Literature Review - 東北大学機関リポジトリTOUR

5.2. Literature Review 105 they consider the valence of the attributes, like positive and negative (Decker and Trusov, 2010; Qi et al., 2016; Xiao, Wei, and Dong, 2016) or more than three levels (Bi et al., 2019, and this study). As discussed in the previous section, the consid-eration of attribute valence is based on the findings that consumers generally have different preference structures for overall satisfaction when they are satisfied and unsatisfied with individual product attributes (e.g., Kano model, Kano et al., 1984).

Also, Xiao, Wei, and Dong (2016) and this study consider the brand heterogeneity into the preference model by introducing heterogeneous coefficients to capture the varying sensitivity of consumers for each brand.

Furthermore, a remarkable feature of this study is that we consider the effects of consumer attributes such as age and reviewer status. In the proposed models, we introduce two different effects of consumer attributes on the satisfaction struc-ture: (i) the direct effect on review ratings when consumer attributes are added into the preference structure model as explanatory variables, and (ii) the indirect effect through the effects on the importance of product attributes for the review writers, considering the extended LDA with hierarchical structure for capturing the indirect effect. These make it possible to distinguish between the direct effect of product attributes and the general trends based on consumer attributes on the preference structure and to understand the importance of products attributes varying for each consumer. Thus, the proposed models can provide more valuable results for mar-keting activities than the existing approaches can.

Some studies take these two processes sequentially (Decker and Trusov, 2010;

Qi et al., 2016; Xiao, Wei, and Dong, 2016), while others estimate a single model by integrating them (Buschken2017; Büschken and Allenby, 2016). The advantage of the sequential approach is that the additional human tasks, as described above, can improve the accuracy of attributes extraction by the statistical models such as LDA and CRF, while that of the integrated approach is that the fitting to the data is generally better than the sequential approach because it allows the integrated model to find the dimensions of product attributes while taking into account their impacts on the preference structure. While the integrated approach has such advantages, it does not take the additional tasks as the sequential approach, and therefore we address some issues of simple text models such as the LDA. One of the issues is

that the model does not take into account the word order and grammar because it takes a unit of text data (such as sentence and document) as the bag-of-words.

Büschken and Allenby (2016) tackles the problem by extending the conventional LDA model to assign topics to not a word but a sentence, that is n-gram model (here, n is the number of words in a sentence). They do not consider the word order but the joint distribution of words consisting of a sentence, and can relax the bag-of-words limitation of the conventional LDA model. Also,Buschken2017propose the auto-correlated topics LDA model that allows the latent topic of the current word to carry over the next word topic, that is, it considers the bi-gram model.

In this study, we propose a word embedding model with consideration of word topics to extract product attributes from the review text while relaxing the bag-of-words limitations. The word embedding model, or word2vec (Mikolov et al., 2013), was proposed in the field of natural language processing and has attracted attention because of its extraordinary performance in many natural language tasks, such as text summarization and translation, and are still being actively studied while many extended models have been proposed. Compared with the existing text models, the standout feature of the word2vec model is treating words as dense vectors, not sparse vectors such as one-hot encoding, by embedding them in a feature space and modeling the word generation process while taking into account the word order by the skip-gram model. As for the former, it flexibly incorporates the meanings of words into the statistical model by defining the word generation probability while representing a single words as a vector with hundreds of dimensions. For the gen-eration probability of a word, the skip-gram considers the conditional probability given the surrounding words of the focal word. In predicting appropriate word in a sentence, for example, “I . . . a student.” (. . . is a masked word for prediction), the skip-gram defines the conditional probability, p(. . . | I,a,student). Therefore, it is expected that p(am | I,a,student)is larger than p(are | I,a,student) if the model learns good embedding representations.

Another limitation of the conventional LDA model is the interpretability of the estimated topics. Previous studies using the LDA have reported topics that can be interpreted by the semantic coherence of the words assigned to the topics. However, the LDA itself does not have a mechanism to guarantee the interpretability of topics,

5.2. Literature Review 107 and in practice, we are often faced with some situations in which we are unable to interpret the meaning of topics even if the model appropriately estimates the top-ics. Such issues have not been given much consideration in the field of customer review analysis, but in the field of natural language processing, some studies have examined the evaluation of the interpretability performance of the LDA model by the coherence measure, as discussed by Mimno et al. (2011).

In this study, we use the pre-trained vectors by large corpus for the initial values of word vectors in training the proposed models to improve the interpretability of the LDA model. In the field of natural language processing, trained word embed-ding models using large corpus data as like Wikipedia, such as GloVe (Pennington, Socher, and Manning, 2014) and BERT (Devlin et al., 2019), have been published.

Moreover, saving the training costs by giving such pre-trained vectors to the model as initial values is a common approach. Although the use of pre-trained models is, in general, considered to contribute to improved performance and faster estimation for the current tasks, this study expects it to also contribute to improved topic inter-pretability. This is because it is believed that the initialization of topic assignments by considering vector representations of the general meaning and use of the words will allow us to appropriately and quickly optimize the model, such that it encompasses the unique meaning to the focal domain, compared with randomly initializing the assignments in the absence of any prior information.

Finally, Table 5.1 summarizes the above discussion on the comparison of this study with the existing studies from several viewpoints of the customer review anal-ysis.

TABLE5.1:Comparisonwithexistingstudies PapersFactorsonoverallsatisfaction Modelforpreference estimationMethodofattribute extractionWhethertheorderof wordsisconsideredPreferenceestimation andattributeextractionProduct attributesValenceof attributesConsumer attributesOthers DeckerandTrusov(2010)Positiveand Negative--Poissonregression, Negativebinomial regression

Associationrule, Conditionalrandomfield withhandcraftedlexicon

Considered (bi-grammodel)Separated Archak,Ghose,andIpeirotis(2011)Levelsofattributesvary foritsevaluations-Price, Review statistics

Loglinear regression

Clusteringsimilarwords usingWordNet, Dependencyanalysis

Considered (dependencybetween attributesandvalence)Separated Qietal.(2016)Positiveand Negative--LinearregressionLDA,Pagerank, HandcraftedlexiconIgnored (uni-grammodel)Separated Xiao,Wei,andDong(2016)Positiveand Negative-Brand intercept

Orderedprobitregression withheteroskedasticity andheterogeneity

Manuallytransformation, automatesimilarity calculation

Ignored (uni-grammodel)Separated BüschkenandAllenby(2016)---OrderedprobitregressionSentenceconstrained LDAConsidered (n-grammodel)Integrated Buschken2017---OrderedprobitregressionAuto-correlatedtopics LDAConsidered (bi-grammodel)Integrated Bietal.(2019)5levelsofvalence--EnsembleneuralnetworkLDA,Manually integrationoftopicsIgnored (uni-grammodel)Separated ThisstudyPositive,Neutral andNegativeBrand heterogeneityOrderedprobitregressionWordembeddingmodel consideringwordtopics andsentiments

Considered (skipgram-grammodel)Integrated

ドキュメント内東北大学機関リポジトリTOUR (ページ 121-126)