Chapter 8. Humor Detection 68 TABLE8.4: Results of my proposed method
Categories Evaluation Results Positive Precision 89.79%
Recall 88.00%
F1-score 88.89%*
Negative Precision 78.57%
Recall 71.74%
F1-score 74.99%*
Optimistic Humorous Precision 79.71%
Recall 83.33%
F1-score 81.48%*
Pessimistic Humorous Precision 65.00%
Recall 72.22%
F1-score 68.42%*
*p<0.05
by adding “optimistic humorous” and “pessimistic humor-ous” categories and considering Internet slang and emo-jis, the F1-score of each category outperformed previous method. My proposed four-categories sentiment analysis approach has improved the performance showing that low-cost, small-scale data labeling can outperform widely used state-of-the-art when emoji and slang information is added to the learning process.
Chapter 8. Humor Detection 69
classified by my proposed method as “optimistic humor-ous" while the baseline recognized it incorrectly as a nega-tive one.
FIGURE 8.3: Example of correct classification of humor-ous post.
This post and similar entries were usually posted as a comment a GIF or video showing a referee who displays her or his skills in basketball by performing a slam dunk.
This entry seems to express an implied humorous nuance of an exaggerated surprise when the poster saw how good the referee was. Because this expression is accompanied by emoji, it improves the performance of classification and predicts the implicit humorous meaning.
As a solution of problems of humor detection encoun-tered in previous research (Li et al., 2018), the fine-grained sentiment classification method I proposed can detect the emotions of Weibo posts more clearly. In Figure 8.4, I show an example of a microblog which was correctly classified by my proposed method as “optimistic humorous," while
Chapter 8. Humor Detection 70
FIGURE8.4: Another example of correct classification of humorous post.
the baseline recognized it as a positive one. As the evalua-tors agreed, it seems that this user wrote a joke just for fun, and my proposed method correctly recognized this kind of emotion.
Error analysis showed that the results of detecting nega-tive emotions and “pessimistic humorous” emotions in my proposed method were still relatively low, which is closely related to the difficulty in recognizing sarcasm and irony.
I plan to train more deep learning models and increase the amount of data to improve the results of pessimistic humor-ous detecting in the future.
Furthermore, some posts were wrongly predicted due to new slang missing from both the parser’s dictionary and my slang lexcion which brought clearly negative impact on the results. In the research of (Ptaszynski et al., 2016), the authors pointed that a typical cause of gradual decrease of performance of systems dealing with Internet language has been the fact, that Internet slang has been constantly chang-ing. This point is also reflected in my research; in Figure 8.5
Chapter 8. Humor Detection 71
FIGURE 8.5: An example of an “optimistic humorous”
post misclassified as “pessimistic humorous”.
I show an example of a post misclassified as “pessimistic humorous” category, but annotated as “optimistic humor-ous” by annotators. Slang expressions asmiao bian (“chang-ing in seconds”) and du ji tang (“poisonous chicken soup”) were parsed incorrectly, and one shifted character caused mis-recognition by the segmentation tool. Du ji tang is a slang word transformed from ji tang which means “anti-motivational quotes” (for example: “Some are born great, some achieve greatness, and some wind up like you” or
“I’m not lazy, I am just highly motivated to do nothing”).
As the abbreviation of “Chicken Soup for the Soul” book series, ji tang is used to express the meaning of “motiva-tional quotes” in recent years. Abundant new words sim-ilar to du ji tang are emerging on social media every year.
Adding new phrases to slang lexicon is costly, and it is not
Chapter 8. Humor Detection 72
enough to keep up with the speed of Internet slang evolu-tion. To deal this phenomenon, a character level contextu-alized word embedding method (for example, pre-trained Chinese word embedding model by BERT) could be con-sidered in the next stage of this research.
73
Chapter 9
Conclusions of the Thesis
In this study, I proposed HEMOS (Humor-EMOji-Slang) sys-tem for fine-grained sentiment classification. I collected 576 frequent Chinese Internet slang expressions and created a slang lexicon; then, I converted the 109 Weibo emojis into textual features creating a Chinese emoji lexicon. I also an-alyzed sentiment polarity of Weibo emojis. I performed se-ries of experiments to verify the validity of both lexicons and emoji polarities. Furthermore, with new “optimistic humorous type” and “pessimistic humorous type” added, I created the basis for a new, four-level sentiment classifi-cation of Weibo posts. I applied both lexicons to my novel deep learning approach, namely attention-based bi-directional long short-term memory recurrent neural network (AttBiL-STM) for more fine-grained sentiment analysis of Chinese social media. My experimental results show that the HEMOS system can significantly improve the performance for pre-dicting sentiment polarity on Weibo.
In order to achieve an even more effective Chinese senti-ment analysis method, I am going to increase the amount
Chapter 9. Conclusions of the Thesis 74
of labeled data to solve the problem of visibly lower re-sults for “pessimistic humorous” and “negative” categories in the proposed fine-grained classification.
In this study, I exclude images and videos. However, in further research, it would be interesting to add images to the data source. I assume this may enhance the text sen-timent analysis since stickers and memes also carry emo-tions. To utilize such additional information, an image pro-cessing phase must be added during the prepropro-cessing stage.
Moreover, during the data labeling phrase, I found that, compared with regular users, there is a high occurrence of posts with specific emojis that are used by spammers (users that spread malicious links or commercial content). Deal-ing with this problem could be an interestDeal-ing research topic, and my methods could be useful for differentiating regular users from spammers.
My ultimate goal is to investigate how much the newly introduced emotion-related features are beneficial for senti-ment analysis by feeding them to a deep learning model, which should allow us to construct a high-quality senti-ment recognizer for a wider spectrum of sentisenti-ment in the Chinese language.
75
Bibliography
Aldunate, Nerea and Roberto González-Ibáñez (2017). “An integrated review of emoticons in computer-mediated communication”. In: Fron-tiers in Psychology7, p. 2061.
Attardo, Salvatore (2010). Linguistic theories of humor. Vol. 1. Walter de Gruyter.
Batty, Magali and Margot J Taylor (2003). “Early processing of the six basic facial emotional expressions”. In:Cognitive Brain Research17.3, pp. 613–620.
Bengio, Yoshua et al. (2003). “A neural probabilistic language model”.
In:Journal of machine learning research3.Feb, pp. 1137–1155.
Bridle, John S (1990). “Probabilistic interpretation of feedforward clas-sification network outputs, with relationships to statistical pattern recognition”. In:Neurocomputing. Springer, pp. 227–236.
Chen, Yuxiao et al. (2018). “Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM”. In:2018 ACM Mul-timedia Conference on MulMul-timedia Conference. ACM, pp. 117–125.
Chen, Zhao et al. (2015). “Combining convolution neural network and word sentiment sequence features for Chinese text sentiment analy-sis”. In:Journal of Chinese Information Processing, 29(6): 172–178.
Cortes, Corinna and Vladimir Vapnik (1995). “Support-vector networks”.
In:Machine learning20.3, pp. 273–297.
Dhaka, Deepali and Monica Mehrotra (2019). “Cross-Domain Spam De-tection in Social Media: A Survey”. In: International Conference on Emerging Technologies in Computer Engineering. Springer, pp. 98–112.
BIBLIOGRAPHY 76 Dhande, Lina L and Girish K Patnaik (2014). “Analyzing sentiment of movie review data using Naive Bayes neural classifier”. In: Inter-national Journal of Emerging Trends & Technology in Computer Science (IJETTCS)3.4, pp. 313–320.
Ekman, Paul and Wallace V Friesen (1971). “Constants across cultures in the face and emotion.” In:Journal of personality and social psychology 17.2, p. 124.
Felbo, Bjarke et al. (2017). “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm”. In:arXiv preprint arXiv:1708.00524.
Furnham, Adrian (1984). “Tourism and culture shock”. In: Annals of Tourism Research11.1, pp. 41–57.
Graves, Alex and Jürgen Schmidhuber (2005). “Framewise phoneme classification with bidirectional LSTM and other neural network ar-chitectures”. In:Neural Networks18.5-6, pp. 602–610.
Guibon, Gaël, Magalie Ochs, and Patrice Bellot (2016). “From emojis to sentiment analysis”. In:WACAI 2016.
Ho, Tin Kam (1995). “Random decision forests”. In: Document analysis and recognition, 1995., proceedings of the third international conference on. Vol. 1. IEEE, pp. 278–282.
Hochreiter, Sepp and Jürgen Schmidhuber (1997). “Long short-term mem-ory”. In:Neural Computation9.8, pp. 1735–1780.
Jones, Graham M and Bambi B Schieffelin (2009). “Talking text and talk-ing back:“My BFF Jill” from boob tube to YouTube”. In: Journal of Computer-Mediated Communication14.4, pp. 1050–1079.
Jones, Jonathon D (2005). A complete analysis of Plato’s philosophy of hu-mor. http://www.jonathonjones.com/papers/plato.pdf.
Kavanaugh, Andrea L et al. (2012). “Social media use by government:
From the routine to the critical”. In:Government Information Quarterly 29.4, pp. 480–491.
Khan, Aurangzeb et al. (2010). “A review of machine learning algo-rithms for text-documents classification”. In: Journal of advances in information technology1.1, pp. 4–20.
BIBLIOGRAPHY 77 Kim, Yoon (2014). “Convolutional neural networks for sentence
classi-fication”. In:arXiv preprint arXiv:1408.5882.
Kulkarni, Vivek and William Yang Wang (2017). “TFW, DamnGina, Ju-vie, and Hotsie-Totsie: On the Linguistic and Social Aspects of Inter-net Slang”. In:arXiv preprint arXiv:1712.08291.
Li, Da, Rafal Rzepka, and Kenji Araki (2018). “Preliminary Analysis of Weibo Emojis for Sentiment Analysis of Chinese Social Media, Pro-ceedings”. In: The 32th Annual Conference of the Japanese Society for Artificial Intelligence, 1J3–04.
Li, Da et al. (2018). “Emoticon-Aware Recurrent Neural Network Model for Chinese Sentiment Analysis”. In: The Ninth IEEE International Conference on Awareness Science and Technology (iCAST 2018), pp. 161–
166.
— (2019a). “A Novel Machine Learning-based Sentiment Analysis Method for Chinese Social Media Considering Chinese Slang Lexicon and Emoticons”. In:The AAAI-19 Workshop on Affective Content Analysis, AffCon 2019, CEUR WS Vol–2328, paper 10.
— (2019b). “Emoji-Aware Attention-based Bi-directional GRU Network Model for Chinese Sentiment Analysis”. In: Joint Proceedings of the Workshops on Linguistic and Cognitive Approaches to Dialog Agents (La-CATODA 2019) and on Bridging the Gap Between Human and Auto-mated Reasoning (BtG 2019) co-located with 28th International Joint Con-ference on Artificial Intelligence (IJCAI 2019), CEUR WS Vol–2452, pa-per 2.
Li, Ruijing et al. (2014). “A method of polarity computation of Chinese sentiment words based on gaussian distribution”. In: International Conference on Intelligent Text Processing and Computational Linguistics.
Springer, pp. 53–61.
Liu, Shuhua Monica and Jiun-Hung Chen (2015). “A multi-label classi-fication based approach for sentiment classiclassi-fication”. In:Expert Sys-tems with Applications42.3, pp. 1083–1093.
Manuel, K, Kishore Varma Indukuri, and P Radha Krishna (2010). “An-alyzing internet slang for sentiment mining”. In:2010 second Vaagdevi
BIBLIOGRAPHY 78 international conference on information Technology for Real World Prob-lems. IEEE, pp. 9–11.
Martin, Rod A (2006).The psychology of humor. Elsevier.
Martin, Rod A and Thomas Ford (2018). The psychology of humor: An integrative approach. Academic Press.
Mathews, Lindsay (2016).Role of Humor in Emotion Regulation: Differen-tial Effects of Adaptive and Maladaptive Forms of Humor. CUNY Aca-demic Works.
Merity, Stephen et al. (2016). “Pointer sentinel mixture models”. In:arXiv preprint arXiv:1609.07843.
Mikolov, Tomas et al. (2013). “Efficient estimation of word representa-tions in vector space”. In:arXiv preprint arXiv:1301.3781.
Novak, Petra Kralj et al. (2015). “Sentiment of emojis”. In: PlOS ONE 10.12, e0144296.
Peng, Haiyun, Erik Cambria, and Amir Hussain (2017). “A review of sentiment analysis research in Chinese language”. In:Cognitive Com-putation9.4, pp. 423–435.
Ptaszynski, Michal et al. (2016). “Sustainable cyberbullying detection with category-maximized relevance of harmful phrases and double-filtered automatic optimization”. In: International Journal of Child-Computer Interaction8, pp. 15–30.
Raskin, Victor (2012). Semantic mechanisms of humor. Vol. 24. Springer Science & Business Media.
Reyes, Antonio, Paolo Rosso, and Davide Buscaldi (2012). “From hu-mor recognition to irony detection: The figurative language of social media”. In:Data & Knowledge Engineering74, pp. 1–12.
Rish, Irina et al. (2001). “An empirical study of the naive Bayes classi-fier”. In: IJCAI 2001 workshop on empirical methods in artificial intelli-gence. Vol. 3. 22. IBM New York, pp. 41–46.
Ruch, Willibald (1993).Exhilaration and humor. Vol. 1. The Guilford Press, pp. 605–616.
Rzepka, Rafal, Noriyuki Okumura, and Michal Ptaszynski (2017). “Worlds Linking Faces – Meaning and Possibilities of Contemporary Pictograms
BIBLIOGRAPHY 79 (in Japanese)”. In: Journal of the Japanese Society for Artificial Intelli-gence, pp. 350–355.
Sharma, Priyanka and Manavjeet Kaur (2013). “Classification in pattern recognition: A review”. In:International Journal of Advanced Research in Computer Science and Software Engineering3.4.
Sharma, Raja (2011).Comedy in New Light-Literary Studies. Lulu. com.
Soliman, Taysir Hassan et al. (2014). “Sentiment analysis of Arabic slang comments on facebook”. In:International Journal of Computers & Tech-nology12.5, pp. 3470–3478.
Sukhbaatar, Sainbayar, Jason Weston, Rob Fergus, et al. (2015). “End-to-end memory networks”. In:Advances in neural information processing systems, pp. 2440–2448.
Tan, Songbo and Jin Zhang (2008). “An empirical study of sentiment analysis for Chinese documents”. In:Expert Systems with applications 34.4, pp. 2622–2629.
Tang, Wenbing, Zuohua Ding, and Mengchu Zhou (2019). “A spammer identification method for class imbalanced weibo datasets”. In:IEEE Access7, pp. 29193–29201.
Vinodhini, G and RM Chandrasekaran (2012). “Sentiment analysis and opinion mining: a survey”. In:International Journal2.6, pp. 282–292.
Wang, Xinyu et al. (2013). “A depression detection model based on sen-timent analysis in micro-blog social network”. In: Pacific-Asia Con-ference on Knowledge Discovery and Data Mining. Springer, pp. 201–
213.
Weinberger, Kilian Q and Lawrence K Saul (2009). “Distance metric learning for large margin nearest neighbor classification”. In: Jour-nal of Machine Learning Research10.Feb, pp. 207–244.
Wu, Liang, Fred Morstatter, and Huan Liu (2016). “Slangsd: Building and using a sentiment dictionary of slang words for short-text sen-timent classification”. In:arXiv preprint arXiv:1608.05129.
Yang, Zichao et al. (2016). “Hierarchical attention networks for docu-ment classification”. In:Proceedings of the 2016 Conference of the North
BIBLIOGRAPHY 80 American Chapter of the Association for Computational Linguistics: Hu-man Language Technologies, pp. 1480–1489.
Yu, Hsiang-Fu, Fang-Lan Huang, and Chih-Jen Lin (2011). “Dual co-ordinate descent methods for logistic regression and maximum en-tropy models”. In:Machine Learning85.1-2, pp. 41–75.
Yue, Xiao Dong (2010). Exploration of Chinese humor: Historical review, empirical findings, and critical reflections.
Yue, Xiaodong (2014). “The attitudes towards humor of Confucian, Bud-dhist, and Taoist culture”. In:Psychological Exploration, pp. 1–5.
Zagibalov, Taras and John Carroll (2008). “Automatic seed word se-lection for unsupervised sentiment classification of Chinese text”.
In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, pp. 1073–
1080.
Zhang, Changli et al. (2009). “Sentiment analysis of Chinese documents:
From sentence to document level”. In:Journal of the American Society for Information Science and Technology60.12, pp. 2474–2487.
Zhao, Jichang et al. (2012). “Moodlens: an emoticon-based sentiment analysis system for chinese tweets”. In:Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data min-ing, pp. 1528–1531.
Zhao, Peijun et al. (2018). “Analyzing and Predicting Emoji Usages in Social Media”. In: Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steer-ing Committee, pp. 327–334.
Zhuo, Shaojian, Xing Wu, and Xiangfeng Luo (2014). “Chinese text sen-timent analysis based on fuzzy semantic model”. In:Cognitive Infor-matics & Cognitive Computing (ICCI* CC), 2014 IEEE 13th International Conference on. IEEE, pp. 535–540.