Evaluation Results - A Study on Framework for Multimodal Intelligent Interfaces Using Emotion D

good). The evaluations were performed from several diﬀerent viewpoints.

6.5.1 Evaluation Results for Visualization Method

The first examinations are to investigate the availability of the proposed visualization method. In the evaluation, we prepared the visualization by bar charts shown in Figure 6.3 as an comparative example. The examinees answered against several questions about the proposed visualization method with comparing the visualization by bar charts.

Question 1. Can you immediately understand how the performer feels?

Table 6.1 is the answers about the above question. All the examinees give a higher score to the proposed method than the bar charts. The evaluation result means the proposed visualization method is useful to quickly comprehend user’s feelings.

Table 6.1: Evaluation Results about the Understandability of Relationships of Whole Body and Emotions.

Question 2. Can you immediately understand what kind of emotion category each

body part means?

The evaluation results are described in Table 6.2. As well as the Question 1, all the examinees marked a higher score to the proposed method. The evaluation result means that a human avatar is available to show the emotion of each part rather than only a text on body parts.

Table 6.2: Evaluation Results about Understandability of Relationships of Body Parts and Emotions.

Question 3. Can you understand the changes of emotion states?

Table 6.3 shows the evaluation results. As well as the Question 1 and 2, all examinee give a higher score to the proposed system. The evaluation result indicates the changes of emotions are understandable to see with the motions.

Question 4. Can you understand the relationships of body movements and emotion categories?

The evaluation results of the above question are in Table 6.4. All the examinees gave a

Table 6.3: Evaluation Results about Understandability of the Changes of Motions and Emotions.

higher score to the proposed method. The result means that the method of showing emo-tion and body movements simultaneously is easy to understand the relaemo-tionship between them.

6.5.2 Evaluation Results for Fusion Process

This evaluation is to investigate the benefit of using linked data. We prepared two ex-ample systems. The first one did not use reference controls to linked data in the fusion process. The other one for it did update the reliability using reference controls to linked data. Several evaluators answered against the following questions with comparing the two examples. Question 1. Can you immediately understand the current emotion state?

The evaluation result is shown in Table 6.5 Almost the evaluators give a higher score to the reference method of linked data. The result indicates the availability of the linked

Table 6.4: Evaluation Results about Understandability of Relationships between Motions and Emotions.

Question 2. Can you clearly understand the changes of emotion states?

Table 6.6 represents the results. As well as the result of question 1, the most evaluators marked a higher score to the reference method of linked data. The result also indicates the usefulness of the linked data for the fusion process.

6.5.3 Evaluation Results for Practical Usability

The last evaluation is for a practical usability. Because we can skillfully use words in our conversations, we sometimes can’t judge how he/she feels from only their spoken contents.

The three actions explained in 6.4 were taken and shown to the evaluators. The purpose of this experiment is to investigate for examinees to enable to see through which performance is truly felt for the spoken contents by using the visualization system. Table 6.7 describes the evaluation results. Almost the evaluators gave the same answer. The results show the system is helpful to understand the truly feelings from his/her performances.

Table 6.5: Evaluation Results of the Reference Method for Understandability.

Table 6.6: Evaluation Results of Reference Method for Emotion State Changes.

Table 6.7: Evaluation Results of Which Motions Truly Feels JOY.

Chapter 7 Summery and Future Works

This dissertation has described linked data building methods and a software architecture to communicate with datasets of the linked data to develop multimodal intelligent in-terface systems. As an application example, we also showed the emotion visualization system using the proposed framework. Evaluation results of the system clarified that the proposed framework including the software architecture and the linked data built based on the proposed building methods is useful for multimodal interface systems.

7.1 Compendium

Chapter 1 introduced research backgrounds and research purposes. Most of the details of technical terms and related works of our researches were described in this chapter.

Chapter 2 explained the proposed software architecture for multimodal intelligent inter-face systems. The main concept of the framework is to positively adopt the Web standard technologies such as Websocket, HTTP and Linked Data. Chapter 3 described the pro-posed building method of linked data as knowledge bases for intelligent interfaces to represent relationships of emotion words and its emotion intensities which are strengths of impressions which humans feel from emotion words. We took several experiments to clarify that linked data are useful for emotion recognitions analyzing emotion intensities

from emotion words. In this chapter, as an application example, a facial expression system using the linked data was shown. In Chapter 4, we proposed a building method of linked data describing relationships between positional states of Action Units of a facial expres-sion and its corresponding human emotions. We performed an experiment to investigate which Action Unit has higher reliability for the emotion recognition and made the linked data describing about the reliability. The approach of Chapter 5 was almost the same as Chapter 4. We explained the method that constructs linked data for walking motions and emotions. The linked data also contained the analyzed and experimental data. Chapter 6 showed the emotion visualization system that uses the software architecture and linked data we proposed in Chapter 2, 3, 4 and 5. The system had multimodal interfaces, which are a voice input device, a camera device and a motion capture device, and showed current emotion states using the color circle on a mobile device. The evaluation results of the system and experimental results indicate the proposed framework is useful for multimodal intelligent interface systems.

7.2 Future Works and Prospective Application Ex-ample

We have insisted that the importance for multimodal interface systems is to communi-cate with the Web resources represented as linked data. By publishing various types of linked data of personal information on the Web and sharing them using standardized data formats for enabling computers to understand their contents, our proposed approach pro-vides powerful tools to analysis various types of sensor data not only by using databases of appearance information, e.g. facial expressions and gestures, but also by using databases of the user’s background information dynamically changing in his/her usual life, e.g.

health condition and schedule of daily works. The computers can analyze the data by

using global datasets represented as linked data, i.e. geolocation information and climate information besides these information.

As one of the future visions of this framework, we want to introduce a prospective ap-plication example that uses our proposed framework shown in Figure 7.1 The apap-plication is a robot that cooks and serves dishes to his owner. The robot recognize the owner’s ap-parent information, e.g. facial expressions and contents of conversations. The robot also searches the owner’s background information like health condition and menus of lunch yesterday when the robot analyzes the apparent information by using linked data of such owner’s personal information. The robot can add generic information, e.g. current am-bient temperature and geolocation information where the owner is in, to be analyzed by crawling the Web of Data. The robot integrates these information and concludes adequate menus suitable for the owner’s current situation.

The eﬀorts of publishing information as linked data are still in its early stage. We should publish various types of linked data to enable computers to eﬀectively understand by analyzing them. If we publish our personal information in a data format that computers can read, computers will come to take more intelligent actions for supporting us.

Our proposed framework for multimodal intelligent interface systems will exert its abilities for analysis of emotion recognitions and inference of its conclusions when many linked data of personal information are publish on the Web and share the data among computers. We hope that procedures and concepts of our researches can help researchers or software developers to implement these multimodal intelligent interface systems.

Figure 7.1: Prospective Application.

Bibliography

[1] A. Azcarate, F. Hageloh, K. Sande and R. Valenti, Automatic facial emotion recog-nition, Univerity of Amsterdam, 2005.

[2] A. G. Rojas, F. Vexo, D. Thalmann, A. Raouzaiou, K. Karpouzis, S. D. Kollias, L.

Moccozet, and N. M. Thalmann, Emotional Face Expression Profiles Supported by Virtual Human Ontology, Journal of Visualization and Computer Animation Vol-ume.17, Issue 3-4, pp. 259-269, 2006.

[3] A. Jaimes and N. Sebe, Multimodal human-computer interaction: A survey, Com-puter Vision and Image Understanding, Volume 108, Issue 1-2, pp.116-134, 2007.

[4] A. Kapur, A. Kapur, N. V. Babul, G. Tzanetakis and P. F. Driessen, Gesture-Based aﬀective computing on motion capture data, Proceedings of the First international conference on Aﬀective Computing and Intelligent Interaction pp. 1-7, 2005.

[5] A. Murat T. and J. Ostermann, Face and 2-D Mesh Animation in MPEG-4, Signal Processing: Image Communication, Volume 15, Number 4, pp. 387-421, 2000.

[6] A. Nakamura,感情表現辞典, 東京堂出版, ISBN-13: 978-4490103397, 1993.

[7] A. Raouzaiou, N. Tsapatsoulis, K. Karpouzis and S. Kollias, Parameterized facial expression synthesis based on MPEG-4, EURASIP Journal on Applied Signal Pro-cessing, Volume. 2002 Issue 1, pp. 1021-1038, 2002.

[8] A. Tinwella, M. Grimshawa, D. A. Nabib and A. Williamsa, Facial expression of emotion and perception of the Uncanny Valley in virtual characters, Computers in Human Behavior archive, Volume 27 Issue 2, pp. 741-749, 2011.

[9] B. Adida and M. Birbeck, RDFa 1.1 Primer - Second Edition - Rich Structured Data Markup for Web Documents, W3C Working Group Note, http://www.w3.org/TR/xhtml-rdfa-primer/, 2013.

[10] B. Dumas, D. Lalanne and S. Oviatt, Multimodal Interfaces: A Survey of Principles, Models and Frameworks,Human Machine Interaction, ISBN: 978-3-642-00436-0, pp.

3-26, 2009.

[11] C. Kozasa, H. Fukutake, H. Notsu, Y. Okada and K. Niijima, Facial Animation Using Emotional Model, International Conference on Computer Graphics, Imaging and Visualisation (CGIV’06), pp. 428-433, 2006.

[12] C. Whissell, The dictionary of aﬀect in language, In Robert Plutchik and Henry Kellerman (Ed.), Emotion: Theory, Research, and Experience,New York: Academic Press, pp. 113-13, 1989.

[13] D. Beckett, RDF/XML Syntax Specification (Revised) - W3C Recommendation, http://www.w3.org/TR/rdf-syntax-grammar/, 2004.

[14] D. Beckett and Tim Berners-Lee, Turtle - Terse RDF Triple Language, http://www.w3.org/TeamSubmission/turtle/, 2008.

[15] D. Glowinski, A. Camurri, G. Volpe, N. Dael and K. Scherer, Technique for auto-matic emotion recognition by body gesture analysis, Computer Vision and Pattern Recognition 2008. CVPR Workshops, IEEE Computer Society (2008), pp.1-6, 2008.

[16] E. Prud’hommeaux and A. Seaborne, SPARQL Query Language for RDF - W3C Recommendation, http://www.w3.org/TR/rdf-sparql-query/, 2008.

[17] G. A. Miller, WordNet: a lexical database for English ,In Communications of the ACM, Volume. 38, pp. 39-41, 1995.

[18] G. Caridakis, G. Castellano, L. Kessous, A. Raouzaiou, L. Malatesta, S. Asteri-adis and K. Karpouzis, Multimodal emotion recognition from expressive faces, body gestures and speech, Artificial Intelligence and Innovations 2007: from Theory to Applications, IFIP The International Federation for Information Processing, Volume 247, pp. 375-388, 2007.

[19] G. Castellano, S. D. Villalba and A. Camurri, Recognising Human Emotions from Body Movement and Gesture Dynamics, Aﬀective Computing and Intelligent Inter-action, Lecture Notes in Computer Science Volume 4738, pp. 71-82, 2007.

[20] G. Klyne and J. J. Crroll, Resource Description Framework (RDF): Concepts and Ab-stract Syntax - W3C Recommendation, 2004. http://www.w3.org/TR/rdf-concepts/, 2004.

[21] H. Gunes, M. Piccardi and T. Jan, Face and body gesture recognition for a vision-based multimodal analyzer, Proceedings of the Pan-Sydney area workshop on Visual information processing, pp. 19-28, 2005.

[22] H. Sugawara, A. Neviarouskaya, and M. Ishizuka, Aﬀect Extraction from Text in Japanese,The 23rd Annual Conference of the Japanese Society for Artificial Intelli-gence, pp. 1-2, 2009.

[23] I. Davis, T. Steiner and A. J Le Hors, RDF 1.1 JSON Alternate Serializa-tion (RDF/JSON) - Editor’s Draft 07, 2013. https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html, 2013.

[24] I. Hickson, The WebSocket API - W3C Candidate Recommendation, http://www.w3.org/TR/websockets/, 2012.

[25] J. Ahlberg, CANDIDE-3 – an updated parameterized face, Report No. LiTH-ISY-R-2326, Dept. of Electrical Engineering, Linkoping University, Sweden, 2001.

[26] J. A. Larson, T.V. Raman and D. Raggett, W3C Multimodal Interaction Framework - W3C NOTE, http://www.w3.org/TR/mmi-framework/, 2003.

[27] J. Grant and D. Beckett, RDF Test Cases - W3C Recommendation, 2004.

http://www.w3.org/TR/rdf-testcases/#ntriples, 2004.

[28] J. J. Lien, T. Kanade, J. Cohn, and C. C. Li, Automated Facial Expression Recog-nition Based on FACS Action Units, Third IEEE International Conference on Auto-matic Face and Gesture Recognition, pp. 390-395, 1998.

[29] J. Whitehill and C. W. Omlin, Haar Features for FACS AU Recognition,Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, pp.

97-101, 2006.

[30] K. Kaneko, Y. Okada and H. Matsuguma, Open Device Control: Human Interface Device Framework for Video Games, Proceedings of the 12th European GAME-ON Conference on Simulation and AI in Games (GAMEON 2011), pp.88-92, 2011.

[31] K. Kaneko, T. Nakamura, Y. Okada, D.-W. Kim and H. Matsuguma, Device-to-Device Communication Framework Supporting Indoor Positioning System for

Location-aware Interactive Applications, Proceedings of the 4th Asian GAME-ON Conference on Simulation and AI in Computer Games (GAME-ON ASIA 2012), pp.39-43, 2012.

[32] K. Kaneko, T. Nakamura, Y. Okada and H. Matsuguma, Open Device Control (OpenDC): Human Interface Device Framework for Interactive Applications Includ-ing Educational Contents in Ubiquitous Environments,Proceedings of the 7th IEEE International Conference on Wireless, Mobile & Ubiquitous Technologies in Edu-cation & 4th IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning (WMUTE&DIGITEL 2012), IEEE CS Press, pp.122-126, 2012.

[33] K. Kaneko and Y. Okada, Building of Japanese Emotion Ontology from Knowl-edge on the Web for Realistic Interactive CG Characters, Proceedings of the 5th International Workshop on Virtual Environment and Network Oriented Applications (VENOA 2013) of CISIS 2013, IEEE CS Press, pp. 735-740, 2013.

[34] K. Kaneko and Y. Okada, Action Unit-Based Linked Data for Facial Emotion Recog-nition, Active Media Technology, Lecture Notes in Computer Science, Volume 8210, pp. 211-220, 2013.

[35] K. Katsurada, Y. Nakamura, H. Yamada, T. Nitta, XISL: a language for describing multimodal interaction scenarios, Proceedings of the 5th international conference on Multimodal interfaces, pp. 281-284, 2003.

[36] L. Boyer, P. Danielsen, J. Ferrans, G. Karam, D. Ladd, B. Lucas, K. Re-hor, Voice eXtensible Markup Language (VoiceXML) version 1.0 - W3C Note, http://www.w3.org/TR/voicexml/, 2000.

[37] M. Bodell, D. Dahl, I. Kliche, J. Larson, B. Porter, D. Raggett, T.V. Ra-man, B. H. Rodriguez, M. Selvaraj, R. Tumuluri, A. Wahbe, P. Wiechno and M. Yudkowsky, Multimodal Architecture and Interfaces - W3C Recommendation, http://www.w3.org/TR/mmi-arch/, 2012.

[38] M. E. Foster, State of the art review: Multimodal fission. COMIC project Deliverable 6.1., 2002.

[39] M. Ptaszynski, R. Rzepka, K.Araki, and Y. Momouchi, A Robust Ontology of Emotion Objects ,Proceedings of The Eighteenth Annual Meeting of The Association for Natural Language Processing (NLP-2012), pp. 719-722, 2012.

[40] N. Tsapatsoulis, K. Karpouzis, G. Stamou, F. Piat and S. Kollias, Classification Based on the MPEG-4 Facial Definition Parameter Set, Proceedings of the 10th Eu-ropean Signal Processing Conference, 2000.

[41] P. Baggia, D. C. Burnett, J. Carter, D. A. Dahl, G. McCobb and D. Raggett, EMMA: Extensible MultiModal Annotation markup language - W3C Recommen-dation, http://www.w3.org/TR/emma/, 2009.

[42] P. Ekman and W. V. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement, Consulting Psychologists Press, Palo Alto, 1978.

[43] P. Ekman, W. W. Friesen and P. Ellsworth, What Emotion Categories or Dimensions Can Observers Judge From Facial Behavior?, In Emotion in the Human Face, pp.

39-55, 1982.

[44] R. Fielding, Hypertext transfer protocol - http1.1 request for comments: 2616.

http://www.w3.org/Protocols/rfc2616/rfc2616.html, 1999.

ドキュメント内 A Study on Framework for Multimodal Intelligent Interfaces Using Emotion Data (ページ 82-99)