• 検索結果がありません。

Recognition of expressive speech based on a multi-layer model for perception

N/A
N/A
Protected

Academic year: 2021

シェア "Recognition of expressive speech based on a multi-layer model for perception"

Copied!
5
0
0

読み込み中.... (全文を見る)

全文

(1)

JAIST Repository

https://dspace.jaist.ac.jp/

Title 多層知覚モデルに基づく音声中に含まれる感情の認識

に関する研究

Author(s) 青木, 祐介

Citation

Issue Date 2009‑03

Type Thesis or Dissertation Text version author

URL http://hdl.handle.net/10119/8111 Rights

Description Supervisor:赤木正人, 情報科学研究科, 修士

(2)

Recognition of expressive speech based on a multi-layer model for perception

Yuusuke Aoki (0710001) School of Information Science,

Japan Advanced Institute of Science and Technology February 5, 2009

Keywords: expressive speech, emotion cognition, multi-layer model, rule-based, fuzzy inference system (FIS).

1 Introduction

We can communicate using speech from which various information can be perceived. Information included in speech is roughly divided into linguistic information that shows the content that speaker intends to convey and non- linguistic information that includes individuality, emotions and dialects, etc.

Recently, speech interface is much in demand. In the future, it is expected that speech interface including non-linguistic information imitates human perception mechanism. Speech applications involving non-linguistic infor- mation can reinforce human-human communication and human-machine communication. We focus on emotions within non-linguistic information, because emotion is an special element that does not depend on the con- tent of utterances and reflects the speaker’s intention, which is useful in communication.

In the traditional researches on emotion recognition, acoustic features are directly mapped into a certain category. Certainly, phoneme and individ- uality, etc., are able to map into one category. However, emotions cannot map into one category. Because, multiple emotions are usually perceived

Copyright c2009 by Yuusuke Aoki

(3)

by humans from one speech utterance. Moreover, speech contains vari- ous emotions to various intensity. In human’s sensory process, emotional perception is performed by perceiving vague semantic primitives based on acoustic features and by combining these semantic primitives. Therefore, it is quite difficult to recognize the emotions in speech by using the simple mapping-based techniques.

To imitate the perception mechanism of humans, we adopt the multi- layer perception model for emotional speech proposed by Huang and Akagi.

In this study, we aim to recognize emotions by using this model which imitates the perception mechanism of humans.

2 Emotion recognition system

To recognize emotions by imitating the perception mechanism of humans, we construct an emotion recognition system by using the multi-layer per- ception model [1][2] and Fuzzy Inference System (FIS) [3]. The multi-layer perception model for emotional speech was constructed for the vague hu- man perception modeling. This model employs a three-layer structure for expressing perception process from acoustic features to emotion. In particular, this model has semantic primitive layer between acoustic fea- ture layer and expressive speech layer. Furthermore, emotion perception is modeled by combination of semantic primitives. This model can judge the change in emotion layer as semantic primitives change. In order to con- nect these layers, they investigated the elements with strong connection between acoustic features and semantic primitives by correlations and that between semantic primitives and emotional perception by FIS.

FIS which includes both symbol processing and numeric processing rep- resents vague experimental knowledge of human according to the IF-THEN form. FIS is able to express vague judgement of human.

We aspire for the recognition system to imitate perception mechanism of human by using the multi-layer model and FIS. Firstly, we extract acous- tic features, semantic primitives and emotional perception results from all utterances of Fujitsu Laboratory database. The acoustic features are extracted using STRAIGHT [4]. The semantic primitives and emotional perception results are obtained through listening tests by subjective as-

(4)

sessments. Finally, the multi-layer system is constructed by combining the multiple FISs.

3 Experimental evaluation

In order to investigate performance of the constructed recognition system, we compare our system and other traditional recognition systems. To evaluate the effectiveness of the multi-layer and multiple FIS model, a two-layer model is constructed for comparative evaluation of the multi- layer model and a recognition system using Multiple Regression Analysis (MRA) is constructed for comparative evaluation of the system using FIS.

Moreover, the recognition system which combines the multi-layer model and FIS is further compared with the system which combines the two- layer model and MRA. We investigate whether it is able to imitate vague process of human perception by comparing FIS and MRA, and to express sensory process of human perception in term of semantic primitives by comparing the multi-layer model and the two-layer model. Recognition accuracy was measured by Euclidean distance on the absolute scale and the correlation on the relative scale using these systems.

4 Conclusions

In order to imitate human perception mechanism, in this study, we con- structed an emotion recognition system based on the multi-layer model proposed by Huang and Akagi.

The evaluation results for FIS and MRA indicate that the recognition systems with FIS are more useful than those with MRA. Furthermore, the two-layer and multi-layer models can recognize emotion at the almost same accuracy. Since a multi-layer model can also judge the change of semantic primitives, it is better than the two-layer model in imitation of human perception. In a sense of imitating the perception mechanism of human, the constructed system provides a more effective emotion recognition system compared with the conventional methods. Therefore, we can recognize emotion by imitating human perception mechanism.

(5)

References

[1] Chun-Fang Huang, Masato Akagi, “A Multi-Layer fuzzy logical model for emotional speech perception,” Proc. EuroSpeech 2005, pp. 417–420, Lisbon, Portugal, 2005.

[2] Chun-Fang Huang, Masato Akagi, “A three-layerd model for expressive speech perception,” Speech Commun., Vol.50, pp. 810-828, 2008.

[3] J. S. R. Jang, C. T. Sun, E. Mizutani, “Neuro-Fuzzy and Soft Comput- ing,” Prentice Hall, 1996.

[4] H. kawahara, I. Masuda-Katsuse, A. Cheveigne, “Resturcturing Speech Representations Using a Pitch Adaptive Time-Frequency Smoothing and an Instantaneous-Frequency-Based F0 Extraction,” Speech Com- mun., Vol.27, pp. 187-207, 1999.

参照

関連したドキュメント

We have found that the model can account for (1) antigen recognition, (2) an innate immune response (neutrophils and macrophages), (3) an adaptive immune response (T cells), 4)

To deal with the complexity of analyzing a liquid sloshing dynamic effect in partially filled tank vehicles, the paper uses equivalent mechanical model to simulate liquid sloshing...

According to expert experience, characteristic data of driver’s propensity includes headway, relative speed, deceleration frequency, acceleration frequency, performance reaction

T. In this paper we consider one-dimensional two-phase Stefan problems for a class of parabolic equations with nonlinear heat source terms and with nonlinear flux conditions on the

It is suggested by our method that most of the quadratic algebras for all St¨ ackel equivalence classes of 3D second order quantum superintegrable systems on conformally flat

In particular, we consider a reverse Lee decomposition for the deformation gra- dient and we choose an appropriate state space in which one of the variables, characterizing the

By considering the p-laplacian operator, we show the existence of a solution to the exterior (resp interior) free boundary problem with non constant Bernoulli free boundary

Here we continue this line of research and study a quasistatic frictionless contact problem for an electro-viscoelastic material, in the framework of the MTCM, when the foundation