• 検索結果がありません。

JAIST Repository

N/A
N/A
Protected

Academic year: 2021

シェア "JAIST Repository"

Copied!
5
0
0

読み込み中.... (全文を見る)

全文

(1)

JAIST Repository

https://dspace.jaist.ac.jp/

Title

音声中の感情表現に関連する物理量とその制御に関す

る研究

Author(s)

杉本, 隆

Citation

Issue Date

2000‑03

Type

Thesis or Dissertation

Text version

author

URL

http://hdl.handle.net/10119/1357

Rights

Description

Supervisor:赤木 正人, 情報科学研究科, 修士

(2)

speech and their control

Takashi Sugimoto

School of Information Science,

Japan Advanced Institute of Science and Technology

February 15, 2000

Keywords: emotionalinformation,synthetic speech, STRAIGHT, Lombard eect.

1 Introduction

Not only linguistic information but also emotional information, which play important

rates inhuman communication, must handle tosynthesize speech likeahumanlistens to

easily.

F0, amplitude,anddurationof utterancewerediscussed asphysicalparametersabout

emotional information in speech, and researches on F0 contours and other values had

been done[1 , 2]. However, as forspeech synthesized by these researches, emotioncan not

be expressed well. This is becausesynthetic speechdisregarded emotionalcorrelates and

physicalcorrelates which originated ingenerative mechanismof speech.

This paper investigates relationship between physical values in speech and cognition

ofemotion. Transformationrulesofphysicalvaluesforcontrolingemotionare madeusing

the results combining physical correlates which originated the generative mechanism of

speech. Additionally inuence of rule-based transfered speech on emotion perception is

investigated.

2 Analysis

2.1 Speech data

Speakers are two males and three females in the age of 20, and they are either actors,

goingtoavocalschoolorgoingtoactortrainingplace. Suchspeakers werechosenbecause

are exactly aware oftechniques toexpress emotionalcondition by speech compared with

generalperson.

Copyrightc 2000byTakashiSugimoto

(3)

(F0) is handled in this paper. Therefore, a sentence "いいじゃない" which composed of

vowels and some voiced consonants is used asspeech data.

167 samples in total were collected. Recording was done in asoundproof room. The

uttered speech waves were recorded onDAT, outsidethe soundproof roomat asampling

rate of 48kHz. Therewere down sampled into20kHz, and storedin WorkStation(WS).

2.2 Experiment for emotion classiecation

The target of emotion is which exist in the listener by this reseach. Then, listening test

were donefor emotion classication.

Subjectsweretried tojudgeemotion,neutral,joy,anger,sadness. Subjectsare eleven

graduate schoolstudents and allhave normality hearing. Subjects listened to stimuli by

headphoneinthesoundproofroom. SpeechdatawerestoredinWSoutsidethesoundproof

room, and it is shown corresponding tothe listener's answer.

Since samplesof high distinction rates 1

existed the result that it was suÆcient could

get it asa result of the listening experiment ineach emotion.

2.3 Analysis Results

PhysicalvaluesrelatedtoemotionwereextractedbySTRAIGHTanalysis-synthesissystem[3 ],

and dierences between emotions classied in section2.2are investigated.

The results show the same tendencies as the past research[1] of the analysis toward

the expression of each emotion. However, therewasone dierencefromthe pastresearch

result, that durationof utterance for anger waslonger than that of neutral.

3 Transformation rule of physical values for emo-

tional control

Transformation ruleof physicalvalueswere constractedusinganalyzed results. Duration

of utterance on consonant portions is not stretched. So, stretch of the consonant part

isn't done by time axis change map which wasmade inthe clue of spectra.

Furthermore, the physical correlation that it originates in the formation mechanism

of speech is integrated in rule of physicalparameters' transformation by Lombard eect

is used.

3.1 Lombard eect

Speechwave istransformed asutterance style changesthis iscalled asLombard eect[4 ].

Thiseectisotherphysicalparameterschangespeechbecomeslouder. Then,modelingof

1

Therate of the listenerwho replied when thesame emotion was expressed is decidedto be called

distinctionrate.

(4)

tral. RiseinF0, shiftof formantfrequency, and increase of spectra inthe high frequency

region are occured inLombard eect.

TransformationruleofphysicalvaluestransformationforemotionalcontrolwithLom-

bard eect are shown asfollows.

joy:duration of utterance,long time average of duration of power and F0 are the same as neutral.

increases the change rates of F0 intwotimes.

onlythe end of a word raises F0.

anger:durationof utteranceis expanded in1.2times.(Thestretchofthe consonantpart is

not done.)

increases the long time averages of F0 in 1.2times.

increases the long time averages of power in1.4 times.

increases the change rates of F0 in1.4times.

formantfrequency isshifted.

inhigh frequency region one increased 5dB.

sadness:durationof utteranceisexpanded in1.5times.(Thereisnoconsonantpartstretch.) The long time averages of F0 are decreased to 0.9times.

The long time averages of power are made to decrease to0.8 times.

The change rates of F0 are decreased to0.5times.

4 Experiment

Listening experiments were done toconrm whatever the synthesized speech which con-

trolled anemotionwhether tocontain anemotion.

A listeningexperiment was done by using speech stimuliof applying rule of physical

parameters'transformationthe neutralspeechof speechused forthe analysisandmaking

it as close test. Listening experiments were done by using speech stimulus of applying

a rule it about "けっこうです", "そうですか" in ATR speech data base much in the

same way as open test. Experiment conditions are the same as the emotion classied

experiment.

As results, it could get a high recognition rate in the close test in all the emotions.

Cognition rate was low inthe open test.

However, many commonclauses are found through the close test and the open test.

Change rate of F0 was strongly concerned in each emotionexpression, and it was found

out specially that animportant part assadness.

Anger was fully expressed when duration of utterance was long. This is thought to

originate in the dierence from cold anger with hot anger, not that physical parameters

of the duration ofutterance is not importanttoward the emotionexpression of anger.

(5)

The inuence which a change in physical parameters in speech gave to the emotion dis-

tinctionwas examinedby this research. It almostagreed withthe past research result as

that result.

Transformation rule of physical value for an emotionalcontrolwas constructed using

the analyzed results with physical correlates. Risein F0, shift of formantfrequency, and

increase of spectrain the high were relatedto increase of power.

Emotion controlling a neutral speech by transformation rule of physical values, and

inuence on emotion perception was examined. The change rate of F0 was strongly

concerned in each emotion expression, and it was found out speciallythat animportant

part as for sadness. As for anger,it could get the improvement of the recognition rate

when physicalparameters was made torelate by Lombard eect.

References

[1] Yoshinori Kitahara, Yoh'ichi TOHKURA: \Prosodic Control to Express Emotions

for Man-Machine Speech Interaction", IEICE TRANS. FUNDAMENTALS, Vol.

E75-A, No.2, Feb.1992.

[2] J.E.H.Noad, S.P.Whiteside and P.D. Green: \A MACROSCOPIC ANALYSYS OF

AN EMOTIONAL SPEECH CORPUS", Proc.of Eurospeech, pp.517-520, 1997.

[3] H.Kawahara,I.Masuda-Katsuse,K.Toyama: \Compensatorytimewindowforspeech

analysis, modication and synthesis using STRAIGHT", ASJ Tech. Report, H97-

47,1997.

[4] B.J.Stanton: \Acoustic-PhoneticAnalysisofLoudandLombardSpeechinSimulated

Cockpit Conditions", Proc.ofICASSP, pp.331-334, Apr.1988.

参照

関連したドキュメント

By means of the double exponential transformation, we numerically solve their solutions with high accuracy, and compute a sufficient condition on the uniqueness of the solution..

Furuta, Log majorization via an order preserving operator inequality, Linear Algebra Appl.. Furuta, Operator functions on chaotic order involving order preserving operator

Next we tropicalize this algebraic construction and consider T -polarized pyrami- dal arrays (that is, arrays satisfying octahedral relations). As a result we get several

The isomorphism class of the module is determined by this Leonard system, which in turn is determined by four parameters: the endpoint, the dual endpoint, the diameter, and

In section 2 we present the model in its original form and establish an equivalent formulation using boundary integrals. This is then used to devise a semi-implicit algorithm

Kilbas; Conditions of the existence of a classical solution of a Cauchy type problem for the diffusion equation with the Riemann-Liouville partial derivative, Differential Equations,

Lang, The generalized Hardy operators with kernel and variable integral limits in Banach function spaces, J.. Sinnamon, Mapping properties of integral averaging operators,

The commutative case is treated in chapter I, where we recall the notions of a privileged exponent of a polynomial or a power series with respect to a convenient ordering,