JAIST Repository

(1)

JAIST Repository

https://dspace.jaist.ac.jp/

Title

文音声中の基本周波数の時間変化に含まれる個人性に

関する研究

Author(s)

大野, 宏

Citation

Issue Date

1997‑09

Type

Thesis or Dissertation

Text version

author

URL

http://hdl.handle.net/10119/1105

Rights

Description

Supervisor:赤木正人, 情報科学研究科, 修士

(2)

contours of sentences

Hiroshi Ohno

Scho ol of InformationScience,

Japan AdvancedInstitute of Scienceand Technology

February 13, 1998

Keywords: fundamental frequencycontours,speakerindividuarity,Fujisakimodel.

1 Introduction

Thispap erdiscussessp eakerindividualityinfundamentalfrequencycontoursofsentencesbased

on analysisusing theFujisaki modeland psychoacousticexperiments. Thestimuliusedforthe

experiments are synthesized using STRAIGHT [1], whosefundamentalfrequency contours are

modied by the Fujisaki mo del. The experiment results indicate that (1) fundamental fre-

quencycontoursofsentenceshavemuchspeakerindividuality,(2)esp ecially,thebasefrequency

F

min

and the timing parameters (T

0

;T

1 and T

2

) in the frequency contour have more speaker

individualitythanother parametersandsubjectscanbe dividedintotwogroups,inwhichfun-

damentalfrequencyheightortimingof fundamentalfrequencydynamicsaectsdiscrimination,

and (3) sp eaker individuality can be controlled by manipulating a few parameters including

timing parameters.

2 Fujisaki model

A fundamentalfrequencycontoursF

0

(t)[2] asfollows:

lnF

0

= lnF

min +

I

X

i=1 A

pi G

pi (t0T

0i )+

J

X

j=1 A

aj fG

aj (t0T

1j )0G

aj (t0T

2j )g;

G

pi (t)=

(

2

i

texp(0

i

t) (t0);

0 (t<0)

(1)

G

aj (t)=

(

min[10(1+

j

t)exp(0

j t);

j

] (t0);

(3)

0 10 20 30 40 50

F ^{m in} Ap ⁰

F ratio

Ap

Ap ¹ Ap ² Aa 0 Aa 1 Aa 2 Aa 3 Aa 4

F min Aa

∆ T0

∆ T01

∆ T02

∆ T10

∆ T11

∆ T12

∆ T13

∆ T14

∆ T1 ∆ T2

∆ T20

∆ T21

∆ T22

∆ T23

∆ T24

Figure1: F ratioofeachparameter

where F

min

: baseline value of a F

0

contour, I: numb er of phrase commands, J: numb er of

accentcommands,A

pi

: magnitudeofthei-thphasecommand,A

ai

: amplitudeofthej-thphase

command, T

0i

:instant of occurrence of theith phrase command, T

1j

: onset of thej-th accent

command, T

2j

: end of thej-th accent command,

i

: natural angular frequencyof thephrase

control mechanism to the i-th phrase command,

j

: natural angular frequency of the accent

control mechanism tothe j-th accent command,and

j

: ceiling level of theaccent component

forthej-thaccent command.

3 Analysis of dierence in fundamental frequency contours on

sentence

Speech datafor all theexperimentsare sentencessuch as\aoiao

iga aoiyaneno ue n

iaru"(\"

meanspositionsofthe accent)|uttered byvemale sp eakers.

ParametersoftheFujisakimodelareestimatedbyminimizingthemeansquarederrorb etween

theextractedF

0

contourandthemodeledF

0

contouronalogarithmicscale. Theminimization

process utilizes theanalysis-by-synthesismethod.

To cho ose some physical characteristics representing sp eaker individualitiy in the analyzed

parameters,wecalculatedtheF ratio(inter-speakervariationdividedbyaveragedintra-speaker

variation)foreachparameter.

F

k

= P

n

i

c

ik 0

1

n P

n

i c

ik

2

1

N P

n

i P

N

j (c

ijk 0c

ik )

2

; 0

@

c

ik

= 1

N N

X

j c

ijk 1

A

(2)

where c

ijk

is the j-thobservation of the i-th speakerfor theparameter k. The larger F ratio

indicates the parametermore signicant for sp eakerclassifrcation. Notesthat the 1 of 1T

0i ,

1T

1j

and1T

2j

indicatedierencesb etweenthephasecommandtimingsandthemoraboundary

T

00 .

(4)

Sp eaker 5

Subject 5

Headphone SENNHEISERHDA200

HeadphoneAmp SANSUIAU-907MR

Hearinglevel 約 ⁷⁶^dB^(A)

Table2: t-test oftheexperimentresult(betweensyntheticspeech)

stimulisample same samesp eaker dierspeaker

O,ST 1.424 4.079 9.111

O,SF 1.585 3.654 9.199

ST,SF 1.187 0.115 0:265

t0:05=1:960;t0:01=2:576

4 Perception of speaker individuarity

In order to investigate fundamental frequency contours, mo deled by Fujisaki model, psychoa-

coustic experimentsused STRAIGHTsp eechwaveswith spectraland amplitudeexchanged.

The typ esofthestimuli arsasfollows:

1. O:originalsp eechwaves

2. ST:synthesized speech by STRAIGHT and TEMPO, whose sp ectra come from another

sp eakersp eech.

3. SF:synthesizedspeechbySTRAIGHTand Fujisaki mo del,whosespectraalso come from

anothersp eakerspeech.

Psychoacousticexperiment wasbymethod ofparired comparisonof ve judgescale.

The resultsoft-test amongthree stimuli areshowninTable2 and Table3.

The experiment results indicate that (1)fundamental frequency contours of sentences have

speakerindividulity,and(2)fundamentalfrequencycontoursbytheFujisakimodelhavesp eaker

individulityasmuch asthosebyTEMPO.

5 Shift of perception by each parameters

ThepsychoacousticexperimentusedABX method,thestimuli xresynthesizedbyexchangeda

fewparameter, and subjects judgedwhether thesynstheticsp eechxwas closertospeakeraor

speakerb.

(5)

stimle samestimulianddierspeaker somesp eakeranddiersp eaker

ST 41.024 61.221

SF 37.722 57.52

t0:05=1:960;t0:01=2:576

Table4: Parameterset

type A B C D E F G H

base a b a a a b b a

phrase a a b a a b a b

accent a a a b a a b b

timing a a a a b a a a

2. phrase A

pi

3. accentA

aj

4. timingT

0i

;T

1j

;T

2j

The exchangedparameters setsare showninTable4.

The psychoacoustic exp erimentresult is shownin Table5. This result is theaverage rate of

thatsubjects judgedspeakerb.

The experiment results inducate that (1)the shift of perception aect dierence of the pa-

rametersbetweenspeakers,(2)F

min

andthetimingparamerters(T

0

;T

1 andT

2

)inthefrequency

contourhavemorespeakerindividualitythanotherparameters,(3)subjectscanbedividedinto

twogroups,inwhichfundamentalfrequencyheightortimingoffundamentalfrequencydynam-

icsaects discrimination,and(4)speakerindividualitycanbecontrolled bymanipulatingthree

parameters including timing parameters.

The results inducate that the timing parameters in the fundamental frequency contours of

sentenceshavemorespeakerindividualitythanwords. Theexperimentresultobtainsameresult

of thereport[4], thesp eakerindividuarityaect dierenceofacoustic features.

6 Conclusion

In order to investigate sp eakerindividualityin fundamentalfrequency countours of sentences,

parameter extraction byFujisaki model, analysis of dierence, and thepsychoacoustic exp eri-

mentswere carriedout.

The resultsindicate thatfundamentalfrequencycontours of sentenceshavespeakerindivid-

uality,and timing parameterhavemore speakerindividualitythanother parameters.

(6)

parameter set A B C D E F G H

subject1 × △ ○ △ ○ ◎ ○ ◎

subject2 × ○ ○ △ ○ ◎ ○ ◎

subject3 × ○ △ △ ◎ ◎ ○ ◎

subject4 △ ○ △ △ ◎ ○ ○ ○

subject5 × △ × △ ◎ △ ○ ○

average × ○ △ △ ◎ ○ ○ ◎

perceptual r ate×^:⁰〜⁵％^;△^:⁵〜²⁰％^;○^:²⁰〜⁴⁰％^;◎^:⁴⁰〜¹⁰⁰％

References

[1] H.Kawaahara：^\A^hight^quality^{sp eech}^analysis,mo dicationandsynthesismethodSTRAIGHT",J.

Acoust. So c.Jpn，^pp.189-192（¹⁹⁹⁷）

[2] H. Fujisaki andK. Hirose: \Analysisof voicefundamentalfrequencycontours fordeclarativesen-

tencesofJapanese",J.Acoust. So c.Jpn.(E)5,4(1984)

[3] M.AkagiandT.Ienaga：^\Speakerindividualitiesinfundamentalfrequencycontoursanditscontrol", J.Acoust. Soc.Jpn.(E)18,2 (1997)

[4] M.Hashimoto, N Higuchi：^\ Ânalysis ôf âcoustic ^features âecting ^{sp eaker} identication", Eu- rospeech'95,pp.435-438(1995)

JAIST Repository