JAIST Repository
https://dspace.jaist.ac.jp/
Title
文音声中の基本周波数の時間変化に含まれる個人性に関する研究
Author(s)
大野, 宏Citation
Issue Date
1997‑09Type
Thesis or DissertationText version
authorURL
http://hdl.handle.net/10119/1105Rights
Description
Supervisor:赤木 正人, 情報科学研究科, 修士contours of sentences
Hiroshi Ohno
Scho ol of InformationScience,
Japan AdvancedInstitute of Scienceand Technology
February 13, 1998
Keywords: fundamental frequencycontours,speakerindividuarity,Fujisakimodel.
1 Introduction
Thispap erdiscussessp eakerindividualityinfundamentalfrequencycontoursofsentencesbased
on analysisusing theFujisaki modeland psychoacousticexperiments. Thestimuliusedforthe
experiments are synthesized using STRAIGHT [1], whosefundamentalfrequency contours are
modied by the Fujisaki mo del. The experiment results indicate that (1) fundamental fre-
quencycontoursofsentenceshavemuchspeakerindividuality,(2)esp ecially,thebasefrequency
F
min
and the timing parameters (T
0
;T
1 and T
2
) in the frequency contour have more speaker
individualitythanother parametersandsubjectscanbe dividedintotwogroups,inwhichfun-
damentalfrequencyheightortimingof fundamentalfrequencydynamicsaectsdiscrimination,
and (3) sp eaker individuality can be controlled by manipulating a few parameters including
timing parameters.
2 Fujisaki model
A fundamentalfrequencycontoursF
0
(t)[2] asfollows:
lnF
0
= lnF
min +
I
X
i=1 A
pi G
pi (t0T
0i )+
J
X
j=1 A
aj fG
aj (t0T
1j )0G
aj (t0T
2j )g;
G
pi (t)=
(
2
i
texp(0
i
t) (t0);
0 (t<0)
(1)
G
aj (t)=
(
min[10(1+
j
t)exp(0
j t);
j
] (t0);
0 10 20 30 40 50
F m in Ap 0
F ratio
Ap
Ap 1 Ap 2 Aa 0 Aa 1 Aa 2 Aa 3 Aa 4
F min Aa
∆ T0
∆ T01
∆ T02
∆ T10
∆ T11
∆ T12
∆ T13
∆ T14
∆ T1 ∆ T2
∆ T20
∆ T21
∆ T22
∆ T23
∆ T24
Figure1: F ratioofeachparameter
where F
min
: baseline value of a F
0
contour, I: numb er of phrase commands, J: numb er of
accentcommands,A
pi
: magnitudeofthei-thphasecommand,A
ai
: amplitudeofthej-thphase
command, T
0i
:instant of occurrence of theith phrase command, T
1j
: onset of thej-th accent
command, T
2j
: end of thej-th accent command,
i
: natural angular frequencyof thephrase
control mechanism to the i-th phrase command,
j
: natural angular frequency of the accent
control mechanism tothe j-th accent command,and
j
: ceiling level of theaccent component
forthej-thaccent command.
3 Analysis of dierence in fundamental frequency contours on
sentence
Speech datafor all theexperimentsare sentencessuch as\aoiao
iga aoiyaneno ue n
iaru"(\"
meanspositionsofthe accent)|uttered byvemale sp eakers.
ParametersoftheFujisakimodelareestimatedbyminimizingthemeansquarederrorb etween
theextractedF
0
contourandthemodeledF
0
contouronalogarithmicscale. Theminimization
process utilizes theanalysis-by-synthesismethod.
To cho ose some physical characteristics representing sp eaker individualitiy in the analyzed
parameters,wecalculatedtheF ratio(inter-speakervariationdividedbyaveragedintra-speaker
variation)foreachparameter.
F
k
= P
n
i
c
ik 0
1
n P
n
i c
ik
2
1
N P
n
i P
N
j (c
ijk 0c
ik )
2
; 0
@
c
ik
= 1
N N
X
j c
ijk 1
A
(2)
where c
ijk
is the j-thobservation of the i-th speakerfor theparameter k. The larger F ratio
indicates the parametermore signicant for sp eakerclassifrcation. Notesthat the 1 of 1T
0i ,
1T
1j
and1T
2j
indicatedierencesb etweenthephasecommandtimingsandthemoraboundary
T
00 .
Sp eaker 5
Subject 5
Headphone SENNHEISERHDA200
HeadphoneAmp SANSUIAU-907MR
Hearinglevel 約 76dB(A)
Table2: t-test oftheexperimentresult(betweensyntheticspeech)
stimulisample same samesp eaker dierspeaker
O,ST 1.424 4.079 9.111
O,SF 1.585 3.654 9.199
ST,SF 1.187 0.115 0:265
t0:05=1:960;t0:01=2:576
4 Perception of speaker individuarity
In order to investigate fundamental frequency contours, mo deled by Fujisaki model, psychoa-
coustic experimentsused STRAIGHTsp eechwaveswith spectraland amplitudeexchanged.
The typ esofthestimuli arsasfollows:
1. O:originalsp eechwaves
2. ST:synthesized speech by STRAIGHT and TEMPO, whose sp ectra come from another
sp eakersp eech.
3. SF:synthesizedspeechbySTRAIGHTand Fujisaki mo del,whosespectraalso come from
anothersp eakerspeech.
Psychoacousticexperiment wasbymethod ofparired comparisonof ve judgescale.
The resultsoft-test amongthree stimuli areshowninTable2 and Table3.
The experiment results indicate that (1)fundamental frequency contours of sentences have
speakerindividulity,and(2)fundamentalfrequencycontoursbytheFujisakimodelhavesp eaker
individulityasmuch asthosebyTEMPO.
5 Shift of perception by each parameters
ThepsychoacousticexperimentusedABX method,thestimuli xresynthesizedbyexchangeda
fewparameter, and subjects judgedwhether thesynstheticsp eechxwas closertospeakeraor
speakerb.
stimle samestimulianddierspeaker somesp eakeranddiersp eaker
ST 41.024 61.221
SF 37.722 57.52
t0:05=1:960;t0:01=2:576
Table4: Parameterset
type A B C D E F G H
base a b a a a b b a
phrase a a b a a b a b
accent a a a b a a b b
timing a a a a b a a a
2. phrase A
pi
3. accentA
aj
4. timingT
0i
;T
1j
;T
2j
The exchangedparameters setsare showninTable4.
The psychoacoustic exp erimentresult is shownin Table5. This result is theaverage rate of
thatsubjects judgedspeakerb.
The experiment results inducate that (1)the shift of perception aect dierence of the pa-
rametersbetweenspeakers,(2)F
min
andthetimingparamerters(T
0
;T
1 andT
2
)inthefrequency
contourhavemorespeakerindividualitythanotherparameters,(3)subjectscanbedividedinto
twogroups,inwhichfundamentalfrequencyheightortimingoffundamentalfrequencydynam-
icsaects discrimination,and(4)speakerindividualitycanbecontrolled bymanipulatingthree
parameters including timing parameters.
The results inducate that the timing parameters in the fundamental frequency contours of
sentenceshavemorespeakerindividualitythanwords. Theexperimentresultobtainsameresult
of thereport[4], thesp eakerindividuarityaect dierenceofacoustic features.
6 Conclusion
In order to investigate sp eakerindividualityin fundamentalfrequency countours of sentences,
parameter extraction byFujisaki model, analysis of dierence, and thepsychoacoustic exp eri-
mentswere carriedout.
The resultsindicate thatfundamentalfrequencycontours of sentenceshavespeakerindivid-
uality,and timing parameterhavemore speakerindividualitythanother parameters.
parameter set A B C D E F G H
subject1 × △ ○ △ ○ ◎ ○ ◎
subject2 × ○ ○ △ ○ ◎ ○ ◎
subject3 × ○ △ △ ◎ ◎ ○ ◎
subject4 △ ○ △ △ ◎ ○ ○ ○
subject5 × △ × △ ◎ △ ○ ○
average × ○ △ △ ◎ ○ ○ ◎
perceptual r ate×:0〜5%;△:5〜20%;○:20〜40%;◎:40〜100%
References
[1] H.Kawaahara:\Ahightqualitysp eechanalysis,mo dicationandsynthesismethodSTRAIGHT",J.
Acoust. So c.Jpn,pp.189-192(1997)
[2] H. Fujisaki andK. Hirose: \Analysisof voicefundamentalfrequencycontours fordeclarativesen-
tencesofJapanese",J.Acoust. So c.Jpn.(E)5,4(1984)
[3] M.AkagiandT.Ienaga:\Speakerindividualitiesinfundamentalfrequencycontoursanditscontrol", J.Acoust. Soc.Jpn.(E)18,2 (1997)
[4] M.Hashimoto, N Higuchi:\ Analysis of acoustic features aecting sp eaker identication", Eu- rospeech'95,pp.435-438(1995)