0riginal Report
A Model of Coarticulation
MinoruSHIGENAGA KiyoshiTANAKA HitoshiARIIZUMI KenjiSAITO
(Received Octover 26,1970)
Synopsis
In thc study of speech, it is one of the principal problcms to investigatc the mcchanism of contcxt in specch and to cxpress it by rule. We have constructed a simple rule for the transition of vocal tract area fdnctions. Thc rule assumcs target configurations 51,32,... fbr cach phonemc, and express the area fUnction 50n the way of transition betweenぷ1 and 52as丘)110ws:ぷ=31十(S2−51)・τ(η)(η=normalized time, T(n)=normalizcd locus of second f()rmant frequency.F2 between SI and S2). For the transition among 51,ぷ2 andぷ3, the locus of F2 is assumcd as the product of the locus of F2 bctwecnぷ1 andぷ2(7’1(n))and the orle between 512 and S3(T2(n)), and oll the articulatory level it is expressed as fbllows: 5=Sl十[32十(53−S2)・τ2(n−nα)−51]・T1(η)(nα=instant of beginning to carry out thc command to move on to S3). It has been shown that the results calculated by our rulccoincide good cnough with the observed effects onγ1γ2γ1,γ10γ2 and Cア℃context
(V=vowe1, C−stop or nasa1)and that the results are available長)r phoneme discrimination.1. Introduction
In the study of spccch, it is one of thc fUndamental problems and plays an important rolc to make clcar the mechanism of coarticulation in speech and ex− press it by rule. This paper is conccrning with howthe coarticulation efflects apPear in the context
of Vl V2 V1, Vl C V2, etc.(V=vowel, C=consonant) and how we should move thc vocal tract area func− tion in order to realize the effects. Further this rule may contributc to thc discrimination among the cluster of like phonemes. The problem of coarticulation reportcd in thc past was in the aspects of frequency spectrum1), and when it is discussed on articulatory level2)thc rclation with the frequency spectrum is not clcarly solvcd. Suchaproblem may not be solved on the acoustical
level only but needs.、to investigate the movement of articulatory organ. However it is not easy totake X−ray photograph, morcover it is not yct
enough precisc to decide the vocal tract area fUnction from the latcral cineradiograph. Theref()re, it is necessary to make a model f()r the transition of area function. The writcrs have made efforts to makc a model and the f()rmant patterns calculated by themodel havc become qualitatively to coincide with
the observed ones and to be able to cxprcss the COartiCUIatiOn CffeCt.2.Amodel for the transition of vocal tract
area function_A rule of coarticulation
Our model is made up as f()110ws:−each phoneme has the proper target area fUnction (targct con−figuration)and uttering Phonemcs in succcssion is
expressed by combining thc target configurations
of each phoneme by rule. That is, if the targctconfigurations of two phoncmcs are exprcssed as
S1(x), S2(κ)(x=distance from glottis)respectively then the area fUnction S(x)on the way of transition from Si(κ)to S2(x)is shown as f‘)110ws:一 SIlp(x)=S11/P(κ)十[S21/P(κ)−Slllp(x)]・n (1)n=normalizcd time,0≦n≦1
Makc p bc equal to 2∼3 and the sccond formant
frequencies corresponding to S(x), S1(x) and S2(κ) be F, Fl and F2 respectively, then the f‘)llowing relation is apProximately realizcd. F=F1十(F2−」F1)・n (2) This relation is also realizcd approximately f()r the
transition of丘rst fbrmant frequency and has been
also applied to the transitions bctwccn any con−sonant and voweL Accordingly by making♪=3
and using T(n)instead of n in Eqs.1and 2, which is a normalized function of n coinciding approxi− mately with the F2 curve, the fbllowing cquation is obtained. S113(x)=S、113(κ)+[S21/3(κ)一ぷ、113(κ)]・T(・), 0≦T(n)≦1 (3)Moreovcr by adding thc tcrm of complementary
functions, the f()llowing equation is fbrmed f()r thetransition bctween two phonemes.
二:::欝}三蒙…三三曇驚X)°ぶ2(X)}1/3} (4) Where 1)LH@,κ)and K(n,κ)express the inhcrentfeatures fbr the coarticulation betwccn the pho−
ncmCS.3,9) Next Iet’s consider the case uttering three pho−nemcs successively. The target configurations of
each phonemc are rcprcscntcd asぷ1, S2 andぷ3,
then the F2 curve may bc cxpressed as thc product of the normalizcd time function 7「1@), which is thc normalized F2 curvc drawn on thc way of tran− sition from Sl toぷ2, and the fUnction T2(n_nα), which is the normalizcd F2 curve f()r the transition from S2 to S3 and nαis the instant of bcginning tocarry out the command to move on to S30n the
way of transition from Sl to S2. That is T’(猿ヌ馴 (5)
whcrc o is thc coe伍cicnt of transfbrmation bctwccn the normalized F2 curve from Si to S2(Tl(n))and the normalized F2 curvc fromぷ2 to S3(T2(n)). This rulc has bcen concluded fbr the F2 curvc drawn by successive utterancc of three vowels Vl V2171, but wc assume that to evcry contexts this rule is applicable.Rewriting this relation by fbrmant frequencies
and designating thc F2’s ofSl, S2,ぷ3 and s asF1, F2, F3 and F respcctivcly, we get the fbllowing equatlon・:≒芸一T・(n)・[1−:iゴi・T・(一・・)]
.°。F=F1+[F2+(F3−F2)・T2(n−n。) −171]・Tl(n) (6) This equation is interpreted as fbllows:−instead of the target frequency.F2 f()r the transition from Sl to S2, the F20n the way from S2 to S3, that is F2十 (F3−F2)T2(n−nα), is regarded as the target frequ− ency for the transition from Si to S2. Whilc, if we use Eq.3for the transition of the area fUnction, the relation between F2 curve calcu− 1ated from the area fUnction and thc time function for the transition of the area fUnction becomes almost linear. So the area function S among S1,ぷ2 and S3 is shown as fbllows:一 S1/3=S、1/3+[S21/3+(S31/3−S21/3)・T2(n−n。) −Sll!3]・T1(n) (7) In thc casc of morc than three phonemes, the above relation may be easily applicable on expanded forms. Morcover in the case of context of three vowels Vl V2 V1, if the F2 is traced and cach symbol isdefined as shown in Fig.1and F2t is the target
frequency of the sccond formant frequency of V2,the f()llowing rclation has been confirmed by
tracing the F2 curves fbr various V, V2 Vl, fbr ex− ample as shown in Fig.1. F2。=κ(F2z 一 F2t)・exp(一βτ)+F2t, (8) κ,β=constants This rclation shows the effect of coarticulation byobscrving the variation of the maximum or mini−
mum value of F2 curve, and have to coincide with thc rcsult obtained by Eq.5, consequently Eq.7. Now, if wc approximate both T1(n)and T2(n) by[1−exp(一αη)], the maximum or minimum point of」F2 curve occurs at the cross point of T1(n)and T2(n−nα)at which n==・τ/2, and the value at theAModel of Coarticulation
2.0 1.0 0.5 ⊃αw
》・1! \ } ⊃ 1 缶 = 1 ) 己o.3!) 1
0・21 f O.1 0.071 1 0 ) 、 F20 く ト ti()i 「 R自Rx x\。 1\ x 口 V, x x x 口 ・x:male talker A 。ロ:: 〃 〃 B, ・\ご ’ loio「 ヘへ ・ \< 。 .Sx e .. \\一 ゜ o°° ・ θ、\●\2 \ 0.04L.,一__一一 ・一.−秩E・∼一一t鼈黶E.n−→一一一_T−一_ 2.QO 400 τ(ms) Fig・1 Relation between(F20−F2t)/(F2i−F2t)and transitional duration τfor /ioi/ and /oio/ uttered by A and B(male). extremc point, which is the normalized.F20, is shown as f()110ws:一 [1−cxp(一ατ/2)]2 =1−2cxp(一ατ/2)十exp(一ατ) Ifατis not so small then [1−exp(一ατ/2)]2二ご1−2 exp(一ατ/2) (9)while by Eq.8the normalized F20 is cxpresscd as
f()llOWS:_ F20−F2i =1一κ・exp(一βτ) (10) F2¢−F2zNamely the normalizcd F20’s in Eqs.9and 10,
which are dcrived from Eqs.5and 8 respcctively,
take approximately the same fbrm. Theref()re, by moving the target con丘guration as shown in Eq.7, the relation of Eq.8may bc almost consistellt. The above equations havc bccn made sure f()r vowels, but by taking the inhcrcnt features of cach phoncmc into considerations Eq.7 may bc applica−ble to the transition among any phoncmes. After
all, the problem of coarticulation is exprcssed byEq.4fbr thc transition bctween two phoncmes
and by Eqs.5,60r 7 fbr the transition among thrce phonemcs quantitatively, and f()r the transitionamong morc phonemes the rule may be expandcd.
In thc abovc relations thcrc arc somc points which arc not provcd strictly but they are simplc exprcssions and consistent enough at first approximation, and we will explain by a few examples that these rclations are consistent and also uscful in specch rccognition. 3・Examples3.1 Vowels
In the case that threc vowels, Vl V2 Vl, are uttered succcssively at different speeds, Eq.8is established fairly well ifF2z, F20 andτare defincd as in Fig.1and F2t is equal to the targct frequency of・F20. Inverscly, the vowel V2 in V, V2 VI may bc dis− tinguishcd by means of calculation of F2t, sub− stituting measured F2i, F20 andτinto Eq.84). But in this case, instead of cxpressingκandβas fUnction of(F2z−F20), give thc timc fUnction T(n), then F2t may bc also calculatcd by Eq.11. (F2・−F2i)/(F、・−F2i)=T2(n) (11) Thc same rclation can be discusscd as to F1. And with the same calculation of Ftt versus、Flo, V2 canbc distinguishcd even when Vl V2 VI is uttercd
rapidly. But, there can be some cases in which inaccuracy is left in measuringτ.Fig.2 shows the calculated vocal tract area
functions and the f()rmant frequencies in the cases where we make nα in Eq.7bc equal to O.3 and O.5 f()rthe context of/i u i/. It shows that thc vocal tract returns to/i/without reaching/u/suMciently. And we can see that the faster the speed of thc uttcrance(the smaller nα)is this phonemcnon be. COmeS CIearCr.3.2 Nasal consonants
Let us considcr the case that V1ハτ V2(V,=/i, o/,2V=/m,11,η/and V2=/i, e, a, o, u/)are uttercd at different speeds. If we dcfine F2z andτas shown in Fig.3and・F2t designates the target frequency of F2z, we will find that the next rclation is consistcnt almost sufHcicntly as shown in Fig.4.6) F2z=κ(F2i 一・F2∂・exp(一βτ)十F2t, (12) κ,β=C・nstants1°
n
εご、
… § 貧5
ら 8 雪9
よ 芸 E 唱・ 3.0 ピ 2.5 2.3 2.0 ぱ ノ ノ \ / / \一・・n。−03/// ’t 一一’:・・=o・『// ∫ 1 ‘ 〆 ノ L ’ ・ ! l l l l l ∫ i F3 !1 エ ア 1 ’ ∫ 、 1 /ジニ」て一.ぷ/一∫−r7r \)/ \,,7K\i‘/ 1 J \1∫ 1、 ’ l lil一叫lli」
、 / ! \F2//’ \\ノ/ ∼ ∫ Fig.2 0.5 0.3 0.0 0.5 1.0 1.5 nOrmaliZed time n na=0.3, n==O.5 =0.5,n=0.6 IL ゴ− tt ア ’ r ドコ 10 15 distance from glottis x (cm) Transitions of area fUnctions and f()rmant 丘equencics of /iui/ fbrηα=0.3 and o.50btained by the modeL The smallerηα
corresponds to the faster utterance.N
F2i τ ・F,1 Fig.3 111ustration of symbols. Moving the vocal tract area fUnction by Eq.7 and calculating F2i,172z and T from the area function, we have plotted.F2z according to Eq.12 by makingF2t be identical with the F210f the monosyllable
(NV2)and it is shown in Fig.5. The result satisfies cnough thc actualization of Fig.4 qualitatively.7) In the case of monosyllables the discrimination among the nasal consonants/m/,/n/and/i)/as for cvery same following vowels may be possible by the accurate cxtraction of F2z. And if the discrimination only bctwcen/m/and/n/is required, cven in words, it is possiblc on real time.5)But it is impossible to discriminate/m/,/n/and/i]/in words by the ex一 ( N I ’N \A
吉 ・』 1 ・巴 0.7’ 0.5 0.3 0.2 {}ユ 1no IMO i)o 75丁 τ(ms) 1100 Fig.4 Normalized F2t plotted against nasal segment durationτin the contexts of/ino/,/ipo/and /imo/(male voice). 1.0A
N l 『N ] 、) \A
N l O.1 巴 0.05 ・0.0 9−. (normalized time)05
Fig.5 Normalized F2i plotted against nasal segment durationτ(in normalized time)calculated for/ino/by the model. ■ =1.0 ぎ’ 討OJ
mo ・ono IMO omo ▲ iり0 6’OIOo 75 τ(ms) llb・[ lml 100 Fig.6 F2t’s calculated by Eq.12 assigning l.O and O.017forκandβrespectively. traction of F210nly. Thercfbre, we tried the trans−formation by Eq.120f、F2z’s into the domain of
F2t. By that, F2t,s of/m/,/n/and/η/related to thesame f()110wing vowel were separated into each
AModel of Coarticulation
句 =5 1,21ka 1 3,4 1ikal lki 1: 〃 ノ1 /(3) di、tance f。。m gl・tti・x(㎝) Fig.7 Area f皿ctions at the instant of/k/−explosion [(1)and(3)]and the onset of the制lowing vowel/a/[(2)and(4)]fbr/ka/and/ika/ (nα=0.225) respectivelly calculated by the model. Curve(5)is the target configuration of/k/. Curves(3)and (4)show the in− fluence of the preceding vowe1/i/upon/ka/.nasal domain and the discrimination among them
became possible as shown in Fig.6.6)3.3 Stop Co血sonants
In the case of stop consonants, we also move thc vocal tract area fUnction according to Eq.7, which is now under particular considerations taking ac− count of the features of stops i.e. the existence of the pcriod of vocal tract closurc and the rapid movcment of the placc of constriction just after the instant ofcxplosion. Fig.7shows the arca fUnctions at thc
instant of explosion and thc onset of the fbllowing vowe1/a/for/ka/and/ika/calculatcd by the model. They show thc effect of coarticulation by/i/upon /ka/・Now we compare the actual values measured as
to the influences of coarticulation on VI C V2 andCVC(C=stop)with the results of calculation by
our modcl. (i) In thc case ofレ71CV2 For 7iC V2(Vl−/i,o/, C=/k,t,P/, V2=/a/)uttcrεdat different spcCds we have measufed thc second
fbrmant frequencies at the onsct of V2(、F2z). The F2z,s of/ka/,/ta/and/pa/, which are separatcd in the case of monosyllablcs, ovcrlap each other be− cause of the coarticulation. But when we investigate the relatiorトbetween lo9[(F2乙一F2t)/(F2i−F2t)]and τ,defining F2i, F2乞andτas same as in Fig.3(putting 焦 1.0《α5\,〈.
皇 遥 Lo・1b
O.05 0.56exp(0.019τ) 「 ▲ ▲ : °:. ・ 、 . o 0.03 50 100 150, τ(ms) Fig.8 Normalized F2z plotted against consonant seg。 ment durationτfbr various contexts ofγ1 C V2(V1=/i, o/, V2=/a/, C=/k, t, P/)(male voice). In Fig.8and 9 。:/ika/,・:/oka/, △:/ita/,▲:/ota/,+:/ipa/and×:/opa/・ 1.0 N 兵 、 1 ’& 1色 ’)z N 兵 0.1 1 N ・』 ) 0.02 0.2 A x 0.4 τ(n・rmalized tim・.) Fig. 9 Normalized F2z plotted segment durationτ (in calculated by the mode1. same as in Fig.8. 0.6 ザ agalnSt COnSOnant normalized time) Notations are the Cin place of N), we find almost linear rclation as shown in Fig.8. So the next relation can be concluded as same as in thc case of nasals. F2z=κ(」F 2z−F2の・exp(一βτ)十IF 2¢ κ,β=constants, F2t==target of F2z (13) We have calculatedF2¢’s inversely伽m the measur− ed、F2i, F21 andτ,6xingκandβto thc mean values f〈)rcach context as shown in Fig.8. Then F2t,s are divided into each domain of/ka/,/ta/and/pa/, which makc possible to distinguish each other.8)、 Our model also shows similar rcsults with theノ Fig.10 tこ1.0 ピ 「 : 巴0,5 > 己 1。.α2