Creating Speaker Independent HMM Models for Restricted Database Using STRAIGHT-TEMPO Morphing
全文
(2) both spcakcrs wc cxtracl thc rccognizcd vowcls parts, producing veclors of fundamcnta.l frcqucncy FO.(t) and �pcclrum IA.(t,1)1, whcrc t is thc time of tb口町plc日d f 悶the (req'uen�y i� the spcctrum for the speaker J. FO.(t) and IA.(t, /)1 arc obtむロcd from STRAIGHT by a.pplying pcoπυ= 1.0 a.nd fconv = 1.0.. 4.. Frorn. PARAMETERIZATION. Ta.ble 2 describes lhc expcrimεnl conditions(para.meter iza.tion uscd) for lhe fo11owiロg cxperimenl a.s weU a.s for lhe rcst of the experiments in this pa.pcr. Table 2:. ALgning thcse Tccognizεd vowcls belwccn bolh speakcrs Z札口d y, wilh vowcls of samc kind we gel pa.irs of ra..ngcs for thc fundarncnla..l frcqucncy. Pa.rametcriza.lioロ. (FO.(t..(p), i..(p)), FOパlv.(p),lνε(p))). 日d for lhc spcclrum. AU the recognilion ra.les il1 the ncxt section refcrs to ticd rojxture modcls. 55 phonemc roodcls wcre uscd. (ん(t.., t..(p), 1), Av{tv.(p), lv'(p), J). 5.. ".herc p is lhc pa.ir numbcr with pε1,..,P日d P is thc. EXPERIMENTS. rn;u:irnum nurnber of pa.irs wc c日以jgn within the a.va.ila.blc da.ta... The fìrst indcx of thc tirnc t refcrs lo thc spca.kcr回d lbe sccond to lhc starl s日d lhc cnd e of thc alignmenl rcspcclively.. The firsl expcrimcnl a.ims lo confìrm lhe reliability of lhc cstima.lion of pconu (pitch convcrsion ra.le) a.nd fconu (frequency convcrsion ralc). The secoロd experimenl wiU eva.lua.le how lbis a..lgoritbm works when a.pplied to children Using this a..1jgned pa.irs of vowcls we ca.n estimale pconυ data.. and fconυ, a.ccording to thc following equa.lions (1) , (2), (3) a..nd (4) 5.1. PITCH AND FREQUENCY RATE ESTI1\在ATION 凡(p). =. t.,(p) - t..(p). Nv(p ). =. tv'(p)ーら.(p). p…=. (1) (2). To evaluale pconu 日d fconu csliroa.lion a. mal巴(MHT ATR SetA [3]) speakcr w出morphed lowards a. fema.le voicc with a.ppropria.tc consta.nt pconυ a.nd fconu para.mcters, gencraling a. morphcd fcma.le voicc(FMHT).. (3). Tbe valucs of pconu日d fconu which a.re us巴d lo convert MHT speaker da.la. lowards FSU speaker da.la., a.s weU a.s lheir tccstimated va.lucs, arc shown in Ta.ble 3.. L. FOv(t). p�J"P Nコj;j. Z. FO. (t). N.(p). Z. 2二 f log 1ん(t ,1)1. z. 2 ンlog IAv(t, 1)1. 狩百j;j. 占 2二. f P戸�J"P. 万百予苛7. '�'..(p)"'.,(p). '�'v.(p)"'v,(p). Reeslima.led pconu and f conu a.re close to va..lues uscd for conversion of MHT to FMHT. Table ・3:. l l |. I. (1 ). 3.. DATABASE. Tbc voices of 6 childrcn have bcen rccordcd a.t 48kHz a.nd downsarnplcd to 16kHz. Tbe a.ges 日d gcndcr of the childrcn a..re shown iロ Ta.blc 1. The 210 words uttcrcd by 亡hildrcn are common use words for childrcn in tbcir a.gc・ Ta.ble 1:. Paramelcr pconv fconv. [ I 1. Used 2.22 1.25. [ 1 1. Eslima.led 2.17 1.27. [. 1. The word recognilion rale using MHT, FSU a.nd lhe morphed FMHT dala. a.re shown in Tablc 1. Models are lra..inεd wilh lhe odd nurobercd words and lcslcd a.ga.insl evcn numbercd words of lhe ATR SdA dala.ba.sc. Table 1: Word recognilion rale(%) FMHT is lbe MHT speaker morphed lowa.rds FSU spea.ker. Children da.ta.ba.sc dcscriplion. From lhis resull we conc1ude tha.l the morphed dala., a.pproxjma.lcd lo lbe rea.l fcma..le da.la., increa.se lhe recog nilion ra.te from 60.2%も087.3% Thc a.dult da.ta. we used wa.s lhe MHT a.nd FSU 5240 words a.nd MHT, MAU, MMY, FKN, FSU and FYM 216 ba..la.nced words of ATR SelA [3].. Also for an adull ma.lc voice morphed towards a fema.le vOJce, pconυa.nd fconu estiroa.tion divergc only 2%, atlcst ing lhe robus lncss of lhc a.lgorilhm.. 980 148.
(3) 5.2.. MORPHING ADULT DATA TO. Seco口d, adult voice wa.s recogn】zed with lhe model generated {rom the remaioing adult voices ( corre・ spond to the first row io Table 6 aod Table 7 ). •. WARDS CHILDREN DATA. .. The nexl experimenl aims lo evaluate thc cfficic日cy o{ STRAIGHT morphing adult voicc lowards children voice We used 6 adult speakers (3 male, 3 (emale ATR SetA) wilb 216 balanced words eacb. For comparison we used 6 cbild同日(3 male, 3 (emale), each childre日 uttcred 2 10 words containing all the Japanese phonemcs.. •. Thrce types o{ acoustic modcls were created: •. the lirst wa.s by using ooly childrcn voices. tbe tbird by using only morpbed data.. Tbe estima1ed pconv日d fconv bc1ween 6 adults aロd Results c回bc seen iロ Table 5 6 children bave been evaluated.. Table. l. 5: pconv祖d fconv betweeo speakers. e�� speakÚ. írom speak. '. ‘'"・101 01 02 03. 11. pcoov. Fourtb, childreo voicc wa.s rccognizcd with thc model generated {rom adult voices morphcd towards the chil dreo test data( correspond to the values after a back slash in Table 6 and Table 7 ).. All cxperiments are carricd out withio thc same geodcr 日d age. Tablc 6 and Table 7 shows tbc recogoition results. The first Ictter o{ the speaker name represents its geロder, whcre male and (emale are represcnted by M日d by F, respectively.. ・1he second by using only ad ult voices. •. Tbird, childrcn voicc wa.s rccognized with the model gcnerated from adult voices ( correspond to values that印the central part o{ Table 6 and Table 7 ).. •. 5.3.. FIX PITCH AND FREQUENCY CONVERSION R ATE. 10 oder to compare the est】matioo o{ pconv and fconv with the optimum values, we carried out recognitioo us ing models morphed io steps o{ 0.05 for each parameter, close to the estimatcd values. T bese recognitioo r目ults are showo io Tables 8 and 9.. I (coov I. 11 2.08 1 1.19 2.22 1.28 1.43 1.12 1.02 1.29 1.06 1.24 0.9 1 1.12. Table 8: Confirming pconv aod fconv male estimations by carryiog out recognition usiog near 10 optimum pconv and f印刷. Recognition ratc is cxprcssed in word accuracy. (男). •. Tablc 6: Word recogni1ioo rate (%) using "malc adult cnodel"r/"morphed model�). child. MCHOl MCH02 MCH03 fCHOl fCH02 FCH03. 85.2 59.5 90.0 73.3 70.0 73.3. �汁. 100.0 17. 1/68.3 1.4 17.1 4.3 1.4 8.6. ai五百. ロA1マ. JVnVIY. 98.2 4.8 1.0/39.2 7.6 2.4 0.5 5.2. 98.2 18.6 2.4 22.1/63.8 1.8 0.0 14.8. Tablc 7: Word recogoition rate (%) using "(emale adult modelnr/"morphed modeln). FKN. child. MCHOl MじH02 MCli03 上、CHOl FCH02 FCH03. 85.2 59.5 90.0 73.3 70.0 73.3. 95.4 65.7 39.5 71.9 49.1/49. 1 28.1 35.2. adult. FSU. FYM. 96.3 67.6 32.4 69.1 43.8 19.1/31.0 4 1.0. 93.5 6 1.0 45.2 68.1 5 1.4 37.6 4 1.4/33.3. Four expericnents were them performcd:. •. F irsl, cbildren voice wa.s recognized wi1h lbe model gencraled from the remai口ing children voiccs ( corre spond to the lirst coUum io Table 6 and Table 7 ).. ICSLP'98. The lixed pconv and fconv experimcnts resulted in the highest recognitio口near the optimum values ( see Table 5) obtaioed with lhe estimation algorithms ( equations (1) 叩d (2) ). This attests the robustness o{ the proposed algori1hms. For a male data morphed 10wards children dala, a higher increase o{ the recogoi1ion ra1e was achieved, while the {cmale data prescn1ed almost no significant improvemcnl. This shows that additional degrees of manipulation are nec cssary to morph adult data towards children data.. 981 149.
(4) τ冶bJc 9: Confirming pconu a.nd f conυ{crnalc cslirnalions by ca.rrying oul rccognilion using ncar 10 opljmurn pconu 日d 1conu. Rccognilioo ralc is exprεsscd in word accuracy. 7.. REFERENCES. 1. H. Kawaha.ra. Speech rcprescntation aod transforrna lion usiog adaptive iolcrpolalion of weighled speclrurn. Vocoder �eviscd IEEE inl. Conf. AC01LJ1., Spccch and. (%).. Signa/ PrOCCJJ., vol2, pagcs 1303-1306, Muenich, 1997. 2. H. Kawahara and de Chcveigne. Error free fO exlraclion rnelhod and its evaIualioo. Tech. Report 01IEJCE, SP・ 96-96・9.18, 1997. (in Jap日邸宅).. 3.. H. Kuwabara,. Y.. Sagisaka, 1<. Takedi\, and M. Abe,. uConslruclioo of ATR Japanese speech databa.se ぉ a resea.rch 1001," TechnicaI Reporl TR-I-0086, ATR,. 1989. (in Jap回目巴).. 6.. CONCLUSIONS AND FUTURE WORK. Tbis papcr preseoled. <1.0. allcrnalive way 10 incrcase lhc. dal.abase {or HMM acouslic rnodcl geoeralioo by using thc bigb.quaJity STRAIGHT- TEMPO algorilhrn. Morphiog adult dala lowa.rds adull data lbc a1goritbm increased lhc fcmale voice rccognilion r<l.lc using rnodcls lr;u口cd wilh rnalc dala úorn 60.2% 10 87.3% wilh rnorphcd dala Tbc algorilbrns proposed {or pilch and frcqucncy coo vcrsion rale eslimalion provcd 10 bc robusl {or adull dala. Thc incrcぉe in lhe word rccogoilion ralc {or childrc日 dala, ....b . eo adult dala is morphcd lowards cbildren dala, alt.csls lhe usdulness o{ the proposed rnclhod for bolh rnalc aod fcrnalc adull dala.. AduJl da.la. is morphcd lowards chiJdrcn dala iucrca.se lhe word recognilion ra.lc {or cbildren da.la., wbich allesls lhe uscfulness o{ lhe proposcd melhod {or bolh rnalc a口d {cmalc aduJl di\la. These way la.rgc arno\lols o{ i\dull rnalc 日d fcmale di\la can bc rnorpbcd 10 mi\lch childrco dala, ,,-'hilc cacb cbildrcn only occd 10 rccord small amounls of words cacb. Iロlhe fulurc we pl日10 invcsligalc a non lincar frc qucncy conversion,出wcll a.s a rnorεrobusl eslirnalion of lbe frcqucncy coovcrsion ratc, by adapling lhe frcqucncy range uscd 10 {oUow lhc {rcqucDcy convcrsion oblained. ACKNOWLEDGMENT. This work is supporlcd by CRES T ( Core Rcscarch for Evolulional Sciencc and Tcchnology ) , JAPAN. 982 150.
(5)
図
関連したドキュメント
V ulpe , Classification of cubic differential systems with invariant straight lines of total multiplicity eight and two distinct infinite singularities,
Mexican Northern Southern Western Cutworm species European Corn Borer Fall Armyworm 1 Flea Beetle species Grasshopper species Japanese Beetle (Adult) Sap Beetle (Adult)
Aphid species 2,3 Armyworm Beet Armyworm 1,3 Corn Earworm Diamondback Moth 3 Fall Armyworm 1 Flea Beetle species Grasshopper species Japanese Beetle (Adult) Leafhopper species
Aphid species 2,3 Beet Armyworm 1,3 Blister Beetle species Colorado Potato Beetle 3 Cucumber Beetle species (Adult) European Corn Borer 4 Fall Armyworm 1 Flea Beetle species
Aphid species 2,3 Armyworm Beet Armyworm 1,3 Corn Earworm Diamondback Moth 3 Fall Armyworm 1 Flea Beetle species Grasshopper species Japanese Beetle (Adult) Leafhopper species
Aphid species 2,3 Beet Armyworm 1,3 Blister Beetle species Colorado Potato Beetle 3 Cucumber Beetle species (Adult) European Corn Borer 4 Fall Armyworm 1 Flea Beetle species
Aphid species 2,3 Beet Armyworm 1,3 Blister Beetle species Colorado Potato Beetle 3 Cucumber Beetle species (Adult) European Corn Borer 4 Fall Armyworm 1 Flea Beetle species
Aphid species 2,3 Beet Armyworm 1,3 Blister Beetle species Colorado Potato Beetle 3 Cucumber Beetle species (Adult) European Corn Borer 4 Fall Armyworm 1 Flea Beetle species