韻律・音韻の部分補正に基づく話者性を保持した日本人英語音声合成と英語習熟度が与える影響
6
0
0
全文
(2) Vol.2015-SLP-105 No.3 2015/2/27. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ƵƌĂƚŝŽŶ ŵŽĚĞů. 1. ͡Ίʹ ΫϩεϦϯΨϧԻ߹ɼ͋Δޠݴͷൃऀͷऀੑ Λҟޠݴͷ߹Իʹөͤ͞Δٕज़Ͱ͋ΓɼऀੑʹΑ ΔใݯͷಛఆΛଅ͠ɼԁͳίϛϡχέʔγϣϯΛଅਐ ͢ΔׂΛ୲͏ɽಛʹຊͰɼӳؒʹ͓͚Δ߹ٕज़ ͷधཁ͕ߴ͘ɼԻ༁γεςϜɼւ֎өըͷਧ͖ସ͑. 䠆 ĚĂƉƚĂƚŝŽŶ. 䠖ĚĂƉƚĂƚŝŽŶĚĂƚĂ. ĚĂƉƚďLJ Ƌ͘;ϮͿΘ;ϯͿ. ,DD ^ƉĞĐƚƌĂů ŵŽĚĞů. ^ŽƵƌĐĞ ƐƉĞĂŬĞƌ͛Ɛ ŵŽĚĞů. džĐŝƚĂƚŝŽŶ ŵŽĚĞů. 䠆 䠆 䠆 䠆 䠆 dĂƌŐĞƚ ^ƉĞĂŬĞƌ͛Ɛ ŵŽĚĞů. ਤ 1 HMM Ի߹ʹ͓͚ΔϞσϧదԠ. CALL γεςϜ [1] ͷԠ༻͕ظ͞ΕΔɽ ͜Ε·Ͱʹɼ౷ܭత࣭มٕज़ [2] ӅΕϚϧίϑϞσ. Fig. 1 Model adaptation in HMM-based speech synthesis.. ϧ ʢHMM: Hidden Markov Modelʣʹͮ͘جԻ߹ [3]. ͋Δऀͷ HMM ͔Βඪऀͷ HMM ΛߏஙͰ͖Δɽ༧. ʹ͓͚ΔऀదԠٕज़ [4] ʹ͓͍ͯɼӳޠΛ͢ͱޠΔ. Ίֶश͓͍ͯͨ͠దԠݩϞσϧͱඪऀͷదԠσʔλΛ. ऀͷԻʹରͯ͠ɼόΠϦϯΨϧԻຊޠԻͱ͍ͬ. ༻͍ͯɼదԠݩϞσϧͷύϥϝʔλΛม͢ܗΔ͜ͱͰɼ. ͨࣗવੑͷߴ͍ԻσʔλΛͨ͠༻׆ऀมॲཧΛࢪ͢. ඪऀͱదԠ͞ΕͨϞσϧ͕ಘΒΕΔɽదԠޙͷฏۉϕ ˆ c ࣍ࣜͰ͞ࢉܭΕΔɽ ˆ ͱڞࢄߦྻ Σ Ϋτϧ μ. ख๏ [5], [6], [7] ͕͘͞ڀݚΕ͍ͯΔɽ͜ΕΒͷख๏ɼ ൺֱతߴ͍ࣗવੑΛ࣋ͭӳޠԻΛ߹Ͱ͖ΔҰํͰɼಉ Ұ͚͓ʹؒޠݴΔ߹Իͱൺֱ͢ΔͱɼऀੑͷྼԽΛ ট͕͋͘Δ [5]ɽ. c. ˆ c = Aμc + b μ. (2). ˆ c = AΣc AT Σ. (3). ͜Εʹର͠զʑɼຊਓӳޠʢERJ: English Read by. ͜͜ͰɼదԠߦྻ A ͱόΠΞεϕΫτϧ b ճؼύϥϝʔ. Japaneseʣ[8] Λར༻ͨ͠Ϟσϧߏஙɼ·ͨɼຊਓӳޠͷ. λͰ͋Γɼෳͷ͕ଐ͢ΔճؼΫϥε͝ͱʹਪఆ͞Ε. ӆޡΓʹର͢Δӆิਖ਼๏ʹΑΓɼऀੑΛ͘ڧө͠. ΔɽHMM Ի߹ͰɼεϖΫτϧύϥϝʔλɼԻݯύ. ͭͭࣗવੑΛվળ͢Δख๏ΛఏҊ͍ͯ͠Δ [9]ɽ͔͠͠ͳ. ϥϝʔλɼঢ়ଶܧଓ͕ਖ਼نͰϞσϧԽ͞Ε͓ͯΓɼ. ͕ΒɼຊఏҊ๏ͷධՁ݅ɼຊޠޠऀ͔ΒΔධ. ͦΕΒશͯʹରͯ͠దԠॲཧ͕ߦΘΕΔɽ͜ΕʹΑΓɼ. ՁऀͱগൃऀͷΈʹཹ·͓ͬͯΓɼධՁऀͷൃͱޠ. અతಛͷΈͰͳ͘ӆతಛಉ࣌ʹదԠՄೳʹͳΔɽ. ऀͷӳޠशख़ʹΑΔӨ͕ڹௐࠪ͞Ε͍ͯͳ͍ɽ·ͨɼ. ύϥϝʔλੜ࣌ʹɼೖྗςΩετΛղੳ͢Δ͜ͱͰ. ຊਓӳޠͷࣗવੑԼͷཁҼͰ͋ΔԻӆޡΓ͕ߟྀ͞Ε. ಘΒΕΔίϯςΩετʹ͖ͮجɼจ HMM Λߏங͢Δɽͦ. ͍ͯͳ͍ͨΊɼಘΒΕΔࣗવੑվળޮՌݶఆ͞ΕΔɽ. ͷޙɼܧଓϞσϧͷ࠷େԽʹΑΓঢ়ଶܧଓΛܾ. ຊߘͰɼධՁऀͷൃͱޠऀͷӳޠशख़͕ӆิ. ఆͨ͠ͷͪɼ੩తɾಈతಛྔؒͷ໌ࣔతͳ੍ͷԼͰɼ. ਖ਼ͷޮՌʹ༩͑ΔӨڹΛௐࠪ͢ΔͱͱʹɼແࢠԻεϖ. HMM ͷ࠷େԽʹΑΓύϥϝʔλΛੜ [11] ͠ɼϘ. Ϋτϧஔͮ͘جʹԻӆิਖ਼๏ΛఏҊ͢Δɽ࣮ݧతධՁʹ. ίʔμʹͮ͘جܗੜॲཧΛͯܦԻ͕߹͞ΕΔɽ. ΑΓɼ ʢ̍ʣύϫʔิਖ਼ʹΑΔࣗવੑͷվળޮՌɼӳޠ ޠऀʹΑΔධՁʹ͓͍ͯݦஶͰ͋Δ͜ͱɼ ʢ̎ʣӳޠशख़ ʹؔΘΒͣɼӆิਖ਼๏ʹΑΓࣗવੑ͕վળ͢Δ͜ͱɼ ʢ̏ʣԻӆิਖ਼๏ࣗવੑվળʹ༗ޮͰ͋Δ͜ͱΛࣔ͢ɽ. 3. ຊਓӳޠԻ߹ʹ͓͚Δӆิਖ਼๏ͱ Իӆิਖ਼๏ 3.1 ϞσϧదԠʹΑΔӆิਖ਼๏ ਤ 2 ʹϞσϧదԠʹΑΔӆิਖ਼๏ͷ֓ཁΛࣔ͢ɽ·. 2. HMM Ի߹ʹ͓͚ΔదԠٕज़. ͣɼӳޠޠऀͷӳޠԻΛ༻͍ͯɼӳޠޠऀʹର. ਤ 1 ʹ HMM Ի߹ʹ͓͚ΔϞσϧదԠͷ֓ཁਤΛ. ͢Δऀґଘ HMM Λֶश͢Δɽ؍ଌσʔλͱͯ͠༻͍Δ. ࣔ͢ɽHMM Ի߹ͰɼԻͷεϖΫτϧύϥϝʔλɼ. ԻύϥϝʔλɼରύϫʔɼεϖΫτϧแབྷύϥϝʔ. Իݯύϥϝʔλɼঢ়ଶܧଓΛɼHMM ʹͮ͘ج౷Ұతͳ. λɼԻݯύϥϝʔλͰ͋Γɼ֤ύϥϝʔλʹର͢Δग़ྗ֬. ΈͰϞσϧԽ͢Δ [10]ɽίϯςΩετΫϥελϦϯάʹ. ͱঢ়ଶܧଓ͕ಘΒΕΔɽ࣍ʹɼඪຊޠ. ΑΔΫϥε c ͷग़ྗ֬ bc (ot ) ɼ࣍ࣜͰද͞ΕΔɽ. bc (ot ) = N (ot ; μc , Σc ) ͨͩ͠ɼot =. . c t , Δct , ΔΔct. (1). ޠऀͷऀੑΛөͨ͠ӳޠԻ߹༻ HMM Λߏங͢ ΔͨΊʹɼඪऀͷຊਓӳޠԻΛ༻͍ͯɼ ӳޠޠ ऀͷ HMM ΛదԠ͢Δɽຊख๏ͰɼຊਓӳޠԻͷ. ɼ࣌ࠁ t ʹ͓͚Δ੩త. ࣗવੑΛྼԽͤ͞ΔཁҼͱͯ͠ɼܧଓͼٴύϫʔʹண. ಛྔ ct ͱͦͷҰ࣍ͱೋ࣍ͷಈతಛྔ Δct ɼΔΔct ͷ݁. ͠ɼঢ়ଶܧଓͱରύϫʔҎ֎ʹର͢ΔϞσϧύϥϝʔ. ߹ϕΫτϧΛද͠ɼN (·; μc , Σc ) ɼฏ ۉμc ɼڞࢄߦྻ. λͷΈΛదԠ͢Δ͜ͱͰɼӳޠޠऀͷӆΛߟྀͨ͠. Σc Λ࣋ͭਖ਼نΛද͢ɽ. ຊਓӳޠͷ HMM Λߏங͢ΔɽຊదԠ๏ʹΑΓɼඪ. HMM Ի߹ͰɼϞσϧదԠٕज़Λ༻͍Δ͜ͱͰɼ 1. ಸྑઌՊֶٕज़େֶӃେֶɹใՊֶڀݚՊ. ⓒ 2015 Information Processing Society of Japan. ຊޠޠऀͷऀੑΛग़དྷΔݶΓอ࣋ͨ͠··ɼࣗવੑ ͕վળ͞ΕͨຊਓӳޠԻͷ߹͕ՄೳʹͳΔ [9]ɽ. 2.
(3) Vol.2015-SLP-105 No.3 2015/2/27. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report ද 1. ධՁʹ༻͍Δख๏. Table 1 Synthetic speech samples used for evaluation. ख๏໊. ֶशσʔλ. దԠσʔλ. ӆิਖ਼. Իӆิਖ਼. ERJ. ຊਓӳޠ. –. ͳ͠. ͳ͠. 2. HMM+VC. ӳޠޠऀӳޠ. –. –. –. Adapt. ӳޠޠऀӳޠ. ຊਓӳޠ. ͳ͠. ͳ͠. Dur.. ӳޠޠऀӳޠ. ຊਓӳޠ. ঢ়ଶܧଓ. ͳ͠. Dur.+Pow.. ӳޠޠऀӳޠ. ຊਓӳޠ. ঢ়ଶܧଓɼରύϫʔ. ͳ͠. Dur.+Pow.+UVC. ӳޠޠऀӳޠ. ຊਓӳޠ. ঢ়ଶܧଓɼରύϫʔ. ແࢠԻεϖΫτϧ. Native. ӳޠޠऀӳޠ. –. –. –. 3.2 ແࢠԻεϖΫτϧஔʹΑΔԻӆิਖ਼. Z:ƐƉĞĞĐŚ. ਤ 3 ʹແࢠԻεϖΫτϧஔͮ͘جʹԻӆิਖ਼๏ͷ खॱΛࣔ͢ɽఏҊ๏ͰɼӳޠޠऀͷεϖΫτϧύϥ ϝʔλΛ෦తʹ༻͢Δ͜ͱͰɼຊਓӳޠͷԻӆΛิ. EĂƚŝǀĞŶŐůŝƐŚ ƐƉĞĞĐŚ. ਖ਼͢ΔɽԻߴԻऀੑ֮ʹ͘ڧӨ͢ڹΔ [12] Ұํ ͰɼແࢠԻͷऀґଘੑখ͍͞ͱ༧͞ΕΔɽͦͷͨ ΊɼຊਓӳޠͷແࢠԻͷஔʹΑΓɼऀੑΛอ࣋͠ ͭͭࣗવੑΛվળͰ͖Δͱߟ͑ΒΕΔɽ ·ͣɼӳޠޠऀ HMM ͱӆิਖ਼͞Εͨຊਓӳޠ. HMM ͔ΒɼͦΕͧΕԻύϥϝʔλΛੜ͢Δɽ͜͜Ͱɼ ֤ HMM ಉҰͷܧଓϞσϧΛ༗͢ΔͨΊɼੜύϥ. ແࢠԻʹରԠ͢ΔϑϨʔϜͷΈΛɼӳޠޠऀͷεϖ. džĐŝƚĂƚŝŽŶ. ĚĂƉƚ. ^ƉĞĐƚƌƵŵ džĐŝƚĂƚŝŽŶ. WŽǁĞƌ. WŽǁĞƌ. ƵƌĂƚŝŽŶ. ƵƌĂƚŝŽŶ. Z:,^DDƐ ǁŝƚŚŵŽĚŝĨŝĞĚ ƉƌŽƐŽĚLJ. ਤ 2 ϞσϧదԠʹΑΔӆิਖ਼ͷ֓ཁ. Fig. 2 An overview of the prosody correction method based on model adaptation technique. EĂƚŝǀĞ ,^DDƐ. ϝʔλ࣌ؒతʹରԠ͚ΒΕ͍ͯΔ͜ͱʹҙ͢Δɽ࣍ ʹɼຊޠޠऀͷεϖΫτϧύϥϝʔλྻܥͷ͏ͪɼ. ^ƉĞĐƚƌƵŵ EĂƚŝǀĞ ,^DDƐ. Z:,^DDƐ ǁŝƚŚŵŽĚŝĨŝĞĚ ƉƌŽƐŽĚLJ. ^ƉĞĐƚƌĂůƉĂƌĂŵĞƚĞƌ ^ǁĂƉƵŶǀŽŝĐĞĚĐŽŶƐŽŶĂŶƚ ǁŝƚŚEĂƚŝǀĞ͛Ɛ. ^LJŶƚŚĞƐŝƐ. WŚŽŶĞŵĞŵŽĚŝĨŝĞĚ ƐLJŶƚŚĞƚŝĐƐƉĞĞĐŚ ŽĨZ:ƐƉĞĂŬĞƌ. ^ƉĞĐƚƌĂůƉĂƌĂŵĞƚĞƌ džĐŝƚĂƚŝŽŶƉĂƌĂŵĞƚĞƌ. Ϋτϧύϥϝʔλʹஔ͢Δɽஔͷࡍʹɼஔޙͷεϖ Ϋτϧͱݩͷ༗ʗແใͷෆҰகʹΑΓੜ͡ΔԻ࣭ྼ ԽΛճආ͢ΔͨΊɼແࢠԻͷϑϨʔϜʹ͓͚Δӳޠޠ ऀͷ F0 ͕༗Ͱ͋Δ߹ɼ֘ϑϨʔϜΛஔ͍ͳ͠ɽ. ਤ 3. ແࢠԻεϖΫτϧஔʹΑΔԻӆิਖ਼ͷ֓ཁ. Fig. 3 An overview of the phoneme correction method based on spectrum swapping of the unvoiced consonants.. τϦʔϜͰֶश͢ΔɽϞσϧదԠ CSMAPLR+MAP[16]. 4. ࣮ݧతධՁ 4.1 ࣮ݧ݅ ֶशσʔλͱͯ͠ɼCMU ARCTIC Իσʔλϕʔε [13]. Λར༻͠ɼճʹྻߦؼ੩తಛྔɼ1 ࣍ͱ 2 ࣍ͷಈతಛ ྔʹରԠͨ͠ϒϩοΫର֯ߦྻΛ༻͍Δɽͨͩ͠దԠ࣌ ʹɼదԠσʔλͷऀͱಉ͡ੑผͷӳޠޠऀͷσʔ λͰֶश͞Εͨ HMM Λ༻͍Δɽ. தͷӳޠޠऀͷஉঁ֤ 1 ໊ʹΑΔ A ηοτ 593 จΛ. ఏҊ๏ʹΑΔӆิਖ਼ͷޮՌΛධՁ͢ΔͨΊʹɼද 1 ʹ. ༻͍ΔɽධՁσʔλಉ B ηοτ 50 จͱ͢Δɽֶशσʔ. ࣔ͢ख๏ʹΑΔ߹ԻΛ༻͍ͯɼऀੑɼࣗવੑ໌ͼٴ. λɼධՁσʔλɼͼٴɼదԠσʔλͷαϯϓϦϯάप. ྎੑʹؔ͢Δओ؍ධՁΛ࣮ࢪ͢Δɽ. 16 kHz Ͱ͋ΔɽԻύϥϝʔλͷੳʹ STRAIGHT ੳ [14] Λ༻͠ɼεϖΫτϧಛྔͱͯ͠ɼରύϫʔ͓Α. 4.2 ϞσϧదԠʹΑΔӆิਖ਼ͷޮՌ. ͼ 1 ͔࣍Β 24 ࣍ͷϝϧέϓετϥϜΛ༻͍ΔɽԻݯ. 4.2.1 ӆิਖ਼๏ʹ͓͚ΔධՁऀͷޠͷӨڹ. ಛྔͱͯ͠ɼର F0 ͼٴ5 पଳҬʹ͓͚Δฏۉඇप. ඪऀɼ20 உੑͷຊޠޠऀ 2 ໊ͱ͢Δɽ. ظΛ༻͍ΔɽϑϨʔϜγϑτ 5 ms ͱ͢Δɽ͜ΕΒͷ. Ұਓɼཹֶݧܦͷແ͍େֶӃੜͰ͋Γɼຊͷඪ४తͳ. Իύϥϝʔλʹ 1 ࣍ͱ 2 ࣍ͷಈతಛྔΛՃ͑ͨͷΛ. ӳڭޠҭΛड͚͖ͯͨऀͰ͋Δʢ“Monolingual”ʣ ɽ͏. ؍ଌϕΫτϧͱ͠ɼ5 ঢ়ଶ left-to-right ܕͷ HSMM[15] ͷ. Ұਓɼ1 ؒΦʔετϥϦΞͷཹֶݧܦͷ͋Δେֶੜ. ֶशΛߦ͏ɽରύϫʔͱϝϧέϓετϥϜಉҰε. Ͱ͋Γɼӳޠशख़͕ߴ͍ऀͰ͋Δʢ“Bilingual”ʣɽ্ ه2 ໊͕ൃͨ͠ ARCTIC Իσʔλϕʔεதͷ A ηο. 2. ैདྷ๏ [5]ʢͨͩ͠ɼҰରଟऀมͰͳ͘ຊਓӳޠΛ༻͍ ͨҰରҰऀมΛ༻ʣʹ͖ͮجɼӳޠޠऀͷऀґଘ HSMM ͷग़ྗԻύϥϝʔλʹରͯ͠ɼGMM ʹͮ͘ج౷ܭత ࣭มΛద༻. ⓒ 2015 Information Processing Society of Japan. τ 593 จΛదԠσʔλͱͯ͠༻͢Δɽ ऀੑͷධՁͰɼඪຊޠޠऀͷຊޠੳ߹ ԻΛϦϑΝϨϯεͱͨ͠ 5 ஈ֊ DMOSʢDegradation. 3.
(4) Vol.2015-SLP-105 No.3 2015/2/27. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. Mean Opinion ScoreʣධՁΛ࣮ࢪ͢ΔɽධՁ͢Δख๏ɼ. ϱ. ϱ. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. ؔ͢Δ 5 ஈ֊ MOSʢMean Opinion ScoreʣධՁΛ࣮ࢪ ͢ΔɽධՁ͢Δख๏ɼ“ERJ”ɼ“HMM+VC”ɼ“Adapt”ɼ. “Dur.”ɼ“Dur.+Pow.”ɼ“Native”ͷ 6 ͭͰ͋Δɽͳ͓ɼ֤. ϰ. ϰ. ϯ. DK^. ͷ 5 ͭͰ͋ΔɽࣗવੑͷධՁͰɼӳޠԻͷࣗવੑʹ. DK^. “ERJ”ɼ“HMM+VC”ɼ“Adapt”ɼ“Dur.”ɼ“Dur.+Pow.”. Ϯ. Ϯ. ϭ. ϭ. ϯ. ɼຊޠޠऀʹΑΔείΞͱൺֱͯ͠ɼେ͖͘ݮগ. ਤ 4. ૬ରతͳείΞͷ্ঢ͕ΈΒΕΔɽ͜ΕΒͷ݁Ռɼӳޠ ൃͷϦζϜ͓Αͼڧʹରͯ͠ɼӳޠޠऀຊޠ ޠऀΑΓաහͰ͋ΔͨΊͩͱߟ͑ΒΕΔɽ ͳ͓ɼ྆ޠऀʹΑΔࣗવੑʹؔ͢ΔධՁʹ͓͍ͯɼ. “Dur.” ͱ “Dur.+Pow.” ଞͷख๏ΑΓߴ͍είΞΛ֫ ಘ͍ͯ͠Δɽ·ͨɼऀੑʹؔ͢ΔධՁʹ͓͍ͯɼ“Dur.” ͱ “Dur.+Pow.” “ERJ”ͱಉͷऀੑΛอ͍࣋ͯ͠Δɽ ͜ͷ͜ͱ͔ΒɼఏҊ͢Δӆิਖ਼๏ͷ༗ޮੑ͕֬ೝͰ͖Δɽ Ҏ্ͷ݁Ռ͔Βɼຊޠޠऀͱӳޠޠऀͱͷؒ ʹධՁ݁Ռʹҧ͍͕ੜ͓ͯ͡Γɼӳޠޠऀͷํ͕Α Γӳൃޠͷӆʹରͯ͠හ͋ͰײΔ͜ͱ͕֬ೝͰ͖ɼ· ͨɼఏҊ๏ʹΑΔܧଓͼٴύϫʔิਖ਼ʹΑΓɼຊޠ. Ƶƌ͘нWŽǁ͘. Z: ,DDнs ĚĂƉƚ Ƶƌ͘. Ƶƌ͘нWŽǁ͘. Z: ,DDнs ĚĂƉƚ Ƶƌ͘. Ƶƌ͘нWŽǁ͘. prosody correction method. ϱ. ϱ. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. ϰ. ϰ ϯ. DK^. ऀʹΑΔධՁͰɼຊޠޠऀʹΑΔධՁͱൺֱͯ͠ɼ. ӆิਖ਼๏ʹର͢Δऀੑʹؔ͢Δओ؍ධՁ݁Ռ. DK^. ਤ 5 ʹࣔࣗ͢વੑʹؔ͢ΔධՁ݁Ռʹ͓͍ͯɼӳޠޠ. DŽŶŽůŝŶŐƵĂů. Fig. 4 Results of subjective evaluation of individuality for. ͢Δ͕ݟΒΕΔɽ࣍ʹɼӳޠޠऀͷύϫʔΛө ͨ͠ख๏ʢ“HMM+VC” ͱ “Dur.+Pow.”ʣʹண͢Δͱɼ. ŝůŝŶŐƵĂů. ;ďͿǀĂůƵĂƚĞĚďLJŶŐůŝƐŚƐƉĞĂŬĞƌƐ. ;ĂͿǀĂůƵĂƚĞĚďLJ:ĂƉĂŶĞƐĞƐƉĞĂŬĞƌƐ. Ϯ. Ϯ. ϭ. ϭ. ϯ. ŝůŝŶŐƵĂů. DŽŶŽůŝŶŐƵĂů. ;ĂͿǀĂůƵĂƚĞĚďLJ:ĂƉĂŶĞƐĞƐƉĞĂŬĞƌƐ ਤ 5. Z: ,DDнs ĚĂƉƚ Ƶƌ͘ Ƶƌ͘нWŽǁ͘ EĂƚŝǀĞ. (a) ͱ (b) ͷൺֱ͔ΒɼӳޠޠऀʹΑΔࣗવੑͷείΞ. DŽŶŽůŝŶŐƵĂů. Z: ,DDнs ĚĂƉƚ Ƶƌ͘ Ƶƌ͘нWŽǁ͘ EĂƚŝǀĞ. ͢Δɽਤ 4 ͷ (a) ͱ (b) ͷൺֱ͔ΒɼऀੑͷείΞɼ ҟͳΔޠΛ࣋ͭධՁऀؒͰಉఔͰ͋ΔҰํͰɼਤ 5 ͷ. ŝůŝŶŐƵĂů. Z: ,DDнs ĚĂƉƚ Ƶƌ͘ Ƶƌ͘нWŽǁ͘ EĂƚŝǀĞ. ʢ“ERJ” ͱ “Adapt”ʣʹ͓͚ΔධՁऀͷޠͷӨʹڹண. Z: ,DDнs ĚĂƉƚ Ƶƌ͘. ࣗવੑʹؔ͢ΔධՁ݁ՌΛࣔ͢ 3 ɽ·ͣɼิਖ਼ແ͠ͷख๏. Ƶƌ͘нWŽǁ͘. ਤ 4 ͱਤ 5 ʹͦΕͧΕɼӆิਖ਼๏ʹର͢Δऀੑͱ. Z: ,DDнs ĚĂƉƚ Ƶƌ͘ Ƶƌ͘нWŽǁ͘ EĂƚŝǀĞ. ͼٴޠӳޠޠऀ֤ 6 ໊ʹΑΓ࣮ࢪ͢Δɽ. Z: ,DDнs ĚĂƉƚ Ƶƌ͘. ධՁɼඪऀຖʹ࡞࣮ͨ͠ݧηοτΛ༻͍ͯɼຊ. ŝůŝŶŐƵĂů. DŽŶŽůŝŶŐƵĂů. ;ďͿǀĂůƵĂƚĞĚďLJŶŐůŝƐŚƐƉĞĂŬĞƌƐ. ӆิਖ਼๏ʹର͢Δࣗવੑʹؔ͢Δओ؍ධՁ݁Ռ. Fig. 5 Results of subjective evaluation of naturalness for prosody correction method.. ޠऀ 6 ໊ʹΑΓ࣮ࢪ͢Δɽ ਤ 7 ʹɼӳޠशख़ʢ“High” ͱ “Low”ʣຖʹूͨ͠ܭɼ. ޠऀͷऀੑΛอ࣋ͭͭ͠ɼӳޠޠऀʹͱͬͯࣗવ. ӆิਖ਼๏ʹର͢Δऀੑͱࣗવੑʹؔ͢Δओ؍ධՁ݁Ռ. ੑͷߴ͍ӳޠԻΛ߹Ͱ͖Δ͜ͱ͕͔Δɽ. Λࣔ͢ɽ·ͣऀੑʹؔ͢ΔධՁ݁Ռʹ͓͍ͯɼGMM ࣭. 4.2.2 ӆิਖ਼๏ʹ͓͚Δൃऀͷӳޠशख़ͷӨڹ. มΛར༻ͨ͠ख๏ “HMM+VC”ʹண͢Δͱɼ“Low”ʹ. దԠσʔλɼຊਓֶੜʹΑΔಡΈ্͛ӳޠԻσʔ. ରͯ͠ɼશϞσϧύϥϝʔλΛదԠͨ͠ख๏ “Adapt”ͱൺ. λϕʔε [8] தͷ࠷ߴʢ ʠHighʡ ʣ͘͠࠷ʢ ʠLowʡ ʣӳޠ. ֱͯ͠ɼऀੑ͕େ͖͘ྼԽ͢Δ͕ݟΒΕΔɽ“High”. शख़είΞΛ࣋ͭஉঁ ܭ4 ໊ʹΑΔ TIMIT[17] 60 จͱ. ʹ͓͍ͯɼྼԽͷఔখ͘͞ͳΔ͕ɼಉ༷ͷ͕ݟ. ͢Δɽͨͩ͠ɼຊߘͷӳޠशख़ɼσʔλϕʔεதͰఆٛ. ΒΕΔɽҰํͰɼఏҊ๏ͷܧଓ͓ΑͼύϫʔΛิਖ਼ͨ͠. ͞Ε͍ͯΔෳͷج४ʢԻૉੜɼϦζϜʣʹ͓͚Δධఆ. “Dur.+Pow.”ʹؔͯ͠ɼӳޠशख़ʹؔͳ͘ “Adapt”. ͷฏۉΛࢦ͢ɽධՁ๏ 4.2.1 અͱಉ༷ʢͨͩ͠ɼऀੑ. ͱಉͷऀੑΛอ͍ͬͯΔ͜ͱ͕͔Δɽ. ͷධՁͰɼඪຊޠޠऀͷຊਓӳޠੳ߹Ի. ࣍ʹɼࣗવੑʹؔ͢ΔධՁ݁ՌΛݟΔͱɼ“HMM+VC”. ΛϦϑΝϨϯεͱ͢ΔͷΈҟͳΔʣͰ͋Γɼऀੑͷධ. ͱൺֱ͠ɼ“Adapt””Low”ʹ͓͍ͯେ෯ͳྼԽΛੜ͡͞. ՁͰ “HMM+VC”ɼ“Adapt”ɼ“Dur.+Pow.” ͷ 3 ͭɼࣗ. ͤΔ͜ͱ͕͔Δɽ͜Εʹର͠ɼ“Dur.+Pow.”ɼӆิ. વੑͷධՁͰ “HMM+VC”ɼ “Adapt”ɼ “Dur.+Pow.”ɼ. ਖ਼ʹΑΓࣗવੑྼԽΛ͙͜ͱ͕ՄೳͰ͋Γɼӳޠशख़. “Native” ͷ 4 ͭͷख๏ΛධՁ͢Δɽͳ͓ɼ֤ධՁɼશ. ʹؔͳ͘ “HMM+VC”ͱಉͷࣗવੑ͕ಘΒΕΔ͜ͱ͕. ຊޠޠऀͷԻΛؚΜ࣮ͩݧηοτΛ༻͍ͯɼӳޠ. ͔Δɽ Ҏ্ͷ݁Ռ͔Βɼӳޠशख़ʹؔΘΒͣɼఏҊ๏͕݈ؤ. 3. ͨͩ͠ɼਤ 4 (a) ͱਤ 5 (a) [9] ͷ࠶͋ͰܝΔɽ. ⓒ 2015 Information Processing Society of Japan. ʹಈ࡞͢Δ͜ͱΛ֬ೝͰ͖ɼܧଓͼٴύϫʔิਖ਼ʹΑΓɼ. 4.
(5) Vol.2015-SLP-105 No.3 2015/2/27. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ϯ Ϯ. Ϯ. ϭ. ϭ. ,ŝŐŚ ֤छ߹ԻͷεϖΫτϩάϥϜʢൃจதͷ “consonants” ͱ͍͏୯ʹޠରԠʣͷྫɽ. ਤ 7. 4 ໊ʹΑΔ TIMIT 60 จͱ͢ΔɽऀੑͷධՁͰɼຊ ޠޠऀͷຊਓӳޠੳ߹ԻΛϦϑΝϨϯεͱ͠. >Žǁ. ͨϓϦϑΝϨϯεςετ (XAB ςετ) Λ࣮ࢪ͢ΔɽධՁ. ,ŝŐŚ. ͢Δख๏ɼ“Dur.+Pow.”ɼ“Dur.+Pow.+UVC”ͷ 2 ͭͰ. ;ĂͿ/ŶĚŝǀŝĚƵĂůŝƚLJ ਤ 8. ,ŝŐŚ. ĚĂƉƚ. EĂƚŝǀĞ. EĂƚŝǀĞ. Ϭ. Ƶƌ͘нWŽǁ͘. จɼฒͼʹɼ4.2.2 અͷ “High” ͘͠ “Low” ʹଐ͢Δ. Ϭ͘Ϯ. Ƶƌ͘нWŽǁ͘нhs. Ϭ. ɼ“Dur.+Pow.”ɼ“Dur.+Pow.+UVC”ɼ“Native”ͷ 3 ͭ. Ƶƌ͘нWŽǁ͘. Ϭ͘ϰ. Ϭ͘ϰ. gual” ʹΑΔ ARCTIC Իσʔλϕʔεதͷ A ηοτ 60. ϑΝϨϯεςετ (AB ςετ) Λ࣮ࢪ͢ΔɽධՁ͢Δख๏. EĂƚŝǀĞ. Ϭ͘ϲ. Ϭ͘ϲ. Ϭ͘Ϯ. ͋ΔɽࣗવੑͷධՁͰɼӳޠԻͷࣗવੑʹؔ͢ΔϓϦ. ĚĂƉƚ. Ϭ͘ϴ. Ϭ͘ϴ. దԠσʔλɼ 4.2.1 અͷ “Monolingual” “ ͼٴBilin-. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. EĂƚŝǀĞ. 4.3 ແࢠԻεϖΫτϧஔʹΑΔԻӆิਖ਼ͷޮՌ. ϭ. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. WƌĞĨĞƌĞŶĐĞƐĐŽƌĞ. ͷ͍ऀʹ͓͍ͯಛʹ༗ޮͰ͋Δ͜ͱ͕͔Δɽ. ӳޠशख़ຖʹͨ͠ࢉܭओ؍ධՁ݁Ռʢӆิਖ਼๏ʣ. (prosody correction method). ϭ. ຊਓӳޠͷऀੑΛอ࣋ͭͭ͠ɼࣗવੑͷߴ͍ӳޠԻ Λ߹Ͱ͖Δ͜ͱ͕͔Δɽ·ͨɼิਖ਼ޮՌɼӳޠशख़. >Žǁ. ;ďͿEĂƚƵƌĂůŶĞƐƐ. WƌĞĨĞƌĞŶĐĞƐĐŽƌĞ. for a word fragment “consonants”. ,ŝŐŚ. Fig. 7 Results calculated in each English proficiency level. Ƶƌ͘нWŽǁ͘нhs. Fig. 6 Example of spectrograms of synthetic speech samples. >Žǁ. ;ĂͿ/ŶĚŝǀŝĚƵĂůŝƚLJ. Ƶƌ͘нWŽǁ͘. ਤ 6. ,DDнs. 6 8. Ƶƌ͘нWŽǁ͘. ,DDнs. ϯ. Ƶƌ͘нWŽǁ͘нhs. EĂƚŝǀĞ. 0 2 4. ϰ. ,DDнs. 6 8. ϰ. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. Ƶƌ͘нWŽǁ͘. 2 4. ϱ. ϵϱйŽŶĨŝĚĞŶĐĞŝŶƚĞƌǀĂů. DK^. Ƶƌ͘нWŽǁ͘нhs. 0. ϱ. Ƶƌ͘нWŽǁ͘. Ɛ. ĚĂƉƚ. ƚ. Ƶƌ͘нWŽǁ͘нhs. 2 4 6 8. Ŷ. ,DDнs. 0. Ădž. Ƶƌ͘нWŽǁ͘. Ɛ Ădž Ŷ Ƶƌ͘нWŽǁ͘. ĚĂƉƚ. Ŷ. Ƶƌ͘нWŽǁ͘. ĂĂ. DK^. &ƌĞƋƵĞŶĐLJŬ,nj. Frequency (kHz)Frequency (kHz)Frequency (kHz). Ŭ. >Žǁ. ;ďͿEĂƚƵƌĂůŶĞƐƐ. ӳޠशख़ຖʹͨ͠ࢉܭओ؍ධՁ݁ՌʢԻӆิਖ਼๏ʣ. Fig. 8 Results calculated in each English proficiency level (phoneme correction method).. Ͱ͋Δɽ֤ධՁɼશͯͷຊޠޠऀͷԻΛؚΜͩ. “Dur.+Pow.+UVC” “Dur.+Pow.”ͱಉͷࣗવੑͼٴ. ࣮ݧηοτΛ༻͍ͯɼӳޠޠऀ 6 ໊ʹΑΓ࣮ࢪ͢Δɽ. ऀੑΛอ࣋Ͱ͖Δ͜ͱ͕͔Δɽͳ͓ɼ“Dur.+Pow.”ͱ. ͨͩ͠ɼධՁ݁Ռӳޠशख़ຖʹ͠ࢉܭɼ“Monolingual”. “Dur.+Pow.+UVC”ʹର͠ɼt ݕఆΛߦͬͨͱ͜Ζɼ“Low”. ͱ “Bilingual” ͦΕͧΕɼ“Low” ͱ “High” ʹଐ͢Δ. ͷࣗવੑͷΈ༗ҙ͕ࠩ֬ೝ͞Εͨʢp < .01ʣɽ. ͷͱ͢Δɽ ਤ 6 ʹɼ֤ख๏ʹΑΔεϖΫτϧάϥϜͷྫΛࣔ͢ɽਤ ͔Βɼ“Native”ͱൺֱ͠ɼ“Dur.+Pow.”ͰແࢠԻ෦. Ҏ্ͷ݁Ռ͔Βɼӆิਖ਼๏ͱಉ༷ʹɼఏҊͨ͠Իӆิ ਖ਼๏ࣗવੑվળʹ༗ޮͰ͋Γɼಛʹӳޠशख़ͷ͍ ऀʹ͓͍ͯ༗ޮͰ͋Δ͜ͱ͕͔Δɽ. ʢ/s/ͳͲʣʹ͓͍ͯɼಛʹߴपྖҬʹ͓͚ΔεϖΫτ ϧแབྷͷܗঢ়͕େ͖͘ҟͳΔ͜ͱ͕͔Δɽ͜Εɼύ. 4.4 ໌ྎੑʹؔ͢ΔධՁ. ϫʔิਖ਼Λ࣮ࢪͨ͠ࡍʹɼҟԻΛੜͤ͡͞ΔཁҼͱͳΔɽ. ఏҊ๏ʹରͯ͠ɼ໌ྎੑʹؔ͢Δॻ͖औΓࢼݧΛ࣮ࢪ. ͜Εʹର͠ɼ“Dur.+Pow.+UVC”Ͱɼ“Native”ಉ༷ͷε. ͢ΔɽධՁσʔλ SUS[18] 50 จͱ͠ɼධՁ͢Δख๏ɼ. ϖΫτϧแབྷܗঢ়͕ಘΒΕΔͨΊɼύϫʔิਖ਼ʹΑΔѱӨ. “HMM+VC”ɼ“Dur.+Pow.+UVC”ɼ“Native”ͷ 3 ͭͰ͋. ڹΛ؇͢Δ͜ͱ͕Ͱ͖ɼࣗવੑͷ্͕ظͰ͖Δɽ. Δɽͳ͓ɼ֤ධՁɼશຊޠޠऀͷԻΛؚΜ࣮ͩ. ਤ 8 ʹɼӳޠशख़ʢ“High” ͱ “Low”ʣຖʹू͠ܭ. ݧηοτΛ༻͍ͯɼӳޠޠऀ 6 ໊ʹΑΓ࣮ࢪ͢Δɽͨ. ͨɼԻӆิਖ਼๏ʹର͢Δऀੑͱࣗવੑʹؔ͢Δओ؍ධՁ. ͩ͠ɼධՁ݁Ռӳޠशख़ຖʹ͠ࢉܭɼ“Monolingual”. ݁ՌΛࣔ͢ɽ“Low”ʹண͢Δͱɼ“Dur.+Pow.+UVC”. ͱ “Bilingual” ͦΕͧΕɼ“Low” ͱ “High” ʹଐ͢Δ. “Dur.+Pow.”ͱൺֱͯ͠ɼऀੑΛಉఔʹอ࣋ͭͭ͠. ͷͱ͢Δɽ. ࣗવੑΛվળͰ͖Δ͜ͱ͕͔Δɽ·ͨɼ“High”ʹ͓͍ͯɼ. ⓒ 2015 Information Processing Society of Japan. ਤ 9 ʹɼӳޠशख़ʢ“High” ͱ “Low”ʣຖʹू͠ܭ. 5.
(6) Vol.2015-SLP-105 No.3 2015/2/27. ใॲཧֶձڀݚใࠂ. ϴϱ. tŽƌĚĐŽƌƌĞĐƚ tŽƌĚĐŽƌƌĞĐƚƌĂƚĞ tŽƌĚĂĐĐƵƌĂĐLJ tŽƌĚĂĐĐƵƌĂĐLJ. ϴϬ. [4]. ϳϱ ϳϬ. ਤ 9. EĂƚŝǀĞ. Ƶƌ͘нWŽǁ͘нhs. Ƶƌ͘нWŽǁ͘нhs. ,ŝŐŚ. ,DDнs. [5]. ϲϱ. ,DDнs. ZĞĐŽŐŶŝƚŝŽŶƌĞƚĞй. IPSJ SIG Technical Report. [6]. >Žǁ. ӳޠशख़ຖʹྎ໌ͨ͠ࢉܭੑʹؔ͢Δॻ͖औΓࢼ݁ݧՌ. Fig. 9 Results of dictation test on intelligibility calculated in. [7]. each English proficiency level.. ͨ ɼॻ ͖ औ Γ ࢼ ݁ ݧՌ Λ ࣔ ͢ ɽ“Low”ʹ ண ͢ Δ ͱ ɼ. [8]. “Dur.+Pow.+UVC” “HMM+VC”ͱൺֱͯ͠ɼ໌ྎੑ ͕վળ͍ͯ͠Δ͜ͱ͕͔Δɽ͜Εɼ“HMM+VC”ʹର ͠ɺ“Dur.+Pow.+UVC”͕࣋ͭԻӆิਖ਼ޮՌʹΑΓɼແ. [9]. ࢠԻԻૉͷ໌ྎੑ͕ճ෮ͨͨ͠Ίͩͱߟ͑ΒΕΔɽ·ͨɼ. “High”ʹ͓͍ͯɼ“Dur.+Pow.+UVC” “HMM+VC” ͱಉͷ໌ྎੑͰɼ“Low”ΑΓߴ͍໌ྎੑ͕ಘΒΕ͍ͯ. [10]. Δɽͳ͓ɼ“Dur.+Pow.+UVC”ɼ“Native”ͱൺֱ͢Δͱɼ ୯ޠਖ਼ղਫ਼ͷྼԽΛ “High”ʹ͓͍ͯ 5 ˋɼ“Low”ʹ ͓͍ͯ 8 ˋʹͱͲΊΔ͜ͱ͕Ͱ͖Δɽ. [11]. 5. ͓ΘΓʹ ຊߘͰɼຊਓӳޠԻ߹ʹ͓͚ΔऀੑΛอ࣋͠. [12]. ͨࣗવੑվળΛతͱͯ͠ɼϞσϧదԠʹΑΔӆิਖ਼๏ ʹରͯ͠ɼධՁऀͷൃͱޠऀͷӳޠशख़͕༩͑ΔӨ. [13]. ͍ͯͭʹڹௐࠪ͠ɼ·ͨɼࢠԻεϖΫτϧิਖ਼ʹΑΔԻӆ ิਖ਼๏ΛఏҊͨ͠ɽ࣮ݧతධՁʹΑΓɼ ʢ̍ʣύϫʔิਖ਼ʹ. [14]. ΑΔࣗવੑͷվળޮՌɼӳޠޠऀʹΑΔධՁʹ͓͍ ͯݦஶͰ͋Δ͜ͱɼ ʢ̎ʣӳޠशख़ʹؔΘΒͣɼӆิਖ਼ ๏ʹΑΓࣗવੑ͕վળ͢Δ͜ͱɼ ʢ̏ʣԻӆิਖ਼๏ࣗવੑ վળʹ༗ޮͰ͋Δ͜ͱΛࣔͨ͠ɽࠓޙɼඪऀຖͷԻ ӆޡΓʹ࠷ͮ͘جదͳิਖ਼๏Λݕ౼͢Δඞཁ͕͋Δɽ ँࣙ ຊڀݚͷҰ෦ɼ ʢಠʣใ௨৴ߏػڀݚͷҕୗݚ ڀʮࣝɾޠݴάϦουʹͮ͘جΞδΞҩྍަྲྀࢧԉγε ςϜͷڀݚ։ൃʯ͓Αͼ JSPS Պݚඅ 26280060 ͷॿΛ ड͚࣮ࢪͨ͠ͷͰ͋Δɽ. [15]. [16]. ࢀߟจݙ [1]. [2]. [3]. ߴಓ৻೭հɼେౡ༔࢘ɼాށஐجɼGraham, N.ɼSakriani, S.ɼɹதଜɿຊਓӳޠͷͨΊͷԻ߹ٕज़Λ༻͍ ͨӳֶޠशࢧԉͷݕ౼ɼڭҭγεςϜใֶձɼVol. 29, No. 5, pp. 111–116 (2015). Toda, T., Black, A. W. and Tokuda, K.: Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory, IEEE Trans. ASLP, Vol. 15, No. 8, pp. 2222–2235 (2007). Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J. and Oura, K.: Speech synthesis based on hidden. ⓒ 2015 Information Processing Society of Japan. [17]. [18]. Markov models, Proc. IEEE, Vol. 101, No. 5, pp. 1234– 1252 (2013). Yamagishi, J. and Kobayashi, T.: Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training, IEICE Trans. Inf. and Syst, Vol. 90, No. 2, pp. 533–543 (2007). Hattori, N., Toda, T., Kawai, H., Saruwatari, H. and Shikano, K.: Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation, Proc. INTERSPEECH, pp. 2769–2772 (2011). Liang, H., Qian, Y., Soong, F. K. and Liu, G.: A cross-language state mapping approach to bilingual (Mandarin-English) TTS, Proc. ICASSP, pp. 4641–4644 (2008). Qian, Y., Xu, J. and Soong, F. K.: A frame mapping based HMM approach to cross-lingual voice transformation, Proc. ICASSP, pp. 5120–5123 (2011). Minematsu, N., Tomiyama, Y., Yoshimoto, K., Shimizu, K., Nakagawa, S., Dantsuji, M. and Makino, S.: Development of English Speech Database Read by Japanese to Support CALL Research, Proc. ICA, Vol. 1, pp. 557–560 (2004). େౡ༔࢘ɼߴಓ৻೭հɼాށஐجɼGraham, N.ɼSakriani, S.ɼɹதଜɿHMM Λ༻͍ͨຊਓӳޠԻ߹ʹ͓ ͚ΔऀੑΛอ࣋ͨ͠ӆิਖ਼ɼ৴ֶٕใɼVol. 114, No. 365, pp. 63–68 (2014). ٢ଜࠀوɼಙాܙҰɼӹࢠ࢙وɼখྛོɼɹଜਖ਼ ɿHMM ʹͮ͘جԻ߹ʹ͓͚ΔεϖΫτϧɾϐονɾ ܧଓͷಉ࣌ϞσϧԽɼ৴ֶɼVol. J83-D2, No. 11, pp. 2099–2107 (2000). Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T. and Kitamura, T.: Speech Parameter Generation Algorithms for HMM-based Speech Synthesis, Proc. ICASSP, Vol. 3, pp. 1315–1318 (2000). Kitamura, T. and Akagi, M.: Speaker Individualities in Speech Spectral Envelopes, Proc. ICSLP, Vol. 3, pp. 1183–1186 (1994). Kominek, J. and Black, A. W.: CMU ARCTIC databases for speech synthesis CMU Language Technologies Institute, Technical report, CMU-LTI-03-177 (2003). Kawahara, H., Masuda-Katsuse, I. and de Cheveign´e, A.: Restructuring Speech Representations Using a Pitch-adaptive Time-frequency Smoothing and an Instantaneous-frequency-based F0 Extraction: Possible Role of a Repetitive Structure in Sounds, Speech Commun., Vol. 27, No. 3-4, pp. 187–207 (1999). Zen, H., Tokuda, K., Masuko, T., Kobayashi, T. and Kitamura, T.: Hidden Semi-Markov Model Based Speech Synthesis System, IEICE Trans., Inf. and Syst., E90-D, Vol. 90, No. 5, pp. 825–834 (2007). Yamagishi, J., Nose, T., Zen, H., Ling, Z.-H., Toda, T., Tokuda, K., King, S. and Renals, S.: Robust SpeakerAdaptive HMM-Based Text-to-Speech Synthesis, IEEE Trans. ASLP, Vol. 17, No. 6, pp. 1208–1230 (2009). Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G. and Pallett, D. S.: DARPA TIMIT acoustic-phonetic continous speech corpus, Technical report, NISTIR 4930, NIST, Gaithersburg, MD (1993). Benoˆıt, C., Grice, M. and Hazan, V.: The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences, Speech Communication, Vol. 18, No. 4, pp. 381– 392 (1996).. 6.
(7)
図
関連したドキュメント
全体の集音範囲で 一定の感 度を持 つ特 性をフラットと呼び、集音した音は原音 に 忠 実となります。ある範 囲の 感
日本語接触場面における参加者母語話者と非母語話者のインターアクション行動お
さて,日本語として定着しつつある「ポスト真実」の原語は,英語の 'post- truth' である。この語が英語で市民権を得ることになったのは,2016年
In order to estimate the noise spectrum quickly and accurately, a detection method for a speech-absent frame and a speech-present frame by using a voice activity detector (VAD)
噸狂歌の本質に基く視点としては小それが短歌形式をとる韻文であることが第一であるP三十一文字(原則として音節と対応する)を基本としへ内部が五七・五七七という文字(音節)数を持つ定形詩である。そ
patient with apraxia of speech -A preliminary case report-, Annual Bulletin, RILP, Univ.. J.: Apraxia of speech in patients with Broca's aphasia ; A
[r]
平成 28 年度は発行回数を年3回(9 月、12 月、3