Double-Talk Free Spoken Dialogue Interface Combining Sound Field Control with Semi-Blind Source Separation

全文

(1)DOUBLE- TALK FREE SPOKEN DIALOGUE IN T ERFACE COl\1BINING SOUND FIELD CONTROL WITH SEl\但・BLIND SOURCE SEPARATION Shigeki Miyabet， Tomoya Takatanit， Yoshimitsu Morit， Hiroshi Saruwatarit， Kiyohiro Shikano↑， Yosuke Tatekurai. ↑Nara Institute of Science and Technology {shig叫tomoya-t， yoshim-m， sawatari， shik姐o}@is.naist.jp :j: Shizuoka University [email protected] Response. ABSTRACT. sound. In 血is paper we in位oduce a new double-talk free spoken diaiogue interface combining sound field con官'01 and a source separation tech nique based on independent component anaiysis (ICA). First， sound field control provides silent zones on也e microphone elements a且d prevents 也e response sound企'Om being observed.. In 血e second. step， we propose a novel semi-blind source separation aigori血m to suppress 血e e町'Or caused by ßuctuation of白e r，∞m仕組sfer func tion.. By using a direct input of response sound signai to ICA， a. so町ce separation problem can be converted to a supervised learning problem. Since the problem becomes e箇ier， the proposed method showed higher performances tha且血e me血od using blind source sep. Fig.l. Con旬lTation of conventionai MO島町Imethod.. aration.. method c姐be improved by increasing血e numbers of loudspeakers. 1. 11可TRODUCTION. and microphone elements. In human-machine communication based on a spoken diaio伊e sys司. Though血e con町'01 of the MOMNl method is high1y robust against. tem， it is desirable 白紙a user can input his speech without wear. a ßuctuation of血e信組sfer function， there is still room for improve. ing speciai equipment. In addition，血e system should be ready for. ment in its performance. By applying組 adaptive process to update. receiving the user's speech input anytime to set 白e user f記氾合om. the filter coefficients of microphone array，出e MOMNl method ob. W包ting， even in白e moment when the system outputs a message to. tains組 adaptation faculty to ßucωation of the r∞m transfer func・. 也e user by sound (response sound). However， in such a situation. tions. In m組y types of array signai processing， one of the most. when the user and血e system utter simu凶且eously，白e user's speech. powerful candidate is blind source separation (BSS) based on in. utterance is observed mixed with the response sound and its speech. dependent component anaiysis (ICA) because BSS doesn't require. recognition performance degrades. For a successful realization of. double-talk detection.. 血e hands-f冗e spoken diaiogue system， a mecha且ism to eliminate the response sound is necess包7・. ing ICA a direct input of血e response sound signai 笛組組swer， we. To eliminate the response sound台'Om the system， an acoustic. make other output signais statistically independent of the response. echo canceller (AEC) is commonly used. M組y types of AECs have. sound， or in other words， only the response sound is eliminated.. been proposed， e.g.， single channel， stereophonic， wave syn也eSls，. 2. CONVENTIONAL MOMNI酔mmOD. 組d beamformer-integrated types [1， 2， 3]. However， the AEC has. 2.1.. 担血herent problem in which an accurate adaptation is di飴cult in 白e duration when both the user and 血e system utter simultane. Algorithm. ously (double-talk). Because of也is problem， the conventionaI AEC. The configuration of the. tect也e double-talk duration and stop adaptation; this implies 白紙. (k. room transfer function arises during double-talk. To solve 血e problem of也e AEC， one of the authors has pro. posed Multiple-Output and Mu1tiple-No-Input (MO:r.制1) me血od [4] ， which combines sound field con佐'01 組d beamforming. τ1国e MO�町I me血od controls血e sound field around microphones to be. MOMNI method is shown in Fig. 1.. M loudspeakers Sm， (m. should adapt filter coe飴cients when only the system utters，組d de 血e elimination performance is likely to degrade when a change in. In 血is paper we extend BSS and propose. semÎ-blind source separation which is a supervised learning. By giv. 1，. •••. We set. ， M)組dK +2 con甘'01 points Ck K + 2) to satisfy血e condition M > K + 2. K control. = 1，. .(k， = .. =. poin臼Ck. 1，...，K)訂e set on the microphone elements to. observe the response sound， and CK+I組d CK+2 are set on 也E. r(ω) [rR(ω)， rL(ωW where {V describes信組sposition組dωshows a且gular合equency， is a set. user's right組d left ears.百e vector. =. of signais intended to be repr吋uced at白e co凶01 points CK+I and CK+2， and白e vector. silent and prevents the response sound針。m being observed. In the. d(ω). second step， the observed signais are applied to delay-and-sum ar. = [dl(ω)，... ，dK(ω)，dK+l(ω)， (m = • ，，. dK+2 ( ω. W. (1). ray signai processing to improve 白e robus也ess of白e elimination. i s a set of signais at 血e con仕01 points. The room仕組sfer func. of the response sound. The elimination performance of the MO恥町I. tions between白e loudspeakers Sm. 1・4244・0469-X/06/$20.00 iþ2006 lliEE. 1. -809. 1，. .. M) and白e control. ICASSP2006. nwu ハU 1Eム.

(2) Fig. 2. Configuration of the simple connection of BSS wi血 MO阻ぜI. method.. points Ck (ω) (k 1，・ー，K+ 2)卸described byM x ( K+ 2) matrix G (ω)whose en凶es are the room剛sfer functions gkm (ω). To reproduce the input signals r (ω) on the co曲01 points Ck (ω)， we design 組Mx ( K十2) inverse filter matrix H (ω) by叫c山t ingMoore-Penrose generalized inverse I田町ix ofG (ω)composed of hmk (m 1 ，...，M， k 1，... ，K+2).τben， we truncate血e ma 肱H (ω) into H' (w ) which is anMx 2filter matrix composed of 也E創ter components hmk， (ω)(m =1，... ，M， k' =K+l，K+2) of H (ω). wì抽出s filter matrix， the following equation holds;. =. =. =. d (ω)=G (ω)H' (ω)r (ω)=[O，... ， O ， rR (ω)，九 (ω)f. (2) 』ー~ーー〆. K. Therefore， on one hand，血e r郎ponse sound signals equal the sig 凶s at the user's ears ([dK+l(ω) ，dK+2(ω) J =[rR (ω)，九 (ω)J) 組d reproduced strictly. On the other hand， silent zones are realized at microphone elements (dk (ω) =0 for k = 1，...，K) and the re sponse sound is prevented台om being observed at也e microphone elements 百en， delay-組d-sum array signal processing is applied to the observed signals. Since也e MOMNI method uses an inverse filter of也e r∞m 位ansfer function，血ree dimensional sound field reproduction c組 be presented. To make full use of血is prope託y， we make也e response sound signals (TR (ω)，九 (ω)) by multiplying the room回nsfer func tions gpri (ω) [gp組 (ω)， rpriL(ω)f between a primary sound so町ce and bo血 of白e user's e紅s， and a monaural so町'ce of也e response sound signal r田 (ω) as. =. (3 ) [rR (ω)，九 (ω)f =g伊(ω)r，町 (ω)・ τbis mechanism c組 present出E鉛町ce position of an agent of dia logue system wi血 high precision. 2.2. Response Sound EIimination E官。r羽'henChanging Room Transfer FunctiODS. TheMOl\⑪-n method can make its control robust against fiuctuation of血e room transfer白血ctions. Assume 血at the number of loud speakersM is enough larger也an the number of con住01 points，組d 也e condition number of血e inverse filter matrix approaches to 1. 百四， it is proved血at血e elimination error after fiuctuation of r∞m 回nsfer function is in propo凶on to 1/v'M支[4].Therefore，也e ro bustness of血eMOMNI method against the room甘組sfer functions is improved by increasing the number of the loudspeakers and the microphone elements. 3. INTRODUC町G町DEPENDENTCOMPONENT. 也e MOMNI method can obtain not only improvement of robust ness against room仕組sfer function but also environmental noise or another talker. 百ough most of adaptive 町ay signal processings require inforrnation of single-talk duration， BSS based on indepen dent component analysis c釦leam its filter coe飴cients only合om observed signals. In血is paper we assume that there is no additional noise and discuss only elimination of the mix佃re of也e response sound in也e observed signal caused by fiuctuation of也e room回邸ー fer function. However， in case there is some additional noise， by increasing the number of microphone elements and size of白e filter m凶x of ICA，出.e proposed method obtains ability to separate也e user's speech台om也e additional noise. 3.1. SimpleConnection of BSS with MO島町IMe曲od. τbe most simple idea is just to connect BSS wi血血eMO恥制I method as shown in Fig. 2. We define 組M-dimensional vector gk (ω) (k 1， K) composed of room 回sfer func位ons gkm (ω) (m 1， M ) between the k-血 microphone element and all血eM loud spe出rs before fiuctuation.百en we define g� (ω)也.e room位ansfer function after fiuctuation given by g�(ω) = gk (ω)+ムgk(ω)，. (4). whereムgk(ω)is a di伽印刷of gk(ω)組d g� (ω). If input signals are given by (3)， gk (ω)H' (ω) =0 and observed signal Xk at k-也 microphone element is given by Xk (ω). =. g�(ω) H' (ω)gpri (ω)rm: (ω) + Sk (ω). = ムgk(ω)H' (ω)g伊 (ω)r眠(ω)+ Sk (ω)，. (5). where Sk(ω) is a component of也e user's utter皿ce observed at血e b白microphone element. Equation (5) shows血at the number of independent signals included in Xk (ω) is two and回paration can be achieved by using two observed signals.τberefo印刷s method uses two microphone elements (K 2) 組d inputs observed signals of these miαophone elements to企巴quency-domain ICA伊D-ICA ). We define 2 x 2 separation filter ma肱W (ω) as. =. | ωll(ω)ω12(ω) 1 (ω)， : " " ) ( I :I: IW21(ω)ω22(ω)1 :--:. y (ω) = W(ω):1: (ω)=1:""):--: (. (6). where two dimensional column vecωr y (ω) = [Yl (ω)，Y2(ωW de scribes output signals. FD-ICA updates its filter W (ω) ω make its output signals statistically independent. The upd蹴of創ter coeffi cients are given by. ANALYSIS TO MO島町lMETHOD. In this section we propose 組 algorithm which apply ICA after the sound field control of theMOr.制1 method. The conventionalMOr.町I me血od adopts delay-and-sum町ay signal processing with fixed filter coe飴cients. Ifωme adaptive array signal pr∞essing is applied，. ==. • • •• ，• • ，. W++(ω) =W (ω)一η{ I一(φ (y (り))yH (り))t } W (ω)，. (7) where W ++(ω) is也e upda包d filter， y (ω，t) is y (ω) observed at time t，( .) t is a time average operaωr，ηis a step-size 阿山町，φ. I・810. -110-.

(3) is an activati'On functi'On like p'Olar functi'On [5] given by. (8). E m 円. Y1(ω)1) exp (j arg(Y1 (ω))) 1 φ(y(ω)) = Irt印刷I �;�h( i め (ω)1) 町 (j 紅g( 仰い))) |. Since the gain 'Of each f民quency has arbi甘ariness in FD-ICA， its 'Out put si伊als are dist'Orted. T'O c'Ompensate f'Or this， projecti'On back [6]. p(ω) = [P1(ω)，P2(ωw. processed by pr句ecti'On back can be written as. (. 必. p. ) ny. 、、tEtFノ可E E」 EE EE ) LW O 2 NS ) ぃ o l uu plati-L ) ー(ω W 〆'saE、、、 ohM 9“ 一一 ) ω (. is applied. In this case，也.e 'Output signals. ( - ) is an 'Operat'Orωmake a vect'Or c'Omposed 'Of必ag'On必. where diag. Fig. 4. Lay'Out 'Of ac'Oustic envir'Onment ro'Om.. c'Omp'Onents 'Of i臼argument.. In lear百ing 'Of FD-ICA， null-bearnf'Ormer with s'Ome reas'Onable directivity pattem is 'Often used as an initial filter. In additi'On， since. 百eref'O低the叫arati'On亘lter is 'Optimum. permutati'On ambiguity 'Occurs. T'O align the permutati'On， a direc. '0凶ywheω n 叫ω)ω / 川ω). identifìes the minus transfer functi'On between the input 'Of the inverse. fìlter c'Oefficients 'Of FD-ICA 'Of each企equency is leamed separately，. fìlter t'O the micr'Oph'One element. In fact， the 'Output si伊山血e pr句ecti'On back in (9) is given by. tivity pattem 'Of 也e separati'On fìlter is utilized [7]. H'Owever， as sh'Own in (5)， since 'Observed resp'Onse s'Ound is multiplied by n'Ot. p(ω) 'Of. p(ω) = fX1(ω)+詑151 h (ω) 1 r眠(ω) L J. ro'Om transfer functi'On but difference 'Of r'O'Om tra且sfer functi'On， it is di飴cult t'O fìnd reliable directivity pattems. T heref'Ore， we cann'Ot expect也is meth'Od perf'Orms as g'O'Od as 'Ordinary BSS.. 組d agree t'O. (13).. (14). On 'One hand，BSS aims t'O make an inverse fìlter 'Of. 3.2. Proposed Method: Semi-Blind Source Separation with Ob. 血E仕組sfer system and requires m'Ore fìlter length白血血E位ansfer. served S砲nal of a Microphone and Direct Input of Response. functi'Ons. T'O 'Obtain a g∞d pe巾m組ce with l'Ong fìlter length， FD. Sound. ICA req凶res l'Ong input signals.. Since the resp'Onse s'Ound. sig心 n r，rc(ω). is kn'Own f'Or the system，. we can use this signal as an input signal 'Of ICA. T heref'Ore， in the. 白at 'Of the transfer system.. In additi'On， since increasing血e number 'Of micr'Oph'One elements. pr'Op'Osed me白'Od， we use 'OnJy 'One microph'One element as sh'Own. in theMO.t.町1 method l'Owers the stability 'Of s'Ound fìeld c'Ontr'Ol，. 3 and leam the separati'On fìlter 'Of (6) in which :1:(ω) = [X1(ω)， r，rc(ωW is subs帥ted. 百en， if we町t'O make 組 'Output si伊al Y2(ω) t'O include '0凶y 血e c'Omp'Onent 'Of r皿(ω) ， that c'Ondi ti'On c組 be s抑制by setting W21(ω) = 0 because in Fig.. Y2(ω) = W21(ω)X1(ω)+ω22(ω)r眠(ω) =ω22(ω)r，rc(ω). Theref'Ore， by setting ω21. (ω) =. decreasing 'One microph'One element is benefìcial t'O 血e MOMNI me白od.. 4. SIMULATION h白is secti'On， we present tw'O experiments in which the prop'Osed meth'Od is c'Ompared with血e c'Onventi'Onal me也'Ods， i.e.， an ac'Oustic. (10). 0 お組凶tial value， the learning. C姐be started仕om 也e st瓜.e where 'One 'Of the signals is already. ech'O canceller a且d血eMOMNI meth'Od， and the simple c'Onnecti'On 'Of BSS t'O theMOMNI meth'Od discussed in Sect.. (8). ro'Om transfer functi'Ons， we perf'Orm a resp'Onse s'Ound eliminati'On Then we evaluate the perf'Orm組ce 'Of each method 'On the basis 'Of. ch組ges the value 'Ofω21 (ω ) ，血E. (ω) = 0 in every iterati'On. By血is c'Onstraint伽tω21 (ω) t'O be zero， Yl (ω) is u凶ated t'O be statistically independent 'Of Y2(ω) W21(ω)r町(ω) and the. a speech rec'Ogniti'On experiment t'O verify白e applicability 'Of血e pr'Op'Osed meth'Od t'O a sp'Oken dial'O忠le system. semi-blind c'Onditi'On c組 h'Old by substitutingω21. Y1(ω) = C(ω)SI(ω) ，. 4.1. ExperimentalConditions. ) -Ea 1 (. =. independence is s甜sfìed when and '0凶y when. T'O validate. experiment in which changes in血E仕組sfer functi'Ons are simulated.. pr'Oblem is n'Ot blind n'Or unsupervised. We call it semi-blind s'Ource Al出'Ough 也e update 'Of. 3.1.. 血e robustl!ess 'Of白e proposed meth'Od against血e fluctu組'On 'Of也E. separated. Since the separati'On 'Of 'One signal is fìnished， n'Ow this separatJ'On.. On the '0出.er hand， the prop'Osed. semi-blind s'Ource sep紅ati'On requires 'OnJy equal leng血 'Of fìlter t'O. Figure 4 sh'Ows the arr組gement 'Of也e app紅atuses. We placed a dummy head， which has 組 average human head and釦upper b'Ody， at也e user's positi'On. We designed the fìlters used in曲eMOMNI. C(ω) is an arbi佐町value. Since Y1(ω) c組 be given by. 祖d血.e pr'Op'Osed meth'Od w出血e r，∞m transfer functi'Ons bef'Ore. Y1 (ω) =ωll(ω)X1(ω)+ω叫ω)rsn:(ω) =( ωll(ω)ムgk(ω)H'(ω)gpri(ω)十ω叫ω)) r，rc(ω) (12) ω + ll(ω)SI(ω)，. fluctuati'Onぉi包fìlter c'Oe飴cíents，assumíng白紙i包adapta泊'On was. where. the c'Onditi'On. fluctuati'On. We gave 血e AEC the r∞m 甘ansfer functi'Ons bef'Ore. transfer functi'Ons.. H'Owever， after the fluctuati'On，也e adaptati'On. c'Ould n'Ot be perf'Ormed due t'O d'Ouble-talk. We evaluated the per f'Ormances with the average 'Of. (11) yields. ( ωll(ω)ムgk(ω)H'(ω)gpri(ω)+ω12(ω)) r"，，(ω) =0 (ω) 一一一 = -ßgk(ω)H'(ω)gpri(ω). ll(ω). performed accurately with'Out err'Ors bef'Ore由e fluc制ati'On 'Of the. 12 kinds 'Of impulse resp'Onses caused 30 cm. by m'Ovements 'Of a m組neq凶n. The interelement spacing was. W也 1 血e c'Onventi'Onal theMO�町1 meth'Od， and 6 cm with the sim ple c'Onnecti'On 'Of BSS.τ'he sampling f民quency was. (13). 1. 16 kHz. In the. learning 'Of ICA， we used the input signals 'Of early 5 sec'Onds. The length 'Of血e separati'On fìlters is 2048 taps.. - 811.

(4) 0 5 国』 (a) (b). (a). (c) Methods. (d). US訂's speech Resp.onse S.o四ld. 11. (d). (e). S. CONCLUSION We prop.osed a semi・blind s.ource separati.on a1g.orithm and applied it t.o血e sp.oken dia1.ogue interface using s.ound field c.on甘.01. As the resu1ts .of血e experiment，血e r.obustness .of s.ound eliminati.on and the perf.ormance .of speech rec.ogniti.on improved wi血 the prop.osed meth.od. From these findings，也e e飽cacy .of也e prop.osed meth.od is ascertained.. I'able 1. Experimenta1 ∞nditi.ons f.or speech rec.ogniti.on. 11 11 11 11 11 11. (c) Methods. Fig. 6. C.omparis.on .of WAs .of (a) ac.oustic ech.o canceller， (b) MOt.町Iwi也 1 micr.oph.one micr.oph.one element， (c) MOr.町Iwith Delay-and-sum wi血 2 elements， (d) simple c.onnecti.on .of ICA and MOt.町Iwi白 tw.o microph.one elements組d (e) Pr.op.osed method.. (e). Fig. S. Comparis.on .of SNR出組d SNR.，u， .of (a) ac.oustic ech.o canceller， (b)MOMNI wi血 1 microph.one micr.oph.one element， (c) MOMNI with Delay-and-sum with 2 elements， (d) simple c.onnec ti.on .of ICA and MOr.町1 with tw.o micr.oph.one elements and (e) pr.op.osed meth.od.. Task Fea加古vect.or Language m.odel Ph.oneme m.odel DeC.oder. (b). Newspaper dictati.on fr.om刑AS [8] 12 MFCCs， 12ムMFCCs， ßp.ower Newspaper d叫ati.on with 20，0∞ w.ords Ph.oneticτ'ied Mixture (PTM) [8] Julius ver. 3.4.2 standard [8]. 6. REFERENCES [ 1] E. Hänsler， “'Ac.oustic ech.o and n.oise c.ontr.ol: where d.o we c.ome fr.om - where d.o we g.o?，" in Proc. 7th 1nterna tional Workshop on Acoustic Echo and Noise Control， pp. 1-4， September2∞1.. 200 sen旬nαs (23 males and 23 fema1es) fema1e utterance. 4.2. Eval幽tion of Response Sound EIimination We eva1uated signa1-t.o-n.oise rati.os .of the .observed signa1 (SNR。ω 組d fina1 .output signa1 (SNR.，u') .of也e system in Fig 5 . These SNRs are just血e p.ower rati.os .of血e user's speech and the resp.o邸e s.ound. τ'heref.ore， dist.orti.on .of spectrum d.oesn't influence these sc.ores. W hen tw.o microph.one elements are used， we eva1uated their average. Regarding SNR.，b" the resu1t .of .one microph.one element sh.ows higher perf.orma且ce血組 tw.o micr.oph.one elements. H.owever， by the effect .of delay-and-sum町ay signa1 processing， SNR叩， .of tw.o eleme凶 is rec.overed t.o血e same level .of .one element. This revea1s 血at 也e c.onditi.on .of eight l.oudspeakers and tw.o microph.ones is a hard c.on diti.on f.or stable c.on佐ol .of血eMOMNIme出 .od， and its perf.ormance d.oesn't agree wi也白le law described in Sect. 2.2 出 at eπ.or is propo凶on剖to 1/ゾM K. In the simple ∞mbination of BSS and the MOMNI me血.od， BSS cannot improve SNR側仕.om its input because .of its p.o.or initia1 filter 組d diffic叫ty in s.oluti.on .of permutati.on. H.owever， the pr.op.osed me血.od impr.oves SNR.，u， c.onsiderably. This sh.ows白at血e e筒cacy .of semi-blind S.o町ce separati.on. 4.3. Speech Reco伊ition Experiment. [2] S.Makin.o組d S. Shimauchi，“Stere.oph.onic ac.oustic ech.o can cellati.on一組 .overview and recent s.oluti.ons，" in Proc. The 19991EEE Workshop on Acoustic Echo and Noise Control， pp. 12-19 ， September 1999 . [3] W. Herb.ordt， J. Yìng， H. Buchner， and W. Kellermann，“'A rea1time ac.oustic hurnan-machine合ont-end f.or mu1timedia applica ti.ons integrating r.obust adaptive beamf.orming and stere.oph.onic ac.oustic ech.o cancellati.on，" in Proc. 7，的1nternational Conf. on Spoken Language Processing， v.ol. 2， pp. 773ー776， September 2002 [4] Y. Hin皿1.ot.o， K.Min.o， H. Saruwat紅i，祖d K. Shik姐.0， “Inter face f.or barge-in f民e spoken dia1.ogue system based .on s.ound field c.on位。1 and micr.oph.one array，" in Proc. 2003 1EEE 1n temational Co，!戸on Acoustics， Speech， and Signal Processing， vol. 5， pp. 505-508， April 2003. [5] H. Sawada， R. Mukai， S. Aaraki， and S. M止血.0， "Polar c∞r dinate based on n.onlinear function f.or台equency d.omain blind s.ource separati.on，" 1E1CE Trans. Fundamentals， vol. E86-A， no. 3， pp. 59 0-596， March 2003.. [6] N. Murata 祖d S Ikeda， “'An On-line Algorigh位n f.or Blind Source Separati.on on Speech Signa1s，" in Proc. 1998 1nter. The effect .of血e response S.ound elimin鈎.on is eva1uated using a large v.ocabu1ary c.ontinu.ous speech rec.ogniti.on task. T.o eva1uate the speech rec.ogniti.on perf.ormance， we ad.opt w.ord accuracy (WA) as a且 eva1uati.on sc.ore[8]. Table 1 lists the experimen凶c.onditi.ons f.or the speech rec.ogniti.on. national Symposium on Nonlinear Theoηand its Applications，. vol. 3， pp. 923・926，September， 1998.. Figure 6 sh.ows the WAs wi血 all the c.ombinati.ons. All 血E sc.ores in the graph are a1m.ost proporti.ona1 t.o th.ose .of SNRs except f.or the simple c.onnecti.on .of BSS and白MOt.町1 meth.od. Be cause .of 白e permutati.on discussed in Sect. 3.1， simple c.onnecti.on has large dist.orti.on組d its perf.ormance is w.orse 出組曲eMOMNI meth.od. The proposed me血.od is n.ot S.o much affected by permuta ti.on and sh.ows the highest perf.ormance. 1-. [ 7] S. Kurita， H. Saruwa凶ri， S. Kajita， K. Takeda，組d F. ltakura， “Eva1uation .of blind signa1 separati.on meth.od using directivity pattem under reverberant conditi.ons，" in Proc. 2000 1EEE 1n ternational Conf on Acoustics， Speech， and Sigrω1 Processing，. vol. 5， pp. 3140-3143， June 20∞.. [8) A. Lee， T. Kaw油ara， and K. Shikan.o，“J凶ius - an open 50町民 rea1-time large vocabulary rec.ogniti.on engine，" in Proc. 7th Eu ropean Conf on Speech Communication and Technolog y， v.ol.3， pp.169 1 -1694， September 200 1 .. 812. 円〆臼 taム 11A.

(5)