Blind Source Separation Based on Fast-Convergence Algorithm Using ICA and Beamforming for Real Convolutive Mixture

全文

(1)BLIND SOURCE SEPARATION BASED ON FAST-CONVERGENCE ALGORITHM USING ICA AND BEAMFORMING FOR REAL CONVOLUTIVE MIXTURE t Hiroshi SARUWATARI， Toshiyaκ4�制MURA， Katsuyuki SAWA/， A tsunobuκ4MINUMA ， and Masao SAκ4TAt. Graduate School of Information Science， Nara Institute of Science and Technology 8916-5 Takayama-cho，Ikoma-shi， Nara， 630-0101， JAPAN tNissan Research C巴nter，NISSAN MσroR CO.， LTD. 1 Natsushima-cho， Yokosuka-shi，Kanagawa 237-8523， JAPAN. AßSTRACT We propose a new algorithm for blind source separation (BSS)， in which ind巴pendent component analysis (ICA) and beamforming are combined to resolve th巴 low-convergence problem through op timization in ICA. The proposed method consists of the fol1owing thre巴 parts: (1) frequency-domain ICA with direction-of.引rival (DOA) estimation， (2) nul1 beamforming based on the estimated DOA， and (3) integration of (1) and (2) based on th巴 algorithm diversity in both iteration and frequency domain. The inverse of the mixing matrix obtained by ICA is temporal1y substituted by the matrix based on null beamfo口ning through iterative optimiza tion，and the temporal altemation between ICA and beamforming can realize fast- and high-convergence optimization. The results of the signal separation experiments reveal that the signal separa tion performance of th巴 proposed algorithm is superior to that of the conventional ICA-based BSS method，even under reverberant conditions. 1. INTRODUCTION Blind source separation (BSS) is the approach taken to estimat巴 original source signals using only the information of the mixed signals observed in each input channel. This technique is ap plicable to the realization of noise・robust speech r巴cognition and high-quality hands-fre巴 telecommunication systems. In the recent works for the BSS based on the independent component analysis (ICA) [1]， several methods， in which the inverse of the complex mixing matrices are calculated in the frequency domain，have been proposed to deal with the arrival lags among each of the elements of the microphone a汀ay system [2，3，4]. However， this ICA-based approach has the disadvantage that there is diftìculty with the low convergence of nonlinear optimization [5]. In this papeにwe describe a new algorithm for BSS in which TCA and beamforming are combined. The proposed method con sists of th巴 fol1owing thre巴 parts: ( 1 ) frequency-domain ICA with estimation of the direction of a出val (DOA) of the sound source， (2) null beamforming based on the estimated DOA， and (3) in tegration of (1) and (2) based on the algorithm diversity in both iteration and frequency domain. Th巴 temporal utilization of null beamforming through ICA iterations can realize fast- and high convergence optimization. The fol1owing sections describe the proposed method in detail， and it is shown that the signal sepa・ ration performance of the proposed algorithm is superior to that of the conventional ICA・based BSS method. Also，the experiment in a r，巴al car environment shows that the separation performances of the proposed method are remarkably superior to those of th巴. 0-7803-7402-9/02/$17.00 @2002 IEEE. microphone 1. •••. d. microphone k. (d=d]J (d=dkJ Fig. 1. Contìguration of a microphone a汀ay and signals conventional DS a汀ay. 2. DATA恥IfODEL AND CONVENTIONAL ßSS恥1ETHOD In this study， a straight-line a汀ay is assumed. The coordinates of the elements are designated as dk (k 1，・・・，K)， and the directions of arrival of multiple sound sources are designated as 1γ. . ，L) (see Fig. 1). where we deal with the case of Bl (l. =. K. = == L. 2.. In the frequency domain， the observed signals in which mul tiple source signals are mixed are given by X(f)=A(I)S(I)， where X(I) [X1(1)，・・・，XK(f)]'r is the observed signal vec toにand S(f) [81(1)，・・. ， 8L (f) ]1 is the source signal vector A(I) is the mixing matrix which is assumed to be complex叫lued because we introduce a model to deal with the arrival lags among each of the elements of the microphone a汀ay and room reverbera tlOns. In the frequency-domain ICA， first， the short-time analysis of observed signals is conducted by frame-by-frame discrete Fourier transform (DFT). By plotting the spectral values in a frequency bin of each microphone input frame by frame， we consider th巴m as a time series. Hereafter. we designate the time series as X (1， t) =[X1(1，t)，・"，XK(I，t)]T. Next， we perform signal separation using the complex刊lued inverse of the mixing matrix， W(f)， so that the L time-series output Y(I， t)=[Y1(1，t)，' • • ，YL(f，tW =W(I)X(f，t) becomes mutual1y independent. We perform this procedure with respect to all frequency bins. Finally， by applying the inverse DFT and the overlap-add technique to the separated time series Y(I， t)， we reconstruct the resultant source signals in the time domain. In the conventional ICA-based BSS method， the optimal W (1) is obtained by the following iterative equation [2]:. 1 - 921 -125 -. ==. [ (. Wi+1(内必ag ( cþ附，収H(川t. ).

(2) where the superscript“(lCA)" is used to express that the inverse of the mixing matrix is obtained by lCA. [Step 3: DOA estimation] Estimate DOAs of the sound sources by utilizing the directivity pattem of the array system， Fl (f， 8)， which is given by K. FI(!，8) =乞WI�CA)(!) 叫[j21l'fdk sin8jc)，. (5). WJ;r;fC叫叫A川)υ(げf)凶iぬS. t山he 巴花 e 、J 目 e巳J 伽 W叫h町e釘悦r陀e pa剖tt陀ems， directional nulls exist in only two pa口凶i比c印u凶la訂r directions. Accordingly， by obtaining statistics with respect to th巴 directions of nulls at all frequency bins， we can estimate the DOAs of the sound sources. The DOA of the l th sound source， 81， can be es timated asÔl =2 81(!m)/N， where N is a to凶point of DFf， and81(fm) 閃presents the.DOA of the l th sound source at the m th frequency bin. These are given by. L��1. 。l」. -ーーーー. Fig. 2. Proposed algorithm combining frequency-domain ICA and beamforming.. 一(iþ(Y(f， t))yH(f，t)) t]W‘(f)+Wi(f)，. (1). where (・)t denotes the time引eraging operator， i is used to express the value of the i th step in the iterations， and ηis the step-size parameter. Also， we defìne the nonlinear vector function iþ(.) as. iþ(Y(f，t))三iφ(日(f，t))，"'，φ(YL(f，t))]T ，. 1 φ(お(f，t))三[1 +exp(一巧(R)(f，t))]1 + j. [1 +叫(-y;川f，t))]一. (2). 81(ん)=min[叫呼nlFl(ん，e)|，ugqn|日(fm，8)1]，. (6). 82(fm)=max[叫IヂIF1(!m，8)1，叫可�n 1日(fm，8)I]，. (7). where min[x， y] (max[x， y]) is defined as a function in order to obtain the smaller (Jarger) value among x and y. [Step 4: Beamfc)l'ming] Construct an altemative matrix for signal separation， WB ( F)(f)-， based on the null-beamforming technique where the DOA results obtained in the previous step is used. ln the. case that the look direction is81 and the directional nuJl is steered to82， the elements of the matrix for signal separation are given as. Wl(�F)(fm) =exp[ - j21l'fmd1 sin Ô1/C] {叫[j21l'fmdl(山Ô2-sinÔl)/c] x. (3). -exp[j2π!md2(sinÔ2 -sinム)/c]} -� WI(�F)(fm) =一閃[- j21l'んのsinÔI/c] {exp[j2πfmdl(sinÔ2-sin (1)/c]. where y;(R)(f，t) and y;(I)(f， t) are the real and imaginary parts of yj (f，t)， respectively.. x. 3. PROPOSED ALGORITHM. 一切[j2π!md2(sinÔ2-sinÔI)/c]}. The conventional ICA method inherently has a signifìcant disad vantage which is due to low convergence through nonlinear opti mization in ICA. In order to resolv巴出e. problem， we propos巴a叩na討1gorit出hm based on t出he t匂巴mpoωra叫1 altemation of leaming b巴飢twee叩nICA and be巴amf，伽0ωrmin tain巴d through ICA iおS t匂emporally substituted by t白h巴 mat凶rix based on null be伺amf，おorming for a temporal initialization or acceleration of the iterative optimization. The proposed algorithm is conducted by the following steps with respect to aJl frequency bins in parallel (see Fig. 2). [Step 1: InitiaIization] Set the initial Wi(f)， i.e.， Wo (f)， to an arbitrary value， wher巴 the subscripts i is set to be O. [Step 2: 1・time ICA iteration] Optimize W(i f) using the fol lowing l -time ICA iteration:. [ 吋 (φ附，収H(川t ). W� ; �A)(f) =η d. - (φ(Y(!，t))yH(f，t) \]Wi(f)+Wi(f)，. (4). 1 -. (8). -1.. (9). Also， in the case that the look direction is O2 and the directional. null. is steered to 01，. the elements of the. matnx are given. as. WJ�F)(fm) = -exp[ - j2π!mdl sinÔ2/C] { 一切[j2πfmdl(sinÔ1-sinÔ2)/c] x. +exp[j2π!md2(sinÔI-sinÔ2)/c]}-I， wrF)(fm) =exp[- jhんd2sinÔ2/c] {-exp[j2π!mdl(sin ê1-sin (2)/c] 1 +叫[j21l'fmd2(sinÔ1-sin (2)/c]}一. (10). x. (11). [Step 5: Diversity with cost function] Select the most suitable unmixing matrix in each frequency bin and each iteration point， i.e.， algorithm diversity in both iteration and frequency domain As a cost function used to achieve the diversity， we calculate two kinds of cosine distances between the s巴parated signals which are. 922. -126 -.

(3) 5.73m fF j 115川悦) 払2.15 斗事 ) (43t1h)φ. obtained by ICA and beamforming. These are given by. j(Iυ閃I悶m川C叫A. J. (1伊伊|ド肘吋げI川刊(1似|同ガ悶臥叫川A叫)り(げf川. 何凹問門川F町)(刊 (f). 刊. 門F川川件 (|ド1 吋R ず町P御町)川但門叫町町附B川( 以川 ~ 勺(f げ川仏仰 λμf川刈，りt). C where lí(I A ) (f， t )is the separated signal by ICA， and lí(BF)(f， t) is the separated signal by beamforming. If the separation per formance of beamforming is superior to that of ICA， we obtain C the condition， j<ICA)(f). > j(BF)(f); otherwis巴j(I A)(f)三 j(BF)(f). Thus， an observation of the conditions yields the fol lowing algorithm:. W(f). =. f W ;��A)(f)， (j(臥)(f)壬j(BF)(f) ) ' \ �1. W(BF)(f)， ��T)ii}\ ，':\" (j(臥)(f) \ I'�:{ ( : II CA \ I �: > j(BF) : (f) (BF ) :'. (14). If the (i + l)th iteration was the final iteration， go to step 6; oth erwise go beck to step 2 and repeat the ICA iteration inserting the W(f) given by Eq. (14) into Wi(f) in Eq. (4) with an increment of i [Step 6: Ordering and scaling] Using the DOA information ob tained in step 3， we detect and correct the source permutation and the gain inconsistency [6]. 4. EXPERI九1ENTS IN REVERBERANT ROOM 4.1. Conditions for experiments A two-element aπay with the interelement spacing of 4 cm is as sumedτne speech signals are assumed to arrive from two direc tions， -300 and 400 • 1\為'0 kinds of sentences， those spoken by two male and two female speakers selected from the ASJ contin uous speech corpus for research， are used as the original speech samples. Using these sentences， we obtain 12 combinations with respect to speakers and source directions. In these experiments， we use the following signals as the source signals: the original speech convolved with the impulse responses specified by different re verberation times (RTs) of 150 msec and 300 msec. The impulse responses are recorded in a variable reverberation time room as shown in Fig. 3. The analytical conditions of these experiments are as follows: the sampling frequency is 8 kHz， the frame length is 128 msec， the frame shift is 2 msec， and th巴 step-size parameter ηis set to be 1.0 X 10-5.. Fig. 3. Layout of reverberant room used in experiments In Fig. 4， it is evident that the separation performances of th巴 proposed algorithm are superior to those of the conventional ICA-based BSS method at every iteration point， even considering the additional computational cost of th巴 proposed algorithm. For example， compared with the conventional method， the proposed method can improv巴 the NRR of about 4.6 dB at the 50-iteration point in the conventional ICA when the RT is 150 msec. AIso， when the RT is 300 msec， the proposed method can improve the NRR of about 1.5 dB. Figure 5 shows a result of altemation between ICA and null beamforming through iterative optimization by the proposed algo rithm when the RT is 300 msec. In this figure， the symbol“・" represents that the null beamforming is used in the iteration point and frequency bin. As shown in Fig. 5， the proposed aIgorithm can work automatically as follows: (1) null beamforming is used for the acceleration of leaming at early times in the iterations because W(BF)(f) is a rough approximation of the inverse of the mixing matrix A(f)， (2) lCA is used after the early pa口 of the iterations because ICA can update the inverse of th巴 mixing matrix more ac・ curately， and (3) th巴 inverse of the mixing matrix obtained by ICA is substituted by the matrix based on null beamforming through whole iteration points at particular frequency bins wher巴 the inde pendence between the sources is low. From these results， although null beamforming is not suitable for signal separation under the condition that the direct sounds and their reftections exist， we can confirm that the temporal utilization of null beamforming for al gorithm diversity through ICA iterations is effective for improving the separation performance and convergence. 5. EXPERIMENTS IN CAR ENVIRON民1ENT. A two-element aπay with the interelement spacing of 4 cm is as sumed. The speech signals are assumed to arrive from two direc tions， 50 for the driver and 500 for the speaker in the assistant seat. The impulse responses are recorded in a real car environment as shown in Fig. 6， wher，巴we use 3 kinds of a汀ay position. The analytical conditions in this experiment are the same as those of the previous section， except for the sampling fr巴quency (which is 16 kHz). Figure 7 shows NRR results of the proposed method， where we also plot the results of the conventional Delay-and-Sum (DS) a汀ay with 16-element for comparison (a p吋ori information on DOAs was given in DS array). From this figure， it is evident that the separation performances of the proposed method are remark ably superior to those of the conventional DS a汀ay at every a打ay position. This indicates that the BSS is eff，巴ctive for speech 巴n-. -0. 4.2. Objective evaluation of separated signals In order to compare the performance of the proposed algorithm with that of the conventional BSS described in Sect. 2 for different iteration points in ICA， the noise reduction rale (NRR)， defined as the output signaトto-noise ratio (SNR) in dB minus input SNR in dB， is shown in Fig. 4. These values were averages of all of the combinations with respect to speakers and sourc巴 directions. As for the proposed algorithm， we also plot the NRR which is r巴scaled by the computational cost (see dott巴d lines) because the proposed algorithm has a computational compl巴xity of about 1.9fold compared with the conventional ICA.. 1 - 923 127.

(4) 14 . ã)12� /........11: W 豆 1 f j戸 � 10 � L./ 市 l 不匡. �. 11 ，i. 11 8Hi. :x- -. k i 6lj 11: 3 4 lIj 戸/， �.. 0:. z. 2. 00. .. --. 4000 3500 1::-・ 3000 巴一一一一一一-- N I ・・旦2500 þ: 〉、 gω 2000 ー司ーー『田ー..."....._・v・yo-、. ・� 31500 IJ... 1000 500. - _...... w . . . . . . . .一“. . � _ -ーーー__ __ーーーーー-ーーーー.'掛~・・ -…・・・・・. �. V 円. *. CO加n附M附ve削削e引制n川州tiω伽iぬion加n州 CA-x十. P向ro叩p閃o悦s則e吋dM胤e附+. 。。. Proposed Method (rescaled by computational cost)・#・. 50. 10. 150. 100 Number 01 Iterations. 200. 〆書7Hi 1 1; ￥ 6Hi Q 川，ち5Hi コ川 E. rr �. I. y. a-- 一一一一一-ーー J. I. I. リ. 2 4 1jF リ・. 3l 3 Hi � 2勝. I. 80. 100. Fig. S. The result of altemation between ICA and null beamform ing through iterative optimization by the proposed algorithm. The symbol“・" represents that the null beamfoロTIing is used in the iteration point and frequency bin. The RT is 300 msec.. (〉山. 事8145;y-. 60. Number of Iterations. ý'. 掛H・H・--中山主午M・M ・....…・・・-…… … - …・・..… ・報 .. 9f. 40. 20. - � -/- � ・ -"" -. Conventional ICA -xー. Back. 打。 Array_ 3 金盈. Proposed Methodー← Proposed Method (rescaled by computation cost)・骨・. 。. 。. 50. 100. 150. 200. Number 01 Iteratoions. Fig. 4. Noise reduction rates for different iteration in ICA. Rever beration time is 150 msec (top) and 300 msec (bottom).. Fig. 6. Layout of aπay in car cabin used in experiment. hancement in the car environment 6. CONCLUSION. In this paper， we described a fast- and high-convergence algorithm for BSS where null beamforrning is used for temporal algorithm diversity through ICA iterations. Th巴 results of the signal separa tion experiments reveal出at the signal separation performance of the proposed algorithm is superior to that of the conventional ICA based BSS method， and the utilization of null beamforrning in ICA is effective for improving the separation performance and conver gence， even under reverberant conditions. Also， the experiment in a real car environment shows that the separation p巴rformances of the proposed method are remarkably superior to those of the conventional DS a汀ay.. Array. Array 2. Array 3. Intemational Symposium on Nonlinear Theory and Its A p plication (NOLTA '98)， vol.3， pp.923-926， Sep. 1998 [3] P. Smaragdis，“Blind separation of convolved mixtures in the frequency domain，" Neurocomputing， vo1.22， pp.21-34， 1998 [4] L. Parra and C. Spence，“Convolutive blind separation of non-stationary sources，" IEEE Trans. Speech & A udio Pro cess.， vol.8， pp.32{}-327， 2000. 7. ACKNOWLEDGEMENT. ηle authors are grateful to Dr. Shoji Makino， Mr. Ryo Mukai of NTT. CO.， LTD， and Mr. Masaru Yamazaki of NISSAN MOTOR CO.， LTD. for their discussions on this work. This work was pa口Iy suppo口ed by NISSAN MOTOR CO.， LTD. and CREST (Core Re search for Evolutional Science and Technology) in Japan.. [5] H. Saruwatari， S. Kurita， K. Takeda， F. Itakura， and K. Shikano，“Blind source separation bas巴d on subband lCA and beamfoπning，" Proc. ICSLP2000， vol.3， pp.94-97， Oct. 2000.. 8. REFERENCES. [6] S. Kurita， H. Saruwatari， S. Kajita， K. Takeda， and F. Itakura，“Evaluation of blind signal separation method us ing directivity patt巴rn under reverberant conditions，" Proc. ICASSP2000， vol.5， pp.3140-3143， June 2000.. [1] P. Common，“Independent component analysis， a new con・ cept?，" Signal Processing， vo1.36， pp.287-314， 1994 [2] N. Murata and S. Ikeda， “An on-line algorithm for blind source separation on speech signals，" Proceedings 011998. 。。円ノU 11ム. 1. 1. Fig. 7. Noise reduction rates for different aπay position.. -. 924.

(5)