Permutation-Robust Structure for ICA-Based Blind Source Extraction

全文

(1)PERMUTATION-ROBUST STRUCTURE FOR ICA-BASED BLIND SOURCE EXTRACTION Yu 1ìαka加shi，TomoyαTakα:tani， Hiroshi Saruwαtari， Kiyohiro Shikano Nara Institute of Science and Technology， Ikoma， Nara 630-0192 JAPAN (e・m氾1: [email protected]). signal via delay-and-sum (DS) procedure. Indeed BSSA partially in. ABSTRACT. volves peロnutation problem in血e ICA-based noise estirnator p紅t. h出is paper， we investigate a new blind source separation (BSS) S回cture合om a permutation-robustness vie，叩oint， to mitigate the permutation problem which co=on1y arises in合叫uency-domain independent component ana1ysis (ICA). Permutation robustness me組S that how much白BSS me血od is not alfected under a certain prob・. However，BSSA can e節cient1y suppress白e negatIve a仔'ection of the permutation owing to the over-subtraction in spec住'a1 subtraction and defocusing properties in DS. E血cacy of the proposed method can be revea1ed by artifìcia1 and rea1-recording-based simu1ations.. ability of arising permutation， unlike也e conventiona1 permutation-. 2. BLIND SPA:τ1AL SUBTRACTION ARRAY [5]. solving approaches. We address to ana1yze our previously proposed. 2.1. Overview of BSSA. BSS architecture， so ca11ed blind spatia1 sub回ction array (BSSA). In. BSSA consists of a delay-and-sum array (DS) based primary p抽. BSSA， so町'ce ex位action is achieved by subtracting the power spec. a且d a reference pa出 for the ICA・based noise estimation (see Fig.. 位um of白e estimated noise via ICA from 也e power spec佐um of. 血.e primary path in血e power-spectrum domain without phase infor. Indeed BSSA P紅白a1ly involves perrnutation problem in the ICA-. mation. The detailed signa1 processing is shown below.. based noise estirnator part. However， BSSA can efficient1y reduce 出e negative alfection of the permutation owing to the over-sub佐action m血e spec住a1 sub仕action and defocusing prope凶es in DS. Exper-. 2.2. Par蝕a1 spωch enhancement in primary path First，也e short-tirne ana1ysis of observed signals is conducted by a. iments using artifìcial and rea1-recording-based simu1ations revea1. frame-by-台'ame discrete Fourier 仕組sform (DFT). The l-channel. that也e proposed method ouゆerforms血e conventiona1 ICA.. 町ay's observed signal is given by. lndex Terms- Speech enhancement， acoustic signa1 process. X(j，T)= [X，(j，T)，...，XJ(j，T)]T. ing， acoustic arrays. = A(f) (S(j， T) + Nぴ，T)}， (1). S(j，T)= [0ぃー，0， S u(j，T)，O，...， O]T，、ーー---、ーー、.....-K-U u-，. 1.町TRODUCTION Blind so岨rce sep唱ration (BSS) is the approach to estimate origina1. [1]. where. have been presen飽d for ωoustic-sound separation [2， 3]. Particu-. (2 ). N(j，T)=[N，ぴ，T)，..，N，山ぴ，7う，αN品lぴ，7ヴド叫NKぴ，7う]T，. sources using 0凶y information of observed signa1s. Recent1y， vari ous BSS methods based on independent∞mponent佃a1ysis (ICA). 1).. The estimated noise component by ICA is e値cient1y sub回cted企'om. p釘t1y speech-enhanced signal via delay-and-sum (DS) proced町'e.. (3). f is 出e fぬquency b泊and T is白e合ame number， A(f) is a N(j，T) is a. mixing matrix， S(j， T) is a凶get speech signa1 vector，. larly， f同quency-domain ICAσDICA) [2] is the most pop凶訂ap pro也ch to ad合ess 血e convolutive BSS problem. In FDICA， how. noise signa1 vector， U expresses白e target speech number， and K is. ever， source permutation ambigt且ty arises in each frequency bin，組d. the number of sound sources. In the primary pa出， the t釘get speech. heavily d阿'eases白e resu1tant qua1ity.τ'herefore， it is indispensable. si伊a1 is p副y e山初ced in advance by DS. This can be given as. for us to a1ign the permutation so 血at each sep紅ated si伊a1 con. Y(j，T)=W6s(f)Xぴ" T) =W6s(f)A(j)S(j，T) + W6sωA(j)N(f，T)， S WDS(j)=[阿D )(j)，…，叫mω]T，. tams企'equency components from出e s創出S口町ωA1though various permutatio也solvers， e.g.， directio也・of-紅riva1 (DOA) based method， have be岨proposed [2， 3，4]， permutation problem c組not be solved completely. In ad岨ition， increase of the permutatio岨-sa1vagi且，g accu racy req凶res higher∞mputational costs.. where. c. is sound velocity. Besides， Bu is也e estimated DOA of the t紅get. tant property h直s never been studied so f:釘泊the previous ICA re. speech， which is given by ICA p副泊Sect. 2.3.. searches. The improvement of permutation robustness with sma11. In Eq. (4)，血e. second term in血e right-hand side expresses the remaining noise in. computations is a novel a且d e節cient way for increasing由直BSS. 血e output of血e primary path.. qua1ity. How c組 we cons住uct a permutatio岨-robust BSS? The姐 swer is wi白血our previously propo防d BSS 紅官hitecture， so called. 2.3. ICA・based noise estima値on in reference path. [5]. In BSSA， sourc泡ex位ac. The proposed BSSA includes ICA-based noise estimation in the ref. tion is achieved by sub往acting由e power spec住umof出e estima包d. erence pa也In ICA part， we perform signa1 sep釘ation using a com. noise via ICA from白e power spectrurn of p副Iy speech・enh飢ced. plex valued unmixing matrix W'CA(f)， so 白紙 the output signals O(j，T) = [O，(j，T)，...，OK(j，T)]T become mutually independent;. This work was p訂t1y supported by MEXT e-Society leading project.. -. WDSび) is a白Iter coe侃cient vector of DS， M is 出e DFT. size，五is a sampling frequency， dj is a microphone position， and. d町 a certain proba凶ity of arising permutation， and such an impor. 1. Y(j， T) is a prim紅y-pa血output which slight1y enhances t訂get. S戸ech，. b岨stness means白紙how much the BSS method is not atfected un. 1-4244-0728-1107/$20.00 <<::l2007 IEEE. (5). 1ゃう町. つめ mitigate the problems， in 血is paper， w宮 investiga也 a new BSS s汀ucture from a "perrnutation-robustness" viewpoint， unlike 出e conventiona1 permutation-solving approaches. P町mutation ro. blind spatia1 sub町卸値on array (BSSA). (4). - 149. 75. -. ICASSP2007.

(2) 3. PERMlπ:ATION-ROBUSTNESS ANALYSIS町BSSA 3.1. Overview. Reference Patn. Fig. 1.. h出is S巴ction， we present a permutation-robustn巴ss analysis in BSSA architecture. In白e conventional ICA， when the permutation 紅ises， we direct1y suffer from the permuted noise component which is wrongly regarded as血e target signalτ'hus白e conven白onal ICA has no rcト bustness against the permutation. on the other hand， in BSSA， ad・ verse effect by the permutation is mitigated because spectral-subtraction based ωurce extraction technique reduces白e permuted component，組d DS defocuses白e component arriving台om out of look direction. Therefore， we c釦say伽t BSSA archi句cture is a permutationrobust struc旬re. The detailed analysis is shown below.. !. Block diagram of proposed BSSA.. 3.2. Perrnutation robustness by over-subtraction. Here， we assume也at so町ce sep紅ation was performed perfect1y by FDICA except for arising permutation in the frequency bin fP. Under 白is assurnption， the estimated target speech signal in白E白官quency bin fp by ICA (inclu曲g PB processing) can be described as. 出is proced町e can be represented by O(f，T) = W1CAぴ)X(f，T). (7). WELl]ω=μ[1 - (111 (O(f，T)) lJ'f(f，T))，] W弘ω+Wl広(凡(8) where μis血.e step-size p紅ameter， [p] is used to express the value of血e p-也 step in the iterations， and 1 is釦 identity ma住ix. Be sides， 0， denotes a time-averaging operator， MH denotes co吋ugate tr加spose of matrix M， andφ(・) is血e appropriate nonlinear vector function [3]. At 由e same time， we can estirnate DOAs by 1∞k・ ing at n叫1 directions in也e directivity pattem which is shaped by W1CA(J) [3]， and we designate DOA of the target speech signal ω Bu・ In the r巴ference pa山， target signal is not req凶red because we want to estirnate on1y the noise component. Accordingly we remove 曲目eparated speech component Ou(f，T) from ICA outputs O(f，T)， and construct the following “noise-on1y vector，" Q(f，T); Q(f，T) = [OiCf，T)，…，OU-I (f，T)，0， OU+1 (f，T)，…，OK(f，T)jT .. Y臥(ふT) = A(ん)N.(ん，T)， N.(fp，T) = [0，. . . ，0，Nn(fp，T)，0，.. • ，O]T，、ー-.----' 、ーー、___.， K-n ，，-1. where Y1CA{jp， T) is 白output signal vector as a 凶get by ICA， N.CふT) is a noise signal vectorω山田凶as凶耳目 speech signal vector by mis拙.e， Nn(fp，T) is a noise component es阻ated as凶， get speech component by mistake， and n(学 U) expresses血e com ponent number of noise. Moreover， since N.(ふT) is composed of zero com凹nents exc巴pt the sμcific noise component Nn(んT)， Y1CA(fp，T) can be rewritten as. (9). Y1CA(ん，T) = Â(ん)叫〔ん，T)，. Next， we apply血e projection back (PB) [2] me由od to remove 由e ambiguity of amplitude. This procedure can be represented as. λ(ん) = [A1n(μ..，AJn(fp)f，. E(f，T) = WtCA(J)Q(f，T)， (10) where � denotes M∞re-Penrose pseudo inverse ma住ix ofM. He毘， Q(f，T) is composed of on1y noise components. Therefore， E(f，T) is a good estimation of出e received noise signals at the array;. (16) (1 7). where Â(fp) is a transfer function vecωr of the noise component Nn{jp，T)，組d Aij(J) expresses組 element of白m江血E四回x Aω. on the other hand，也e estimated noise signal in也e reference pa也 of BSSA can be rep民sented by. ) 'Ea・ (. E(f，T) "" A(J)Nぴ" T).. (14) (15). Z(ふT)=W�S(ん)A(ん)L(ふ�， ( l� L(fp，T)=[L1Cらふ.目.，んー1Cらふ0，Ln+1Cルふ…，LK(fp，，，]T， (1 9). Finally， we obtain the estimated noise signal Z(f，T) by performing DS as follows:. where L(fp， T) is the estirnated noise component v配tor incIuding 白target signal by mistake. Note伽t the observed signal X(んT) can be rewri伽as X(fp，T) = A{jp)(L(fp，T) + N.{jp，T)}. When IY(ん，T)12 _β'IZ(ふT)12之0， using Eqs. (4) 組d (1 8)， we can write 白e expectation of the power spectrum of BSSA output as. (12) Z(f，の= WÒs ωEぴ� T) "" WÒsωAωN(f，T). Equation (12) is exμct吋to be equal to the noise term of Eq. (4) in 白e primary pa血. 2.4. So町ce extraction proc聞ing. h白巴 propo鍔d BSSA，回町ce ex町action is carried out by subtract [ Y{jp，T)12 -βIZ叫刈12 ] ing也e estimated noise power spectn皿但q. (12))企om the p凶y E [IYBSSA(JP'T)12 ] =E I enhanced target speech power sμctrum侭q. (4));由us =E I [ W�仏)X(ん，T)12 _βIW�s{jp)A{jp)L(fP，T)12 ] i j |ZUT)| ーβ (IY(f，T)12 2 =E [IW�刈) (L(fp，T) +的(ふT)) 12 ] YBSSA(f，T) = 1 ( if IY(f，T)12 -β・IZ(f，T)12ミ0 )， (1 3) -E ド IW�s{jp)A(ん)L(ふT)12 ] l'Y'IY(f，T)1 (0也erwise)， "" (1 -ß) . E [IW�s{jp)A(fp)L(ふT)12 ] where YBSSA(f，T) is the output of BSSA，βis an over-subtraction parameter，組dγis a ðooring p紅'ameter.τ'he appropriate setting， +E I [ W�s(fp)A(ん)N.(fp，T)12 ] ， (加) e.g.，β> 1釦d 1 >> 'Y > 0，give組 efficient noise reduction. Finally， we perform mel-scale filter bank組alysis， log回且sfoロn and discrete where E[. ] denotes the expectation operator， and we use也E毘lation cosine紅ansform to obtain mel-企叫uency ceps住um coefficient for 白紙也e cross-terms among也.e distinct noise components are negli speech reco伊izer [5]. gible wi由 taking expectation. Since we usually set over-sub回ction. I. 1-. -. 150. 76. -.

(3) P紅白neter toβ> 1， it is obvious 血鉱山e first term in the right hand side ofEq. (20) is a negative qu組tity 組d the following relation holds:. E [I 恥A(ふの12]. < =. E [IW�刈 )A(fp)抗体 T)12]. E [IW�刈 )λ品川(ん， T)12] .. (2 1 ). 3.3. Permutation robustness by defocusing in DS. Under reverberant conditions， A(fp) can be exp回ssed by supe中osi tion of all of reflection components.τberefore A(fp) can be rewrit ten as. l: μ刈， (jq)). (22). a(f，B). [al(f，B)，...， a，(f，B)]T，. (23). aj(f，B). 叫助(f /M)!sdj sin B/c. A(ん). (. )，. (24). where (q) is used to express the number of q-th reflection compo nent，q 〆が ) is a DOA of白e reflection ) is a reflection coeflìcient，q component of白permu凶noise 叫(ふ T)，組d aげ"B) is a steering vector which expresses phase information of白e sound source arriv ing from direction B. Using Eq. (22)， we can obtain恥following eq凶包on，. =. 2 ンq W ) �s(ん). a(ん， (jq. <. I�A({p， T)12.. (3 1). From Eqs. (2 1) 組d (3 1)，山fo 11owing relation is approved:. E [IYBSSA(ふT)12] < E [IW�s (品川(ん)N.(ふT)12] <. E[IYI�A(ふT)12]. (32). 百世s relation indicates也at the power of BSSA output is less白組 that of ICA ou刷t in the permutation叩sing frequency bin fp 白血e o白er h組d， when lY(fp， T)12ーβ・IZ(ん， T)12 < 0， the re S叫tant power sμc回m of BSSA is floored by flooring p紅ame紙r )'. If flooring parameterγis suflìciently small， becomes sma11er白組 the error component of血e permutation. From the above-mentioned fact， we ca且 conclude血at BSSA is permutation-robust rather than ICA. However， we must pay attention to由e setting of over-subtraction parameterβ. Although the over sized over-sub回ction parameterβcan suppress the permu凶on per fectly， such a p紅ameter reduces not only noise components but also 出e target component in other innocent (non-permuted)合equency bins. Therefore， we should use佃appropriate over-subtraction pa rame町β because such an oversized p紅姐eter causes 組紅tificial distortion， so ca11ed musical noise.. YBSSA(んT). 4.1. Evaluation of permutation-robustness in BSSA. )) N.(fp， T)12. エ Irω W�s叫)a(ん内が抗(ん， T)12. +. C1>. (25). where C1 is a term which contains a11 of cross-terms among reflec tion components. Also， the po\V_er of the conventional ICA's output in the specific microphone j， }弘(fp， T)， can be wri蜘as IY;忠(ん， T)12 =. Il:( q r )aj(んがq))叫(ふT)12. =. I )aj(んがq))叫(ん， T)12 l: ，.<q. +. C2，. (26). q where C2 also exp問sses a11 of cross-terms among reflection compoト nents. Here，也e directivity gain of DS-fi1ter W�s(j) is unity only when B equals白e focus direction of DS， Bu， and it is less th組 one (i.e.， defocused) 血 the other directions.τbis is represented by. IW�sωa(μ)1 :5. 1. (27). Thus，白e power of each reflection component satisfies IW�(Jp)a(fp，の121，-{ω叫(ふτ)12 � laj(ふ9)12Ir(q)叫(ん，T)12. (28). because laj(f，B)1 = 1 as in Eq. (24). Using the assumptio目白at alrnost a11白e reflection componen岱of N.(ん， T) come from around 吐le noise DOA and outside of Bu， we c組 m叫i今Eq. (28) as ， ) N.(ん， T)12 < 1内}ajU，W)叫(ん， T)r 〆 q | W ) �sa(fp(jq. (29). If由e interference with each refiection component is arising statis tica11y at r紐dom， it c組 be expected正hat C1泊Eq. (25) 釦d C in 2 Eq. (26) become statistica11y 白S岨e. Therefore， the following equation holds:. l: l〆句E斗saω ，(jq仰(jq)り帆} <. IW�sÂ(ん)N.(ん， T)12. 4. EXPERIMENTS AND RESULT. IW�s (んは(ん)N.(fp， T)12 1. This equation c釦 be replaced by the fo11owing，. 2エ; iμ〆内q)a切aj叫が， (jq戸仰qω帆). 1. First， we comp紅'e ICA and BSSA on the basis of noise reduction rate (NRR) [3]， which is defined as 白e output signal-to・nOIse ra tio (SNR) rninus the input SNR in dB. In白is experim巴nts， we as S山ne也at source separation is performed perfectly except for也巴 permutation which is generated artificia11y出血e randomly selected 企equency bins. We increase permutation-arising企equency bins to evaluate the robustness against the permutation problem. Figure 2 illustra'旬:s a layout of the reverberant room in白is experiment. We use speech signals (male 組d femaIe) as初original speech， and input SNR is set to 0 dB 鉱山e array. Target signal is male's speech， noise is female's speech， and noise direction is 50 degrees. A four-element or eight-element array with the inte詑1巴ment spacing of 2 cm is used， and DFT size is 5 12. Over-sub回ction par畑出rβis 1.2 and floor ing coeflìcientγis 0.0. Figure 3 shows血e resulta且t curve of NRRs of ICA 組d BSSA wi白 increasing permutation-arising 企equency bins. From these results， we can con且rm that NRR of BSSA out performs白at of ICA even if出e percentage of permutation-釘ising mcreasesηlese results obviously indicate that BSSA involves the permutation-robust s住uc同re. Although the previous NRR results are positive for BSSA， one might spec凶ate that the sound distortion inαeases; cert氾n1y we c組 se泡也e musical noise in the resultant output of也e propose BSSA. Unfortunately we cannot provide distortion assessment results due to白e lirnitation of白e paper's space， but instead we show results of speech reco伊ition which is the final goal of BSSA， where the separated sound quality is tota11y considered. We compare ICA and BSSA on也巴 basis of word accuracy under the same experimental conditions. We use 組 eight-element array， and we generate 5% or 10% permutations artificia11y.We use 46 spealcers (2∞sentences) as the original so町ce and we use male's speech ( 1 sentence) as an泊. terference noise so町'ce. Noise direction is 50 or 80 degrees. Speech recognition task is 20 k-word dictation， acoustic model is phonetic tied mixture [7]， we use 260 spe広ers ( 150 sentences / 1 speはer) as trai瓜ng data for acoustic model， and we use Ju1ius [7] 3.5. 1 for speech decoder. Figure 4 shows the word acc町acy under each con dition. From these results， we can see白at白e word accuracy of白E propo鈴d BSSA is superior to白紙of ICA under a11 conditions.. - 151. -77ー.

(4) 副 ? : ? l|l t …nちJJT I. (. '.' " .. iF5・10mfごj. I. 1|. ?) j lz;. 也rget s阿部h). 川O m. ?AZ阿国ウ制σ. 恒三F. Fig. 2. Layout of reverberant room used in experiment which simu1ates permutation prob1em.. 30 25 ;: 20 ，g 15 íii'. 5l. 至. 司. CI> 国. 5l. 呈. 5. o. 10. J!l 6 ... g. "". �4 E ". 呈. 15. 10. 70. ; 65. .50. In血is paper，we theoretically釦d experimentally show that BSSA is a blind source ex住action method with permutation-robust slJ'ucture. BSSA is permutation-robust because over-sub回ction and defocus ing prope凶es c組 reduce the adverse effect of permutation prob1em. It was confìrmed 也at NRR and word accuracy of BSSA overtake 曲ose of the conventional JCA in白巴巴:xperiment which s皿凶ates permutation prob1em紅岨cially. Moreover， we revealed血at白E word accぽacy of th巴 proposed BSSA exceeds those ofDS，JCA and ICA+SS in the real environment. 65 60. 50 -80 ・50 Noise direction [deg]. 30. 5. CONCLUSIONS. 15. 55. 宮. 室60. 8 .. �40 E. 白e simp1巴 combination of exis也19ICA 組d SS.τbis is a promis ing evidence白at白巴proposed BSSA has an applicability to noise (血cluding permutation) robust speech recognition.. Curves of NRR with increasing permutation-arising fre quency bins by (a) 4-e1ement and (b) 8-e1ement町'ays.. I (a)5%閃rmu回世on. 5 50. word acc町acy score of each me白od.. Fig. 3.. ;;;:. 主主60. Fig. 6. (a) Resu1t of noise reduction rate in real separation，and (b). Percentage 01 permutation-arising Irequency bins [%]. 京75. 2. 70. 。. I............�S;AI. 5. 6. u 3. •. コ. �. 30 25 20 15 10 5 o. O. • • • •. ñ. 。. -. ;: D. 5. • 1i・・・・ • • ・ z i ・・・ 2 ・. íii'. 10. =10 主I (a). •. ãl. • • • • •. コ. • •. CI> 帽. |図DS白ICA口ICA+SS・Propos凶BSSAI. I............�S;AI. (a). τ3. 日g. 5. Layout of reverberant room for speech recognition tωt in real environment.. .60. 6. REFERENCES. [1] P. Comon，“Independent component analysis. a new concept?，" Signal Processing， vo1.36， pp.287-3 14， 1994. [ロ2) S. Ikeda et a叫1.， Proc.lntem 肋rkshop on ICA and BSS， pp.365ー37 1 ， 1999 . [3) H. Saruwatari et al.，.‘Blind so町ce separation combining inde pendent component analysis and bearnforming，" EURASIPよ Applied Signal Proc.， vo1.2∞3，no. 1 1 ，pp.1 135 -1 14 6 ，2∞3. [4) H. Sawada et al.， “'A robust and precise method for solving 血e permutation problem of企equency-domain blind so町ce sep aration，" IEEE Trans. Speech and Audio Processing， vol.1 2， pp.530-538，2α)4. [5) Y.11はahashi et al.，“Blind spatial sub回.ction array wi也 inde pendent component analysis for hands-free speech reco伊ition，" Proc. 01 IWAENC， 2∞ 6 . [ 6) S . F. Boll，“Suppression o f acoustic noise i n speech using spec・住al sub町action，" IEEE Trans. Acoustics， Speech， Signal Proc.， vo1.ASSP・27，no.2，pp. 1 1 3-120，1979 . [7) A . Le哩et al.，“J凶ius - An open so町'ce real-time 1紅ge v∞ab凶釘y recognition engine，" Proc. Eurospeech， pp. 1 69 1 -1 694， 2∞ 1. Fig. 4. Word acc町acy in experiment which simulate permutation problem artifìcially for (a) 5%，組d (b) 1 0% permutation. 4.2. Speech recogn雌。n test in real environment. Next， we conduct real BSS experiments，組d compare DS，ICA，血E conventional sing1e-channe1 spectral sub回ction [6] cascaded with ICA (ICA+SS)，and BSSA in a real environment. In出is scen紅io， there is not oniy the permutation prob1em but a1so target or noise estimation error because ICA c組not work perfectly. Fi思ll'e 5 illus 回tes a 1ayout of reverberant room in 血is experiment. Conditions and task for speech r巴cognition紅'e the same as those of Sect. 4.1. We use male's speech which was recored in the real environment as an interference including background noise. Input SNR is set to 1 0 dB . Besides，over.叫btraction p紅組出rβis 2.0釦d flooring param 巴terγis 0.2. Moreover， we use DOA-based permutation solver [3) in ICA. Figure 6 shows NRR and the word acc町acy in each method. τbese resu1ts reveal白紙 the word acc町acy of白e proposed BSSA 釘e remarkab1y superior to those of血e conventional me白ods. It shou1d be mentioned白紙白e proposed BSSA can s副1 outperform. 1. -. -. 152. 78. -.

(5)