Permutation-Robust Structure for ICA-Based Blind Source Extraction
全文
(2) 3. PERMlπ:ATION-ROBUSTNESS ANALYSIS町BSSA 3.1. Overview. Reference Patn. Fig. 1.. h出is S巴ction, we present a permutation-robustn巴ss analysis in BSSA architecture. In白e conventional ICA, when the permutation 紅ises, we direct1y suffer from the permuted noise component which is wrongly regarded as血e target signalτ'hus白e conven白onal ICA has no rcト bustness against the permutation. on the other hand, in BSSA, ad・ verse effect by the permutation is mitigated because spectral-subtraction based ωurce extraction technique reduces白e permuted component, 組d DS defocuses白e component arriving台om out of look direction. Therefore, we c釦say伽t BSSA archi句cture is a permutationrobust struc旬re. The detailed analysis is shown below.. !. Block diagram of proposed BSSA.. 3.2. Perrnutation robustness by over-subtraction. Here, we assume也at so町ce sep紅ation was performed perfect1y by FDICA except for arising permutation in the frequency bin fP. Under 白is assurnption, the estimated target speech signal in白E白官quency bin fp by ICA (inclu曲g PB processing) can be described as. 出is proced町e can be represented by O(f,T) = W1CAぴ)X(f,T). (7). WELl]ω=μ[1 - (111 (O(f,T)) lJ'f(f,T)),] W弘ω+Wl広(凡(8) where μis血.e step-size p紅ameter, [p] is used to express the value of血e p-也 step in the iterations, and 1 is釦 identity ma住ix. Be sides, 0, denotes a time-averaging operator, MH denotes co吋ugate tr加spose of matrix M, andφ(・) is血e appropriate nonlinear vector function [3]. At 由e same time, we can estirnate DOAs by 1∞k・ ing at n叫1 directions in也e directivity pattem which is shaped by W1CA(J) [3], and we designate DOA of the target speech signal ω Bu・ In the r巴ference pa山, target signal is not req凶red because we want to estirnate on1y the noise component. Accordingly we remove 曲目eparated speech component Ou(f,T) from ICA outputs O(f,T), and construct the following “noise-on1y vector," Q(f,T); Q(f,T) = [OiCf,T),…,OU-I (f,T),0, OU+1 (f,T),…,OK(f,T)jT .. Y臥(ふT) = A(ん)N.(ん,T), N.(fp,T) = [0,. . . ,0,Nn(fp,T),0,.. • ,O]T, 、ー-.----' 、ーー、___., K-n ,,-1. where Y1CA{jp, T) is 白output signal vector as a 凶get by ICA, N.CふT) is a noise signal vectorω山田凶as凶耳目 speech signal vector by mis拙.e, Nn(fp,T) is a noise component es阻ated as凶, get speech component by mistake, and n(学 U) expresses血e com ponent number of noise. Moreover, since N.(ふT) is composed of zero com凹nents exc巴pt the sμcific noise component Nn(んT), Y1CA(fp,T) can be rewritten as. (9). Y1CA(ん,T) = Â(ん)叫〔ん,T),. Next, we apply血e projection back (PB) [2] me由od to remove 由e ambiguity of amplitude. This procedure can be represented as. λ(ん) = [A1n(μ..,AJn(fp)f,. E(f,T) = WtCA(J)Q(f,T), (10) where � denotes M∞re-Penrose pseudo inverse ma住ix ofM. He毘, Q(f,T) is composed of on1y noise components. Therefore, E(f,T) is a good estimation of出e received noise signals at the array;. (16) (1 7). where Â(fp) is a transfer function vecωr of the noise component Nn{jp,T),組d Aij(J) expresses組 element of白m江血E四回x Aω. on the other hand,也e estimated noise signal in也e reference pa也 of BSSA can be rep民sented by. ) 'Ea・ (. E(f,T) "" A(J)Nぴ" T).. (14) (15). Z(ふT)=W�S(ん)A(ん)L(ふ�, ( l� L(fp,T)=[L1Cらふ.目.,んー1Cらふ0,Ln+1Cルふ…,LK(fp,,,]T, (1 9). Finally, we obtain the estimated noise signal Z(f,T) by performing DS as follows:. where L(fp, T) is the estirnated noise component v配tor incIuding 白target signal by mistake. Note伽t the observed signal X(んT) can be rewri伽as X(fp,T) = A{jp)(L(fp,T) + N.{jp,T)}. When IY(ん,T)12 _β'IZ(ふT)12之0, using Eqs. (4) 組d (1 8), we can write 白e expectation of the power spectrum of BSSA output as. (12) Z(f,の= WÒs ωEぴ� T) "" WÒsωAωN(f,T). Equation (12) is exμct吋to be equal to the noise term of Eq. (4) in 白e primary pa血. 2.4. So町ce extraction proc聞ing. h白巴 propo鍔d BSSA,回町ce ex町action is carried out by subtract [ Y{jp,T)12 -βIZ叫刈12 ] ing也e estimated noise power spectn皿但q. (12))企om the p凶y E [IYBSSA(JP'T)12 ] =E I enhanced target speech power sμctrum侭q. (4));由us =E I [ W�仏)X(ん,T)12 _βIW�s{jp)A{jp)L(fP,T)12 ] i j |ZUT)| ーβ (IY(f,T)12 2 =E [IW�刈) (L(fp,T) +的(ふT)) 12 ] YBSSA(f,T) = 1 ( if IY(f,T)12 -β・IZ(f,T)12ミ0 ), (1 3) -E ド IW�s{jp)A(ん)L(ふT)12 ] l'Y'IY(f,T)1 (0也erwise), "" (1 -ß) . E [IW�s{jp)A(fp)L(ふT)12 ] where YBSSA(f,T) is the output of BSSA,βis an over-subtraction parameter,組dγis a ðooring p紅'ameter.τ'he appropriate setting, +E I [ W�s(fp)A(ん)N.(fp,T)12 ] , (加) e.g.,β> 1釦d 1 >> 'Y > 0,give組 efficient noise reduction. Finally, we perform mel-scale filter bank組alysis, log回且sfoロn and discrete where E[. ] denotes the expectation operator, and we use也E毘lation cosine紅ansform to obtain mel-企叫uency ceps住um coefficient for 白紙也e cross-terms among也.e distinct noise components are negli speech reco伊izer [5]. gible wi由 taking expectation. Since we usually set over-sub回ction. I. 1-. -. 150. 76. -.
(3) P紅白neter toβ> 1, it is obvious 血鉱 山e first term in the right hand side ofEq. (20) is a negative qu組tity 組d the following relation holds:. E [I 恥A(ふ の12]. < =. E [IW�刈 )A(fp)抗体 T)12]. E [IW�刈 )λ品川(ん, T)12] .. (2 1 ). 3.3. Permutation robustness by defocusing in DS. Under reverberant conditions, A(fp) can be exp回ssed by supe中osi tion of all of reflection components.τberefore A(fp) can be rewrit ten as. l: μ刈, (jq)). (22). a(f,B). [al(f,B),..., a,(f,B)]T,. (23). aj(f,B). 叫 助(f /M)!sdj sin B/c. A(ん). (. ),. (24). where (q) is used to express the number of q-th reflection compo nent,q 〆 が ) is a DOA of白e reflection ) is a reflection coeflìcient,q component of白permu凶noise 叫(ふ T), 組d aげ"B) is a steering vector which expresses phase information of白e sound source arriv ing from direction B. Using Eq. (22), we can obtain恥following eq凶包on,. =. 2 ンq W ) �s(ん). a(ん, (jq. <. I�A({p, T)12.. (3 1). From Eqs. (2 1) 組d (3 1),山fo 11owing relation is approved:. E [IYBSSA(ふT)12] < E [IW�s (品川(ん)N.(ふT)12] <. E[IYI�A(ふT)12]. (32). 百世s relation indicates也at the power of BSSA output is less白組 that of ICA ou刷t in the permutation叩sing frequency bin fp 白血e o白er h組d, when lY(fp, T)12ーβ・IZ(ん, T)12 < 0, the re S叫tant power sμc回m of BSSA is floored by flooring p紅ame紙r )'. If flooring parameterγis suflìciently small, becomes sma11er白組 the error component of血e permutation. From the above-mentioned fact, we ca且 conclude血at BSSA is permutation-robust rather than ICA. However, we must pay attention to由e setting of over-subtraction parameterβ. Although the over sized over-sub回ction parameterβcan suppress the permu凶on per fectly, such a p紅ameter reduces not only noise components but also 出e target component in other innocent (non-permuted)合equency bins. Therefore, we should use佃appropriate over-subtraction pa rame町β because such an oversized p紅姐eter causes 組 紅tificial distortion, so ca11ed musical noise.. YBSSA(んT). 4.1. Evaluation of permutation-robustness in BSSA. )) N.(fp, T)12. エ Irω W�s叫)a(ん内 が 抗(ん, T)12. +. C1>. (25). where C1 is a term which contains a11 of cross-terms among reflec tion components. Also, the po\V_er of the conventional ICA's output in the specific microphone j, }弘(fp, T), can be wri蜘as IY;忠(ん, T)12 =. Il:( q r )aj(んがq))叫(ふT)12. =. I )aj(んがq))叫(ん, T)12 l: ,.<q. +. C2,. (26). q where C2 also exp問sses a11 of cross-terms among reflection compoト nents. Here,也e directivity gain of DS-fi1ter W�s(j) is unity only when B equals白e focus direction of DS, Bu, and it is less th組 one (i.e., defocused) 血 the other directions.τbis is represented by. IW�sωa(μ)1 :5. 1. (27). Thus,白e power of each reflection component satisfies IW�(Jp)a(fp,の121,-{ω叫(ふτ)12 � laj(ふ9)12Ir(q)叫(ん,T)12. (28). because laj(f,B)1 = 1 as in Eq. (24). Using the assumptio目白at alrnost a11白e reflection componen岱of N.(ん, T) come from around 吐le noise DOA and outside of Bu, we c組 m叫i今Eq. (28) as , ) N.(ん, T)12 < 1内}ajU,W)叫(ん, T)r 〆 q | W ) �sa(fp(jq. (29). If由e interference with each refiection component is arising statis tica11y at r紐dom, it c組 be expected正hat C1泊Eq. (25) 釦d C in 2 Eq. (26) become statistica11y 白S岨e. Therefore, the following equation holds:. l: l〆句E斗saω ,(jq仰(jq)り帆} <. IW�sÂ(ん)N.(ん, T)12. 4. EXPERIMENTS AND RESULT. IW�s (ん は(ん)N.(fp, T)12 1. This equation c釦 be replaced by the fo11owing,. 2エ; iμ〆内q)a切aj叫が, (jq戸仰qω帆). 1. First, we comp紅'e ICA and BSSA on the basis of noise reduction rate (NRR) [3], which is defined as 白e output signal-to・nOIse ra tio (SNR) rninus the input SNR in dB. In白is experim巴nts, we as S山ne也at source separation is performed perfectly except for也巴 permutation which is generated artificia11y出血e randomly selected 企equency bins. We increase permutation-arising企equency bins to evaluate the robustness against the permutation problem. Figure 2 illustra'旬:s a layout of the reverberant room in白is experiment. We use speech signals (male 組d femaIe) as初original speech, and input SNR is set to 0 dB 鉱山e array. Target signal is male's speech, noise is female's speech, and noise direction is 50 degrees. A four-element or eight-element array with the inte詑1巴ment spacing of 2 cm is used, and DFT size is 5 12. Over-sub回ction par畑出rβis 1.2 and floor ing coeflìcientγis 0.0. Figure 3 shows血e resulta且t curve of NRRs of ICA 組d BSSA wi白 increasing permutation-arising 企equency bins. From these results, we can con且rm that NRR of BSSA out performs白at of ICA even if出e percentage of permutation-釘ising mcreasesηlese results obviously indicate that BSSA involves the permutation-robust s住uc同re. Although the previous NRR results are positive for BSSA, one might spec凶ate that the sound distortion inαeases; cert氾n1y we c組 se泡也e musical noise in the resultant output of也e propose BSSA. Unfortunately we cannot provide distortion assessment results due to白e lirnitation of白e paper's space, but instead we show results of speech reco伊ition which is the final goal of BSSA, where the separated sound quality is tota11y considered. We compare ICA and BSSA on也巴 basis of word accuracy under the same experimental conditions. We use 組 eight-element array, and we generate 5% or 10% permutations artificia11y.We use 46 spealcers (2∞sentences) as the original so町ce and we use male's speech ( 1 sentence) as an泊. terference noise so町'ce. Noise direction is 50 or 80 degrees. Speech recognition task is 20 k-word dictation, acoustic model is phonetic tied mixture [7], we use 260 spe広ers ( 150 sentences / 1 speはer) as trai瓜ng data for acoustic model, and we use Ju1ius [7] 3.5. 1 for speech decoder. Figure 4 shows the word acc町acy under each con dition. From these results, we can see白at白e word accuracy of白E propo鈴d BSSA is superior to白紙of ICA under a11 conditions.. - 151. -77ー.
(4) 副 ? : ? l|l t …nちJJT I. (. '.' " .. iF5・10mfごj. I. 1|. ?) j lz;. 也rget s阿部h). 川O m. ?AZ阿国ウ制σ. 恒三F. Fig. 2. Layout of reverberant room used in experiment which simu1ates permutation prob1em.. 30 25 ;: 20 ,g 15 íii'. 5l. 至. 司. CI> 国. 5l. 呈. 5. o. 10. J!l 6 ... g. "". �4 E ". 呈. 15. 10. 70. ; 65. .50. In血is paper,we theoretically釦d experimentally show that BSSA is a blind source ex住action method with permutation-robust slJ'ucture. BSSA is permutation-robust because over-sub回ction and defocus ing prope凶es c組 reduce the adverse effect of permutation prob1em. It was confìrmed 也at NRR and word accuracy of BSSA overtake 曲ose of the conventional JCA in白巴巴:xperiment which s皿凶ates permutation prob1em紅岨cially. Moreover, we revealed血at白E word accぽacy of th巴 proposed BSSA exceeds those ofDS,JCA and ICA+SS in the real environment. 65 60. 50 -80 ・50 Noise direction [deg]. 30. 5. CONCLUSIONS. 15. 55. 宮. 室60. 8 .. �40 E. 白e simp1巴 combination of exis也19ICA 組d SS.τbis is a promis ing evidence白at白巴proposed BSSA has an applicability to noise (血cluding permutation) robust speech recognition.. Curves of NRR with increasing permutation-arising fre quency bins by (a) 4-e1ement and (b) 8-e1ement町'ays.. I (a)5%閃rmu回世on. 5 50. word acc町acy score of each me白od.. Fig. 3.. ;;;:. 主主60. Fig. 6. (a) Resu1t of noise reduction rate in real separation,and (b). Percentage 01 permutation-arising Irequency bins [%]. 京75. 2. 70. 。. I............�S;AI. 5. 6. u 3. •. コ. �. 30 25 20 15 10 5 o. O. • • • •. ñ. 。. -. ;: D. 5. • 1i・・ ・ ・ • • ・ z i ・・ ・ 2 ・. íii'. 10. =10 主I (a). •. ãl. • • • • •. コ. • •. CI> 帽. |図DS白ICA口ICA+SS・Propos凶BSSAI. I............�S;AI. (a). τ3. 日g. 5. Layout of reverberant room for speech recognition tωt in real environment.. .60. 6. REFERENCES. [1] P. Comon,“Independent component analysis. a new concept?," Signal Processing, vo1.36, pp.287-3 14, 1994. [ロ2) S. Ikeda et a叫1., Proc.lntem 肋rkshop on ICA and BSS, pp.365ー37 1 , 1999 . [3) H. Saruwatari et al.,.‘Blind so町ce separation combining inde pendent component analysis and bearnforming," EURASIPよ Applied Signal Proc., vo1.2∞3,no. 1 1 ,pp.1 135 -1 14 6 ,2∞3. [4) H. Sawada et al., “'A robust and precise method for solving 血e permutation problem of企equency-domain blind so町ce sep aration," IEEE Trans. Speech and Audio Processing, vol.1 2, pp.530-538,2α)4. [5) Y.11はahashi et al.,“Blind spatial sub回.ction array wi也 inde pendent component analysis for hands-free speech reco伊ition," Proc. 01 IWAENC, 2∞ 6 . [ 6) S . F. Boll,“Suppression o f acoustic noise i n speech using spec・ 住al sub町action," IEEE Trans. Acoustics, Speech, Signal Proc., vo1.ASSP・27,no.2,pp. 1 1 3-120,1979 . [7) A . Le哩et al.,“J凶ius - An open so町'ce real-time 1紅ge v∞ab凶釘y recognition engine," Proc. Eurospeech, pp. 1 69 1 -1 694, 2∞ 1. Fig. 4. Word acc町acy in experiment which simulate permutation problem artifìcially for (a) 5%,組d (b) 1 0% permutation. 4.2. Speech recogn雌。n test in real environment. Next, we conduct real BSS experiments,組d compare DS,ICA,血E conventional sing1e-channe1 spectral sub回ction [6] cascaded with ICA (ICA+SS),and BSSA in a real environment. In出is scen紅io, there is not oniy the permutation prob1em but a1so target or noise estimation error because ICA c組not work perfectly. Fi思ll'e 5 illus 回tes a 1ayout of reverberant room in 血is experiment. Conditions and task for speech r巴cognition紅'e the same as those of Sect. 4.1. We use male's speech which was recored in the real environment as an interference including background noise. Input SNR is set to 1 0 dB . Besides,over.叫btraction p紅組出rβis 2.0釦d flooring param 巴terγis 0.2. Moreover, we use DOA-based permutation solver [3) in ICA. Figure 6 shows NRR and the word acc町acy in each method. τbese resu1ts reveal白紙 the word acc町acy of白e proposed BSSA 釘e remarkab1y superior to those of血e conventional me白ods. It shou1d be mentioned白紙白e proposed BSSA can s副1 outperform. 1. -. -. 152. 78. -.
(5)
図
関連したドキュメント
One of these classes is known as the quasiprimitive permutation groups of twisted wreath type and consists precisely of those quasiprimitive permutation groups G whose socle is
At the same time, a new multiplicative noise removal algorithm based on fourth-order PDE model is proposed for the restoration of noisy image.. To apply the proposed model for
Proof: The observations at the beginning of this section show for n ≥ 5 that a Moishezon twistor space, not fulfilling the conditions of Theorem 3.7, contains a real fundamental
This paper investigates the control problem of variable reluctance motors (VRMs). VRMs are highly nonlinear motors; a model that takes magnetic saturation into account is adopted
The efficient and robust uncertainty quantification method for unsteady problems based on extrema diminishing interpolation of oscillatory samples at constant phase used to resolve
The question posed after Theorem 2.1, whether there are 2 ℵ 0 closed permutation classes with counting functions mutually incomparable by the eventual dominance, has a positive
So here we take our set of connected blocks to be the isomorphism classes of finite strongly connected tournaments (and again, the weight of a connected block is the number of
Finally, we use results from the well-developed theory of permutation groups and modular permutation representations to give a description of the primitive permuta- tion groups