Source-Oriented Localization Control of Stereo Audio Signals Based on Blind Source Separation
全文
(2) ぷ(/.1) ーー?一一+ 1 トー.. 211<1, 1) 1 “ l 牛 111C()Wー; (f) ト一一.. ZM,(j.I) .':,, (1,1) ー?十一一+1. 一. 1二i. ・. Z,,(/.I). X,(/.I). 己 引/.1). Z,ぷf./). 11(/ヲ W. (() 1」一一一+. 1� LML(f,1】. Fig. 1. Separation step based on conventional ICA (FDICA+PB).. value of the i-血 step m也e iterations,μis 出e step- size parameter, 釦dφ(・) is出e appropóate nonlinear vector function. [8].. witllllut Dis(orl;o/l. [主,(j,. 3.2. Problem of Conventional Projection Back. From Eqs. (4) and (5),仕巴 1 inverse ma位ix H(J) of by ICA is given as fol1ows: H(J) =Aげ)Diag(Cげ))ー,. (10). A(J). H(J). H(J)=Wげ)ーI, (5) The inverse ma凶x H(J) recons回cts the amplitude of the separated. 3.3. Propo鉛d Algorithm. If由e separation is achiev巴d without distortion, its inverse filter plays only the role of reconstructing localization. τ百四the inverse filt巴r can be used as an approximation of仕le transfer system A(J),ωd can achieve punch in without disto凶ng白e substitut巴d source.τbus our strategy is to divide the sep紅ation process into two steps: a sep 紅油on step without distortion and a localization reconstruction step (see Fig. 2).τbe localization control can be achieved by m叫i今泊g the localization reconstruction step.. signals at each of白e audio channels, and its output signal Zml(J, t) of 由e 1-白 separated signal at the m- th channel can be givenぉfol1ows:. l = H (J)Diag(Yげ, t)). m. (6). Thus,也e scaling problem is solved in白e forrn of the recons位uct10n of the transfer system Aげ), refeπed to as projection back (PB) In general, Z,削げ" t) is obtain巴d direct1y by由e filter Hげ)W1(J) denotes出e de出足ng ma ins同d of obtaining Yげ" t) where tóx replacing all coefficients by zero except白e 1白 row of (see Fig. 1). [9].. 3.3.1. Monaural separation step without distomon In this section, to separate the observed signals into each of the monaural source signals without distortion, we obtain the dernixing filter which intentionally scales each of its sep抑制signals to an average value of the channels It is easy to obtain the average value of the channels with re spect to回ch sound source at白e audio channels by using PB. Fur therrnore, it c組 be said白紙也e average value of the channels is a monaural sigr叫wi也 little distortion. By using Eq. (6), the channel averaged so町'ce estimation is given by. W(J). W,(J),. 3. ANALYSIS OF SOUND LOCALIZA'百ON. h也is section, we propose an 釦alysis method of sound localization. Since sound localization is deterrnined individually for each of the sound sourωs,組alysis of sound localization is inextócably linked to source separation.. M M 、T 合12:丸山) ,..., 2: z",L(J,1) 1 r. 3.1. Extraction of Sound-Localization Information. From Eq. (1),批仕組sfer sys旬m A(J) contains all information on sound localization for each of the so町ce signals in S (J, t). Assum ing is known, the fol1owing processing is possible by using Aげ). First, from Eq. (1),白e source sep紅ation is en丘rely achieved as fol1ows:. ー. A(J). J. m=l. f)WωX(j, t).. (11). W,(J) is defined as follows: W,(J) =占Diag([附]T[じ斗]T)W(j).. (12). J目. 、ーー、,.-' M. 百lerefore,白e dernixing filter. (7). = S(J,t).. L m=l. = ,11 , Diag([H(J)]T[ 1 , ..., 1. Yげ,1) =Aげ)ー'Xぴ, t). '‘. 百IUS, by using W. (J),白e average value of the channels wi白 respect. Second, the same sound localization白紙白e 1-出 sound source has can be givenωanother monaural sound source R(J, t) as. = A(J)[Y,(J, t), • • • , Y1-1 (J, t), R (J, t),Y1+1 (J, t),• . • , Y (J, t)f, (8) where Xげ� t) is 出e signal replacing Y1(J, t) with R (J,I). We call X (J, t). W(J) es出lated. If出e punch-in process was implemented by using Hげ), 組0血er monaural sound source R(J, t) wou1d be affected not only by but also by Diag(Cげ))ー,百lerefore, is inadequate to substi tute for出e transfer system Aび). However, it is very di伍cu1t for the deconvolution to be achieved without inforrna凶on on sour,ωsignals. where Cげ) = notes gain ambi伊ity of ICA, Diag(-) is the diagonal ma凶x whose diagonal component is each element of column vector .. To com・ pensate for the effect of C (J), PB applies the inverse mamx of也e dernixing mamx. W1(J). (9). Xげ" t) = A(J)Y(J, t).. However, 批 仕組sfer system Aげ) is generally unknown in practical situations and shou1d be estimated by some means.. (4) Wげ) = Diag(C (J))Aげ)ー" [C (J), .", CLげ)]T is a cons回t vector which de ,. =Aげ)Diag(Sげ" t)).. Srep. Third, we can con住01 sound localization individually and合間ly by modi今ing the relation among the channels of A(J). Using mod ified transfer system Âげ), the 10calization-controlled audio signal t),...,XMげ" IW is given by X(J,t) =. W(J). m. Locali:ation RecolIstruction. Fig. 2. Procedure step of proposed method. Since the cóteóon of independence does not speci今回plitude組d order of signals, FDICA itself is insuflìcient as filter learning. Ambi guity of amplitude, the so-called scaling problem,randornizes spec tral characteóstics and it resu1ts as distortion in白e output signals. Ambiguity of order is known as the pennutation problem, and with out identiかing ∞rrespondence between 出e sources and separated outputs, broad-band estimation of the source signals cannot be ob tain巴d. Here we discuss血e solution of the scaling problem using projection back (PB) [9]. Under血e assumption 出鉱山e d巴rni対ng matóx separates source components accurately, and permu c釦 be ex tation担biguity is aligned by some means [7], pressed as fol1ows:. [Z M, t)]. Zw(/.I) Separation Slep. 2.3. Projection Back. W(J). ZLI(f‘t). XA,(/.t). L. such a substitution of the sources punch in.. L(J, t)]T is given by. to each sound source U(J,t) = [U,(J.t), ..., U U げ', t). = W, (J)X (j,t).. (13). 3.3.2. Localization reconstruction step Here, Hsげ) = W,(J)-l is defined as the localization reconstruction filter. This filter only takes charge of reconstruc也19 sound localiza-. 178. nu 唱EA.
(3) +・111t. ca�e wh叩1'CH,川U>!lIIf",(J)ド'HIf品(H",(!i.. <;"'") TIIt!叫柑愉hClI. direction can be con住olled without affecting perception of dis tance. Here, sound-localization control of由e individual sources is achieved approximately by converting the inter-channel gain di仔er ence of each separated signal with some function as. :r (jl1'i�U正'lli,,/fi'J < J{.,,(.f)/出''''ιf). Z歪 主;えいi UF -�ーム\鷲 ノー一万戸…. IlÎ'lkω1. Fig. 3. Configuration of sound-localization control. tIon to白e separated signal Uげ, t). By applying H,(f) to U(f,t) , the output signals釘e equivalent to出e output signals of PB as follows: H,(f) Diag(Uげ" t) ) = [z",t<f,t)]mJ'. ( 14). This indicates血t H,げ) reconstructs the inter-channel level and phase differences to the monaural s巴parated signal Uげ,t), which has sound reverberation caused by the回nsfer system A(f) . Therefore, H,(f) can be approximated to play only the role of reconstructing sound localization of U(f, t) .. n-f民μ)1 \. (15) '\ IH'2k (f) 1 J' lÎ I ,祉げ) 1 whereH,励げ) denotes the unprocessed coe伍cient of the localization reconstruction filter concemed with也e k-血 separated signal at白E m-th channel, lÎ'mk (f) denotes its modified version, andヂ(・) is組 arbi甘'ary function to modify the inter-channel gain difference. In addition, v紅ious control is possible according to the design of血is functionヂ0. Using H,励げ) ,白e modified coe伍cient lÎ,融げ) can be written as 合.....げ) =. I. LmlHs 融(f)12. 、 Hs....げ). 凡び)仰足掛))ー+ l} 明川. I. fOIτm= 1,2.. (16). Using ÍÌs(f) = [lÎsmzlmJ obtained above, the signal in which the con官olled direction of each sound source Xp均(f,t) = [X1問げ,t),X2阿げ,tW can be given as. 4. PROPOSED SOURCE.LOCALIZA:百ON CONTROL. 4.1. Motivation In this section, by changing the inter-channel gain difference of the localization r巴construction filter H,(f) , we control白e direction of the virtual image of each source, as its configuration is shown in Fig. 3.τbe inter-channel gain di仔erence between the left and right channels of H,(f) concemed with each separated signal is nearly in one-to-one correspondence to白e direction of the source. 百us, by modi今ing白e inter-channel gain difference between the le食釦d right channels of H,げ) , the direction of each separated signal c姐 be controlled. In general, the number of sources must be estimated in advance 出血e BSS. Additiona11y, high-quality separation is di伍cult with m血y sources. Nevertheless,白e proposed method c姐 deal wi也組 arbitrary number of sources because of the following reasons. First, s凶ce出巴 proposed control of localization is a simple ∞nversion of 白e泊ter-channel gain difference and出e explicit identification of the so町'ces is unnecessary, we need not solve the permutation, which is di伍c叫t to solve with叩unknown or large number of so町ces. Second, as discussed in the following section, two-泊put two-output ICA can 釦alyze localization of stereo signal consisting of組 arbi・ 位紅y number of sources sufficiently.. Xprop(f,t) = H,(f) U(f,t) .. (17). 5. EVALUA:百ON EXPERIMENT 5.1. Experimental Condition. h出is section, we verify白e e伍ciency of the proposed analysis and 也e process of血e localization information usingICA by comp紅台湾 出e performance of the proposed method and competitive methods. 百le comparison is conducted in both 白巴 subjective and objective evaluations. In出s ex戸rirnent, to simplify the discussion, we used 白e gain-difference仰lversion function denoted in Eq. (15) to con trol the range of the localized directions given by. (. ) (. r. 町) I r 凡 ω f == ー デ 一. (18). IH,詰げ) 1 \ Wi白山is function, the gain difference is converted in proportion to αin the log domain. Here we describe two competitive methods. Competitive method 1: 百1Îs method is a control of localiza tion based on fixed filtering. In 出is method, the inter-channel av eraged level difference of the analysis企ames of the ster,∞channels X1(f, t),X2(f, t) is modified to itsα白 ー power. Competitive method 2: This method is a con甘01 of localization based on time-varying filtering. In 也is method, the inter-channel level difference of the stereo channels X1げ,t) ,X2げ" t) is modified to its αー血 power without ch組g泊g白e to凶 power of the channels 泊 each of the time-合判uency grids. In both the su句ectIve佃d objective evaluations, weωed six stereo �巴cordings of music. Each of出e stereo signals consists of 也ree instrurnents, and each panned stereo signal of each source track is available separately 姐d is used in the evaluation of the signal-to noise ratio (SNR) . All of白em紅'e recorded 姐d edited by profes sional musicians.百ley 紅e recorded in sampling仕'equency 44.1 kHZ with qu組tization of 16 bit. For each of the ster,ωsignals, we made six proces銘d signals by all three methods with two settings of白e p紅創neter, i.e., settingα= 10 to spread the width of也e spacial im age and settingα= l/lOto n町ow出e wid出The length of the filter is 1024 taps.. 4.2. Behavior of Localization Analysis with Many Sources. In血is section, we discuss the behavior of two-input two-outputICA against ster,∞signal consisting of many sources. Assuming sp紅se ness among sources [10], it c姐 be expected that the numb巴r of dom in釦t sources often decreases 泊 each n町ow subband. Sparseness among sources is an assurnption that the magnitudes of出e sources are dis凶buted sparsely in the time-frequency domain and no two dorninant source components share the same time-f民quency grid First, in也e time-仕equency b凪where the number of dominant sources is below two throughout all出eframes, the analysis of sound localization c組 be achieved successfully as discussed in Sect. 3.3. Next, in the time-合問uency bin where more 白血 two dominant sources exist, ICA separates two dorninant sources to maxirnize the difference in statistical behaviors between the sep紅ated signals. As a resultヲICA estimates two clusters of sources組d 出e localization reconstruction filter plays the role of reconstructing sound local ization to也e separated monaural so町'ce clusters. Thus, ICA can sufficiently analyze sound localization information of ster即signals consisting of more白血two sources.. IH,ふげ) 1/. 5.2. Objective Evaluation. We compare the controllability of the conventional and proposed methods in the 0対ective evaluation. By filtering the stereo signal of separated source track Slm(n) in each of the methods, we obtain 白e processed ste�ωsignal sJm(n) of each sources, where l denotes. 4.3. Algorithm. By changing the inter-channel gむn difference between the 1巴ft and right channels of H,げ) with its tota1 power maintained, only the. 179. 唱EA.
(4) !:':::::;:::lcom問削ve methodl. 仁コ z.. _COmpelniVe method2. Propos凶m凶hod Com凶itive method2. k",:?,:,l ト→. (a) Prelerence S∞re 01 The Controllability. Com阿itive methodl 95%∞nfidence inぬ叫. (b) Sound-Quality Score 。. E ・1. 2. 3. G E コ. Fig. 4. Result of objective evaluation Table 1. Ra凶19Scheme. Score. (α=10). Irnperceptible. -2 -3. Perceptible but not釦noying. 4. (位1/10). a4. (位10). Fig. 5. Results of subjective evaluation. Irnpainnent. 。. 2 (α=1/10). mechanism to analyze the localization information sufficient1y even when the sp岨rseness assumption does not hold. Thus the proposed method c釦 ∞ntrol sound localization without degrading sound quality. ηle proposed method shows the best perflωm組C巴 in∞n trollability in bo血 settings, and 血e degradation of so叫nd quality is not significant. As a result, it is ascertained that the proposed method can control sound localization of ste陀ゆaudio s抱nals with multiple sources su血ciently.. Slightlyむmoying Annoyin g Very annoying. index of the sources, m = 1,2 denotes index of stereo channels and n is白血ldex of samples. In addition, by modifying白e amplitude of each separated 位ack in each channel, the target processed signal f/m(n) is obtained. As an evaluation sco陀,we used th氾SNR of each source evaluating白川!po岬er ratio of出直 t紅get and the error of pro cessing given by 2 |tl l (n)1 + I印刷e SNRr = 100 0g ャ (19) ' tll I (n) Sll (n川t/2仲s/2(n)12. 6. CONCLUSION. h血is P岨per, first, we pr'叩osed a localization information analysis method with low distortion. Next,we proposed a localization control method of stereo audio signal consisting of multiple sources. τbe C侃cacy of the propos巴d method is as印刷ined in出e objective and su句ective evaluations The processing of 血e proposed method and the punch in de scribed in Sec. 3.1 is demons住at疋:d in the following URL. 吟. We evaluated the averaged SNR of出e soぽces. 百le result of the objective evaluation is shown in Fig. 4. In both of白e p也rameter settings, the performance of the proposed method shows the best perform釦ces. In con甘ast,血e perfl目立lance of出e conventional methods changes depending on the par但neters百lUS 白e proposed method c組 achieve stable con佐ollability of localiza tionおr 釦y parameter se凶ng.. http://s戸lab.naist.jp/databaseイDemo/slc/ 7. REFERENCES. 5ふSubjective Evaluation. We evaluated出e ability of desir吋control仕'om白e viewpoints of source localization ability and sound quality in the subjective evalu atJon. In出e evaluation of localization, the two stimuli selected from different methods ar芭 Pぼsented凶random order, and白le subjects select出e better one to創出e p山pose of the processing. In白e eval uation of sound quality,出e processed signals are presented in a ran dοm order followed by the presentation of the unprocessed signals, and the subjects evaluate the degradation of the sound quality. The stimuli are given wi出 headphones. The su句ects consisted of eight males and a female. Table 1 shows the rating scheme. We show the r官sults of the s曲Ijective evaluation in Fig. 5. Th崎削除r design of ∞mpetitive me由od 1 wi由 a single fixed filter coe伍cient in a f民:quency bin assumes the existence of only a single sour,ωin a frequency bin through all白e analysis frames. 百us, in白e frequency subbands where multiple sources exist,出is me出od cannot modify the inter-channel level difference of each of 出e sources sep紅ately. In addition, the application of白巴 single channel filter for each of the channels causes∞lorization to degrade the quality. The time-varying filtering of competitive me由od 2 assumes出e eXls旬nce of only a single sou民e in each time-f記quency grid, which is often satisfied. However, in the time仕 - 巴quency grids where白e assumption is not satisfied, this method causes musical noise simi lar to the Wiener filter and time- 台equency binary masking [11] and 由e degradation of quality is more signific釦t仕a l n in ∞mpetitive method 1. h∞ntrast,ぉ discussed in Sect. 4.2,白e proposed method has a. [1] J. He釘e,S. Disch J. Hilpert,組ld O. Hellmu出, “From SAC to SAOC - Recent developments in par百netric ∞ding of spatial audio,"Proc.AES 22nd UKC onf, 2007. [2] C. Faller and F目Baumgarte, “ inaural Cue Coding-P制ll: Schemes and Applications,"IEEE T rans.S peech andA udioPro cess., vol. 11,no. 6,pp. 520-531,2003. [3] J. Blauert,Sp叫ial Hearing,恥位T Press,Cambridge,MA,1997 [4] O. Gillet 佃âG. Rich紅d,官X住actio口組d remixing ofむum 住ucks仕'om polyphonic music signals," Proc. WASRゐ4., pp. 315-318.2005. [5] C. Avendano,“Frequency-domむn source identification and ma nipulation in ste陀o mixes for enhan四ment,suppression組,d re panning applications," Proc. WASRん4., pp. 55-58,2003. [6] P. COn:loñ, “Independent component -analysis-A new con cept?," S ignal Process., vol. 36,pp. 287-314,1994. [7] H. Saruwatari,S. K町ita, K. Takeda, F. ltakura, T. Nishikawa, 組.d K. Shik組0,“Blind source separation combining indepen dent component analysis 組d bearnfor凶ng," EU.九4.SIPJou円wl onA ppliedSignalProcessing, vol. 2003,no. 11,pp. 1135-1146, 2003. [8] H. Sawada,R. Mukai,S. Araki, and S. Makino,“Pol紅∞ordi nate based nonlinear function for frequency domain blind soぽ∞ separation," IEICE Trans. Fundam., vol. E86-A,no. 3,pp. 590596,2003. [9] N. Murata and S. Ikeda,“An on-line algorithm for blind sour∞ sep紅ation on speech signals," Proc. NOLTA '98, pp. 923-926, 1998 [10] P. Bo白ll,“Underdほtermined blind separation of delayed sound sources血血e frequency domain," Neurocomputing, vol. 55,pp. 627-641,2003. [11] S. Ben Jebara,“A perceptual approach ωreduce musical noise phenom直non with wiener denoising te氾hnique," Proc. ICASSP, pp. III・49-III-52,2006.. 180. つ山 唱EA 唱'A.
(5)
図
関連したドキュメント
Segmentation along the time axis for fast response, nonlinear normalization for emphasizing important information with small magnitude, averaging samples of the brain waves
Rapid Motion Change Experiment (figure 3, figure 4). The experiment environment of this experiment is as follows. y It is single-unit as for ten times of bending and stretching. y
In 2003, Agiza and Elsadany 7 studied the duopoly game model based on heterogeneous expectations, that is, one player applied naive expectation rule and the other used
Based on the stability theory of fractional-order differential equations, Routh-Hurwitz stability condition, and by using linear control, simpler controllers are designed to
A three-stage room thermostat is modeled to output three on/off control functions that can be used to control a system having a solar heat source, an auxiliary heater, and a
In Proceedings Fourth International Conference on Inverse Problems in Engineering (Rio de Janeiro, 2002), H. Orlande, Ed., vol. An explicit finite difference method and a new
Yang, Complete blow-up for degenerate semilinear parabolic equations, Journal of Computational and Applied Mathematics 113 (2000), no.. Xie, Blow-up for degenerate parabolic
It is well known that the inverse problems for the parabolic equations are ill- posed apart from this the inverse problems considered here are not easy to handle due to the