Timing Optimization of Filter Replacement in Compressive Coding for Stereo Audio Signals Using Independent Component Analysis

全文

(1)ICSP2008 Proceedings. Timing Optimization of Filter Replacement in Compressive Coding for Stereo Audio Signals Using Independent Component Analysis. Keisuke Masatoki 1， Shigeki Miyabe* 1， Yu Takahashi 1， Hiroshi Saruwatari 1， KiyohiroShikano 1， and Toshiyuki Nomur，α2 1Nara Institute ofScience and Technology 2Common PlaゆrmSoかωre Research Laboratories， NEC Corporation. Abstract. ln this paper we propose a timing optimi'zation of thefilter replacement in compressive coding method of stereo audio signals using independent component anal ysis (ICA). We previously proposed a novel compressive coding method of ste陀o audio signals using ICA as a coding technique of high sound qualiηand low bit rate. However; the sound qualiη degradesザwe do not re place separationfilter in time for changing composition of the musical instruments. Therlφre we newly propose the optimization algorithm of timing to replace the sep aration filter; and experimental results show our a匂0rithm can provide the best timing to replace the filter with less computations. 1. Introduction In recent years， d巴mand for efficient compressive coding of stereo or multi-channel audio signals is in creasing， amid the div巴rsification of delivery system and spreading of multi-channel systems， as typified by home theater systems. Since many of traditional audio coding methods deal with only single channel， their bit rates increase in propoロion to the number of channels. Among many recently propos巴d joint-coding techniques for multi-chann巴I audio，binaural cue coding (BCC) [1] is the most attractive method， and Isq爪1PEG stan dardization group is discussing standard of the next generation audio based on BCC [2]. BCC represents multi-channel audio at a bit rate only slightly higher than single-channel; a single channel audio signal call巴d sum signal and low-bitrate parameters characterizing human perception of sourc巴 localization. In the decoder， multi-channel filters are designed to reproduce the parameterized features in each analysis仕祖国，and such time-variant linear filters 'Reserach Fellow of the Japan Society for the Promotion of Science目 This work was partly supported by MIC Strategic Infor mation and Communications R&D Promotion Programme in Japan.. convert the sum signal into multi-channel signals. How ever，白e filters designed from small-sized parameters cause fatal distortion of the spectrum and blur of source localization in the resultant reconstructed multi-channel signals. In this paper， w巴 propose a tirning optirnization al gorithm of sep釘ation filter in compressive coding of stereo audio signals using independent component anal ysis (lCA) [3]. We utilize ICA to extract the sparse ness between stereo channels. However sound quality degrades if we do not replace separation filter in time for changing composition of the musical instruments. It is because the separation filter w巴 obtained is time invari ant， so source separation does not always perform high quality. The automatic algorit加n of replacement of sep aration filter still remains as undeveloped issue. There fore， in this paper we newly propose the optirnization algorithm of separation filter replacement in compres sive coding using ICA. Experimental results show that our algorithm can achieve the best replacement timing of separation filter.. 2. Our Previous Works:. Selective Place. ment with ICA 2.1. 恥位xing process. In this section， we assume that the number of sound sources is L and the number of audio channels is M， and we deal with the case of L =凡1. The source signals of the L sources in the time frequency domain are denot巴d by an L-dimensional vector S(f， t) = [S 1 (f， t)，...， S L(f， t)]T， where f is the index of the frequency bin and t is the index of the analysis frame， and superscript T denotes transposition of vector/matrix. In addition， a linear time-invariant transfer system is denoted by an M x L rnixing matrix A(f) = [Aml(f)]ml， where Aml(f) is the transfer白nc tion from the l-th source to the m-th channel， and [X]ml denotes the matrix that includes the element x in the m-th row and the l-th colurnn. Then， the observed sig nal X(f， t) = [Xl (f， t)， • • .， XM(f， t)]T in time-frequency domain is written approximately as. 978-1-4244-2179-4/08/$25.00 @2008 IEEE. 510. ハ『ν 内ペυ ヮ“.

(2) X(j，t) = A(j)S(j，t).. (1). 2.2. Blind source separation. Although sparseness between musical instrument sources is high generally， sparseness is low between channels of signals consisting of many instruments， which have high corr巴lation mutually. To use blind source s巴paration (BSS) with合equency-domain inde pendent component analysis (FDICA) [4]， the mixed signals is separated to each of instrument sources so that we enhance sparseness between the channels of the signals. First， we perform signal separation by us ing an L x M complex valued separation matrix W(j) optimized so that the output signals Y(j，t) in the time frequency domain become statistically independent mu tually. Thus Y(j，t) is given as Y(f， t) = [Y1 (f，t)，・・・，YL(j，t)f = W(j)X(j，t).. (2). In addition，W(j) can be optimized by the following iter ative updating formula so-called higher-order ICA (HO ICA): W[i+IJ(j). =μ[1 - (φ(Y(j， t))Y(j，t)円 1 W[iJ(f). +. i W[ J(j)， (3). W附町げωf刀斤パ)-戸-→イI�仏，[ "'，0，れ町W附(げ以刷fλ， L. 2.3. Selective placement (SP) [3]. We show the configuration of SP coding algorithm in Fig. 1. Here we compare powers of the time-frequency grids between both channels which hav巴 b巴come high sparseness via FDICA，and we perfoロn encoding. First， we need to detect the grids of non-sparseness between the channels， so check whether power of the ICA out puts exceeds threshold. Thus we use the largest power component between the channels as 2 P"，(j，t) = ma�12mM，t)1 ， / 1，2. =. (5). where 2m/(f，t) is the l-th component of 2"，(j，t). The detection of non-sparse grids is conduct巴d by setting threshold Th as ThくP1 (j，t)P2(j，t). (6) Thus we d巴termine the sparseness of grids if (6) is not. t，I.I，.). Transmitter. I. Receiver. Fi伊re 1. Configuration of our previous SP coding.. satisfied. Hence we describe the encoding process. If (6) holds，we set the flag of non-sp紅seness by setting P(j，t) = 0， (7) and store both values of Y(j，t) in VSP(j，t) as SP (8) v (j，t) = Y(j，t). does hold not If (6) ， y，sP(j，t)(j， t) is regarded as a sparse grid， and we store index of the dominant sig nal in j SP(j，t) and only the signal with larger power in SP v (j， t)，as j"'l"(j，t) = argmax PM，t)， (9) /=1.2 SP (10) V (j，t) = Y，町[，t)(j， t). h白e decoding steps， we first obtain estimation y5P(f，t) of Y(f，t) by allocating VSP(f，t) corTECtly ac cording to jSP(j，t). If jSP(j，t) = 1 or 2， sp，. 1 '-川(j，t). r .' YjI"'l" ('Jλ" t) = {1 0. if i = j SP， oth. On the other hand，if jSP(j，t) =0， YSP(j，t) = VSP(j，t).. 、‘，J 'E-a 'Ea rt、. 1. where denotes the identity matrix， 01 denotes the time-averaging operator，superscript H denotes complex co吋ugate transposition of vector/matrix， [i] is used to express the value of the i-th step in the iterations，μis the step-size parameter. Also，φ(-) is the appropriate nonlinear vector function. Next， we can obtain M-channels of each source by usmg pr句巴ction back (PB) [5] which adapt inverse ma trix of W(j). Hence restoring source 2/(j，t) for 1 = 1，・・・，L is given as 2M的げ肌山，t) =. J，(I.I). xιz. (12). 3. Proposed Method 3.1. Motivation SP realizes the high compression efficiency by ICA to extract sparseness among sound sources and choose only the dominant one. However， sp紅seness tends to be low when the performance of source separation de grades， so compressive efficiency becomes worse with increasing the grids which should be transmitted. In fact， sep訂ation filter W(f) is tim巴invariant and addi tional source separation is not performed along with the change of composition of musical instruments， so the deterioration of sound quality is a big problem inher ent is SP coding. If we conduct source separation of the spatially time-v訂iant audio signals by using a sin gle scparation filter W(j)， the performance degrades be cause it can not adapt to the change of source compo・ sition. Moreover， if we use multiple sep紅ation filters which are formed at regular intervals， the performance also degrades in two intervals of changing the compo sition of sources. In addition， several sep紅ation filters in the same interval lead to the redundant transmission quantity because we仕組smit血e sarne separation filters again and again.. 511. - 240-.

(3) studied by us [7]. This subsection briefly describes the overview of signal processing in the closed-form ICA. The strict proofs of the theorem will be omitted due to the limitation of the current manuscript's space First， we obtain the correlation matrices with differ ent tlme pomts as. x，(f.') T q開. x.(f.') Fi伊re 2. Configuration of proposed algorithm. RI' (f) = (x(f， t)x(f， t)H)町，. 3.2. Timing optimization algorithm of replacement separation filter. where 01日， denotes the time-averaging operator over specific time duration ti， and i = 1，2，… represent in dices of time-averaging block Next， we apply the singular value decomposition (SVD) to a superposition of R1，(f)， which is repr巴sented as. To solve the problem in Sect. 3.1， we replace sepa ration fìlter co町'esponding to the changing the compo sition of instruments. Thus， we need to know the op tima1 tirning of replacing separation fìlters according to the changing the composition of instruments automati cally. Therefore we propose a timing optimization algo rithm of replacement， which detects the optimal timing based on the distortion of decoded signal. We show the con負guration of血e proposed a1gorithm in Fig. 2 First， we encode in all possible timings of replace ment to det巴ct the optimal timing of replacement， and one decoded stereo signal ZT， (f， tk) is denoted in one timing of replacement Tk (Tkεt) as. I. R1，(f). [ I. RI，(f)f. (1 3). L(f). where the decoded signal YT， (f，tk) is obtained by using separation filter WT， (f， tk)， which is obtained using ICA from input signals X(f， tk)， and To= 1. Next， we calculate SNRT， which is signal-to-nois巴 ratio (SNR) in each timing of replacement as. 2 �I �f IIX(f， t)11. �'_�'_ � ^ - �I �f IIX(f， t) - ZT，(f，t)112. ，. = U(f) di刷1 À2， ...)U(f)H，. (17). where λk are the eigenvalues， diag(λ1，…) denotes the di司 agonal matrix which includ巴s the eigenvalues，and U(f) is the matrix consisting of the eigenvectors. Then we obtain a full-rank decomposition for pseudo-inverse of �i R1，(f) as follows. 会TKU，tk)=wtu，tk)fTKU，tk) (tk=TKー1，'" ，九)，. SNRT，企= 10 10g _ _. (16). L(f)L(f)H，. (18) 1. 1. 一一一一，…). (19) ..[X;'..jI;. U(f)diag(. If the covariance of the sources s(f， t) in ti is negli gible， every L(f)HRI' (f)L(f) for any i shares the same eigenvectors， and this is given via SVD form as. ，. L(f)HR1，(f)L(f)=T(f)diag(σ1 (ti) σ2(ti)，…)T(f)H( 20). (14). where σk(ti) are the eigenvalues for a specific time block ti， and T(f) denotes the matrix consisting of shared eigenvectors which are independent of time-block index i. Therefore， for any i， the simultaneous diagonalization of RI，(f) can be achieved as follows;. ，. After calculation SNR T in all of the timing of replace ment Tk， optima1 tirning of separation filter replacement TOpl is given as (15) Topl = argmax(SNRT.). T，. T(f)HL(f)HR1，(f)L(f)T(f)=diag (σ1(ti)，σ2(ti)， ...)，(2 1). 百le more sparseness is satisfied as more higher SNRT， is archived. As for update of separation filter W(f) by FDICA in all intervals of switching，it spends hug巴 amount of com putational complexies. Fortunately our purpos巴 is not source s巴paration itself with high performance but fìlter replacement timing detection， so we can partially use a limited performance of sourc巴5巴paration to search opti mal tirning of the fìlter replacement. In se紅ching steps， we use more faster method of ICA than HO-ICA (see (3))， e.g.， closed-form 2nd-order ICA (SO・ICA) [6][7]， or fastICA [8]. Both of them have fast-convergence prope口y and a certain level of the separation perfor mance.. and this means that白e optimal sep紅ation filter ma出x in the 2nd-order sense is given by. Wso(f) = (L(f)T(f))H.. (22). Computational cost in the closed-form SO-ICA is very small. In fact， it should be mentioned that the whole computations in the closed-form solution ar巴 almost the same as thos巴 for 1 or 2 it巴rations in HO-ICA. 4. Experiments and results 4.1. Conditions of experiments. In this experiment we evaluate performances of timing optirnization algorithm of s巴paration filter in compressive coding using ICA， and we use two stereo recordings of music. At first， track 1 is the localization of出at a flute is the right and a guitar is出e center. Next，. 3.3. Closed-form 2nd・order ICA. Closed-form SO-ICA has been found by Tanaka [6]， and its application to acoustic signals is now being. 512. Aせ円〆臼.

(4) 18 16 宅12 白 a: 10 Z u) 8 6 4 2. θ. θ. @0. 8@. Figure 3. The compositions of sources at track 1 and track 2.. track 1 changes to that a flute is between the right and the center and a guitar is the center in Fig. 3. Also track 2 is the localization of that a flute is出e left and a guitar is the right at first. Next， track 2 changes to that a flute is none， a guitar is the center， and both drums and a bass is the center in Fig. 3. Both track 1 and track 2 are recorded and edited by professional musicians， and have changing the composition of the instruments near the center of the signals. They are recorded in sampling frequency 44.1 kHz with quantization of 16 bits. The length of filter is 1024 taps. 百le size of window used 1024 points with 90 overlap points (60-point hanning， 30-point zeros). In this experiment we use three tech niques of ICA; HO・ICA with d巴fault matrix which is the identity matrix， SO-ICA， and fastICA， and we calculate SNRT， in each timing Tk， where is given by (14). o. 100. 200. 300. 400. 500. 600. 700. Frame Index n. Figure 4. The result of proposed searching optimal timing to replace at track 1.. 14 12 _11 宣10. �. 9. m 8 7 6 100. 200. 500. 600. 700. Frame Index n. Figure 5. The result of proposed searching optimal timing to replace at track 2. 6. References [1] C. FaJler and F. Baumgarte， "Binaural cue coding-part II: schemes and applications，" IEEE Trans. Speech And Audio Processing， vol.ll， pp.520-531， 2003.. 4.2. Experimental results. We show two results of searching optimal tirning; track 1 has two sources in Fig. 4， track 2 has four sources in Fig. 5， where the shaded zone represents near the cor rect tirning. Both tracks have changing the composition of the sourc巴s near center of the signals， so both of the results and all of ICA techniques can show the coπect tirning peak. In addition， we can use SO・ICA and fas tICA as abundant techniques for searching the optimal timing of changing the composition of sources， although they only provide lower SNRT， than HO-ICA. Note that SO・ICA's computational efficiency is remarkable; SO ICA can work with 4% computations of HO-ICA.. [2] J. H巴πe et al.， "Spatial audio coding: next-generation 巴筒cient and compatible coding of multi-chann巴I audio，" 117th Conv. Aud. Eng. Soc.， Preprint 6187，2004. [3] S. Miyabe et al.， "Compressive coding of ster巴o audio sig nals extracting sparseness among sound sources with in dependent component analysis，" Proc. WASPAA， pp.331334.2007. [4] P. Smaragdis， "Blind separation of convolved mixtures in the仕equency domain，" Neurocomputing， vol. 22， pp. 2134，1998. [5] N. Murata and S. Ikeda， "An on-line algorithm for blind source separation，" Proc. NOLTA， pp.923-926， 1998.. 5. Conclusion. [6] A. Tanaka et al.， "Theoretical foundations of second order-statistics-based blind source sep釘ation for non stationary sources，" Proc. ICASSP， pp.600-603， 2006.. In this paper we proposed a tirning optirnization of the separation filter in compressive coding method of stereo signals using ICA. Experimental results show that the proposed algorithm realiz巴s the optimal timing of changing the composition of the instruments localized into two direction. In the future， we need to realize the algorithm of optimal timing against a number of mu sic by compact disc， and more efficient algorithm co汀巴 sponding to more than three timing changes.. [7] K. Tachibana et al.， "E筒cient blind source separation combining closed-form second-order ICA and nonclosed form higher-order ICA，" Proc. ICASSP， Vol 1， pp.45-48， 2007. [8] A. Hyv誌rinen組d E. Oja， "A fast fixed-point a1go rithm for independent analysis，" Neural Computation 9， pp.1483 -1492， 1997.. 513. ワU Aせっ“.

(5)