Robust digital watermarks for audio signal.

全文

(1)ROBUST DIGITAL WATERMARKS FOR AUDIO SIGNAL. Akira NakαναmαぺJinlin Lu**， Sαtoshi Nakαmur，α**， Kiyohiro Shikαno** 牟NT T Cyber Space Laboratories 3・9-11， Midori-cho， Musushino， 180・8585 Japan 寧Graduate School of Information Science， Nara Institute of Science and Technology 8916-5 Takayama， Ik oma， Nara， 630-0101， Japan. ABSTRACT. Digital watermarking algorithms have been studied these days， where it is necessary and inevitable for the watermarks to be inaudible. One of the watermarking algorithms is proposed by Boney et.， et al based on the MPEG psychoacoustic model. The Boney et. water marking algorithm uses a PN sequence as a watermark， which is passed with a zero-pole filter. In this paper， we introduce temporal masking as well 出 simultaneous masking and we improve the Boney et. watermarking algorithm(“First Stage" Boney et. called) on the masking approximation filter and the bandwidth for embedding the watermarks. Our ap proximation filter using cepstr心analysis / synthesis method fitted the masking threshold much better than Boney et. one and the bandwidth for embedding the watermarks is restricted less than 12 kHz to improve the robustness against MPEG CODEC. We also in troduce a self-clocking mechanism to detect the wa termark automatically. The Boney et. and our wa termarking algorithms are comparatively tested by a watermark detection algorithm and evaluated by ob jective segmental SNR and subjective tests with double blind triple stimuli tests， using several kinds of music sources. Both evaluation results show an advantage for our watermarking algorithm as follows. (1) Audio signal with embedded waterrr町ks h回no degradation and the watermarks are inaudible. (伊2幻) Our wa叫te町r口rma町rl此出k恒cin noise attack， low-pass filtering， resampling， MPEG encodi時/decodi時， DA/AD conversion with analogue wire transmission is superior. Our watermarking can be almost detected in any c出es. (3) Our watermarking algorithm is simpler than “Full Watermark Generator“ Boney et. proposed. Moreover， our watermark is inaudible and its robust ness lS出 high 出 “Full Watermark Generator.". 1.. INTRODUCTION. As the Internet and digital broadcasting are growing in large， the amount of digital audio information increase. Digital information can be distributed to the general public through the Internet rapidly. These facts mean that copyright are infringed widely. In recent years， digital watermarking technologies have been developed. The digital watermark is a de terrent to people which like to make an copy and dis tribute illegally. For the purpose mentioned， the watermark needs to demand some requirements described部below[l]. The watermark should: be inaudible; be robust to attacks by pirates， to audio compressions and signal processing operations; have enough room for the data embedded. Several techniques have been developed. These tech niques exploit the phase alteratio叫2]， spectrumspreadi時[3]， the frequency masking characteristic of the human auditory system[4] to embed data. These reports have no exhaustive quality evaluation of audio signals embedded watermarks. The method Boney et. proposed based on the fre quency masking [4] mentioned above， considers simul taneous masking only. For that reason， if the character istics of the human auditory system aside from the fre quency masking can be exploited， the watermark will be more robust. Hence， we augment a temporal mask ing model based on psychoacoustic experiments to this method. 2.. 2.1.. ALGORITH恥f. Embedding AIgorithm. The watermark is generated from a PN-sequence fil tered with a filter. The filter approximates the fre quency masking characteristics which are computed with the MPEG psychoacoustic model 1. Figure 1. 55.

(2) 80 "-1asking threshol �ero-põle filter Cepstral filter. 60. 'a F 占 F ' a ， h h JU HU朴t 《u nu 内4 nu 内4 C EEEE〈. ，、. --“- "'.. �'J. 、‘t t. 4t. Figure 1: The watermark algorithm shows our modified algorithm from Boney et. algo司 rithm. 1. Each frame of the signal， 512 samples， is weighted with a window. The power spectrum of the weighted signal is calculated. 2. The masking threshold of the signal is calculated. 3. An approximation filter using cepstral analysis / synthesis method is fitted the masking threshold， and the bandwidth for embedding the watermarks is restricted less than 12 kHz to improve the ro・ bustness against MPEG CODEC. 4. A PN-sequence is filtered with the approximate filter. 5. The filtered PN sequence is weighted in the time domain with the envelope of the signal. 6. The watermark is quantized and then added to the original audio signal. Figure 2 shows the masking threshold， the zero-pole filter， and the proposed filter槌an example. 2.2. Detection algorithm. For the detection algorithm， we use the correlation function between original watermark and calculated watermark from received signal. Let y(t)， s(t)，ω(t) represent the audio signal embedded the watermark， the original signal and the watermark respectively， the detection algorithm is described as follows. 1. x(t) =ν(t) - s(t) is calculated. x(t) contains the masking threshold and a approximated filter quan tization error， noise， x(t) is injured by attackers.. 56. Frequency[kHz]. 15. 20. Figure 2: Masking threshold and a approximated filter 2. The correlation function between x(t) and w(t) is calculated. When the maximum value of the cor relation is more than the threshold， we decide the audio signal y(t) contains the watermarkω(t). 3.. TEMPORAL MASKING お10DEL. Simultaneous masking describes situations where the masker and the maskee is on the same time. Mask ing can also occur when a signal is presented after the masker; this is called forward masking. The amount of forward masking decreases and the masking pattern spreads along the frequency axis with increasing an in terval between the masker and the maskee[5]. Hence， in our research， we邸sume that the p回t masking pattern also affect the present masking. Temporal masking can be expressed as follows. Mt(ω，t) = M(ω，t) + exp(-l /1")M(ω，t- 1)③S(l，ω) +exp(ー2/1")M(ω，t-2)③S(2，ω) + ・・・(1) + exp(-k/1")M(ω ，t-k)③S(k，ω) + ・. . where M: amount of simultaneous masking， ω:企e quency， t: frame number， Mt: amount of masking with considering temporal masking， 1": a constant represents the decrease of the effects of the previous masking， k : an interval in time， S: a function represents the spread. In our research， a Hamming window is used for the function S(k，ω). S(k，ω)=. 0.54 + 0.46 cos( V.苦万 ) 巴 ω=( N村山) ゎ 2二ω=一(N村山) 0.54 + 0.46∞s( V.告が. (2). where N: a constant， represents an initial spread， ν:a constant， gives how much the function S spreads the masking pattern with increasing the interval k. ω varies mー(νk +N)三ω三νk + N..

(3) We add the absolute masking threshold defined by the MPEG psychoacoustic model to Mt， and then ob tain the masking threshold mentioned in section 2.1.. ( ぶ) 85ロ。 zυ ω 芯白. 4.. l∞. DETECTION WATERお1ARK. 4.1. Experiments condition. Some detection watermark experiments were per formed. We used segments of four different pieces as test materials. These segments are a viola piece， a cas tanet piece， a piano piece and a song piece. The du ration of these segments are about 10 seconds. All the segments are sampled at 48kHz. The segments were embedded the watermarks with the Boney et. method and the improved method (that is induding temporal masking and filter processing). We indicate the results of both method in 4.2 section， and otherwise only the results of improved method. 4.2. Attack of MPEG encoding jdecoding. We had the segments encodedjdecoded with MPEG (Layer 1， Psychoacoustic model 1，192 kbitsjs )， de tected the watermark. Hence， the audio signal prob ably encounters the same kinds of encodi泊n時19/d白ec∞odωm s可ysはtems on the transmission way. These kinds of per ceptual coding affects the watermark signals consider ably. Table 1 shows the results from only one frame de tection. The MPEG encoding/decodi時 process delete the watermarks except for that embedded in the back ground noise of the segments in case of the original method.. 50. -50. -40. -10. 0. Figure 3: Watermark detection with additive noise dis tortion 4.4.. Attack of low-pass filtering. We 出sume the embedded signal w出 attacked by Low Pass filter processing， then detected the watermark. Cut off frequency of the LPF filter w回 setted to 6，8， 10，12，16 and 20 kHz. The embedded signal p出S to these filters， and then we calculate SNR 叩d seg mental SNR of filtered signal， and detecte the water mark from them. Figure 4 shows the detection rate of improved method for LPF signal. Figure 5 shows the SNR and the segmental SNR of them. When cut off frequency is large than maximum watermark frequency (here， 12kHz)， the detection rate is almost const and equal to non-LPF results. The detection rate will be down when the cut off frequency is lower than 12kHz， but SNR of the LP filtered signal will also changed to poor， then these signal will differ to original quality. We can take no account of this c出e from the angle of quality. 凸 ω 】 ω zυ 85ロ。点) ( 5. 10. 15. 20. LPF cutoff frequency(kHz). Additive noise attack. We added noise to the segments embedded the water marks， then detected them. The noise is inaudible the oretically. This noise is considered a deliberate attack or another PN-sequence on a multiple PN-sequence scheme. The results are similar for both method. Figure 3 shows the detection rate of improved method for several noise levels.. -20. Noise to masking level(dB). Table 1: Watermark detection with MPEG distortion Segments Detection rate [%1 Original Improved 46.2 84.7 Viola 72.7 Castanet 51.70 84.7 Piano 43.2 Song 73.7 50.0 4.3.. -30. Figure 4: Watermark detection with LP filtering attack. 4.5.. Attack of resampling. We down sampled the embedded signal， then detected them. Considering usual used sampling rate， down sampling rate is setted to 40， 36， 32， 24， 20 and 16 kHz. 57.

(4) 50. ，圃‘、. 思里. 40. 加却. 10 0. 5. 10. 15. 20. Table 2: Watermark detection with MPEG distortion and DAjAD conversion(redundancy) Frames(time) Detection r叫e [%J MPEG DA/AD 77.4 1(10ms) 62.7 82.8 100 10(107ms) 20(213ms) 90.6 100 40(427ms) 93.8 100. 50. QUALITY TEST. 5.. ___. 40. ミ. 30. êIi. 10. ω. 白I 20. 0. 5. 10. 15. 20. LPF cutoff frequency(kHz) Figure 5: SNR and segmental SNR of LP filtered signal from original 48kHz sampling rate. Figure 6 shows the results of detection rate. ω湾同ロozυω芯。 dF) ( 15. 20. 25. 30. 3S. Each of thirteen subjects listened to 48 times with dou ble blind triple stimuli tests. Table 3 is the results of the tests. This shows that the watermark scheme gives no distortion to the origi nal signal perceptually. Except for the piano segment， each the difference of the preference score is not a sig 凶fic姐t differenceunder the ・30dB level. That me回S the subjects can not distinguish the original segments and the watermarked segments . Table 3: Preference score Preference score [%1 Segments Original method Improved method -20dB -30dB -20dB -30dB -0.59本 -0.03 Viola -0.46本 -0.01 -0.75 -0.38本ー0.02 Castnet -0.18 * * 本本 36 Piむ10 -0.36 -0.76 -0.69 -0. Song +0.09 +0.16 +0.19 +0.16 * indicates the significant difference between original and embeded sources.. 40 6. CONCLUSIONS. Down sampling frequency(kHz). Figure 6: Watermark detection with resampling attack. 4.6.. redundancy. We made the DAjAD conversion with analogue wire transmission， detected the watermark. To check the re・ dundancy of this system， we put same watermark into some continued frames， detected them. Table 2 shows the results of MPEG encodingjdecoding(64 kbps) and DAjAD conversion for 1， 10， 20， 40 frames redundancy If we increase in number of same watermark embedded， the detection rate will be incre回e. For DAjAD con version， only using 10 frames， 100% detection rate was obtained.. 58. We augment a temporal masking model and introduce some filter processing to the Boney et. method. These two method is robust to the noise attack. The im proved method is more robust to the MPEG encod ingjdecoding， e鐙cient to DAj AD conversion， filtering and other attack. The results of the subjective tests show both methods give no distortion to the original signals perceptually. 7.. [1]. REFERENCES. Cox， J. Kili祖， T. Leighton回d T. Shamoon， “Se cure Spread Spectrum Watermarkingfor Multimedia，" Tech. Rep. 95・10， NEC Research Institute， 1995.. 1.. [2] Y . Yardimci， A.E. Cetin回d R.Ansari， “Data Hiding in Speech Using Phase Coding，" ESCA， Eurospeech97， Greece， pp.167�ド1682 (1997) ..

(5) [3] M. Iwakiri回d K. Matsui， “Watermarking Technique for High Quality Audio Media，" SCIS98-8.2.C(1998). (in Japanese). [4] Laurence Boney， Ahmed H.Tewfik阻d Khaled N. Hamdy， “Digital Watermarks for Audio Signals，" IEEE Intl. Conf. on Multimedia computing and Sys tems， Hiroshima， pp.473-480(1996). [5] E. Miyasaka，“Spatio-temporal cl四aderistics of mask ing of brief test-tone pulses by a tone-burst with abrupt switching transients，" J. Acoust. Soc. Jpn.， 39， pp.614623(1983). (in Japanese).. 59.

(6)