Analysis of the Relationship between Prosodic Features of Fillers
and Its Forms or Occurrence Positions
*
Shizuka Nakamura
Ryosuke Nakanishi
Katsuya Takanashi
Tatsuya Kawahara
Graduate School of Informatics, Kyoto University
Abstract: Fillers are involved in the ease of understanding by listners and turn-taking. However, the knowledge about its prosodic features is insufficient, and its modeling has not been done either. For these reasons, there is insufficient knowledge to generate natural and appropriate fillers in a dialog system at present. Therefore, for the purpose of clarifying the prosodic features of fillers, its relationship with occurrence positions or forms were analyzed in this research. ‘Ano’ and ‘Eto’ were used as forms, non-/boundary of Dialog Act and non-/turn-taking for occurrence positions. Duration, F0, and intensity were utilized as prosodic features. As a result, the followings were turned out: the prosodic features are different depending on the difference of the occurrence positions even for fillers of the same form, and similer prosodic features are found between the same occurrence positions even in different forms.
1.
[1] ∗2.
∗* 606-8501 E-mail: [email protected] [7] [2-4] [5] [6] 人工知能学会研究会資料 SIG-SLUD-B506-03[ ] [%] 195 50.4 63.7 81 20.9 - 63 16.3 20.6 15 3.9 4.9 33 8.5 10.8 387 100.0 100.0 Dialog Act DA DA
3.
[7] 1 30 1 10 5 504.
4.1.
1 [7] 1 8 14.2.
[7] DA [8]Long Utterance Unit
LUU [9] LUU DA DA DA DA 606 156 25.7% 79 13.0%
4.3.
2 46.8% 29.1% 69.6% 18.9%2 [ ] [%] 77 49.4 - 37 23.7 46.8 23 14.7 29.1 19 12.2 24.1 156 100.0 100.0 [ ] [%] 185 68.4 69.6 43 18.6 18.9 14 6.1 6.2 4 1.7 - 12 5.1 5.3 231 100.0 100.0 1 2 3 2 50.0% 18.2% 45.6% 3 [ ] [%] 65 74.7 - 11 12.6 50.0 4 4.6 18.2 7 8.0 31.8 87 100.0 100.0 [ ] [%] 26 37.7 45.6 19 27.5 33.3 12 17.4 - 12 17.4 21.1 69 100.0 100.0 33.3% 2 1 2 1 2 5.2
5.
5.1.
5.1.1.
… 2 … 1(1) (2) (3) (4) (5) (6) 37 159 37 159 11 26 11 159 11 159 26 159 2.31 2.26 0.14 0.20 2.35 2.30 2.35 2.26 0.10 0.20 2.30 2.26 0.06 0.11 0.14 0.16 0.03 0.06 0.03 0.11 0.05 0.16 0.06 0.11 p < 0.001 p < 0.05 p < 0.01 p < 0.001 p < 0.001 p < 0.01
5.1.2.
TANDEM- STRAIGHT [10] XSX [11] 10 2 n !"#$%!!,! !"#$%!!,!! = ! !0!"#,!!–!!0!"#,! !0!"#,! n !!0!"#,! n5.1.3.
TANDEM-STRAIGHT XSX5.2.
t 4 4 (1) > (2) < (3) > (4) > (5) < (6) > 2 2 24.3 1 2
5.3.
t6.
DA [6-7] DA [5] JST ERATO [1] , , : http://pj.ninjal.ac.jp/ corpus_center/csj/k-report-f/CSJ_rep.pdf,pp.23-130 (2006). [2] , : : , , Vol. 108, pp. 74 92 (1995).[3] M. Watanabe: Features and Roles of Filled Pauses in Speech Communication: A corpus-based study of spontaneous speech, Hitsuji Syobo Publishing (2009).
[4] : (2010). [5] : : , , Vol. 16, No. 3, pp. 106 107 (2012). [6] , , , , : , , SLUD-B505-30, pp. 114-119 (2016). [7] , , , , : , , SIG-SLUD-B506, in press (2017). [8] , , : http://pj.ninjal.ac.jp/ corpus_center/csj/k-report-f/CSJ_rep.pdf, pp. 255-321, (2006).
[9] Japanese Discourse Research Initiative:
version 2.0, http://www.jdri.org/ resources/manuals/uu-doc-2.0.pdf (2014).
[10] Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T. and Banno, H.: TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation, Acoustics, Speech and Sig- nal Processing, 2008. ICASSP 2008. IEEE International Conference on, IEEE, pp. 3933–3936 (2008).
[11] Itagaki, H., Morise, M., Nisimura, R., Irino, T. and Kawahara, H.: A bottom-up procedure to extract periodicity structure of voiced sounds and its application to represent and restoration of pathological voices., MAVEBA, pp. 115–118 (2009).