Copyright©2014 NTT corp. All Rights Reserved.
Role Play Dialogue Topic Model for Language Model Adaptation
in Multi-Party Conversation Speech Recognition
Ryo MASUMURA, Takanobu OBA, Hirokazu MASATAKI, Osamu YOSHIOKA and Satoshi TAKAHASHI
NTT Media Intelligence Laboratories, NTT Corporation, Japan
�
�
�
�
�
��
��
�
��
�,�
��
� �
�
��
��
Share the topic distribution among all speakers.
Treat not only conversational topic but also speaker role.
In RPDTM, word probability distribution has different form depending on the value of condition variable. � takes the values of 0 or 1.� �|�, � = � � ��, , � = � � �� , � � ��,
� , � =
3-1. Role Play Dialogue Topic Model (RPDTM)
1.For ea h topi t =1,…,T: (a). Draw ��~Dirichlet β 2.For ea h speaker r=1,…,R:
(a). Draw ��~Dirichlet (b). Draw ��~Dirichlet 3. For each dialogue m=1,..M:
(a). Draw ��~Dirichlet . For ea h speaker r=1,…,R: For each word i=1,…, �,�: (b-1). Draw ��~ �� � ��
(b-2). Draw ��~ �� � �� (b-3). If ��= ,
then draw ��~ �� � ��, else draw ��~ �� � ���
1. Overview
[Objective]
Introduce an unsupervised language model adaptation
technique for multi-party conversation tasks.
[Points]Propose a novel topic model called role play dialogue topic
model (RPDTM) and also propose an adaptation framework
that can utilize multi-party conversation attributes.
• Each speaker shares the same conversational topic.
• Ea h speaker’s uttera e depe d o ot o ly
conversational topic but also own role.
2-1. Latent Dirichlet Allocation (LDA) [D. M. Blei+, 2003.]
1.For ea h topi t=1,…,T: (a). Draw ��~Dirichlet β 2. For each document m=1,..M:
(a). Draw ��~Dirichlet (b) For each word i=1,…, �,�: (b-1). Draw ��~Muliti �� (b-2). Draw ��~Muliti ��� Topic model can capture semantic properties of words and documents.
2-2. LDA-based unsupervised LM adaptation [Y. Tam+, 2006.]
Single recognition hypothesis
Topic probability estimation
Adapted unigram
Unigram
marginal Adapted N-gram
Background N-gram
A recognition hypothesis is used for estimating the topic probability and adapted unigram probability is calculated. Then, n-gram is adapted using unigram probability based on unigram marginal technique.
3-3. RPDTM-based unsupervised LM adaptation
Set multiple recognition hypotheses for each speaker and simultaneously adapts LMs for each speaker role using the shared conversation topic.
Recognition Hypothesis A (Operator)
Adapted unigram A
Unigram
marginal Adapted N-gram A
Background N-gram
4. Experiments
The conventional models are only appropriate for single speaker task because they assume that each document has a different topic.
In multi-party conversation, we have to give consideration to the aspect of the correlation among several speech sets
Adapted unigram B
Adapted N-gram B Unigram
marginal Recognition
Hypothesis B (Customer)
3-2. Inference of RPDTM
Gibbs sampling can be used for the assignment of topic variable �
and conditional variable �.
� ��|�−�, �, � ~ � ����, , ��= � ����, � ��|��� , , ��=
� ��|�−�, �, � ~ � �� ����, � ��|��, , ��=
���, � � ��� , , ��=
Used contact center dialogue data sets. One dialogue set means a telephone call between one operator and one customer.
Methods Topic
sharing
Consider for
speaker role PPL WER (%)
BASE First decoding pass based on background LM. - - 47.12 22.70
LDA1 Individually constructed adapted LM using each
speaker recognition hypothesis based on LDA. × ○ 42.56 22.26
LDA2 Constructed single adapted LM using all speaker
recognition hypotheses based on LDA. ○ × 43.88 22.18
RPDTM Individually constructed adapted LM using all
speaker recognition hypotheses based on RPDTM. ○ ○ 39.66 21.20
• Training set: 1922 dialogues (1.7M)
• Test set: 18 dialogues (20K)
• Background LM: 3-gram hierarchical Pitman-Yor LM (60K)
• Acoustic model: Triphone DNN-HMM (7 hidden layers of 2048 nodes)
• Decoder: WFST-based VoiceRex
• Number of topics: 20
RPDTM-based adaptation is more effective than LDA-based adaptation. Both topic sharing and role-dependent adaption are effective for multi-party conversation.
Once topic variable and condition variable assignments are concluded, Each probability distribution can be calculated.
• If � = , � is related to speaker role.
• If � = , � is related to the topic of the dialogue.
The role means speaker type. For example, there are two roles in
contact center dialogue, which is operator and customer.
RPDTM assu es that ea h speaker’s role i o versatio is give .
RPDTM generates a topic distribution for each dialogue, which includes several speech sets.
Topic variable
Topic variable Condition
variable
Topic probability estimation One of the most accurate approaches is based on probabilistic topic
models such as latent Dirichlet allocation.
Problems:
RPDTM is used to estimate topic probability for the target dialogue.