1405【ICASSP2014 ポスター】pdf 最近の更新履歴 Ryo Masumura: Web

(1)

Copyright©2014 NTT corp. All Rights Reserved.

Role Play Dialogue Topic Model for Language Model Adaptation

in Multi-Party Conversation Speech Recognition

Ryo MASUMURA, Takanobu OBA, Hirokazu MASATAKI, Osamu YOSHIOKA and Satoshi TAKAHASHI

NTT Media Intelligence Laboratories, NTT Corporation, Japan

�

��

�

��

�,�

��

� �

�

��

 Share the topic distribution among all speakers.

 Treat not only conversational topic but also speaker role.

In RPDTM, word probability distribution has different form depending on the value of condition variable. � takes the values of 0 or 1.

� �|�, � = ^{� � �}^�, , � = � � �� , � � ��,

� ^, ^{� =}

3-1. Role Play Dialogue Topic Model (RPDTM)

1.For ea h topi t =1,…,T: (a). Draw ��~Dirichlet β 2.For ea h speaker r=1,…,R:

(a). Draw _�_�~Dirichlet (b). Draw_�_�~Dirichlet 3. For each dialogue m=1,..M:

(a). Draw ��~Dirichlet . For ea h speaker r=1,…,R: For each word i=1,…, _�,�: (b-1). Draw ��~ ��

(b-2). Draw _�_�_~ _{�� }_� (b-3). If _�_�_{= ,}

then draw _�_�_~ _{�� }_�, else draw ��~ ��

1. Overview

[Objective]

Introduce an unsupervised language model adaptation

technique for multi-party conversation tasks.

[Points]

Propose a novel topic model called role play dialogue topic

model (RPDTM) and also propose an adaptation framework

that can utilize multi-party conversation attributes.

• Each speaker shares the same conversational topic.

• Ea h speaker’s uttera e depe d o ot o ly

conversational topic but also own role.

2-1. Latent Dirichlet Allocation (LDA) [D. M. Blei+, 2003.]

1.For ea h topi t=1,…,T: (a). Draw _�_�~Dirichlet β 2. For each document m=1,..M:

(a). Draw ��~Dirichlet (b) For each word i=1,…, _�,�: (b-1). Draw �_�~Muliti �_� (b-2). Draw _�_�_{~Muliti �}_�_� Topic model can capture semantic properties of words and documents.

2-2. LDA-based unsupervised LM adaptation [Y. Tam+, 2006.]

Single recognition hypothesis

Topic probability estimation

Adapted unigram

Unigram

marginal ^Adapted N-gram

Background N-gram

A recognition hypothesis is used for estimating the topic probability and adapted unigram probability is calculated. Then, n-gram is adapted using unigram probability based on unigram marginal technique.

3-3. RPDTM-based unsupervised LM adaptation

Set multiple recognition hypotheses for each speaker and simultaneously adapts LMs for each speaker role using the shared conversation topic.

Recognition Hypothesis A (Operator)

Adapted unigram A

Unigram

marginal ^Adapted N-gram A

Background N-gram

4. Experiments

The conventional models are only appropriate for single speaker task because they assume that each document has a different topic.

 In multi-party conversation, we have to give consideration to the aspect of the correlation among several speech sets

Adapted unigram B

Adapted N-gram B Unigram

marginal Recognition

Hypothesis B (Customer)

3-2. Inference of RPDTM

Gibbs sampling can be used for the assignment of topic variable _�

and conditional variable _�.

� ��|�^−�, �, � ~ ^{� �}^�^�^�, , ��= � ��, � ��|��_� , , ��=

� ��|�^−�, �, � ~ _{� �}^{� �}^�^�^�^{, � �}^�^|�^�^{, ,} ^�^�⁼

��, � � ��_� , , ��=

Used contact center dialogue data sets. One dialogue set means a telephone call between one operator and one customer.

Methods ^Topic

sharing

Consider for

speaker role ^PPL ^{WER (%)}

BASE First decoding pass based on background LM. _- _- 47.12 22.70

LDA1 Individually constructed adapted LM using each

speaker recognition hypothesis based on LDA. ^× ^○ ^42.56 ^22.26

LDA2 Constructed single adapted LM using all speaker

recognition hypotheses based on LDA. ^○ ^× ^43.88 ^22.18

RPDTM Individually constructed adapted LM using all

speaker recognition hypotheses based on RPDTM. ^○ ^○ ^39.66 ^21.20

• Training set: 1922 dialogues (1.7M)

• Test set: 18 dialogues (20K)

• Background LM: 3-gram hierarchical Pitman-Yor LM (60K)

• Acoustic model: Triphone DNN-HMM (7 hidden layers of 2048 nodes)

• Decoder: WFST-based VoiceRex

• Number of topics: 20

RPDTM-based adaptation is more effective than LDA-based adaptation. Both topic sharing and role-dependent adaption are effective for multi-party conversation.

Once topic variable and condition variable assignments are concluded, Each probability distribution can be calculated.

• If � = , � is related to speaker role.

• If � = , � is related to the topic of the dialogue.

The role means speaker type. For example, there are two roles in

contact center dialogue, which is operator and customer.

RPDTM assu es that ea h speaker’s role i o versatio is give .

RPDTM generates a topic distribution for each dialogue, which includes several speech sets.

Topic variable

Topic variable Condition

variable

Topic probability estimation One of the most accurate approaches is based on probabilistic topic

models such as latent Dirichlet allocation.

Problems:

RPDTM is used to estimate topic probability for the target dialogue.