• 検索結果がありません。

1409【 ポスター】pdf 最近の更新履歴 Ryo Masumura: Web

N/A
N/A
Protected

Academic year: 2018

シェア "1409【 ポスター】pdf 最近の更新履歴 Ryo Masumura: Web"

Copied!
1
0
0

読み込み中.... (全文を見る)

全文

(1)

Copyright©2014 NTT corp. All Rights Reserved.

Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Hirokazu MASATAKI and Sumitaka SAKAUCHI

NTT Media Intelligence Laboratories, NTT Corporation, Japan

Mixture of Latent Words Language Models for Domain Adaptation

2. N-gram mixture modeling

+ +

+ +

� � � = � � �, � � � �

= � � � , � � � |�

=

� � � =1 � � � , �� � � , � � � |�� �|�

=

N-gram models define a probability distribution � � � , � over current word � given context � and parameter �.

 An n-gram mixture model is constructed by combining

several n-gram models trained using different sources.

 The mixture weights � � � can be optimized using

development data based on following EM algorithm.

+ +

+ +

3. Latent words language model (LWLM)

 LWLMs are soft class based n-gram models where each

latent variable is associated with an observed word.

� � � = � � �, � � � � � � �

= � � ℎ , � � ℎ |� , �

=

4. LWLM mixture modeling

� � � = � � �, �, � � � �, � � �|�

= � � ℎ , � � ℎ � , � � � |�

=

 LWLM mixture modeling can be considered to be the union of

an n-gram mixture model and an LWLM.

Latent variables in LWLM are represented as a specific word, so multiple LWLMs can share a common latent variable space.

� ℎ �, �, � ~� � ℎ , � � ℎ � , � +�−

=

� � �, �, � ~� � ℎ , � � ℎ � , � � � |�

� � � =1� � � �

�=

=1 � ; ��; � + �

+ ��

�=

In Bayesian criterion, optimized mixture weights are estimated by Monte Carlo integration using multiple model index sequences.

We use Gibbs sampling to assign latent word sequence and model index sequence to development set.

 To optimize mixture weights, we use the Bayesian criterion

and a sampling that is compatible with LWLM training.

+ +

+ +

+ +

Introduce a novel language model adaptation method

based on mixture of latent words language models.

6. Experiments

 Conventional: n-gram mixture modeling

Model merging is conducted on the observed word space.

To determine mixture weights based on ML criterion.

 Proposed: LWLM mixture modeling

Model merging is conducted on the latent variable space .

To determine mixture weights based on the Bayesian criterion.

1. Background

 Techniques to build effective language models using limited

target domain data are needed since large amounts of specific

domain data are not often available.

⋇Robust modeling: smoothing, dimensionality reduction

⋈Domain adaptation: mixture modeling

• Specialize in the target domain using out-of-domain data.

Problems of n-gram mixture modeling

• It is hard for out-of domain LMs to offer adequate adaptation performance since the words in out-of domain LMs often differ from those in the target domain LM.

• Realize high performance even when the target domain data is restricted.

Solution

• Realize a method in which model merging is conducted in a latent variable space in common with robust modeling.

5. Implementation

 N-gram mixture model

Approximately expressed as a single back-off n-gram model.

Hierarchical Pitman-Yor LM is used as an n-gram LM.

 LWLM and LWLM mixture model

An approximate model is constructed by randomly generating text data according to a stochastic process and training a standard back-off n-gram model.

Training 1 (target domain) CSJ academic lecture (3.5M words) Training 2 (out-of domain) CSJ extemporaneous lecture (3.8M words) Development (target domain) CSJ academic lecture (28K words) Test 1 (target domain) CSJ academic lecture (28K words) Test 2 (out-of domain) CSJ extemporaneous lecture (20K words)

Decoder VoiceRex (WFST-based)

Acoustic model Context dependent DNN-HMM

7 hidden layers of 2048 nodes

 Evaluate both robust modeling and domain adaptation

in terms of perplexity (PPL) and word error rate (WER).

Test 1 Test 2

PPL WER(%) PPL WER(%)

N-gram constructed

from Training 1 62.85 21.98 183.38 32.51

LWLM constructed

from Training 1 62.34 21.85 165.87 31.43

N-gram mixture model constructed from each training data set

64.19 20.34 178.71 26.56

LWLM mixture model constructed from each training data set

60.19 19.36 164.46 25.34

Achieve further improvement by using the out-of-domain training data compared with only using the target domain training data.

Proposed LWLM mixture modeling can achieve improvements for both the target domain and the out-of domain data compared with only n-gram mixture modeling.

Model index sequence

Observed word sequence

Latent variable sequence Observed word sequence

Latent variable sequence Observed word sequence Model index sequence

 Employ the corpus of spontaneous Japanese (CSJ).

参照

関連したドキュメント

Monotone domain decomposition algorithm for the parabolic problem (1.2) For solving the nonlinear difference scheme (2.5), we construct and investigate a paral- lel domain

Key words and phrases: category of morphisms, category of epimorphisms, category of monomor- phisms, cartesian lifting, closure operator, codomain functor, cohereditary operator,

Furuta, Log majorization via an order preserving operator inequality, Linear Algebra Appl.. Furuta, Operator functions on chaotic order involving order preserving operator

Byeon, Existence of large positive solutions of some nonlinear elliptic equations on singu- larly perturbed domains, Comm.. Chabrowski, Variational methods for potential

Key words: exterior differential systems; variation of Hodge structure, Noether–Lefschetz locus; period domain; integral manifold; Hodge conjecture; Pfaffian system; Chern

In [3], the category of the domain was used to estimate the number of the single peak solutions, while in [12, 14, 15], the effect of the domain topology on the existence of

Theorem 4.8 shows that the addition of the nonlocal term to local diffusion pro- duces similar early pattern results when compared to the pure local case considered in [33].. Lemma

The damped eigen- functions are either whispering modes (see Figure 6(a)) or they are oriented towards the damping region as in Figure 6(c), whereas the undamped eigenfunctions