サーベイ資料pdf 最近の更新履歴 Ryo Masumura: Web

(1)

Deep Learning ^{×自然言語処理}

増村亮

サーベイ資料

(2)

1 ザックリしいけ、

そ話面そう！

思うこ

本発表目的

(3)

本日話題

Deep Learning ^{×自然言語処理全般}

Embedding ^最近

Deep Learning ^{×機械翻訳}

Neural Machine Translation ^最近

基盤技術編

応用技術編

(4)

3 Deep Learning ^{×自然言語処理全般}

Embedding ^最近

基盤技術編

(5)

Deep Learning ^{×自然言語処理全般}

• Distributed Sentence Representation, Sentence Embedding

• ^可変長 ^系列 ^{固定長ベクトル} ^表現

• ^文 ^さ ^複数文 ^含 ^{ドキュメントへ}

• ^系列 ^系列へ ⁽ ^今日 ^後半 ^話 ⁾

• Distributed Representation of Word, Word Embedding

• ^いわゆ ^Word2Vec

• ^単語 ^意味 ^あ ^{固定長ベクトル} ^表現

ここ数く聞くフレーズ言え、

Distributed Representation ( ^散表現 )

Embedding ( ^情報 ^埋 ^込 )

 2 ^前く ^い ^話題 ^中心

 ^特 ^昨 ^～最近 ^話題 ^中心

本日こ範囲

5 Word Embedding ^研究

• NN Embedding [Bengio+, Journal of MLR 2003]

• RNN Embedding [Mikolov+, NAACL 2013]

• CBOW, Skip-gram [Mikolov+, NIPS 2013 ]

• GloVe [Pennington+, EMNLP 2014]

 ^教師 ^し学習

※

NN: Neural Network

RNN: Recurrent NN (not Recursive in this presentation)

ベクトル向

意味

載い

(7)

Word Sentence ^狙い ^い

 Word Embedding ^狙い

�

[ , , , , , ,…, ]

課題視しい：スパース性

=> 1-hot ^ベクトル ^固定長 ^{連続値ベクトル} ^す

 Sentence Embedding ^狙い

�

� � � �

課題視しい：スパース性、可変長性、系列性

=> ^可変長 ^系列 ^し ^情報 ^固定長 ^{連続値ベクトル} ^す

Embedding

(8)

7 Sentence Embedding ^研究

• Recursive Auto Encoder [Socher+, NIPS 2011]

• Average of Word Vector [Socher+, EMNLP 2013]

• Paragraph Vector [Le and Mikolov, ICML 2014]

• Semi Supervised LSTM [Dai+, Arxiv 201511]

• Recursive NN [Socher+, EMNLP 2013]

• LSTM-RNN [Tai+, ACL 2015]

• CNN [Kalchbrenner+, ACL 2014 ][Kim, EMNLP 2014]

• Tree LSTM [Tai+, ACL 2015][Zhu+, ICML 2015]

 ^教師あ ^学習 (Sentence Classification ^過程 ^学習 ⁾

 ^教師 ^し学習

※

CNN: Convolutional NN

LSTM: Long Short Term Memory

(9)

Sentence Embedding ^教師あ ^学習

CNN ^例

RNN or LSTM ^例

Recursive NN ^例

� � � �

softmax

pooling

convolution

(window size:2)

Word

embedding

Word

embedding

softmax

� � � �

Word

embedding

文類ㄥ例え _Sentiment 類ㄦう学習

(10)

9 Method Data Dim Tuning

[Zhang+, EMNLP 2015] Word2Vec Other 128 Fine-tuned

[Mou+, EMNLP 2015] Word2Vec In-Domain 300 Fixed

[Lei+, EMNLP 2015] GloVe Other 512 Fixed

[Kim, EMNLP 2014] CBOW Other 300 Fine-tuned

[Liu+, EMNLP 2015] Word2Vec Other 100 Fine-tuned

[Tang+, EMNLP 2015] Skip-Gram In-Domain 200 Fixed

[Zeng+, EMNLP 2015] Skip-Gram In-Domain 50 Fixed

Word Embedding for Sentence Embedding

Sentence Embedding ^学習す ^際 Word Embedding ^必須

=> Word Embedding ^自体 ^{事前学習し} ^くこ ^多い

Sentence Embedding ^{学習データ} ^異 ^データ

Word Embedding ^{事前学習し} ^い ^場合 Fine-Tuning ^必須

Word Embedding ^事前学習 ^方例

(11)

Sentence Document ^へ

• Restaurant Review: 1 ^{ドキュメントあ} ⁹ ^文、 ¹⁵⁰ ^単語

• Movie Review: 1 ^{ドキュメントあ} ¹⁴ ^文、 ³²⁵ ^単語

• Stanford Sentimental Treebank ^： ¹ ^文、 ^均 ¹⁹ ^単語

• TREC question Dataset: 1 ^文、 ^均 ¹⁰ ^単語

 Sentence Embedding ^扱う範囲

 Document Embedding ^扱う範囲

 1 ^{文、単語数少} ^い

 ^{複数文、単語数多い}

(12)

11 Document Embedding ^研究

• Hierarchical Neural Auto Encoder [Li+, ACL 2015]

 ^教師あ ^学習 (Document Classification ^過程 ^学習 ⁾

 ^教師 ^し学習 ⁽ ^{言語モデル} ^含 ⁾

• Hierarchical RNNLM [Li ^ｎ +, EMNLP 2015]

• Word RNN-Sentence RNN [Tang+, EMNLP 2015]

• Document Context LM [Ji+, Arxiv 201511]

• Large Context LM [Wang and Cho, Arxiv 201511]

• Skip-Thought Vectors [Kiros+, Arxiv 201506]

※

今日後半話大く関わ研究多い詳細省く

(13)

Document Embedding ^教師あ ^学習

� � � � � � � � � � � �

softmax

Word

embedding

Sentence

embedding

Pooling

Document

embedding

Word RNN-Sentence RNN ^例

文内単語間

系列性捉え

文間

系列性捉え

(14)

13 • ^単語 ^文、そし ^{ドキュメントへ}

• ^今日 ^話さ ^文 ^最小単位 ^し ^扱う

研究面い

Embedding ^魅力

Embedding ^潮流

• ^{固定長ベクトル}

• Feature Engineering ^し ^特徴 ^取

基盤技術編

(15)

Deep Learning ^{×機械翻訳}

Neural Machine Translation ^最近

応用技術編

(16)

15 Deep Learning ^{×機械翻訳}

 ^{フレーズベース翻訳} ^拡張

 Neural Network Joint Model [Devlin+, ACL 2014]

※ _Microsoft 翻訳搭載

 Continuous Space Translation Model [Le+, NAACL 2012]

 Recurrent Continuous Translation Model

[Kalchbrenner+, EMNLP 2013]

※原理的 _NMT 先駆け

 フルニューラルネットワーク翻訳

 Neural Machine Translation

[Cho+, EMNLP 2014][Sutskever+, NIPS 2014]

(17)

A B _C _D <EOS>

X Y _Z

X Y

<EOS>

Z

Encoder

(Input ^{を固定長ベクトル化} )

Decoder

(beam search)

( ^例 ) Input: A B C D

Output: X Y Z

Neural Machine Translation (NMT)

 ニューラルネットワーク

Encoder-Decoder ^{アプローチ} ^機械翻訳

Encoder Decoder RNN

(18)

17 NMT ^利点

 ^言語 ^{トークン区} ^系列 ^いう以外 ^、

言語依特定知識必要い

 ^翻訳問題 ^{直接モデル化可能}

• ^従来 ^{生成モデルベース} ^{デコードし} ^、

MERT ^仮説 ^{リスコアリング}

 ^省メモリ ^動作可能

(19)

NMT ^課題

 ^長い ⁽ ^{入力単語数} ^多い ⁾ ^文 ^翻訳

=> Attention based NMT [Bahdanau+, ICLR 2015]

 ^未知語 ^翻訳

=> UNK Replace [Luong+, ACL 2015]

 ^単言語 ^データ ^利用

=> Deep Fusion [Gulcehre+, ACL 2015]

(20)

19 A B _C _D <EOS>

X

0

1 A B C D

Attention-based NMT

A B _C _D <EOS>

X Y

X

次単語生成時 _Input 側単語

^注目す ^い ^判断

Input ^側 ^隠 ^層 ^出力 ^求

^{足し合わせ} ^次 ^単語 ^生成 ^活 ^す

次単語

生成

B ^要そう

次単語生成

有用情報

い

(21)

X

0

1 A B C D

Attention ^実例

European ^生成す ^際

、 europeenne

注目さ、

自動対応取い

Attention-baed NMT=

自動アライメントし

翻訳しいこ相当

(22)

21 Attention ^効果

Attenton ^、

長い文章対しうく翻訳う

Attention ^し

Attention ^あ

(23)

UNK Replace

珍しい単語

A BBBB _C _D <EOS>

X <UNK>

X

0

1 A BBBB C D

未知語記号

通常、入力珍しい単語 ₍ 未知語 ₎ 対し <UNK> ^記号 ^生成

※ _NMT 現状語彙サイズ入力出力 ₅ 万語程度

UNK Replace=

Attention ^時 <UNK> ^対応す ^入力単語 ^見 ^け、

入力単語単純辞書ベース置換えこ

翻訳文 <UNK> ^入力文 BBBB ^対応す ^こ ^見 ^け

BBBB <UNK>

(24)

23 UNK Replace ^効果

文頻出度ランク ₍ 右いくほほ出現しい単語含文 ₎

UNK Replace ^あ

UNK Replace ^し

UNK Replace

手軽 _NMT 未知語問題解決可能

(25)

Deep Fusion

A B _C _D <EOS> X

X _Y

単言語データ

作 _RNN

隠層

単言語 _RNN

NMT ^統合

す層

パラレルデータ ₍ 限あ ₎ 学習す _NMT 、

単言語データ ₍ 大集 ₎ 直接活すこい

Deep Fusion=

NMT ^{単言語データ} ^構築し RNN ^{ネットワーク内} ^統合

(26)

25 Deep Fusion ^効果

Deep Fusion

単言語リッチ情報活し性能改善可能

※ Shallow Fusion ^{対数確率レベル} ^線形補間

※ パラレルデータ数万単語、単言語データ数十億単語データ

(27)

 Retraining based Adaptation [Luong+, IWSLT 2015]

NMT ^そ ^他 ^工夫

 Source Reversing [Sutskever+, NIPS 2014]

 Ensemble Modeling [Sutskever+, NIPS 2014]

※ 提案さいうほい工夫位置け

(28)

27 Source Revering

通常 _NMT 入力文翻訳文順向入力

Source Reversing =

入力文逆向入こ、

文頭関係捉えすくすこ

A B C D <EOS>

X Y _Z

X Y

<EOS>

Z ^D ^C ^B ^A <EOS>

X Y Z

X Y

<EOS>

Z

順向入力 _逆向 _入力

例え文頭 _{A X}

対応しい場合、

A ^情報 ^活 ^し ^すい

Source Reversing ^効果

BLEU

(29)

Ensemble Modeling

通常 ₁ 回学習推定し単一 _NMT モデル用い

Ensemble Modeling =

初期値隠層大さ変え複数モデル準備し、

複数結果統合出力

A B _C _D <EOS> X Y

<EOS>

Z

X Y Z

モデル ₁

モデル ₂

Ensemble Modeling ^効果

1 ^個 ^モデル

5

(30)

29 Retraining based Adaptation

通常 _NMT 、ランダム初期化対象タスクデータ学習

Retraining based Adaptation =

大 Out-Of-Domain ^{学習データ} ^作 NMT

初期モデルし、対象 _Domain 学習データトレーニング

A B C D <EOS>

X Y _Z

X Y

<EOS>

Z

ランダム初期化

大 Out-Of-Domain ^学習

少対象 _Domain 学習

Retraining based Adaptation ^効果

工程入

入い

(31)

 ^モントリ ^ール大 [Jean+, WMT 2015]

• En->Cs, En->De ^、従来 ^SMT ^以上

• Cs->En, De->En ^、い ^い

 ^スタンフ ^ード大 [Luong+, IWSLT 2015]

• En->Ge ^、従来 ^SMT ^く ^向上

• En->Vietnamese (low-resource) ^、い ^い

NMT ^現状 ^到 ^点

2015 ^開催さ ^{評価型ワークショップ} ^成績

(32)

31  ^他 Encoder-Decoder ^{アプローチ}

• ^音声認識 [Chorowski+ NIPS 2014][Chan+, Arxiv 201508]

• ^{マルチマイクロフ} ^{ン音声認識} [Kim+, Arxiv 201511]

• ^{画像キャプション生成} [Xu+, ICML 2015]

• ^{動画キャプション生成} [Yao+, CORR 2015]

• ^{タスク対話応答生成} [Wen+, EMNLP 2015]

• ^{ープン対話応答生成} [Vinyals and Le, ICML 2015]

• ^{テキスト要約} [Rush+, EMNLP 2015]

NMT ^関わ ^そ ^他 ^研究

 ^{マルチタスク} Encoder-Decoder ^{アプローチ}

• Multi-Task Enocoder-Decoder [Luong+, Arxiv 201511]

(33)

• ^大語彙化 ^手法 ^余地 ^あ

• Deep Fusion ^関連 ^手法 ^余地 ^あ

• UNK Replace ^う ^{実用上有用} ^技術 ^要

• ^今 ^枠組 ^置 ^代わ ^日 ^近い ^？

NMT ^現在

NMT ^今後

• ^急 ^技術 ^発展し ^い

• ^性能 ^出

応用技術編

サーベイ資料pdf 最近の更新履歴 Ryo Masumura: Web

Deep Learning ×自然言語処理

増村 亮

サーベイ資料

1

ザックリし いけ 、

そ 話 面 そう！

思 うこ

本発表 目的

本日 話題

Deep Learning ×自然言語処理全般

Embedding 最近

Deep Learning ×機械翻訳

Neural Machine Translation 最近

基盤技術編

応用技術編

3

Deep Learning ×自然言語処理全般

Embedding 最近

基盤技術編

Deep Learning ×自然言語処理全般

• Distributed Sentence Representation, Sentence Embedding

• 可変長 系列 固定長ベクトル 表現

• 文 さ 複数文 含 ドキュメントへ

• 系列 系列へ ( 今日 後半 話 )

• Distributed Representation of Word, Word Embedding

• いわゆ Word2Vec

• 単語 意味 あ 固定長ベクトル 表現

ここ数 く聞くフレーズ 言え 、

Distributed Representation ( 散表現 )

Embedding ( 情報 埋 込 )

 2 前く い 話題 中心

 特 昨 ～最近 話題 中心

本日 こ 範囲

最新 話題

5

Word Embedding 研究

• NN Embedding [Bengio+, Journal of MLR 2003]

• RNN Embedding [Mikolov+, NAACL 2013]

• CBOW, Skip-gram [Mikolov+, NIPS 2013 ]

• GloVe [Pennington+, EMNLP 2014]

 教師 し学習

※

NN: Neural Network

RNN: Recurrent NN (not Recursive in this presentation)

ベクトル 向

意味

載 い

Word Sentence 狙い い

 Word Embedding 狙い

�

�

[ , , , , , ,…, ]

[ , , , , , ,…, ]

課題視し い ：スパース性

=> 1-hot ベクトル 固定長 連続値ベクトル す

 Sentence Embedding 狙い

�

� � � �

� � � �

課題視し い ：スパース性、可変長性、系列性

=> 可変長 系列 し 情報 固定長 連続値ベクトル す

Embedding

Embedding

7

Sentence Embedding 研究

• Recursive Auto Encoder [Socher+, NIPS 2011]

• Average of Word Vector [Socher+, EMNLP 2013]

• Paragraph Vector [Le and Mikolov, ICML 2014]

• Semi Supervised LSTM [Dai+, Arxiv 201511]

• Recursive NN [Socher+, EMNLP 2013]

• LSTM-RNN [Tai+, ACL 2015]

• CNN [Kalchbrenner+, ACL 2014 ][Kim, EMNLP 2014]

• Tree LSTM [Tai+, ACL 2015][Zhu+, ICML 2015]

 教師あ 学習 (Sentence Classification 過程 学習 )

 教師 し学習

※

CNN: Convolutional NN

LSTM: Long Short Term Memory

Sentence Embedding 教師あ 学習

Deep Learning ^{×自然言語処理}

増村亮

ザックリしいけ、

そ話面そう！

思うこ

本発表目的

本日話題

Deep Learning ^{×自然言語処理全般}

Embedding ^最近

Deep Learning ^{×機械翻訳}

Neural Machine Translation ^最近

Deep Learning ^{×自然言語処理全般}

Embedding ^最近

Deep Learning ^{×自然言語処理全般}

• ^可変長 ^系列 ^{固定長ベクトル} ^表現

• ^文 ^さ ^複数文 ^含 ^{ドキュメントへ}

• ^系列 ^系列へ ⁽ ^今日 ^後半 ^話 ⁾

• ^いわゆ ^Word2Vec

• ^単語 ^意味 ^あ ^{固定長ベクトル} ^表現

ここ数く聞くフレーズ言え、

Distributed Representation ( ^散表現 )

Embedding ( ^情報 ^埋 ^込 )

 2 ^前く ^い ^話題 ^中心

 ^特 ^昨 ^～最近 ^話題 ^中心

本日こ範囲

最新話題

Word Embedding ^研究

 ^教師 ^し学習

ベクトル向

載い

Word Sentence ^狙い ^い

 Word Embedding ^狙い

課題視しい：スパース性

=> 1-hot ^ベクトル ^固定長 ^{連続値ベクトル} ^す

 Sentence Embedding ^狙い

課題視しい：スパース性、可変長性、系列性

=> ^可変長 ^系列 ^し ^情報 ^固定長 ^{連続値ベクトル} ^す

Sentence Embedding ^研究

 ^教師あ ^学習 (Sentence Classification ^過程 ^学習 ⁾

 ^教師 ^し学習

Sentence Embedding ^教師あ ^学習

CNN ^例

RNN or LSTM ^例

Recursive NN ^例

文類ㄥ例え _Sentiment 類ㄦう学習

Sentence Embedding ^学習す ^際 Word Embedding ^必須

=> Word Embedding ^自体 ^{事前学習し} ^くこ ^多い

Sentence Embedding ^{学習データ} ^異 ^データ

Word Embedding ^{事前学習し} ^い ^場合 Fine-Tuning ^必須

Word Embedding ^事前学習 ^方例

Sentence Document ^へ

• Restaurant Review: 1 ^{ドキュメントあ} ⁹ ^文、 ¹⁵⁰ ^単語

• Movie Review: 1 ^{ドキュメントあ} ¹⁴ ^文、 ³²⁵ ^単語

• Stanford Sentimental Treebank ^： ¹ ^文、 ^均 ¹⁹ ^単語

• TREC question Dataset: 1 ^文、 ^均 ¹⁰ ^単語

 Sentence Embedding ^扱う範囲

 Document Embedding ^扱う範囲

 1 ^{文、単語数少} ^い

 ^{複数文、単語数多い}

Document Embedding ^研究

 ^教師あ ^学習 (Document Classification ^過程 ^学習 ⁾

 ^教師 ^し学習 ⁽ ^{言語モデル} ^含 ⁾

• Hierarchical RNNLM [Li ^ｎ +, EMNLP 2015]

今日後半話大く関わ研究多い詳細省く

Document Embedding ^教師あ ^学習

Word RNN-Sentence RNN ^例

文内単語間

系列性捉え

系列性捉え