• 検索結果がありません。

NLC EMNLP 最近の更新履歴 Ryo Masumura: Web

N/A
N/A
Protected

Academic year: 2018

シェア "NLC EMNLP 最近の更新履歴 Ryo Masumura: Web"

Copied!
37
0
0

読み込み中.... (全文を見る)

全文

(1)

国際会議報告 EMNLP2015 (2)

~ 深層学習 観点 ~

NTT Media Intelligence Laboratories

増村 亮

(2)

1

EMNLP2015

深層学習 関連研究 く いあ ?

(3)

eu al , lstm , e u si e , rnn , e u e t ,

cnn , o olutio , dnn , deep ,

e eddi g , dist i uted ep ese tatio

70 /312

次 用語 タ トル 含 論文数 in EMNLP 2015

(4)

3

70 フォロヸし 発表 困難

2 話題 焦点 当 、

面白 論文 いく 紹

Distributed Representation 関連す 研究

Encoder-Decoder 関連す 研究

※ 系列ラ リング RNNLSTM 適用す 研究 、

発表 除外

(5)

Distributed Representation 関連す 研究

(6)

5

- 文やドキュメント 可変長 系列

固定長 ベクトル 表現

1 年前~最近 話題 中心

2 年前く い 話題 中心

- 単語 分散表現、いわゆ Word2Vec

- 単語 分散表現 ツヸル化

(7)

• CBOW, Skip-gram [Mikolov+, NIPS 2013 ]

• GloVe [Pennington+, EMNLP 2014]

Word Vector

Sentence Vector

• Average of Word Vector

• Recursive-NN [Socher+, EMNLP 2011]

• Paragraph Vector [Le and Mikolov, ICML 2014]

• LSTM-RNN [Tai+, ACL 2015]

• CNN [Kalchbrenner+, ACL 2014 ][Kim, EMNLP 2014]

• Recursive LSTM [Tai+, ACL 2015][Zhu+, ICML 2015]

○ 従来研究

(8)

7

CNN

RNN or LSTM

Recursive NN

� � � �

� � � �

� � � �

softmax softmax

softmax pooling

convolution (window size:2)

embedding embedding

embedding

Sentence Vector 識別器

(9)

EMNLP2015 け 研究

分散表現 改良

[Mou+, EMNLP 2015] [Lei+, EMNLP 2015]

複数含 ドキュメント 分散表現

[Tang+, EMNLP 2015][Lin+, EMNLP 2015][Liu+, EMNLP 2015]

• 2 文間 関係 レクト モデル化

[Zhang+, EMNLP2015][He+, EMNLP 2015]

(10)

9

紹 論文 -1

Document Modeling with Gated Recurrent Neural Network

for Sentiment Classification [Tang+, EMNLP 2015]

□ 研究課題

□ 内容

ドキュメント

複数 文 含 陽 捉え 、

Sentence Vector Gated RNN 捉え モデル化 提案

複数 ドキュメント単位 Sentiment 識別し

従来 Sentence Vector 跨い 関係 捉え

(11)

Document

Vector

Sentence

Vector

Word

Vector

ドキュメント ラベル 組 、

誤差逆伝搬 基 教師あ 学習

1 文目 2 文目 N 文目

(12)

11

Restaurant Review (5 class)

Movie Review (10 class)

SVM - Bag of Words 61.1 39.9

SVM - Average of Word Embedding 56.8 31.9

Sentence Softmax- Paragraph Vector 60.5 34.1

Sentence Softmax - CNN 61.5 37.6

Document Softmax - CNN-GRNN

66.0 42.5

Document Softmax - LSTM-GRNN

67.6 45.3

• Restaurant Review: 1 ドキュメントあ 9 文、 150 単語

• Movie Review: 1 ドキュメントあ 14 文、 325 単語

Stanford Sentimental Treebank 1 文、 19 単語

ドキュメント単位

Sentiment Classification タスク 評価

(13)

紹 論文 -2

Hierarchical Recurrent Neural Network for Document

Modeling [Lin+, EMNLP 2015]

ドキュメント 並び 捉え RNN

通常 単語 い RNN (RNN 言語モデル )

階層的 融合し モデル化 提案

複数 デヸタ 適用可能 言語モデル

従来 技術 、ドキュメント中 並び 考慮

□ 研究課題

□ 内容

(14)

13

Sentence 並び

制御す RNN

(Sentence

Bag-of-Words 捉え )

Word 並び

制御す RNN

( いわゆ RNNLM)

ドキュメント群 教師 し学習

Sentence 並び モデル 学習し、

そ 固定し Word 並び モデル 学習

(15)

Perplexity

RNNLM 183

Hierarichical RNNLM 174

複数 文 含 ドキュメント 用い

IWSLT2014 コヸパス 学習ヷ評価

行 場合 パヸプレキシテ

(16)

15

○ ?

Method Data size Dim Tuning

[Zhang+, EMNLP 2015] Word2Vec Other 128 Fine-tuned

[Mou+, EMNLP 2015] Word2Vec In-Domain 300 Fixed

[Lei+, EMNLP 2015] GloVe Other 512 Fixed

[Zhang+, EMNLP 2015] Skip-gram Other 50 ?

[Kim, EMNLP 2014] CBOW Other 300 Fine-tuned

[Liu+, EMNLP 2015] Word2Vec Other 100 Fine-tuned

[Tang+, EMNLP 2015] Skip-Gram In-Domain 200 Fixed

[Zeng+, EMNLP 2015] Skip-Gram In-Domain 50 Fixed

Sentence Vector 構築 必要 Word Vector

うや 構築し い ? =>

(17)

タスク外 デヸタ 作 Word Vector

Fine-Tuning 必須

Word Vector

学習デヸタ

様々 変え 実験

Fine-Tuning 有無 in Opinion Mining

[Liu+, EMNLP 2015]

(18)

17

Encoder-Decoder 関連す 研究

(19)

• Neural Machine Translation [Kalchbrenner and Blunsom,

EMNLP 2013][Cho+, EMNLP 2014][Sutskever+, NIPS 2014]

• Neural Conversational Model [Vinyals and Le, Arxiv 2015]

Encoder-Decoder

A B C D <EOS>

X Y Z

X Y

<EOS>

Z

Encoder

(Input を固定長ベクトル化 )

Decoder

( ) Input: A B C D

Output: X Y Z

(20)

19

• Neural Machine Translation [Bahdanau+, ICLR 2015]

• Speech Recognition [Chorowski+, Arxiv 2015]

• Image Caption Generation [Xu+, ICML 2015]

A B C D <EOS>

X Y Z

X Y

<EOS>

Z 0

1

A B C D

○ 最近 Encoder-Decoder

= Attention-based Model

デコヸド時 、 ンプット側 各隠 層 出力 重 け

足し合わせ 利用 ( 注目す 決定 )

(21)

EMNLP2015 け 研究

• Neural Machine Translation State-of-the-art

[Jean+, WMT 2015]

• Attention-based プロヸチ 改良

[Luong+, EMNLP 2015]

別タスク Encoder-Decoder 利用

- タスク型対話 応答生成 [Wen+, EMNLP 2015]

- 要約文生成 [Rush+, EMNLP 2015]

- 圧縮文生成 [Filippova+, EMNLP 2015]

(22)

21

紹 論文 -1 (Best paper)

Semantically Conditioned LSTM-based Natural Language

Generation for Spoken Dialogue Systems [Wen+, EMNLP 2015]

• LSTM デコヸダ

入力 Dialog Act 制御す セル 直接導入

• Dialog Act 情報 入力し LSTM 応答文 生成す 枠組

• Hand-Crafted 仕組 システム 構築す 困難

• Dialog Act 入力し 出力す 枠組 く作

□ 研究課題

□ 内容

(23)

- 生成済 入力情報 以後注目し

Dialog Act to Sentence 仕組

※ 従来 ヒュヸリステ ック 生成済 情報 消し い [Wen+, SIGDIAL 2015]

□ や い

Dialog Act 情報

単語コンテキスト

用い 逐次生成

(24)

23

単語 生成時

Dialog Act 情報

使う 自動制御

Semantic Controlled LSTM cell

(25)

BLUE ERR (%)

Hand-Crafted 0.56 0.00

K-NN 0.68 1.87

LSTM without heuristics 0.81 1.93

LSTM with heuristics 0.81 1.53

SC LSTM 0.80 0.78

Deep SC LSTM 0.83 0.41

ホテル検索タスク BLEU Slot Error Rate (ERR) 評価

Input:

i fo t pe = hotel , ou t= , dogsallowed= dontcare

Output:

There are 182 hotels if you do not care whether dogs are allowed

(26)

25

紹 論文 -2

A Neural Attention Model for Sentence Summarization

[Rush+, EMNLP 2015]

• Encoder-Decoder

Attention ベヸス 方法 文要約 提案

抽出型 い生成型 文要約 対象 研究

従来 生成型 文要約

クラシカル 統計的機械翻訳 基 く

□ 研究課題

□ 内容

(27)

要約文

alls 生成す

原文 各単語

い 重

○要約生成 け Attention 様子

(28)

27

ROUGE-1

Sentence Compression 19.77

Traditional SMT (MOSES) with MERT 26.50

Attention-based NMT

( 研究 NMT NNLM ベヸス ) 26.55

Attention-based NMT with MERT 28.18

DAC-2004 コヸパス 使 学習ヷ評価

原文

australian foreign minister stephen smith sunday congratulated

new zealand ’s e p i e i iste -elect john key as he praised

ousted leader helen clark as a guts a d espe ted politi ia .

要約文 ( 人手文 比 しい )

australian foreign minister congratulates smith new zealand as leader .

(29)

終わ

(30)

29

EMNLP2015

深層学習 関連す 研究 25% 程度

□ 文 分散表現 (20 単語程度 )

ドキュメント (100-300 単語、複数文 ) 分散表現

Encoder-Decoder 発展途上 確実 進歩、

様々 生成タスク 適用

(31)

□ や 重要!

- Encoder-Decoder プロヸチ ドメ ン適応

□ 見 け い 狙い目?

- Connectionist Temporal Classification

利用 [Graves+, ICML 2006]

○ 後?

(32)

31

[Loung+, EMNLP 2015]

Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, "Effective Approaches to Attention-based Neural Machine Translation", In Proc. EMNLP, pp.1412-1421, 2015.

[Tang+, EMNLP 2015]

Doyu Tang, Bing Qin, and Ting Liu, "Document Modeling with Gated Recurrent Neural Network for Sentiment Classification", In Proc. EMNLP, pp.1422-1432, 2015.

[Tai+, ACL 2015]

Kai Sheng Tai, Richard Socher, a d Ch istophe D. Ma i g, I p o ed “e a ti

Representations From Tree-Structured Long Short-Te Me o Net o ks , I P o . ACL, pp.1556-1566, 2015.

[Lin+, EMNLP 2015]

Rui Lin, Shujie Liu, Muyun Ya g, Mu Li, Mi g )hou, a d “he g Li, Hie a hi al Re u e t Neu al Net o k fo Do u e t Modeli g , I P o . EMNLP, pp. -907, 2015.

[Lei+, EMNLP 2015]

Tao Lei, Regina Barzilay, and Tommi Jakkola, Modeli g CNNs fo te t: o -linear, non- o se uti e o olutio s , I P o . EMNLP, pp. -1575, 2015.

(33)

[Mou+, EMNLP 2015]

Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, and Zhi Jin, Dis i i ati e Neu al “e te e Modeling by Tree-Based Co olutio , I P o . EMNLP, pp. -2325, 2015.

[Liu+ EMNLP 2015a]

Pengfei Liu, Xipeng Qiu, XinchiChen, Shiyu Wu, and Xuanjing Hua g, Multi-Timescale Long Short-Te Me o Neu al Net o k fo Modelli g “e te es a d Do u e ts , I Proc. EMNLP, pp.2326-2335, 2015.

[Liu+ EMNLP 2015b]

Pengfei Liu, Shafiq Joty, and Helen Meng, Fi e-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings , I P o . EMNLP, pp. -1443, 2015.

[Zhang+, EMNLP 2015]

Biao Zhang, Jinsong Su, Deyi Xiong, Yaojie Lu, Hong Duan, and Junfeng Yao, “hallo Co olutio al Net o k fo I pli it Dis ou se Relatio Re og itio , I P o . EMNLP, pp.2230-2235, 2015.

[He+, EMNLP 2015]

Hua He, Kevin Gimpel, a d Ji Li , Multi-Perspective Sentence Similarity Modeling ith Co olutio al Neu al Net o ks , I P o . EMNLP, pp. -1586, 2015.

(34)

33

[Kalchbrenner+, ACL 2014]

Nal Kalchbrenner, Edward Grefenstette, and Phi Blunsom, A Co olutio al Net o k fo Modelli g “e te es , I P o . ACL, pp. -665, 2014.

[Rush+, EMNLP 2015]

Alexander M. Rush, Sumit Chop a, a d Jaso Westo , A Neu al Atte tio Model fo

“e te e “u a izatio , I P o . EMNLP, pp. -389, 2015.

[Wen+, SIGDIAL 2015]

Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, David Vandyke, a d “te e You g, “to hasti La guage Ge e atio i Dialogue usi g Re u e t Neu al Networks with Convolutional Sentence Reranking , I P o . “IGDIAL, pp. -284, 2015. [Wen+, EMNLP 2015]

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve You g, “e a ti all Co ditio ed L“TM-based Natural Language Generation for Spoken Dialogue “ ste s , I P o . EMNLP, pp. -1721, 2015.

[Filippova+, EMNLP 2015]

Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, and Oriol Vinyals,

“e te e Co p essio Deletio ith L“TMs , I P o . EMNLP, pp. -368, 2015.

(35)

[Kim, EMNLP 2014]

Yoo Ki , Co olutio al Neu al Net o ks fo “e te e Classifi atio , I P o . EMNLP, pp.1746-1751, 2014.

Zeng+, EMNLP 2015]

Daojian Zeng, Kang Liu, Yubo Che , a d Ju )hao, Dista t “upe isio fo Relatio E t a tio ia Pie e ise Co olutio al Neu al Net o ks , I P o . EMNLP, pp. - 1762.

[Pennington+, EMNLP 2014]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning, Glove: Global Vectors for Word Rep ese tatio In Proc. EMNLP 2014, pp.1532-1543, 2014.

[Socher+, EMNLP 2011]

Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Ma i g, “e i-supervised recursive autoencoders for predicting sentiment

dist i utio s I P o . EMNLP, pp. -161, 2011.

[Li and Mikolov, ICML 2014]

Quoc V. Le, and Tomas Mikolov, Dist i uted ep ese tatio s of se te es a d do u e ts , I P o . ICML, pp. -1196, 2014.

(36)

35

[Zhu+, ICML 2015]

Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo, Lo g “ho t-Term Memory Over Tree

“t u tu es , I P o . ICML, pp. -1612, 2015.

[Kalchbrenner and Blunsom, EMNLP 2013]

Nal Kalchvrenner and Phil Blunsom, Re u e t Co ti uous T a slatio Models , I Proc. EMNLP, pp.1700-1709, 2013.

[Sutskever+, NIPS 2014]

Ilya Sutskever, Oriol Vinyals, a d Quo V. Le, “e ue e to “e ue e Lea i g ith Neu al Net o ks , I P o . NIP“

[Cho+, EMNLP 2014]

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, Lea i g Ph ase Rep ese tatio s using RNN Encoder–De ode fo “tatisti al Ma hi e T a slatio , I P o . EMNLP, pp.1724-1734, 2014.

[Vinyals+, Arxiv 2015]

Oriol Vinyals, Quo V. Le, A Neu al Co e satio al Model , Arxiv, 2015.

(37)

[Bahdanau+, Arxiv 2014]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, Neural Machine Translation joi tl lea i g to alig a d t a slatio , Arxiv, 2014.

[Bahdanau+, Arxiv 2015]

Dzmitry Bahdanau, Jan Chorowskiy, Dmitriy Serdyukz, Philemon Brakelz, and Yoshua Bengio, E d-to-End Attention- ased La ge Vo a ula “pee h Re og itio , Arxiv 2015.

[Xu+, ICML 2015]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan

Salakhudinov, Rich Zemel, and Yoshua Bengio, “ho , Atte d a d Tell: Neu al I age Captio Ge e atio ith Visual Atte tio , I P o . ICML, pp. -2057, 2015.

参照

関連したドキュメント

One dimensional classification problem is used for simulation to show the validity of adding one randomly selected data to a pair of the boundary data.. The location of the boundary

The connection weights of the trained multilayer neural network are investigated in order to analyze feature extracted by the neural network in the learning process. Magnitude of

[Publications] M.Tsuchiya: &#34;Some analytical aspecl of diflusion processes with obligue reflection&#34; Japan-Russion Symposium on Probability Theory and.

[r]

In the present paper, the methods of independent component analysis ICA and principal component analysis PCA are integrated into BP neural network for forecasting financial time

In the previous discussions, we have found necessary and sufficient conditions for the existence of traveling waves with arbitrarily given least spatial periods and least temporal

Li, “Simplified exponential stability analysis for recurrent neural networks with discrete and distributed time-varying delays,” Applied Mathematics and Computation, vol..

Q discrep : Predefined empirical constant corresponding to the minimum value of the module of total discrepancy between estimated gas supply volumes, which is of practical