0.3 確率 - PowerPoint プレゼンテーション

0.2 0.5 Shift

Reduce

0.3

0.7 構文解析

30 Bi-LSTM Feature Representation

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations [Kiperwasser+ 2016]

•

入力文全体から大域的な特徴量を学習して，依存構造解析に用いる

構文解析

31 Bi-LSTM Feature Representation

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations [Kiperwasser+ TACL 2016]

入力文全体から

Bi-LSTM

で単語の特徴量を学習する：

単純だが，依存構造解析に対して効果が高い．

構文解析

32 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

•

文全体から大域的な特徴量を学習する

[Kiperwasser+ 2016]

•

デコードはさらに単純化して，各単語ごとに独立に依存先

（

head

）の単語を選ぶ（！！）

※ 出力が木構造になる保証はない

The auto maker sold 1000 cars last year.

構文解析

33 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

The auto maker sold 1000 cars last year.

•

遷移型やグラフ型の依存構造解析は，ボトムアップに木を組み立てていく

構文解析

34 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

The auto maker sold 1000 cars last year.

LSTM LSTM

•

遷移型やグラフ型の依存構造解析は，ボトムアップに木を組み立てていく

• Head selection

では，単語ごとに依存先を独立に決定する

構文解析

35 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

•

遷移型やグラフ型の依存構造解析は，ボトムアップに木を組み立てていく

• Head selection

では，単語ごとに依存先を独立に決定する

The auto maker sold 1000 cars last year.

… … P (year | The)

P (1000 | The)

構文解析

36 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

•

遷移型やグラフ型の依存構造解析は，ボトムアップに木を組み立てていく

• Head selection

では，単語ごとに依存先を独立に決定する

The auto maker sold 1000 cars last year.

構文解析

37 Dependency Parsing as Head Selection

Dependency Parsing as Head Selection [Zhang+ 2016]

英語の依存構造解析の結果

•

高精度

•

文長が長くなったときにどの程度の性能か要検証

構文解析（句構造）

38 木構造の線形化（ linearization ）

Vinyals et al., “Grammar as a Foreign Language”, Arxiv, 2015

•

木構造を推定する問題を系列モデリング（

3

層

LSTM

）で解く

構文解析（句構造）

39 木構造の線形化（ linearization ）

Vinyals et al., “Grammar as a Foreign Language”, Arxiv, 2015

•

モデルが不正な木構造を出力する割合は

1.5%

（意外と少ない）

• Attention

を入れないと精度が大きく低下

•

最終的に従来手法とほぼ同等の結果

構文解析

40 • Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles [Cross+ ACL 2016

Outstanding Paper]

• Global Neural CCG Parsing with Optimality Guarantees [Lee+

EMNLP 2016 Best Paper]

それ以外にも

A*

探索で最適な木構造を出力

系列から系列の生成

（ Sequence-to-Sequence Learning ）

41

Seq2Seq Learning

42 • 機械翻訳

• 自動要約

• 質問応答

• 対話

• 文法誤り訂正

応用例：

43 RNN による機械翻訳のモデル化 A B C D X Y Z

A B C D <eos> X Y Z

<eos>

X Y Z

機械翻訳

Sutskever et al., “Sequence to Sequence Learning with Neural Networks”, Arxiv, 2014

44 アテンションに基づく RNN

A B C D <eos> X Y Z

<eos>

X Y Z

どこに「注意」して翻訳するかを学習する

機械翻訳

Bahdanau et al., “Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR, 2015

45 アテンションに基づく RNN

A B C D <eos> X Y Z

<eos>

X Y Z

どこに「注意」して翻訳するかを学習する

機械翻訳

Bahdanau et al., “Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR, 2015

46 アテンションに基づく RNN

A B C D <eos> X Y Z

<eos>

X Y Z

どこに「注意」して翻訳するかを学習する

機械翻訳

Bahdanau et al., “Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR, 2015

47 アテンションに基づく RNN

A B C D <eos> X Y Z

<eos>

X Y Z

どこに「注意」して翻訳するかを学習する

機械翻訳

Bahdanau et al., “Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR, 2015

単語ベース生成モデルの問題

48 単語を出力する系列モデルは出力層の計算が大変

次元数：

~10

⁵（＝語彙数）

~10

⁵次元

~10

²次元

Softmax

関数

入力層中間層出力層

未知語に弱い

系列 - 系列の学習

49 サブ単語ベースの機械翻訳

Neural Machine Translation of Rare Words with Subword Units [Sennrich+ ACL 2016]

• Byte pair encoding

（

BPE

）

[Gage 1994]

を用いて単語分割を行う出現頻度が高い

2

文字を，別の

1

文字に置き換えていくことを繰り返して圧縮する

機械翻訳では，人間と同じ基準の単語分割を行う必要はない

a b c d e f ab c d e f

系列 - 系列の学習

50 サブ単語ベースの機械翻訳

Neural Machine Translation of Rare Words with Subword Units [Sennrich+ ACL 2016]

英語

↔

ドイツ語の機械翻訳

ドキュメント内 PowerPoint プレゼンテーション (ページ 30-52)