補足資料プログラミングの基礎勉強会：チュートリアル首都大学東京自然言語処理研究室（小町研）

(1)

8. Recurrent

Neural Networks

Supplement

(2)

Recurrent

(3)

Neural Networks

(Last Week)

・・・

t a n h

Sentence Vector (Vocab)

Label Vector

(1) Hidden

(4)

Recurrent Neural Networks

Word Vector �_"

(Vocab) POS Vector �_"

(POS Type)

Hidden Vector ℎ_"

(2)

・・・・

tanh

softmax

tanh

・・・・

tanh

softmax

tanh

� = 0 � = 1

(5)

Forward Propagation

(t = 0)

Word Vector �₎

(Vocab) POS Vector �₎

(POS Type)

Hidden Vector ℎ₎

(2)

・・・・

tanh

softmax

tanh

� = 0

ℎ₎ = tanh �_/0 �₎ + �_/ �₎ = softmax �₈₉ ℎ₎ + �₈

Matrix �₈₉

(POS Type, 2)

Matrix �_/0

(6)

Forward Propagation

(t > 0)

Word Vector �_"

(Vocab) Previous Hidden

Vector ℎ_":;

(2)

・・・・

tanh

softmax

tanh

� > 0

POS Vector �_"

(POS Type)

Hidden Vector ℎ_"

(2)

Matrix �_/0

(2, Vocab) Matrix �₈₉

(POS Type, 2)

Matrix �_/9

(2, 2)

(7)

Back Propagation 1

(t = max)

・・・・

softmax

POS Vector �_"

(POS Type)

Hidden Vector ℎ_"

(2) Matrix �₈₉

(POS Type, 2)

�₈C = derr d �₈₉ ℎ_" + �₈

= �_correct_" − �_"

�_/ = derr dℎ_"

= derr d �₈₉ ℎ_" + �₈

d �₈₉ ℎ_" + �₈

dℎ_"

= �₈C�₈₉

∆�₈₉ = derr

d�₈₉ =

derr d �₈₉ ℎ_" + �₈

d �₈₉ ℎ_" + �₈

d�₈₉

= �₈Cℎ_"

∆�₈ = LMNN

LO_P =

LMNN

L Q_PR 9_STO_P

LO_P = �8

(8)

Back Propagation 2

tanh tanh

�_/

�

_/C

=

derr

d

�

_/9

ℎ

_":;

+

�

_/0

�

_"

+

�

_/

=

derr

d

ℎ

_"

d

ℎ

_"

d

�

_/9

ℎ

_":;

+

�

_/0

�

_"

+

�

_/

=

�

_/

1 −

ℎ

_"U

Hidden Vector ℎ_"

(9)

Back Propagation 3

Word Vector �_"

(Vocab) ・・・・

tanh tanh

Matrix �_/0

(2, Vocab)

�_/C

∆�_/0 = derr d�_/0

= derr

d �_/9 ℎ_":; +�_/0 �_" +�_/

d�_/0

= �_/C�_"

∆�_/ = derr d�_/ =

derr

d �_/9 ℎ_":; + �_/0 �_" + �_/

d�_/

(10)

Back Propagation 4

(t > 0)

Previous Hidden Vector ℎ_":;

(2)

tanh tanh

Matrix �_/9

(2, 2)

�_/C

∆�_/9 = derr

d�_/9

= derr

d �_/9 ℎ_":; + �_/0 �_" + �_/

d�_/9

(11)

Back Propagation 1

(t < max)

・・・・

softmax

POS Vector �_"

(POS Type)

Hidden Vector ℎ_"

(2) Matrix �₈₉

(POS Type, 2)

�₈C = derr d �₈₉ ℎ_" + �₈

= �_correct_" − �_"

�_/ = derr dℎ_" +

��_�T� ��_�

= derr d �₈₉ ℎ_" + �₈

d �₈₉ ℎ_" + �₈

dℎ_" + �� C_�

��

= �₈C�₈₉ + �_�C�_��

Next Hidden Vector ℎ_"T;

(2)

Matrix �_/9

(2, 2)

∆�₈₉ = derr

d�₈₉ =

derr d �₈₉ ℎ_" + �₈

d �₈₉ ℎ_" + �₈

d�₈₉

= �₈Cℎ_"

∆�₈ = LMNN

LO_P =

LMNN

L Q_PR 9_STO_P

LO_P = �8

(12)

(13)

Forward RNN

forward_rnn(net, x)

h = [ ] # 隠れ層の値（各時間 t において）

p = [ ] # 出力の確率分布の値（各時間 t において） y = [ ] # 出力の確率分布の値（各時間 t において） for each time t in 0 .. len(x)-1:

if t > 0:

h[t] = tanh(w_rx x[t] + w_rh h[t-1] + b_r ) else:

h[t] = tanh(w_rx x[t] + b_r ) p[t] = softmax(w_oh h[t] + b_o ) y[t] = find_max(p[t])

(14)

Training

create defaultdict x_ids, y_ids, array data

get len(x_ids), len(y_ids)

for each sentence in the data create x_list, y_list

for each labeled pair x, y in the sentence

add create_onehot(x, x_ids) to x_list, create_ids(y, y_ids) to y_list add (x_list, y_list) to feat_lab

initialize net randomly #w_rx , w_rh, b_r , w_oh, b_o

for I iterations

for each labeled pair x , y_correct in the feat_lab h, p, y_predict = forward_rnn(net, x)

Δ = gradient_rnn(net, x, h, p, y_correct)

update_weights(net, Δ, λ)

(15)

Testing

read ids from id_file

read net from weights_file

for each sentence in the data create x_list

for each x in the sentence

add create_ids(x, x_ids) to x_list h, p, y_list = forward_rnn(net, x_list)

補足資料 プログラミングの基礎勉強会：チュートリアル 首都大学東京 自然言語処理研究室（小町研）

8. Recurrent

Neural Networks

Supplement

Recurrent

Neural Networks

(Last Week)

Recurrent Neural Networks

Forward Propagation

(t = 0)

Forward Propagation

(t > 0)

Back Propagation 1

(t = max)

Back Propagation 2

�

=

derr

d

�

ℎ

+

�

�

+

�

=

derr

d

ℎ

d

ℎ

d

�

ℎ

+

�

�

+

�

=

�

1 −

ℎ

Back Propagation 3

Back Propagation 4

(t > 0)

Back Propagation 1

(t < max)

Forward RNN

Training

Testing

補足資料プログラミングの基礎勉強会：チュートリアル首都大学東京自然言語処理研究室（小町研）