8. Recurrent
Neural Networks
Supplement
Recurrent
Neural Networks
(Last Week)
・ ・ ・
t a n h
t a n h
t a n h
Sentence Vector (Vocab)
Label Vector
(1) Hidden
Recurrent Neural Networks
Word Vector �"
(Vocab) POS Vector �"
(POS Type)
Hidden Vector ℎ"
(2)
・ ・ ・ ・
・ ・ ・ ・
tanh
softmax
tanh
・ ・ ・ ・
・ ・ ・ ・
tanh
softmax
tanh
� = 0 � = 1
Forward Propagation
(t = 0)
Word Vector �)
(Vocab) POS Vector �)
(POS Type)
Hidden Vector ℎ)
(2)
・ ・ ・ ・
・ ・ ・ ・
tanh
softmax
tanh
� = 0
ℎ) = tanh �/0 �) + �/ �) = softmax �89 ℎ) + �8
Matrix �89
(POS Type, 2)
Matrix �/0
Forward Propagation
(t > 0)
Word Vector �"
(Vocab) Previous Hidden
Vector ℎ":;
(2)
・ ・ ・ ・
・ ・ ・ ・
tanh
softmax
tanh
� > 0
POS Vector �"
(POS Type)
Hidden Vector ℎ"
(2)
Matrix �/0
(2, Vocab) Matrix �89
(POS Type, 2)
Matrix �/9
(2, 2)
Back Propagation 1
(t = max)
・ ・ ・ ・
softmax
POS Vector �"
(POS Type)
Hidden Vector ℎ"
(2) Matrix �89
(POS Type, 2)
�8C = derr d �89 ℎ" + �8
= �_correct" − �"
�/ = derr dℎ"
= derr d �89 ℎ" + �8
d �89 ℎ" + �8
dℎ"
= �8C�89
∆�89 = derr
d�89 =
derr d �89 ℎ" + �8
d �89 ℎ" + �8
d�89
= �8Cℎ"
∆�8 = LMNN
LOP =
LMNN
L QPR 9STOP
L QPR 9STOP
LOP = �8
Back Propagation 2
tanh tanh
�/
�
/C=
derr
d
�
/9ℎ
":;+
�
/0�
"+
�
/=
derr
d
ℎ
"d
ℎ
"d
�
/9ℎ
":;+
�
/0�
"+
�
/=
�
/1 −
ℎ
"UHidden Vector ℎ"
Back Propagation 3
Word Vector �"
(Vocab) ・ ・ ・ ・
tanh tanh
Matrix �/0
(2, Vocab)
�/C
∆�/0 = derr d�/0
= derr
d �/9 ℎ":; +�/0 �" +�/
d �/9 ℎ":; +�/0 �" +�/
d�/0
= �/C�"
∆�/ = derr d�/ =
derr
d �/9 ℎ":; + �/0 �" + �/
d �/9 ℎ":; + �/0 �" + �/
d�/
Back Propagation 4
(t > 0)
Previous Hidden Vector ℎ":;
(2)
tanh tanh
Matrix �/9
(2, 2)
�/C
∆�/9 = derr
d�/9
= derr
d �/9 ℎ":; + �/0 �" + �/
d �/9 ℎ":; + �/0 �" + �/
d�/9
Back Propagation 1
(t < max)
・ ・ ・ ・
softmax
POS Vector �"
(POS Type)
Hidden Vector ℎ"
(2) Matrix �89
(POS Type, 2)
�8C = derr d �89 ℎ" + �8
= �_correct" − �"
�/ = derr dℎ" +
�����T� ���
= derr d �89 ℎ" + �8
d �89 ℎ" + �8
dℎ" + �� C�
��
= �8C�89 + ��C���
Next Hidden Vector ℎ"T;
(2)
Matrix �/9
(2, 2)
∆�89 = derr
d�89 =
derr d �89 ℎ" + �8
d �89 ℎ" + �8
d�89
= �8Cℎ"
∆�8 = LMNN
LOP =
LMNN
L QPR 9STOP
L QPR 9STOP
LOP = �8
Forward RNN
forward_rnn(net, x)
h = [ ] # 隠れ層の値(各時間 t において)
p = [ ] # 出力の確率分布の値(各時間 t において) y = [ ] # 出力の確率分布の値(各時間 t において) for each time t in 0 .. len(x)-1:
if t > 0:
h[t] = tanh(w_rx x[t] + w_rh h[t-1] + b_r ) else:
h[t] = tanh(w_rx x[t] + b_r ) p[t] = softmax(w_oh h[t] + b_o ) y[t] = find_max(p[t])
Training
create defaultdict x_ids, y_ids, array data
get len(x_ids), len(y_ids)
for each sentence in the data create x_list, y_list
for each labeled pair x, y in the sentence
add create_onehot(x, x_ids) to x_list, create_ids(y, y_ids) to y_list add (x_list, y_list) to feat_lab
initialize net randomly #w_rx , w_rh, b_r , w_oh, b_o
for I iterations
for each labeled pair x , y_correct in the feat_lab h, p, y_predict = forward_rnn(net, x)
Δ = gradient_rnn(net, x, h, p, y_correct)
update_weights(net, Δ, λ)
Testing
read ids from id_file
read net from weights_file
for each sentence in the data create x_list
for each x in the sentence
add create_ids(x, x_ids) to x_list h, p, y_list = forward_rnn(net, x_list)