7. Neural Networks
Supplement
Forward Propagation
・ ・ ・ t a n h t a n h�
" Vector (Vocab)�
$ Vector (2)�
" Matrix (2, Vocab)�
" Vector (2)Forward Propagation
t a n h
�
# Vector(1)
�
$Vector (2)
�
#
= tanh
�
$
�
$
+
�
$
�
$Matrix (1, 2)
�
$Hyperbolic Tangent
・
tanh � =
./0.1/./2.1/
・
tanh �
3=
./2.1/ 40 ./0.1/ 4./2.1/ 4
=
5 ./2.1/ 4
= 1 −
./2.1/ 405./2.1/ 4
= 1 −
.4/2#2.14/05 ./2.1/ 4
= 1 −
.4/0#2.14/./2.1/ 4
= 1 −
./0.1/ 4 ./2.1/ 4
= 1 − tanh
#�
Why gradient is used ?
・
err =
<
40=
4
#
・Minimizing
err
function
・
>?@@
>A
= 0
→ w is ideal
・
>?@@
>A
> 0
→ w is too big (w -=
λ
>?@@
>A
→ w will be decreased)
・
>?@@
>A
< 0
→ w is too small (w -=
λ
>?@@
>A
→ w will be increased)
Back Propagation
t a n h
�
#Vector (1)
err =
�
#
− �
#
2
�
#
=
derr
d
�
#
=
�
#
− �
�
#
3
=
derr
d
�
$
�
$
+
�
$
=
derr
d
�
#
d
�
#
d
�
$
�
$
+
�
$
=
�
#
1 −
�
#
#
Vector�
3#Back Propagation
t a n h�
$ Vector (2)�
$
=
derr
d
�
$
=
derr
d
�
$�
$+
�
$d
�
$�
$+
�
$d
�
$=
�
#
3
�
$
�
$ Matrix (1, 2)�
$ Vector (1)�
3#Back Propagation
t a n h
t a n h
�
$
3
=
derr
d
�
"
�
"
+
�
"
=
derr
d
�
$
d
�
$
d
�
"
�
"
+
�
"
=
�
$
1 −
�
$
#
�
3$Vector (2)
�
$Back Propagation
・ ・ ・ t a n h t a n h�
" Matrix (2, Vocab)�
" Vector (2)�
"
=
derr
d
�
"
=
derr
d
�
"�
"+
�
"d
�
"�
"+
�
"d
�
"=
�
$
3
�
"
�
3$Vector (2)
�
"Update Weights
・ ・ ・ t a n h t a n h�
" Matrix (2, Vocab)�
" Vector (2)�
"
−= λ
>?@@
>
A
I= λ
�
$
3
>
A
I<
I2
J
I>
A
I= λ
�
$
3
�
"
�
"
−= λ
>?@@
>
J
I= λ
�
$
3
>
A
I<
I2
J
I>
J
I= λ
�
$
3
�
3$Vector (2)
�
"Update Weights
t a n h
�
3#Vector (1)
�
$ Vector (2)�
$ Matrix (1, 2)�
$ Vector (1)�
$
−= λ
>?@@
>
A
K= λ
�
#
3
>
A
K<
K2
J
K>
A
K= λ
�
#
3
�
$
�
$
−= λ
>?@@
>
J
K= λ
�
#
3
>
A
K<
K2
J
K>
J
KSaving Models
・Saving dict to File
・Saving network to file
network = list [ net[0], net[1], …, net[i] ]
net[i] = tuple ( w, b )
w, b = array ([[…],[…], …, […]])
How to save to file ?
from each key, value in dict
Serializer
・Serializer converts object hierarchy into a byte stream
・Saving
・Loading
Network can be saved easily !
import pickle
pickle.dump(file_object, network)
Create Feature
CREATE_FEATURES(x):
create list phi (len = len(ids))
split x into words for word in words
#Training
phi[ids[ UNI: +word]] += 1 #Testing
if UNI: +word in ids:
phi[ids[ UNI: +word]] += 1
Training
create defaultdict ids, array feat_lab
get len(ids)
for each labeled pair x, y in the data
add (create_features(x), y ) to feat_lab initialize net randomly
for I iterations
for each labeled pair φ0 , y in the feat_lab
φ= forward_nn(net, φ0 )
δ'= backward_nn(net, φ, y) update_weights(net, φ, δ', λ)
Testing
read ids from id_file
read net from weights_file
for each x in the data
φ0 = create_features(x)
φ= forward_nn(net, φ0 )