PDFファイル 3H4OS24b オーガナイズドセッション「OS24 Deep Learning 」

(1)

The 28th Annual Conference of the Japanese Society for Artificial Intelligence, 2014

- 1 -

ュ

ッ

ワ

内

力学的解析

Analysis of Deep Neural Network Using by Dynamical Systems Analysis

本武

陽一

*1

岡

瑞起

*2

池

高志

*1

Mototake Yhoichi Oka Mizuki Ikegami Takashi

*1

東京大学総合文化研究科

*2

筑波大学大学院

情報工学研究科

Graduate School of Arts and Science, The University of Tokyo Graduate School of Systems and Information Engineering, University of Tsukuba

Since Hinton et al. (2006) came back with a multilayered feed-forward network, called a deep neural network, many people have started to investigate its potential capability and applications. For example, Google Inc. showed that the deep learning automatically extracted cat face and human body images from the millions of randomly selected youtube images[Quoc 12] .In this study, we compute the information flow within a deep neural network in order to reveal the underlying dynamical systems properties. Unexpected power law behavior of Eigen values computed from the Jacobian matrices of the deep net will be reported.

1. じめに

多層ンけッョン

限界発見以来，ュッ注目度

い，[Hinton 06] い有効学習発

見，比較的簡便深い階層持ュッワ

学習可能．， Deep Neural

Network 以 DNN ，驚異的認識精度記録，

ュッワ，再び脚浴びうい．

，ン以 DL うくい

くい基本的問題，未未解明部分多い． [Saxe 14]

本研究，DNN いッワ内力

学的分析，問題試

．

2. デ

ープラー

ン

ダ

DNN ，次 2 考え．1

，学習中重時間発展あ．う1 ， 1 う

DNN 各階層時間対応付け，層進従変化

ュン発火ン時間発展考え視あ．

本研究特後者視重視．

，時ュン発火時間発展，次式定

義．

(1)

g ン表わ．

，DL 一口言，関連技術範広い．

従，本研究，要因程度う

ン向貢献い知，要素毎

性質調考え．

一方，比較対象，複雑学習実現い，多数

要素組込条件分析．

従本研究，両者採用．

前者，以目う要因い，

一分析．

学習 drop out ,pooling etc.

ンッ種類手書文字,画像 etc.

ッワ構造各層数 etc.

一方，後者分析，分析対象 Convolutional Neural Netやdrop out[Hinton 12]等提供い DL

あ DeCAF[Donahue 13] ，提供い pre

training済重ッ用い．

1.階層方向時間発展

3. 先行研究

3.1 _{pre training}

DNN け，１う対象研究

，[Ganguli 14] あ．研究，特 pre training 着目

，各種近似，無限層ュッ

解析的求い．，100 層 DNN

Restricted Boltzmann Machine 以，RBM 用い pre

training ，合わ pre training ，重行列直交

行列う,初期化い対応結論．同時

，う初期値い，無限階層ュッワ

学習，有限時間収束示い．

研究，入力対，直交性

仮定い．，実際計算，比較的単純

手書文字ッ MINIST[LeCun 98] 用いュ

ョン終わい．

[email protected]

3H4-OS-24b-4





 

i

t j t

ij t i t

j sigmoid g h w Bias

(2)

- 2 -

3.2 縁とpre training

ンュッワ知見，

縁呼ば，系相転移い，高い学習性能実現

ういうあ [Bertschinger 04]．行研究

ン DL 適用，(1)式 g ，output

層周辺分散特性相転移注目，，ッワ全

体特異値関係分析い．特異値，入力層微

小変化，出力層け伝わ表わ値あ， up-down path 重行列転置関係場合， Back Propagation

以 BP ，特異値，微小変化方向 O(1)

程度あ，伝搬有用あわ．

計算結果，g=1付近相転移生， g<1 場合，特

異値総小く，一方 g>1 場合，一部大特

異値持一方，ほ方向特異値非常小い値

持，的偏分布示．従，

状態 BP 対良い状態いえい．一方，

縁 g≒1付近，O(1)程度特異値多く含，

分布現．状態，BP 対適い．

以う， pre training g 値，最適初期状

態得示い．，臨界指数

，g い考え [Bertschinger 04]．例えば，

入力性質や，重分散あ．行研究

，入力分散計算行い．，

実際ッ分散一定い．，結果

実際 pre training 用い示わけい．

3.3 先行研究まとめ

以，行研究，pre training 重直交

行列初期化近い示，初期化

実現状態い，縁周辺 BP 最適

状態え示い．，行研究，

多様ッ pre training 行う言及足

い側面あ考え，本稿着目

分析行．

4. 実験とそ

分析

MINIST 及び，複雑画像ッ

(CIFAR-100[Krizhevsky 09]) 使用，[Hinton 06] 従，RBM 用

い pre training 行．，(1)式あう，ュ

ン連続値ュン，ッワ階層 10 ．，

g=1.05，学習ン数 12,800 ．得

学習結果用い，以う手，特異値計算．

(1)式，

従，各層間変換 Jacobean ，以う．

，ッワ全体変換 Jacobean(J) 以う

求．

行列J J J* 求非負固有値特異値

求．

計算結果， 2， 3 示．結果，

乗う分布いわ．，特異値値

非常小く，前述理由，良い初期化実現

状態言えい．原因，実ン

分散一定い等考え，pre training う

くいいい考え，原因い，

分析発表時説明い．

5. まとめ

本稿 DL pre training後実現

い着目分析結果 BP

情報伝搬やい状態実現縁

実際同様形現い可能性示唆

発表 CNN や drop out 正則化構造

違い等う変化述

DeCAF 学習済用い実

際的条件分析い論予定あ．

2.MNIST 特異値分布

3.Cifar-100 特異値分布

) ( ) ( ) ( ) 1 ( ) ( ) 1 ( t ij t j t j t i t j w h h h h      _                                       ) ( ) ( ) ( ) ( 1 ) ( ) ( ) ( 1 ) ( 1 ) ( 1 ) ( 11 ) ( 1 ) ( 1 ) 1 ( ) ( ) 1 ( 1 ) ( ) 1 ( ) ( 1 ) 1 ( 1 ) ( 1 ) ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( t NN t N t N t N t N t N t N t t t t t t N t N t t N t N t t t t w h h w h h w h h w h h h h h h h h h h J           0 1

J

(3)

- 3 -

参考文献

[Bertschinger 04] Bertschinger, N. and Natschläger, T.Real-time computation at the edge of chaos in recurrent neural networks，Neural Computation, 16(7):1413-1436, 2004. [Donahue 13] Donahue, J.,Jia, Y., Vinyals, O., Ning -Zhang,

J., Tzeng,E., Darrell,T. DeCAF: A Deep Convolutiona- Activation Feature for Generic Visual Recognition，arXiv preprint arXiv:1310.1531 ，2013．

[Hinton 06] Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets.Neural Computation, 18, pp 1527-1554, 2006.

[Hinton 12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov: Improving neural networks by preventing co-adaptation of feature detectors， arXiv:1207.0580v1 ，2012．

[Krizhevsky 09] Krizhevsky,A., Learning Multiple Layers o Features from Tiny Images，2009.

[LeCun 98] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document

recognition. Proceedings of the IEEE, 86(11):2278-2324, November 1998.

[Saxe 14] Saxe, A. M. , Berschinger, N., and Legenstein R. Exact solutions to the nonlinear dynamics of learning in deep linear neural network，NIPS Workshop on Deep Learning , 2013.