DNN の事前学習への適用

36 付録A RBMの学習

付録 B

Randomized Rounding ^法

本研究の離散化，量子化学習手法としているRandomizedRounding^法（RR^法）[18]^について説明する．RR法の定義は以下のように表される．今，離散空間 Ω = {kγ : γ ∈ R, k ∈Z}^{に対して，}

w^new =w^old+ϵ∆w (B.1)

D_Ω(w^new) = {

LΩ(w^new) with probability (GΩ(w^new)−w^new)/RΩ(w^new)) GΩ(w^new) with probability (w^new−LΩ(w^new))/RΩ(w^new))

(B.2) ただし，式(B.1)^はBP学習での荷重値更新規則である．これは学習時のある時間ステップにおける荷重値 w^old に対して学習率 ϵ をかけた更新量 ∆w によって新しい荷重値 w^new に更新するという意味である．RR法ではこのw^new を式(B.2)の確率によって確率的に離散値に丸める．G_Ω(w^new)，L_Ω(w^new)は離散幅R_Ω(w^new)の差を持つ隣り合う離散値でGΩ(w^new)> LΩ(w^new)であり，RΩ(w^new) =GΩ(w^new)−LΩ(w^new)である．

図B.1 RR^{法の模式図}

38 付録B Randomized Rounding法ここで，RR法によって得られる離散値D_Ω(w^new)の期待値は式(B.3)のように表される．

E(D_Ω(w^new)) =w^new (B.3)

よって，離散化を行った後の離散値の期待値は離散化前の連続値に一致することから，離散化を行っても連続値と同様の学習則を適用し，学習を進めることができるのだと考えられる．

ただし，本研究では2のベキ乗で離散化を行うため，離散空間は以下のように定義される．

Ω ={w :|w| ≤1, w∈2^x}(

x:discrete grid width)

(B.4)

付録 C

2 章での用いたネットワーク構成

表C.1 ^{各隠れ層の構造}(784-hidden-10)^{の全ネットワーク}

3層 4層 5層 6層

30 50 - 100 50 - 100 - 150 50 - 100 - 150 - 200

50 50 - 150 50 - 150 - 300 50 - 150 - 300 - 500

100 50 - 250 50 - 250 - 500 50 - 250 - 500 - 700

150 100 - 100 100 - 100 - 100 100 - 100 - 100 - 100

200 100 - 150 100 - 100 - 150 100 - 100 - 100 - 200

250 100 - 200 100 - 100 - 200 100 - 100 - 150 - 150

300 100 - 300 100 - 100 - 300 100 - 100 - 200 - 100

350 100 - 400 100 - 100 - 400 100 - 100 - 400 - 400

375 150 - 100 100 - 150 - 150 100 - 150 - 150 - 100

400 150 - 150 100 - 200 - 100 100 - 200 - 100 - 100

500 150 - 200 100 - 300 - 100 100 - 400 - 400 - 100

550 150 - 350 100 - 400 - 400 150 - 100 - 100 - 150

600 200 - 100 150 - 100 - 100 150 - 150 - 100 - 100

700 200 - 150 150 - 150 - 100 200 - 100 - 100 - 100

750 200 - 200 150 - 150 - 150 200 - 150 - 100 - 50

900 200 - 400 150 - 150 - 200 200 - 200 - 200 - 400

1000 200 - 550 150 - 200 - 150 200 - 200 - 400 - 200

1500 200 - 600 200 - 100 - 100 200 - 200 - 550 - 550

2000 250 - 250 200 - 150 - 100 200 - 400 - 200 - 200

40 付録C 2章での用いたネットワーク構成

3層 4層 5層 6層

250 - 750 200 - 150 - 150 200 - 550 - 550 - 200

300 - 300 200 - 200 - 200 250 - 250 - 250 - 250

300 - 350 200 - 200 - 400 300 - 300 - 300 - 300

300 - 900 200 - 200 - 550 375 - 375 - 375 - 375

350 - 150 200 - 200 - 600 400 - 100 - 100 - 400

350 - 300 200 - 400 - 200 400 - 200 - 200 - 200

350 - 350 200 - 550 - 550 400 - 400 - 100 - 100

375 - 375 200 - 600 - 200 500 - 300 - 100 - 30

400 - 100 250 - 250 - 250 500 - 300 - 150 - 50

400 - 200 300 - 100 - 100 500 - 500 - 500 - 500

400 - 400 300 - 300 - 300 550 - 200 - 200 - 550

500 - 300 300 - 300 - 900 550 - 550 - 200 - 200

500 - 500 300 - 350 - 300 700 - 500 - 300 - 100

500 - 1000 300 - 350 - 350 700 - 500 - 250 - 50

550 - 200 300 - 900 - 300 1000 - 1000 - 1000 - 500

550 - 550 350 - 300 - 350 1000 - 1000 - 1000 - 1000

600 - 200 350 - 350 - 300 1000 - 2000 - 3000 - 4000

700 - 400 375 - 375 - 375 1500 - 500 - 250 - 30

700 - 500 400 - 100 - 100 2000 - 2000 - 2000 - 2000

750 - 250 400 - 200 - 200

750 - 750 500 - 300 - 100

900 - 300 500 - 300 - 150

1000 - 500 500 - 500 - 500

1000 - 1000 550 - 200 - 200

1000 - 2000 550 - 550 - 200

1500 - 500 600 - 200 - 200

2000 - 2000 700 - 400 - 300

700 - 500 - 250

900 - 300 - 300

1000 - 500 - 1000 1000 - 1000 - 1000 1000 - 2000 - 3000

1500 - 500 - 250

2000 - 2000 - 2000

表C.2 ^{隠れ層の構造}(784-hidden-10)^{全体の隠れ素子数が}500^，1000^，1500^{のネットワーク}

3^層 4^層 5^層 6^層

500 100 - 400 50 - 150 - 300 50 - 100 - 150 - 200

150 - 350 100 - 100 - 300 100 - 100 - 100 - 200

250 - 250 100 - 300 - 100 100 - 100 - 150 - 150

350 - 150 150 - 150 - 200 100 - 100 - 200 - 100

400 - 100 150 - 200 - 150 100 - 150 - 150 - 100

200 - 100 - 100 100 - 200 - 100 - 100

200 - 150 - 100 150 - 100 - 100 - 150

200 - 150 - 150 150 - 150 - 100 - 100

200 - 100 - 100 - 100

200 - 150 - 100 - 50

1000 250 - 750 200 - 200 - 550 50 - 150 - 300 - 500

500 - 500 200 - 200 - 600 100 - 100 - 400 - 400

750 - 250 200 - 600 - 200 100 - 400 - 400 - 100

500 - 300 - 150 200 - 200 - 200 - 400

550 - 200 - 200 200 - 200 - 400 - 200

550 - 550 - 200 200 - 400 - 200 - 200

600 - 200 - 200 250 - 250 - 250 - 250

400 - 100 - 100 - 400

400 - 200 - 200 - 200

400 - 400 - 100 - 100

500 - 300 - 100 - 30

500 - 300 - 150 - 50

1500 500 - 1000 300 - 300 - 900 200 - 200 - 550 - 550

750 - 750 300 - 900 - 300 200 - 550 - 550 - 200

1000 - 500 500 - 500 - 500 375 - 375 - 375 - 375

700 - 500 - 250 550 - 200 - 200 - 550

900 - 300 - 300 550 - 550 - 200 - 200

700 - 500 - 250 - 50

謝辞

本研究を行うにあたり，多くの方々にご指導，ご助言をいただきました．指導教官の中島康治教授には研究の方向性からデータのまとめ方，考察する視点や発表の仕方など研究全般において多くのご指導，ご助言をいただきました．また，研究者として未熟でありながら自分勝手に研究を進めようとする私に対して，頭ごなしに命令するでなくあくまで私の自主性を尊重しつつ辛抱強くご指導いただき，成長を見守ってくださったことに深く感謝いたします．田中和之教授には本研究をまとめるにあたり，データの考察に関して鋭く貴重なご質問をいただきました．また，定期的に開かれている情報数物研究会に参加させていただき，本研究を進めるにあたり貴重な知見を学ばせていただけたことに心より感謝いたします．佐藤茂雄教授には研究の初期段階に計算論的神経科学と学習理論の理論的体系について多くのご教示をいただくとともに，研究をまとめるにあたり鋭いご指摘とご助言をいただきました．また，計算論的神経科学の勉強会開催を快く承諾してくださり，計算論的神経科学についての理解を深める機会をいただけたことに深く感謝いたします．仙台高等専門学校の早川吉弘教授には研究の進め方や方向性，データのとりかたまとめ方など研究全般にわたって様々なご指導，ご助言をいただきました．妥協のない鋭い考察と議論を辛抱強くいただけたことを深く感謝いたします．小野美武助教には研究生活を様々な面でサポートしてしていただき，研究を不便なく進める環境を提供することに妥協無く努めてくださったことに深く感謝いたします．またゼミ発表の際には異分野でありながら鋭いご指摘ご質問をいただけたことに感謝いたします．東北大学金属材料研究所の丹野航太

氏にはGPGPUを用いた並列計算手法やプログラミング，計算機の取り扱いなど多くのご

教示をいただきました．心より感謝いたします．中島研究室の皆様には，研究生活を通してたくさんのご協力をいただきました．特に同期の堀内優太氏には研究生活の中研究内容をはじめとして様々な面でご教示やご助言をいただき，本研究をまとめるにあたり非常にお世話になりました．深く感謝いたします．また，他にも研究生活を通して実際にお会いした方々やSNSなどを通して様々な方々に本研究を進めるにあたって貴重なご助言やヒントをいただきました．大変感謝しております．最後に，これまで私の成長の支えとなり，勉強を続けさせてくださった家族，親類，友人の方々に深く感謝いたします．

参考文献

[1] Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, Vol. 5, No. 4, pp. 115–133, 1943.

[2] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 1701–1708. IEEE, 2014.

[3] Quoc V Le. Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Confer-ence on, pp. 8595–8598. IEEE, 2013.

[4] Yoshua Bengio. Learning deep architectures for ai. Foundations and trends® in Ma-chine Learning, Vol. 2, No. 1, pp. 1–127, 2009.

[5] Johan Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pp. 6–20. ACM, 1986.

[6] Stephen José Hanson and Lorien Y Pratt. Comparing biases for minimal network con-struction with back-propagation. In Advances in neural information processing systems, pp. 177–185, 1989.

[7] B Boser Le Cun, John S Denker, D Henderson, Richard E Howard, W Hubbard, and Lawrence D Jackel. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems. Citeseer, 1990.

[8] Steven J Nowlan and Geoﬀrey E Hinton. Simplifying neural networks by soft weight-sharing. Neural computation, Vol. 4, No. 4, pp. 473–493, 1992.

[9] Geoﬀrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, No. 7, pp. 1527–1554, 2006.

[10] Geoﬀrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature de-tectors. arXiv preprint arXiv:1207.0580, 2012.

46 参考文献 [11] Li Wan, Matthew Zeiler, Sixin Zhang, Yann L Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Confer-ence on Machine Learning (ICML-13), pp. 1058–1066, 2013.

[12] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.

[13] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.

[14] Kazuyuki Aihara, T Takabe, and M Toyoda. Chaotic neural networks. Physics letters A, Vol. 144, No. 6, pp. 333–340, 1990.

[15] Yoshihiro Hayakawa and Koji Nakajima. Design of the inverse function delayed neu-ral network for solving combinatorial optimization problems. Neuneu-ral Networks, IEEE Transactions on, Vol. 21, No. 2, pp. 224–237, 2010.

[16] David E Rumelhart, Geoﬀrey E Hinton, and Ronald J Williams. Learning representa-tions by back-propagating errors. Cognitive modeling, Vol. 5, , 1988.

[17] David H Ackley, Geoﬀrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines*. Cognitive science, Vol. 9, No. 1, pp. 147–169, 1985.

[18] Prabhakar Raghavan and Clark D Tompson. Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica, Vol. 7, No. 4, pp.

365–374, 1987.

[19] Marcin Wojnarski. Nondeterministic discretization of weights improves accuracy of neural networks. Machine Learning: ECML 2007, pp. 765–772, 2007.

[20] Daniel Golovin, D Sculley, H Brendan McMahan, and Michael Young. Large-scale learning with less ram via randomization. arXiv preprint arXiv:1303.4664, 2013.

[21] Chuan Zhang Tang and Hon Keung Kwan. Multilayer feedforward neural networks with single powers-of-two weights. Signal Processing, IEEE Transactions on, Vol. 41, No. 8, pp. 2724–2727, 1993.

[22] Emile Fiesler, Amar Choudry, and H John Caulfield. Weight discretization paradigm for optical neural networks. The Hague’90, 12-16 April, pp. 164–173, 1990.

[23] Sorin Draghici. On the capabilities of neural networks using limited precision weights.

Neural networks, Vol. 15, No. 3, pp. 395–414, 2002.

[24] Takeshi Kamio, Hisato Fujisaka, and Mititada Morisue. Backpropagation algorithm for logic oriented neural networks with quantized weights and multilevel threshold neu-rons. IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 84, No. 3, pp. 705–712, 2001.

[25] Jian Bao, Yu Chen, and Jinshou Yu. An optimized discrete neural network in embed-ded systems for road recognition. Engineering Applications of Artificial Intelligence, Vol. 25, No. 4, pp. 775–782, 2012.

[26] Minjae Lee, Kyuyeon Hwang, and Wonyong Sung. Fault tolerance analysis of dig-ital feed-forward deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 5031–5035. IEEE, 2014.

[27] Kyuyeon Hwang and Wonyong Sung. Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In Signal Processing Systems (SiPS), 2014 IEEE Workshop on, pp. 1 – 6. IEEE, 2014.

[28] Mehmet Vural, A Ozgur, Alexandre Schmid, and Yusuf Leblebici. Fault tolerance of feed-forward artificial neural network architectures targeting nano-scale implementa-tions. In Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on, pp.

779–782. IEEE, 2007.

[29] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haﬀner. Gradient-based learn-ing applied to documents recognition. Proceedlearn-ings of the IEEE, Vol. 86, No. 11, pp.

2278–2324, 1998.

[30] Clément Farabet, Yann LeCun, Koray Kavukcuoglu, Eugenio Culurciello, Berin Mar-tini, Polina Akselrod, and Selcuk Talay. Large-scale fpga-based convolutional networks.

Machine Learning on Very Large Data Sets, 2011.

[31] Paul Smolensky. Information processing in dynamical systems: Foundations of har-mony theory. 1986.

[32] Christopher M Bishop, et al. Pattern recognition and machine learning, Vol. 4. springer New York, 2006.

[33] Geoﬀrey E Hinton. Training products of experts by minimizing contrastive divergence.

Neural computation, Vol. 14, No. 8, pp. 1771–1800, 2002.

[34] Yoshua Bengio and Olivier Delalleau. Justifying and generalizing contrastive diver-gence. Neural Computation, Vol. 21, No. 6, pp. 1601–1621, 2009.

[35] Geoﬀrey Hinton. A practical guide to training restricted boltzmann machines. Momen-tum, Vol. 9, No. 1, p. 926, 2010.

ドキュメント内量子化結合ニューラルネットワークの深層学習に関する研究 (ページ 45-58)

付録 B

Randomized Rounding 法

付録 C

2 章での用いたネットワーク構成

謝辞

参考文献

Randomized Rounding ^法