ྔࢠԽޡࠩΛߟྀͨ͠χϡʔϥϧωοτϫʔΫͷֶशख๏
Quantization Error-aware Neural Network Training
ኍ Ұढ़
1∗҆౻ ᔨଠ
1২٢ ߊେ
1ล ক೭
1ઙҪ
1ຊଜ ਅਓ
1ߴલా ৳
1Kazutoshi Hirose
1, Kota Ando
1, Kodai Ueyoshi
1, Masayuki Ikebe
1,
Tetsuya Asai
1, Masato Motomura
1, and Shinya Takamaeda-Yamazaki
11
ւಓେֶେֶӃใՊֶݚڀՊ
1
Graduate School of Information Science and Technology (IST), Hokkaido University
Abstract: Deep neural network is a widely-used technology for various machine learning appli-cations. A training technology for both low-precision and high-accuracy is desired for low power neural network hardware. We propose a quantization-error-aware training method for higher ac-curacy of quantized neural networks. Our approach appends an additional regularization term, based on quantization errors of weights, to the loss function. The evaluation results on MNIST and CIFAR-10 show that the proposed approach achieves higher accuracy than the standard approach.
1
͡Ίʹ
Deep neural network (DNN)ػցֶशͷٕज़ͱ͠
ͯ෯͘༻͍ΒΕ͓ͯΓɺը૾ೝࣝɺԻೝࣝ[9]ɺ ༁[6]ͱ͍ͬͨ໘Ͱ༻͞Ε͍ͯΔɻDNNɺωο τϫʔΫͷେنԽෳࡶԽʹΑΓैདྷͷػցֶशΑ ΓߴͳλεΫ͕Մೳͱͳ͕ͬͨɺലେͳܭࢉࢿݯ Λඞཁͱ͢ΔɻαʔόʔͷॲཧͰɺਪͱֶश͕ߦ ΘΕɺओʹGPU͕༻͞ΕΔɻߴੑೳͰ͋Δ໘ɺଟ ͘ͷిྗΛফඅ͢ΔɻҰํͰɺܞଳΈࠐΈػ ثͱ͍ͬͨকདྷͷIoTσόΠεͰɺݶΒΕͨڥͰ ಈ࡞͠ͳ͚ΕͳΒͳ͍ɻͦͷͨΊɺిྗɾলϝϞ ϦͰಈ࡞͢ΔϋʔυΣΞʹಛԽͨ͠DNNٕज़͕ٻ ΊΒΕ͍ͯΔɻ ϋʔυΣΞࢦͷDNNٕज़ͷ1ͭͱͯ͠ɺ ͷྔࢠԽ͕ڍ͛ΒΕΔɻྔࢠԽුಈখͰදݱ͞Ε ͍ͯΔΛݻఆখ[11]ର[12]ɺόΠφϦ[2]ͱ ͍ͬͨදݱͰද͢͜ͱͰ͋ΔɻྔࢠԽ͞ΕͨϝϞ ϦྔΛݮ͠ɺԋࢉΛ୯७ʹ͢Δ͜ͱ͕ՄೳͱͳΔɻྫ ͑ɺόΠφϦԽ͞ΕͨॏΈͱΞΫνϕʔγϣϯ ͷܭࢉࢉ͕ෆཁʹͳΓɺΘΓʹXNORԋࢉʹஔ ͖͑ΒΕΔɻXNORճ࿏ࢉճ࿏ʹൺɺඇৗʹ ؆୯ͳճ࿏Ͱ͋ΔͨΊɺফඅిྗେ෯ʹݮ͞ΕΔ [1]ɻ ͔͠͠ɺྔࢠԽΛߦ͏ͱɺೝࣝਫ਼ΛԼ͛ͯ͠·͏ ͱ͍͕ͬͨى͜Δɻ͜ͷݪҼɺුಈখͰදݱ ∗࿈བྷઌɿւಓେֶେֶӃใՊֶݚڀՊ ɹɹɹɹɹɹ˟ 060-0814 ւಓࡳຈࢢ۠ 14 9 ஸ ɹɹɹɹɹɹใ౩ (̢౩)2F ूੵΞʔΩςΫνϟݚڀࣨ ɹɹɹɹɹɹ E-mail: [email protected] -1.2 -0.8 -0.4 0 0.4 0.8 1.2 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 with Eq(3) with Eq(4) w wq ਤ1: ରྔࢠԽͷద༻(LogQuant(w, 3, 1)ͷྫ) ͞Ε͍ͯΔΛྔࢠԽ͢Δ͜ͱͰൃੜ͢ΔྔࢠԽޡࠩ ʹΑΔɻҰൠతʹॏΈͷදݱਫ਼ͷૈ͞ͱೝࣝਫ਼ ʹτϨʔυΦϑͷ͕ؔ͋ΓɺྔࢠԽ͢Δ΄Ͳೝ ࣝਫ਼͕Լ͕Δʹ͋ΔɻͦͷͨΊɺೝࣝਫ਼Λอͬ ͨ··ɺ͍͔ʹॏΈͷදݱਫ਼Λམͱͯ͠ྔࢠԽ Ͱ͖Δ͔͕՝ͱͳ͍ͬͯΔɻ ຊݚڀͰɺྔࢠԽχϡʔϥϧωοτϫʔΫʹ͓͍ ͯɺΑΓೝࣝਫ਼ΛߴΊΔͨΊʹྔࢠԽޡࠩΛߟྀ͠ ֶͨशख๏ΛఏҊ͢ΔɻॏΈͷྔࢠԽޡࠩɺೝ ࣝਫ਼ͷԼΛ͙ͨΊɺখ͘͢͞Δ͜ͱ͕·ΕΔɻ ຊख๏ɺྔࢠԽޡࠩʹجͮ͘ਖ਼ଇԽ߲Λతؔʹऔ ΓೖΕֶͯशΛਐΊΔ͜ͱͰྔࢠԽޡࠩΛখ͘͢͞Δɻ 人工知能学会研究会資料 SIG-FPAI-B507-01 - 1 -
0 0.1 0.2 0.3 0.4 0.5 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 with Eq(3) with Eq(4) |QE| w ਤ2: ରྔࢠԽద༻࣌ͷྔࢠԽޡࠩ(|QE|)
2
ྔࢠԽχϡʔϥϧωοτϫʔΫ
χϡʔϥϧωοτͷྔࢠԽුಈখͰදݱ͞ΕΔ ΛݻఆখόΠφϦԽ(ೋ(ʶ 1))ͱ͍ͬͨগͳ ͍ใྔͰදݱ͢Δ͜ͱͰ͋ΔɻۙͰɺ͞Βʹର Λ༻͍ͯΛදݱ͢ΔରྔࢠԽΛద༻ͨ͠χϡʔ ϥϧωοτ͕ఏҊ͞Ε͍ͯΔɻରྔࢠԽΛ༻͍Δ͜ ͱʹΑͬͯɺදݱͰ͖ΔͷྖҬͷ͞ͱখ͍͞ ͷղ૾Λಉ࣌ʹߴΊΔ͜ͱ͕ՄೳͱͳΔɻຊݚڀͰ ɺॏΈͷྔࢠԽख๏ͱͯ͠ରྔࢠԽͱόΠφϦԽ ΛऔΓ্͛Δɻͨͩ͠ɺΞΫνϕʔγϣϯͷྔࢠԽʹ ͍ͭͯߟྀ͠ͳ͍͜ͱͱ͢Δɻ ͡Ίʹɺݩͷ࣮දݱͷॏΈΛwͱ͢Δͱɺ ରྔࢠԽ࣍ࣜͰද͞ΕΔɻ AP 2(w) = sign(w)× 2round(log2|w|) (1) ͜ͷAP 2(·)approximate-power-of-2ͷ಄จࣈΛͱͬ ͨͷͰɺ࠷͍ۙ2ͷ͖ʹۙࣅ͢ΔԋࢉͰ͋Δɻ ͜ͷԋࢉΛ༻͍ΔͱුಈখͰදݱ͞Ε͍ͯΔΛର ྔࢠԽ͢Δ͜ͱ͕Ͱ͖Δɻ͞Βʹɺ1ͭͷΛදݱ͢Δ ͨΊʹඞཁͳϏοτ෯(ූ߸ϏοτΛؚΉ)Λbitwidthɺ ͜ͷϏοτ෯ͰදݱͰ͖Δ࠷େͱ࠷খΛͦΕͧΕ maxV ͱminV ͱͨ͠ͱ͖ɺ࣍ࣜΛ༻͍ͯϏοτ੍ Λ͔͚ΔɻLogQuant(w, bitwidth, maxV )
= Clip(AP 2(w), minV, maxV ) (2)
ࣜ(1) ʹroundԋࢉؚ͕·Ε͍ͯΔɻҰൠతʹ roundԋࢉҎԼͷ࢛ࣺޒೖ͕༻͍ΒΕΔɻ round(x) = ceil(x) (x− x ≥ 0.5) f loor(x) (x− x < 0.5) (3) ͜ͷroundԋࢉதԝ͕0.5Ͱ͋ΔͨΊ࣮ྖҬͰ ͷྔࢠԽޡࠩΛ࠷ݮΒ͢͜ͱ͕Ͱ͖Δɻ͔͠͠ɺର ྖҬͰͷதԝ0.5Ͱͳ͘ɺlog2(32)ͱͳΔɻͦ ͷͨΊࣜ(1)Ͱ࣍ࣜΛ༻͍Δ͜ͱʹΑͬͯྔࢠԽ࣌ ͷޡ͕ࠩݮΒ͢͜ͱ͕Ͱ͖Δɻ round(x) = ceil(x) (x− x ≥ log2(32)) f loor(x) (x− x < log2(32)) (4) ্ࣜΛ༻͍ͨରྔࢠԽ࣌ͷޡࠩΛਤ2ʹࣔ͢ɻ·ͨɺ όΠφϦԽʹ࣍ͷࣜΛ༻͍Δɻ Binarize(w) = sign(w) = +1 (if w≥ 0) −1 (otherwise) (5)
3
ྔࢠԽޡࠩʹجͮ͘ਖ਼ଇԽ
͜Ε·ͰͷྔࢠԽχϡʔϥϧωοτϫʔΫɺΛ ྔࢠԽ͢Δͱ͖ʹൃੜ͢ΔྔࢠԽޡࠩ(QE;Quantization Error)Λߟྀ͍ͯ͠ͳ͍ɻຊݚڀͰɺྔࢠԽޡࠩΛߟ ྀͨ͠χϡʔϥϧωοτϫʔΫͷֶशख๏ΛఏҊ͢Δɻ QEͷൃੜʹΑΔೝࣝਫ਼ͷԼΛ͑ΔͨΊʹɺ QE͕খ͘͞ͳΔΑ͏ʹॏΈΛֶश͢Εྑ͍ɻ͜͜ ͰQEΛਖ਼ଇԽ߲ͱͯ͠తؔʹՃ͢Δɻ͜ͷ తؔʹ͍ͭͯॏΈͷֶशΛਐΊΔ͜ͱͰɺQEͱೝ ࣝޡࠩΛಉ࣌ʹখ͘͞͠ɺೝࣝਫ਼Λอͭ͜ͱ͕Մೳ ͱͳΔɻ ࣮දݱͷॏΈΛw,ྔࢠԽ(ରྔࢠԽ͓ΑͼόΠ φϦԽ)ޙͷॏΈΛwqͱͨ͠ͱ͖ɺྔࢠԽޡࠩ࣍ࣜ Ͱఆٛ͞ΕΔɻ QE(w) = w− wq (6) ͦͯ͠ྔࢠԽޡࠩʹجͮ͘ਖ਼ଇԽ(QER;Quantization Error-based Regularization)߲ΛҎԼͷΑ͏ʹఆٛ͢ Δɻ QER(w) =w − wq2 (7) ͞ΒʹೝࣝޡࠩؔΛE(w)ͱͨ͠ͱ͖ɺతؔʹ QER߲ΛՃ͑ͯҎԼͷΑ͏ʹఆٛ͢Δɻ L(w) = E(w) + η2QER(w) (8) ͜ͷతؔΛ࣍ࣜͷΑ͏ʹ࠷খԽ͢ΔํֶशΛ ਐΊΔ͜ͱͰɺೝࣝޡࠩͱQE͕ಉ࣌ʹখ͘͞ͳΔɻ min w L(w) (9) ҰൠతͳχϡʔϥϧωοτϫʔΫͰɺաֶशΛ͙ ͨΊʹॏΈͷL2ϊϧϜͱ͍͏ਖ਼ଇԽ߲Λ༻͍Δ͜ͱ͕ ͋Δ[5]ɻL2ਖ਼ଇԽॏΈ͕0ʹۙͮ͘Α͏ʹಇͨ͘ ΊɺॏΈͷൃࢄΛ͙͜ͱ͕Ͱ͖ΔɻҰํͰɺຊఏҊ ͷQER࣮දݱͷॏΈ͕ྔࢠԽޙͷॏΈʹۙͮ͘ Α͏ʹಇ͘ɻͦͷͨΊɺྔࢠԽޡ͕ࠩখ͘͞ͳΓɺೝ ࣝޡࠩΛ͑Δ͜ͱ͕ՄೳͱͳΔɻ - 2 -Algorithm 1QERΛద༻ͨ͠ྔࢠԽχϡʔϥϧωο τϫʔΫͷֶश
Require: a minibatch of inputs and targets (x0, x∗), previous weights w, previous learning rate ηt. Ensure: updated weights wt+1, updated learning
rate ηt+1 1. Forward propagation forl = 1 to L do wql ⇐ Quantize(wl) ul⇐ xl−1· wl if l < L then xl⇐ReLU(ul) end if end for 2. Backward propagation Compute ∂E
∂uL knowing uLand x
∗ forl = L to 1 do ∂E ∂uql−1 ⇐ ∂E ∂ul · w q l ∂E ∂wl ⇐ ∂E ∂ul T · uq l−1 end for
3. Accumulating the parameter gradients
forl = 1 to L do ∂QER(wl) ∂wl ⇐ QE(wl) wt+1l ⇐ wl− η1t·∂w∂El − η t 2·∂QER(w∂wl l) ηt+1⇐ ληt end for
4
ධՁ
ຊઅͰɺQER߲Λ༻͍ͣ௨ৗͷֶशΛߦͬͨ߹ ͱɺQER߲ΛతؔʹՃֶͯ͠शΛߦͬͨ߹ͷ ධՁΛߦͬͨɻධՁʹػցֶशϑϨʔϜϫʔΫ Ten-sorFlowΛ༻͍ͨɻ༻ͨ͠දݱ্ه·Ͱͱಉ༷ ʹɺରྔࢠԽͱόΠφϦԽͰ͋ΔɻରྔࢠԽͰɺ ͯ͢ͷʹ͓͚ΔॏΈͷྔࢠԽΛLogQuant(w, 4, 1) ͱͨ͠ɻֶशAlgorithm 1ʹଇͬͯߦ͏ɻ͜͜Ͱͷ ֶशη10.001ͱͨ͠ɻ·ͨη2ॳظΛ0.00001 ͱ͠ɺ10ΤϙοΫຖʹ1.2ഒͱઃఆͨ͠ɻ͜ͷΛ ༻͍Δ͜ͱͰɺֶश։࢝ޙೝࣝޡࠩΛॏࢹ͠ɺֶ शऴ൫ྔࢠԽޡࠩΛॏࢹ͢ΔΑ͏ʹॏΈ͕ߋ৽͞Ε Δɻֶशͷ࠷దԽʹAdam[7]Λ༻ͨ͠ɻ·ͨɺֶ शʹ༻ͨ͠σʔληοτ࣍ͷ2ͭͰ͋Δɻ 1. MNIST[10]28×28ͷάϨʔεέʔϧը૾Ͱߏ ͞Ε͓ͯΓɺ0͔Β9·Ͱͷखॻ͖ࣈը૾ͷ σʔληοτͰ͋Δɻֶश༻ͷը૾6000ຕͱධ Ձ༻ͷը૾10000ຕͷ70000ຕ͕͋Δɻσʔλ ΞʔΪϡϝϯτ༻͍ͯ͠ͳ͍ɻֶशʹ༻ ͨ͠ωοτϫʔΫӅΕ2ؚΉmulti-layer ද1: ֤ख๏ͷೝࣝਫ਼ MNIST CIFAR-10 float 0.9777 0.6941LogQuantize (4bit) w/o QER 0.9773 0.6844
w/ QER 0.9783 0.7031
Binarize (1bit) w/o QER 0.9664 0.6724
w/ QER 0.9709 0.6839 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0 50 100 150 200 250 300 float
LogQuantize (4bit) w/o QER LogQuantize (4bit) w/ QER
epoch accuracy 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0 50 100 150 200 250 300 float
Binarize w/o QER Binarize w/ QER epoch accuracy ਤ3: CIFAR-10Ͱͷೝࣝਫ਼ͷऩଋ perceptronΛ༻͓ͯ͠ΓɺҎԼͷͷͰ͋Δɻ ͜͜ͰͷFCશ݁߹Λࣔ͢ɻ FC728-256, FC256-256, FC256-10 2. CIFAR-10[8]32×32ͷΧϥʔը૾Ͱߏ͞Ε ͓ͯΓɺ10Ϋϥεྨͷը૾ͷσʔληοτͰ͋ Δɻֶश༻ͷը૾50000ຕͱධՁ༻ͷը૾10000 ຕͷ60000ຕ͕͋ΔɻσʔλΞʔΪϡϝϯτ ༻͍ͯ͠ͳ͍ɻֶशʹ༻ͨ͠ωοτϫʔΫ
CNN(convolutional neural network)Λ༻ͯ͠
͓ΓɺҎԼͷͷͰ͋Δɻ͜͜ͰͷC3-X3×3 ϑΟϧλʔΛ༻͍ͨXνϟωϧग़ྗͷࠐΈɺ MP2max-poolingΛࣔ͢ɻ C3-64, MP2, C3-64, MP2, FC4096-384, FC384-192, FC192-10 ද1ʹೝࣝਫ਼ͷ݁ՌΛࣔ͠ɺਤ3ʹͦΕΒͷऩଋ ঢ়گΛࣔ͢ɻMNISTͱCIFAR-10ͷ྆ํͷϕϯνϚʔ ΫͰɺରྔࢠԽɺόΠφϦԽͷͲͪΒͷྔࢠԽ๏ʹ͓ ͍ͯQER߲ΛؚΊֶͯशΛ͢Δ͜ͱͰೝࣝਫ਼͕ ্ͨ͠ɻಛʹLogQuantize(4bit)ͷධՁͰQERΛద ༻͢Δ͜ͱͰfloatͷೝࣝਫ਼Λ্ճΔ݁Ռͱͳͬͨɻ - 3 -
5
ؔ࿈ݚڀ
ॏΈΛྔࢠԽͯ͠ϋʔυΣΞʹదͨ͠ܗʹѹॖ ͢Δख๏͕ఏҊ͞Ε͍ͯΔɻShinΒLUT(Look Up Table)Λ͏ͨΊॏΈΛѹॖͨ͠[13]ɻGyselΒϋʔ υΣΞࢦͷݻఆখදݱͷॏΈʹ͢ΔͨΊͷϑΝ Πϯνϡʔχϯάٕज़ΛఏҊͨ͠[3]ɻ͜ΕΒͷख๏ ݶΒΕͨදݱͰͷೝࣝਫ਼ΛߴΊΔͨΊʹॏΈΛ ࠷దԽ͍ͯ͠ΔɻզʑͷݚڀɺྔࢠԽޡࠩΛߟྀͯ͠ ͍Δͱ͍͏Ͱ͜ΕΒͷݚڀͱҟͳΔɻ͔͠͠ɺ͜ ΕΒͷख๏ͱಉ࣌ʹ༻͢Δ͜ͱ͕ՄೳͰ͋Δɻ Loss-aware binarization[4]όΠφϦԽ͞ΕͨॏΈ ʹର͢ΔଛࣦΛ࠷খʹ͢ΔͨΊʹɺDiagonalHes-sian ApproximationΛ༻͍ͨproximal Newton
algo-rithmΛ࠾༻͍ͯ͠ΔɻզʑͷݚڀྔࢠԽ࣌ͷӨڹ Λߟྀ͍ͯ͠Δͱ͍͏Ͱಉ͡Ͱ͋Δɻզʑɺਖ਼ଇ Խ߲Λ༻͍ͯೝࣝਫ਼Λ্͛Δ͜ͱΛతͱ͍ͯ͠Δ ͕ɺόΠφϦԽҎ֎ͷྔࢠԽʹద༻ՄೳͰ͋Δɻ
6
·ͱΊ
ຊݚڀͰɺχϡʔϥϧωοτͷϋʔυΣΞ࣮ʹ ͚ɺྔࢠԽޡࠩΛߟֶྀͨ͠शख๏ΛఏҊͨ͠ɻྔ ࢠԽޡࠩʹجͮ͘ਖ਼ଇԽ߲ΛతؔʹՃ͢Δ͜ͱ ͰɺྔࢠԽޡࠩͱೝࣝޡࠩΛখ͘͞ͳΔΑ͏ʹֶशΛ ߦ͏͜ͱ͕ՄೳͱͳΔɻ ࠓޙͷ՝ͱͯ͠ɺΑΓେنͳωοτϫʔΫͷ ద༻ɺରྔࢠԽόΠφϦԽͱ͍ͬͨྔࢠԽ๏ͩ ͚Ͱͳ͘ɺઢܗྔࢠԽͷద༻Λ͠ɺධՁΛߦ͏ඞཁ ͕͋Δɻ·ͨɺਖ਼ଇԽ߲ͷಋೖʹΛ༻͍ΔͨΊɺ ͜ͷͷಈత࠷దԽٕज़͕ߟ͑ΒΕΔɻँࣙ
ຊݚڀJST ACCElٴͼςΫϊόͷॿΛड͚ͨ ͷͰ͋Δɻࢀߟจݙ
[1] Ando, K., Orimo, K., Ueyoshi, K., Yonekawa, H., Sato, S., Nakahara, H., Ikebe, M., Asai, T., Takamaeda-Yamazaki, S., Kuroda, T., Mo-tomura, M.: BRein Memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfig-urable in-memory deep neural network accelera-tor in 65 nm cmos. In: 2017 IEEE Symposium on VLSI Circuits (VLSI-Circuits). pp. C24–C25. Kyoto, Japan (2017)
[2] Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized Neural Net-works: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. ArXiv e-prints (Feb 2016)
[3] Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented Approximation of Convolutional Neural Networks. ArXiv e-prints (Apr 2016)
[4] Hou, L., Yao, Q., Kwok, J.T.: Loss-aware Bina-rization of Deep Networks. ArXiv e-prints (Nov 2016)
[5] Janocha, K., Czarnecki, W.M.: On Loss Func-tions for Deep Neural Networks in Classification. ArXiv e-prints (Feb 2017)
[6] Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi´egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s Multilingual Neural Machine Trans-lation System: Enabling Zero-Shot TransTrans-lation. ArXiv e-prints (Nov 2016)
[7] Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. ArXiv e-prints (Dec 2014)
[8] Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (canadian institute for advanced research) http://www.cs.toronto.edu/ kriz/cifar.html [9] LeCun, Y., Bengio, Y., Hinton, G.: Nature.
Na-ture (2016)
[10] LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010), http://yann.lecun.com/exdb/mnist/
[11] Lin, D.D., Talathi, S.S., Sreekanth Annapureddy, V.: Fixed Point Quantization of Deep Convolu-tional Networks. ArXiv e-prints (Nov 2015) [12] Miyashita, D., Lee, E.H., Murmann, B.:
Con-volutional Neural Networks using Logarithmic Data Representation. ArXiv e-prints (Mar 2016) [13] Shin, D., Lee, J., Lee, J., Yoo, H.J.: 14.2 dnpu: An 8.1tops/w reconfigurable cnn-rnn pro-cessor for general-purpose deep neural networks. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). pp. 240–241 (Feb 2017)