it-ken_open.key

(1)

深層学習技術と

 

信号処理・通信系アルゴリズム

̶概観と展望̶

名古屋工業大学

和田山正

(2)

深層学習技術の進展

• 画像認識

• 音声認識

• 自然言語処理

• 機械翻訳

ImageNet Classiﬁcation

Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model. The correct label is written under each image, and the probability assigned to the correct label is also shown with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The remaining columns show the six training images that produce feature vectors in the last hidden layer with the smallest Euclidean distance from the feature vector for the test image.

In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example, only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry) there is genuine ambiguity about the intended focus of the photograph.

Another way to probe the network’s visual knowledge is to consider the feature activations induced by an image at the last, 4096-dimensional hidden layer. If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar. Figure 4 shows five images from the test set and the six images from the training set that are most similar to each of them according to this measure. Notice that at the pixel level, the retrieved training images are generally not close in L2 to the query images in the first column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the results for many more test images in the supplementary material.

Computing similarity by using Euclidean distance between two 4096-dimensional, real-valued vec-tors is inefficient, but it could be made efficient by training an auto-encoder to compress these vecvec-tors to short binary codes. This should produce a much better image retrieval method than applying auto-encoders to the raw pixels [14], which does not make use of image labels and hence has a tendency to retrieve images with similar patterns of edges, whether or not they are semantically similar.

7 Discussion

Our results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning. It is notable that our network’s performance degrades if a single convolutional layer is removed. For example, removing any of the middle layers results in a loss of about 2% for the top-1 performance of the network. So the depth really is important for achieving our results.

To simplify our experiments, we did not use any unsupervised pre-training even though we expect that it will help, especially if we obtain enough computational power to significantly increase the size of the network without obtaining a corresponding increase in the amount of labeled data. Thus far, our results have improved as we have made our network larger and trained it longer but we still have many orders of magnitude to go in order to match the infero-temporal pathway of the human visual system. Ultimately we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information that is missing or far less obvious in static images.

8

cited from: ``ImageNet Classiﬁcation with Deep

Convolutional Neural Networks , Alex Krizhevsky et al. https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classiﬁcation-with-deep-convolutional-nn.pdf

深層学習技術は、これらの分野において特に圧倒的な強みを見せている

(3)

広がり軸深化軸画像診断材料工学遺伝子解析構造探索蒸留 GAN 自己回帰型  ネットモデル圧縮転移学習実験物理通信工学自動運転医療応用ドメイン固有の知識が必要

深層学習の「深化と広がり」

画像認識音声認識自然言語処理畳み込み  ネットワーク半教師付き  学習生成モデルロボティクス急速に進む研究・開発激しい競争

(4)

広がり軸深化軸画像診断材料工学遺伝子解析構造探索蒸留 GAN 自己回帰型  ネットモデル圧縮転移学習実験物理通信工学自動運転医療応用

深層学習の「深化と広がり」

画像認識音声認識自然言語処理畳み込み  ネットワーク半教師付き  学習生成モデルロボティクス急速に進む研究・開発激しい競争本日のテーマ：今後進展の可能性がかなりあるのでは？

(5)

cited from: http://icc2018.ieee-icc.org/workshop/promises-and-challenges-machine-learning-communication-networks-ml4com

(6)

(7)

本日のテーマ：

 

データ駆動型アルゴリズム設計

深層学習技術

人工知能実現のための技術

(8)

本日のテーマ：

 

データ駆動型アルゴリズム設計

深層学習技術

人工知能実現のための技術

通信系アルゴリズムとの自然な結びつき

学習可能な確率モデル

多段関数の非凸最適化

データ駆動型

 

アルゴリズム設計

(9)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

(10)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

(11)

こんな関数を作りたい！

f : ℝ

n

→ ℝ

0: 猫

(12)

{ , 1}

f

_Θ

: ℝ

n

→ ℝ

Θ

関数の形状をコントロールするパラメータ群

関数の形状を制御するパラメータを導入する

訓練データを準備する

{ , 1} { , 1}

{ , 0} { , 0} { , 0}

入力値教師ラベル

(13)

パラメータを更新する(訓練・学習）

f

_Θ

: ℝ

n

→ ℝ

{ , 1}

{ , 0}

出力と教師ラベルとの間の食い違いがなるべく小さくなるようにパラメータを更新する大量の訓練データを利用近い値？

̂y

y

(14)

深層ネットワークモデル

Wh + b

線形層活性化関数

g

f

_Θ

: ℝ

n

→ ℝ

m

入力

出力

W

h

+ =

h

b

バイアス係数行列 h′

h′

Θ = {W

1 , b

1 , W

2 , b

2 , …}

活性化関数

(15)

W

h

+ =

b

h′

人工ニューラル素子

(

ふたたび

)

人工ニューロン素子は次の構造を持つ

:

y = f

!

_n

!

i=1

w

_i

x

_i

+ b

"

(4)

関数

f

は活性化関数であり、さまざまな種類の関数が利用されて

いる

:

アフィン変換

活性化関数

(16)

深層ネットワークの訓練

Θ = {W

₁

, b

₁

, W

₂

, b

₂

, …}

_{̂y = f}

Θ

(x)

{ , 1}

損失関数

x

loss(y, ̂y)

y

訓練・学習過程では、損失関数値を最小化するようにパラメータを変更する

(17)

深層学習 = 最小二乗法の親玉？

パラメータ付き関数

f (x|w)

の選択

(1)

多項式モデル

x

_{∈ R}

f (x|w) = w

₀

+ w

₁

x + w

₂

x

2 +

_{· · · + w}

_M

x

M

をこの多項式モデルの次数と呼ぶ。

3 次関数モデルによるフィッティング

(18)

最適化技法として確率的勾配法を利用

パラメータの勾配ベクトルの効率的な計算には

誤差逆伝播法(back prop)を利用

補足

よい汎化性能を得るためには、大量の訓練

 

データが必要

一般には、非凸最適化となる

(19)

D = {(x

₁

, y

₁

), (x

₂

, y

₂

), …, (x

_T

, y

_T

)}

訓練データ

確率的勾配法(SGD)

ミニバッチ

B = {(x

_b1

, y

_b1

), (x

_b2

, y

_b2

), …, (x

_bK

, y

_bK

)}

目的関数

G

B

(Θ) = 1

_K

K

∑

k=1

loss(y

_bk

− f

_Θ

(x

_bk

))

り、学習可能パラメータ Θ の値を更新する（注4）。 Step 1 (初期点設定) Θ := Θ0 Step 2 (ミニバッチ取得) B をランダムに生成（注5） Step 3 (勾配ベクトルの計算) g := _∇GB(Θ) Step 4 (探索点更新) Θ := Θ _{− αg} Step 5 (反復) Step 2 に戻るミニバッチ B は確率変数であるため、確率的勾配法は、目的関数 GB(Θ) の期待値を最小化している手順と見なすことができる。ニューラルネットワーク学習などの学習問題に確率的勾配法を適用することにより汎化能力の高いパラメータが見出されることが経験的に知られている[10]。確率的勾配法の振る舞いやミニバッチサイズ・学習率と汎化誤差との関係に関して、現在の深層学習分野において実験的・理論的研究が活発に進められている[16] [14]。上で述べた確率的勾配法は最も単純なものであり、多くのバリエーションや関連した収束加速手法が知られている[10]。例えば、RMSprop, Adam [17] などの手法は、広く深層学習の研究に利用されている。適切な確率的勾配法の選択、学習率の設定とそのスケジューリング、ミニバッチサイズなどは、ミニバッチ学習において最も重要なハイパーパラメータ群であり、学習結果の良否はこれらの設定の良し悪しに強く影響される。良いハイパーパラメータ設定を見つけるためには、多少の（場合によってはかなりの）試行錯誤的プロセスが必要となることもある。 2. 2 誤差逆伝播法確率的勾配法の実行のためには、学習可能パラメータ Θ の導関数値 (勾配値) を計算する必要がある。誤差逆伝播法は、微分の連鎖則を利用することにより層構造 (または疎な計算グラフ構造) を持つ関数に対して効率の良い導関数値の計算を実現する。誤差逆伝播法は、前向き計算フェーズと後ろ向き計算フェーズの２つ計算フェーズを持つ。前向き計算フェーズでは、深層ニューラルネットワークにおいては入力層から順に出力層に向かって信号ベクトル値の評価が行われる。信号流グラフにおいても同様であり、各層での処理を多変数入力・ベクトル値関数 φ1, φ2, . . . で表現すると y = φk(· · · φ2(φ1(x))) として関数値を順に計算していく。なお、後ろ向き計算フェーズのために計算の途中結果は捨ててしまわずに残しておく必要がある。後ろ向き計算フェーズでは、まず、損失関数値を初期値として、出力端から入力端に向かって、微分の連鎖律に従い計算が進行する。具体的な計算としては、φi に関するヤコビ行列と逆伝播ベクトルの積の計算が順次行われる。深層ニューラルネットワークの学習プロセスにおいては、大量のデータを含む訓練データ集合が利用される。各層での前向き計算では、yk = g(Wkxk + bk) という計算を行う必要がある。（注4）：確率的勾配法はもともとミニバッチサイズが 1 の場合を想定して提案された。本稿では、ミニバッチ型の確率勾配法と陽に区別はしていない。（注5）：実際には、全訓練データをミニバッチに分割し、そのミニバッチを順次利用するという手段が取られることが多い。すべての訓練データを使い果たすと (1 エポック)、改めて再度ミニバッチに分割し手続きを繰り返す。ここで、W は係数行列であり、bk はバイアスベクトルである。すなわち、逆誤差伝播法の実行過程においては大量の行列・ベクトル積 (正確にはテンソル積) の計算を行う必要があり、この計算量が学習時間において支配的となる。 2. 3 フレームワークフレームワークは、深層学習の要素技術の主たるものをライブラリの形でまとめたものであり、深層学習技術を利用した研究を進める上で欠かせないプログラム開発のプラットフォームである。代表的なフレームワークとして、TensorFlow、Keras, PyTorch、Chainer などがあり、ライブラリ仕様の拡張や機能拡充などを含めて、現在も活発な開発が継続中である。フレームワークを利用する最も大きな利点は、フレームワーク利用者 (プログラム開発者) が誤差逆伝播法の後ろ向き計算を自分で書かなくても良い点にある。すでに述べたように後ろ向き計算フェーズにおいては、微分の連鎖律に従い、行列・ベクトル積の計算と非線形関数の適用を反復して計算を行うことになる。前向き計算と異なり後ろ向き計算は記述をミスしやすく、その計算自体を陽にプログラムとして記述することは、プログラム開発者の著しい負担となる。すべてのフレームワークにおいて誤差逆伝播法の後ろ向き計算は自動化されており、フレームワーク利用者はその恩恵を受けることができる。各フレームワークの内部では計算グラフが構築され、その計算グラフに基づき、後ろ向き計算が実行される。計算グラフの構築においては、前向き計算時に動的に計算グラフが構成される環境 (define by run; PyTorch, Chainer) と事前に計算グ

ラフを明示的にユーザ側で構築しておく環境 (define and run;

TensorFlow) がある。後者にない前者の大きな特徴は訓練データに依存した計算グラフの構築が可能であることが挙げられる。すなわち、前向き計算部 (計算グラフ構築部）において、ループや条件式の利用が可能である。すでに述べたように深層学習において、最も計算時間がかかるのが訓練プロセス実行時 (学習時) に実行される行列・ベクトル積 (正確には、ひとつのミニバッチが一気に計算されるためにテンソル積計算となる) の計算に必要とされる計算時間である。特に画像認識のように扱うべきテンソルのサイズが大きい場合には、GPU に基づく並列計算の利用が必須である。すべてのフレームワークは GPU 計算に対応しており、GPU プログラミングに精通していない開発者も容易に GPU をフル活用した計算を行うことができる。深層学習関連の研究を開始する際には、まずどのフレームワークを利用するかを決めなければならない。個人的な印象としては、アルゴリズム設計・評価研究においては、現時点では、TensorFlow(TensorFlow 上で動作する Keras も含めて) と PyTorch の二択の状況に思える。フレームワークの利用者数の点では、世界的に TensorFlow が圧倒的に優位に立っており、それに伴い膨大な関連の情報をネット上で見つけることができる。また、最新の深層学習の論文に掲載されているアルゴリズムの実装を Github において公開することが広く行われており、 TensorFlow による実装が公開されているケースが非常に多い。一方 PyTorch は、TensorFlow と異なる計算グラフ構成の方 — 4 — 確率的勾配法に基づく最小化バリエーション Momentum AdaDelta RMSprop Adam

(20)

誤差逆伝播法(backprop)

注：ただし途中の計算結果を残す前向き計算フェーズ（単なる値の評価）損失関数

loss(y, ̂y)

・パラメータの勾配ベクトルを効率良く求めることが目的・微分の連鎖律の利用(BCJRアルゴリズムにとても良く似ている）後ろ向き計算フェーズヤコビ行列とベクトルの積を順次計算

(21)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

(22)

LeCunは、入出力を伴う処理(NNとは限らない）

 

に対して、深層学習技術が適用可能であることを

 

示唆している

処理全体が微分可能であれば、内部に含まれる

 

パラメータをbackprop+SGDで最適化できる

 

→

可微分プログラミング

 

(diﬀerentiable programming )

可微分プログラミング

… important point is that people are now building a new

kind of software by assembling networks of parameterized

functional blocks and by training them from examples using some form of gradient-based optimization…

(23)

• ビリーフプロパゲーション

• ガウス・ザイデル法

• 共役勾配法

• 反復収縮法(IST)

• AMP

• ADMM

 

…

反復型アルゴリズムの信号フロー

プロセスA

プロセスB

プロセスC

入力出力

(24)

信号フローを時間方向に展開

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

各プロセスが微分可能であり、かつ、その導関数が

ほぼ至るところでゼロでなければ、

backprop可能

(25)

学習可能パラメータや学習可

能な関数を内包させる

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

学習可能パラメータ

学習可能な関数（NN)

(26)

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

プロ

セス

A

プロ

セス

 

B

プロ

セス

C

損失

関数

{観測信号, 元信号}

推定アルゴリズムにおける学習フェーズ

(27)

End-to-End アプローチ

プロセスA

プロセスB

プロセスC

入力出力

深層ニューラル

ネットワーク

入力出力

(28)

従前のアルゴリズムを超える性能を持つ

 

アルゴリズムを構成できる可能性がある

 

(具体例は後ほど)

可微分プログラミングのメリット

データに基づく学習可能性を持つ柔軟な派生

 

アルゴリズムを構成できる(未知の状況に適応

 

できる）

End-to-endアプローチよりも大規模問題に

 

スケールしやすい

過去の知見・技術が活かせる

(29)

問題に関する先見的知見・数理的洞察に基づき演繹的に導かれるアルゴリズム構造データに基づく学習学習可能パラメータの導入可塑的な部分構造の導入

可微分プログラミングに基づく新しいアルゴリズムの設計

(30)

どのパラメータを学習可能パラメータとするか、

は重要な問題（最終的な性能や学習の速度・

 

安定性に大きく影響）

可微分プログラミングにおける

 

考えどころ

オンライン学習の設定なのか、オフライン学習の

設定なのか

(31)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

(32)

符号化・復号アルゴリズム

Nachmani et al. 2016 [21] LDPC符号の復号アルゴリズムであるビリーフプロパゲーション(BP)復号法の性能改善のために深層学習技術を初めて利用した。Nachmaniらは、変数ノードメッセージに乗じる形の学習可能パラメータを導入し、ミニバッチ学習によりそれらの学習パラメータの最適化を行った。 Vasic et al. 2018 [29] BP復号法の変数ノード・チェックノード処理自体をNNを利用して学習 Gruber et al. 2017 [12] 復号器全体を深層ニューラルネットワークで実現するというアプローチを検討 Kim et al, 2018 

Deepcode: Feedback Codes via Deep Learning

(33)

変数ノード処理

ビリーフプロパゲーションに基づくLDPC符号の復号

β

_j→i

= λ

_j

_{+ ∑}

k∈B(j)\i

α

_k→j

β

α

β

α

_i→j

= 2 tanh

−1

∏

k∈A(i)\j

tanh (

1

2 β

k→i

)

チェックノード処理

λ

すべての処理が微分可能！

(34)

変数ノード処理

Nachmani et. alのアプローチ

β

_j→i

= λ

_j

_{+ ∑}

k∈B(j)\i

w

_k,j

α

_k→j

β

α

β

α

α_i→j = 2 tanh−1 ∏ k∈A(i)\jtanh ( 1 2 βk→i) チェックノード処理

λ

学習可能パラメータの導入

Nachmani et al. 2016 [21]

(35)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

(36)

研究事例紹介

スパース信号再現アルゴリズム

過負荷MIMO 検出

LDPC符号に適した量子化器設計

本研究室の研究を紹介します

Ito, Takabe, W, Trainable ISTA for Sparse Signal Recovery , IEEE ICC Workshop, May, 2018

Imanishi, Takabe, W, Deep Learning-Aided Iterative Detector for Massive Overloaded MIMO Channels , submitted to IEEE GlobalSIP, 2018

W, Takabe, Joint Quantizer Optimization based on Neural Quantizer for Sum-Product Decoder , to appear, IEEE Globecom, 2018

(37)

圧縮センシングの問題設定

+

=

観測行列疎ベクトル

ノイズ

観測ベクトル

観測ベクトルを見て、疎ベクトルを可能な限り正確に

 

推定したい

「未知変数の数 > 方程式の本数」となる

劣決定系問題

A

x

w

y

N

M

(38)

II. SPARSE RECOVERY BY NEURAL NETWORKS A. Binary compressed sensing

The main problem for binary compressed sensing is to reconstruct an unknown sparse signal vector x ∈ Rn _{from the}

observation signal vector u ∈ {+1, −1}m _{under the condition}

that these signals satisfy the relationship:

u = sign(Ax). (1) The sign function sign(·) is defined by

sign(a) = !

−1 (a ≤ 0),

+1 (a > 0). (2) The matrix A ∈ Rm_×n _{is called a sensing matrix. We assume}

that the length of the observation signal vector u is smaller than the length of the sparse signal vector x, i.e., m < n. This problem setup is similar to that of the original compressed sensing. The notable difference between them is that the observation signal u is binarized in a sensing process of binary compressed sensing. A receiver obtains the observation signal u and then it tries to recover the corresponding hidden signal x. We here make two assumptions for the signal x and the sensing matrix A. The first assumption is sparsity of the hidden signal x. The original binary signal x ∈ {0, 1}n contains only k non-zero elements, where k is a positive integer much smaller than n, i.e., Hamming weight of x should be k. We call the set of binary vectors with Hamming weight k is k-sparse signals. The second assumption is that the receiver completely knows the sensing matrix A.

B. Network architecture

When we need an extremely high speed signal processing or an energy-efficient sparse signal processing method for battery powered sensor, it would be reasonable to develop a sparse signal recovery algorithm suitable for the situation. In the sparse signal recovery method based on neural networks to be described in this section requires only several matrix-vector products to obtain an output signal, which is an estimate signal of the sparse vector x. Thus, the proposed method needs smaller computational costs than those required by conventional iterative methods.

Our sparse recovery method is based on a 3-layer feedfor-ward neural network illustrated in Fig.??. This architecture is fairly common one; it consists of the input, hidden and output layers. Adjacent layers are connected by weighted edges and each layer includes neural units that can keep real values. As an activation function, we employed the sigmoid function to determine the values of the hidden and output layers. In our problem setting, the observation signal u is fed into the input layer from the left in Fig.??. The signal propagates from left to right and the output signal y eventually comes out from the output layer. The network should be trained so that the output signal y is an accurate estimation of the original sparse signal x. The precise description of the network in Fig.?? is given by the following equations:

h = σ(Whu + bh), (3) y = σ(Woh + bo), (4) ˆ x = round(y). (5) Input layer Hidden layer Output layer ・・・・・・・・・・・・・・ round

Fig. 1. Architecture of feedforward neural networks for sparse signal

recovery. An observation signal u comes from the left and is fed to the input layer. A sparse estimation vector comes out from the output layer. The sigmoid function is used as an activation function.

The function σ(·) is the sigmoid function defined by f(a) = 1/(1 + e−a). In this paper, we will follow a simple con-vention that f(a) represents (f(a1), f (a2), . . . , f (an)) where

a = (a1, . . . , an). The round function round(a)(a ∈ R) gives

the nearest integer from a. The equation (??) defines the signal transformation from the input layer to the hidden layer. An affine transformation is firstly applied to the input signal u _{∈ {+1, −1}}n and then the sigmoid function is applied. The weight matrix Wh ∈ Rm×α and the bias vector bh ∈ Rα

defines the affine transformation. The resulting signal h ∈ Rα

is kept in the units in the hidden layer, which are called hidden units. The parameter α thus means the number of hidden units. From the hidden layer to the output layer, the equation (??) governs the second signal transformation. The second transformation to yield the signal y consists of the affine transformation, based on the weight matrix Wo ∈ Rα×n

and the bias vector bo ∈ Rn, and the nonlinear mapping based

on the sigmoid function. The vector y ∈ Rn _{emitted from}

the output layer is finally rounded to a nearest integer vector because we assumed that non-zero elements in the original sparse signal x takes the value one. Since the range of the sigmoid function lies in the open interval (0, 1), an element in the estimate vector ˆx should take the value zero or one.

C. Training

The network in Fig.?? can be seen as a parametrized estimator ˆx = round(Φθ(u)) where θ is the set of the trainable

parameters θ = {Wh, Wo, bh, bo}. It is expected that the

trainable parameter θ should be adjusted in the training phase so as to minimize the error probability P rob[x ̸= ˆx]. However, it may be computationally intractable to minimize the error probability directly. Instated of direct minimization of the error probability itself, we will minimize a loss function including a cross entropy-like loss function and an L1-regularization

term. In this subsection, the details of the training process is described.

In the training phase of the network, the parameter θ should be updated in order to minimize the values of the given loss function. Let D = {(u1_{, x}1_{), . . . , (u}L_{, x}L₎_{} be the set}

of the training data used in the training phase. The signals ui and xi relate as ui = sign(Axi), i = 1, 2, . . . , L. In the

ニューラルネットに

 

基づくスパース信号再現

Ito, W, SITA2016

0 50 100 150 200 250 0.0 0.2 0.4 0.6 0.8 1.0

Index of signal component

0 50 100 150 200 250 0.0 0.2 0.4 0.6 0.8 1.0

Fig. 2. Sparse signal recovery for a 6-sparse vector. (top: the original sparse signal x, bottom: the output y = Φθ∗(x) from the trained neural network.

n = 256, m = 120) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 Recovery rate Learning steps (x104₎

Fig. 3. Learning steps and recovery rate. (n = 256, m = 140, k = 6)

the progress contains fluctuations. The recovery rate appears to be saturated around 5.0 × 104 _{steps. In following experiments,}

the number of learning steps is thus set to 5 × 104 _{based on}

this observation.

C. Sparse recovery by integer programming

We introduce integer programming (IP)-based sparse sig-nal recovery as a performance benchmark in the subsequent subsections because it provides the optimal recovery rate. The IP formulation shown here is based on the linear programming formulation in [?]. Although IP-based sparse signal recovery requires huge computer resources, it is applicable to moderate size problems if we employ a recent advanced IP solver. We used IBM CPLEX Optimizer for solving the IP problem shown below. The problem needed to solve is to find a feasible binary vector z = (z1, . . . , zn) ∈ {0, 1}n satisfying the following

0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 Recovery rate

Length of observation vector m NN : k=4 NN : k=6 IP : k=4 IP : k=6

Fig. 4. Recovery rates of the proposed scheme (denoted by NN). (n = 256, k = 4, 6) As benchmarks, recovery rates of IP-based scheme is also included (denoted byIP).

conditions: n ! i=1 zi = k, (7) ∀i ∈ [1, m], n ! j=1 Ai,jzj > 0, if ui = +1, (8) ∀i ∈ [1, m], n ! j=1 Ai,jzj ≤ 0, if ui = −1, (9)

where Ai,j is the (i, j) element of the sensing matrix A

and ui is the i-th element of the observation signal u. If a

feasible solution z∗ _{satisfying all the above conditions exists,}

it becomes an estimate ˆx = z∗_{. It is clear that these conditions}

are consistent with our setting of binary compressed sensing. D. Experimental results

Fig.?? presents the recovery rates of the proposed scheme under the condition where n = 256, k = 4, 6. The neural network shown in Fig.?? was used. In Fig.??, it is seen that the recovery rate tend to increase as m increases for all k. The recovery rate is beyond 90% at m = 140 when the original signal is 4-sparse. It can be also observed that the recovery rate strongly depends on the sparseness parameter k. For example, the recovery rate of k = 6 comes around 70% at m = 140. The IP-based sparse recovery provides the recovery rate 99% at m ≥ 60 when the original signal is 6-sparse vector. On the other hand, our neural network yields the recovery rate more than 90% at m ≥ 240 when the original signal is 6-sparse. Although computation costs of the neural network in the recovery phase are much smaller than those required for IP-based sparse recovery, the performance gap appears rather huge and the neural-based reconstruction should be further improved.

IV. MAJORITY VOTINGNEURALNETWORKS

In the previous section, we saw that neural-based sparse signal recovery is successful under some parameter setting

End-to-Endアプローチ

(39)

既存手法: ISTA

ISITAはよく知られている反復型信号再現アルゴリズム

Brief review of ISTA

ISTA is a well-known sparse recovery algorithm defined by the

following simple recursion:

r

_t

= s

_t

+ βA

T

(y − As

_t

)

(1)

s

_t+1

= η(r

_t

; τ),

(2)

where β represents a step size and η is the soft thresholding

function.

!

_{The initial value is assumed to be s}

₀

_{= 0.}

!

_{Lasso formulation is given by}

1

2 ||

y − Ax||

2

2 + λ||x||

1 .

(3)

Since the proximal operator of L

₁

-regularization term ||x||

₁

is

the soft thresholding function, ISTA can be seen as a proximal

gradient descent algorithm

(40)

Brief review of ISTA

ISTA is a well-known sparse recovery algorithm defined by the

following simple recursion:

r

_t

= s

_t

+ βA

T

(y − As

_t

)

(1)

s

_t+1

= η(r

_t

; τ),

(2)

where β represents a step size and η is the soft thresholding

function.

!

_{The initial value is assumed to be s}

₀

_{= 0.}

!

_{Lasso formulation is given by}

1

2 ||

y − Ax||

2

2 + λ||x||

1 .

(3)

Since the proximal operator of L

₁

-regularization term ||x||

₁

is

the soft thresholding function, ISTA can be seen as a proximal

gradient descent algorithm

̂x = ||y − Ax||

2 ₂

+ λ||x||

₁

LASSO

近接勾配法

 

(proximal gradient descent)

勾配法ステップ

近接射影ステップ

(41)

LISTA

K. Gregor, and Y. LeCun,

``Learning fast approximations of sparse coding,''

Proc. 27th Int. Conf. Machine Learning}, pp. 399--406, 2010.

2.2. スパース信号復元に関する関連研究

9 定義

2.1 関数

η :

R !−→ R

が

E[η

′

_{(R)] = 0}

_(2.23)

を満たす時，

η

をダイバージェンスフリー関数と呼ぶ．

ダイバージェンスフリー関数は任意の関数

η

ˆ

を用いて

η

_DF

(r) = C

_{· (ˆη(r) − E}

_R

[ˆ

η

′

(R)]

_{· r)}

(2.24)

と構成することができる．ここで，パラメータ

C

は任意の係数である．例として，関

数

η

ˆ

を軟判定閾値関数

ˆ

η(r) = sign(r) max

_{{|r| − τ, 0}}

(2.25)

としたとき，ダイバージェンスフリー関数は

η

_DF

(r) = C

_·

!

ˆ

η(r)

₋

!

1 N

N

"

i=1

I(|r

i

| > τ)

#

· r

#

(2.26)

となる．ここで，関数

_I(|r

_i

_{| > τ)}

は指示関数

I(|r

i

| > τ) =

$

1 (if

_|r

_i

_{| > τ)}

0 (if

_|r

_i

_{| ≤ τ)}

(2.27)

である．

誤差分散

v

_t

2 , τ

_t

2 の推定式は

SE

より導出される

[18]

．推定式の導出については次章に

て詳細を述べる．

Ma

らは

IID

ガウシアン行列やユニタリ不変行列が観測行列の場合に

ついて，線形推定誤差と非線形推定誤差が統計的に直交することを用いてこの

SE

が信

頼できることを示した

[18]

．

2.2.4 LISTA

前述した

ISTA

，

AMP

，

OAMP

と異なるアプローチとして，

ISTA

の構造を持つニュー

ラルネットワークを用いたスパース信号復元法が提案された

[27]

．この手法は

Learned

ISTA

と呼ばれている．

LISTA

では漸化式

r

_t

= Bs

_t

+ Sy

(2.28)

s

_t+1

= η(r

_t

; τ

_t

)

(2.29)

を

T

回反復計算することにより推定信号

x = s

ˆ

_T

を求める．参考として，

LISTA

の第

t

層の概要図を図

2.2 に示す．

10 第2 章基礎的事項図 2.2: LAMP のt反復目の概要．学習可能パラメータは B, S, τt．st, y を受け取り st+1 を出力する． LISTAは学習可能パラメータΘ = _{{S, B, {τ}t}T_t=0−1}を持つ．学習可能パラメータの初期

値はISTAの設定に基づき，S = βAT, B = I_−SAと設定する．Gregorらは，ISTAにお

ける閾値τ を層依存の閾値τt にして，訓練データ {(y, x)}に基づき loss(Θ) = ||ˆx − x||

を最小化するように学習可能パラメータ S, B, τt を調整することを提案した．学習可

能パラメータの調整には，深層学習において用いられている学習法を用いている．これにより，LISTA は ISTA と比べて非常に高速な推定を可能とした．LISTA では前述

した全ての層で学習をした行列 S, B を共有するモデルの他に，全ての層で独立に行列 St, Bt の学習を行うモデルを考えることができる．全層で行列の共有を行う場合， B _{∈ R}N×N, S _{∈ R}N×M より，学習可能パラメータサイズは _{|Θ| = N}2 + N M + T となる．これに対して，全層で独立に行列の学習を行う場合には，前者より復元性能が少し上昇するものの，学習可能パラメータサイズが _{|Θ| = T (N}2 _{+ N M + 1)}となり，大幅に総数が増加することになる．ニューラルネットワークの学習可能パラメータは，学習後の振る舞いについての解析が一般的には困難である．しかしながら，LISTA は ISTA の構造を基にニューラルネットワークを構成しているため，それぞれの学習可能パラメータの振る舞いを ISTA を基に考察することができると考えられている．学習可能パラメータ τt に着目すると， ISTAにおいて τ は軟判定閾値関数に対する閾値として用いられており，τ はrt − x の標準偏差を用いることが良いとされている．このことから，LISTA では大量の訓練サンプルをもとに，統計的に rt − x の標準偏差の推定を行っていると考えることができる．また，ISTAではτ の値は全ての反復で同じ値を用いているが，LISTAでは層ごとに異なるτt を用いることで，推定信号 rt に合わせた τt を使用して収縮推定を行うことができる．以上の特徴より，LISTA では ISTA と比べて非常に高速な収束を可能としている．上述の手法以外にも，ISTAの構造を基にしたニューラルネットワークの構成について様々な研究がされている．例えば，ISTAのパラメータ調整についてではなく，ISTA における軟判定閾値関数を MSE 最適な関数に置き換える手法が提案されている [31]．この手法では，MSE最適な関数を近似した関数を収縮関数として使用することを考え学習可能  パラメータ  （行列）

可微分アプローチの先駆け

(42)

提案手法: TISTA

Ito, Takabe, W, Trainable ISTA for Sparse Signal Recovery , IEEE ICC Workshop, May, 2018

Definition of TISTA

TISTA

The recursive formula of TISTA are summarized as follows:

r

_t

= s

_t

+ γ

_t

W (y − As

_t

),

(4)

s

_t+1

= η

_MMSE

(r

_t

; τ

2_t

),

(5)

v

_t2

= max

!

||

_{y − As}

_t

||

2₂

_{− Mσ}

2

trace(A

T

A)

, ϵ

"

,

(6)

τ

2_t

=

v

2 t

N

(N + (γ

2 t

− 2γ

t

)M) +

γ

2_t

σ

2

N

trace(WW

T

_),

₍₇₎

!

γ

_t

: trainable variable (trained by mini-batch training)

(43)

提案手法: TISTA

Definition of TISTA

TISTA

The recursive formula of TISTA are summarized as follows:

r

_t

= s

_t

+ γ

_t

W (y − As

_t

),

(4)

s

_t+1

= η

_MMSE

(r

_t

; τ

2_t

),

(5)

v

_t2

= max

!

||

_{y − As}

_t

||

2₂

_{− Mσ}

2

trace(A

T

A)

, ϵ

"

,

(6)

τ

2_t

=

v

2 t

N

(N + (γ

2 t

− 2γ

t

)M) +

γ

2_t

σ

2

N

trace(WW

T

_),

₍₇₎

!

γ

_t

: trainable variable (trained by mini-batch training)

学習可能  パラメータ  （スカラー）

誤差分散の

推定式

MMSE Shrinkage function

Plots of ηMMSE as a function of a received signal y

α2= 1, σ2= 0.2, 0.8. -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

output of MMSE estimator

received signal

こんな感じの関数→

近接勾配  ライクな処理

(44)

Block diagram of TISTA

The t-th iteration of the TISTA with learnable variable γ

_t

.

Process A Process B Process C Signal ﬂow

(a) Flow diagram of an iterative algorithm

Process A B C A B C

(b) Signal-ﬂow network and deep learning process Stochastic gradient descent Loss function Parameter updates Training data

TISTAの１ラウンド分のブロック図

損失

関数

y

x

学習可能  パラメータ  （スカラー）

(45)

は、ガウス雑音ベクトルである。圧縮センシングでは、与えら

れた観測ベクトル

y

から原信号

x

を可能な限り高い精度で再現

することを目標とする。圧縮センシングは、最近では広い工学

分野においてその応用研究が進められている。例えば、医療画

像処理分野

(

特に

MRI)

では、サンプリング回数削減を実現す

るための画像再現原理として注目されている。また無線通信分

野においてもランダムアクセスプロトコル、スペクトルセンシ

ング、通信路推定などの分野で応用研究が発表されている。

スパース信号再現アルゴリズムとして様々なアルゴリズムが

知られているが、ここではその一部について紹介するに留める。

ISTA [4] [5]

は、古くから知られる反復型アルゴリズムであり、

次の再帰式に基づく計算を行う。

r

t

= s

t

+ βA

T

(y

− As

t

)

(2)

s

t+1

= η(r

t

; τ ).

(3)

式

(2)

は原信号に対する線形推定式であり、式

(3)

は、軟しき

い値関数

(soft thresholding function)η(r

t

; τ )

に基づく非線形

推定を表す関数である。以上の反復式は、元問題を

Lasso

形式

ˆ

x = arg min

x 1₂

||y − Ax||

2₂

+ λ

||x||

1

に表現し、それを近接演算

子

(proximal operator)

を利用した勾配法

(proximal gradient

descent)

を書き下すと

ISTA

の反復式

(2)(3)

が自然に導かれる。

ISTA

は単純な反復計算に基づくアルゴリズムであり、その

計算量も少ない

(1

ラウンドの計算につき、

O(N

2

)

の計算量で

十分である

)

ため、大規模なスパース信号再現問題に対しても適

用が可能である。しかし、解への収束は一般に遅く、数百回か

ら数千回の反復が必要になることもある。また、ハイパーパラ

メータ

β

の設定も、かなりシビアであることが知られている。

観測行列がある種の条件を満たすときに、

ISTA

と比べて圧

倒的に高速な収束を見せるスパース信号再現アルゴリズムとし

て、近年では、

AMP [6]

がよく知られている。

AMP

は、

ISTA

に似た反復型の構造を持っている。また、最近では、

OAMP

な

ど

AMP

を改良したアルゴリズムもいくつか提案されている。

4. 2 TISTA

の詳細

ここでは、文献

[15]

に基づき、

TISTA

の詳細について述べ

る。

TISTA

の反復処理は、次の通りまとめられる。

r

t

= s

t

+ γ

t

W (y

− As

t

),

(4)

s

t+1

= η

M M SE

(r

t

; τ

t2

),

(5)

v

_t2

= max

!

||y − As

t

||

2₂

− Mσ

2

trace(A

T

_A)

, ϵ

"

,

(6)

τ

_t2

=

v

2 t

N

(N + (γ

2 t

− 2γ

t

)M ) +

γ

_t2

σ

2

N

trace(W W

T

). (7)

行列

W

はセンシング行列

A

の疑似逆行列である。式

(4)

と式

(5)

は、ほぼ

ISTA

の線形推定式・非線形推定式と同様である。

ただし、

TISTA

では、収縮関数として軟しきい値関数の代わ

りに

MMSE

収縮関数

η

M M SE

(y; σ

2

)

が利用されている。ここ

で、注目すべき点として、式

(4)

に登場する変数

γ

t

が

TISTA

における学習可能パラメータである。このパラメータは、学習

プロセスにおいて適切な値に調整されることになる。式

(4)

を

一種の勾配法の過程としてみるとき、

γ

t

はステップサイズを定

めるパラメータと見ることもできる。

ここで収縮関数である

η

M M SE

(y; σ

2

)

は

η

M M SE

(y; σ

2

) =

#

yα

2

ξ

$

pF (y; ξ)

(1

_{− p)F (y; σ}

2

_{) + pF (y; ξ)}

,

(8)

と定義される。ここで、

ξ = α

2

+ σ

2

であり、

F

は

F (z; v) =

1 √ 2πv

exp

%

−z2 2v

&

と定義される。この

MMSE

収縮関数は元信

号の事前確率分布

(

この場合、ベルヌーイ・ガウシアン事前分

布

)

により定まる。図

2 に

η

M M SE

(y; σ

2

)

の概形を示す。

この図から見て取れるようにパラメータ

σ

2

の値により、収

縮関数の概形が制御されることになる。

TISTA

の反復過程に

おいて、この

σ

2

は推定誤差分散

τ

_t2

と等しく置かれており、推

定誤差分散により適切な値が自動的に設定される。

TISTA

は

ISTA

をベースとして構成されたアルゴリズムで

あるが、学習パラメータの導入の仕方に大きな自由度があるこ

とを指摘しておきたい。

TISTA

の場合は

γ

t

として、スカラー

の学習可能パラメータを各反復処理ごとに導入している。一方、

既存の学習型スパース信号再生アルゴリズムである

LISTA [11]

や

LAMP [2]

では、線形推定式に現れる行列そのものを学習可

能パラメータとして置いている。このように深層学習に基づい

て派生アルゴリズムを作り出す際には、高い構造上の自由度が

ある。どのような形で学習可能パラメータを導入するか、また

は可塑性のあるニューラルネットワーク関数近似をどの部分に

導入するか、などの慎重な吟味・検討が性能の良い派生アルゴ

リズム構成のためには重要となる。

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

output of MMSE estimator

received signal

図

2 TISTA

で利用される

MMSE

収縮関数

表

1

学習可能パラメータ数の比較

(T

ラウンドの処理を仮定

)

TISTA

LISTA

LAMP

# of params

T

T (N

2

+ M N + 1) T (N M + 2)

既存のアルゴリズムをベースとして、学習可能なアルゴリズ

ムを構成する際には、どの部分を学習可能にするかの選択に

よって最終的なアルゴリズム全体の性能、学習プロセスの難易

度・速度は変わってくる。表

1 に

TISTA, LISTA, LAMP

に

おいて利用される学習可能パラメータ数の比較を示す。

TISTA

は従前のアルゴリズムと比べて圧倒的に学習可能パラメータ数

— 6 —

学習パラメータ数の比較

から、サポートベクトルマシン

[1]

のように学習における大域

最適性の保証が可能な凸最適化に基づくアルゴリズムへの機械

学習研究者の興味の移り変わりがあったように見える。今は

ニューラルネットワーク研究の第

3 次ブームがやってきている

状況であるが、それは、非凸最適化の復権、そして発展の時代

を意味しているのかもしれない。

世間的には、深層学習に対して、現状の技術水準を超えた期

待がちらほら出始めている気もするが、技術者・研究者として

は、冷静に、その技術的な核は何かを見定めながら、深層学習

技術とつきあっていくとともに、その可能性を関連研究分野に

おいて探るのが生産的である。

謝

辞

本稿の研究の一部は科研費基盤

(B)16H02878

に基づく。名

古屋工業大学高邉賢史氏からは、本稿の内容を改善するために

有益なコメントを頂戴した。感謝の意を表する。

文

献

[1] C. M. Bishop, “Pattern recognition and machine learning, ” Springer, 2006.

[2] M. Borgerding and P. Schniter, “Onsager-corrected deep learning for sparse linear inverse problems,” 2016 IEEE Global Conf. Signal and Inf. Proc. (GlobalSIP), Washing-ton, DC, Dec. 2016, pp. 227-231.

[3] A. Caciularu, D. Burshtein “Blind channel equalization us-ing variational autoencoders, ” arXiv:1803.01526, 2018.

[4] A. Chambolle, R. A. DeVore, N. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 319–335, Mar, 1998.

[5] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure and Appl. Math., col. 57, no. 11, pp. 1413-1457, Nov. 2004.

[6] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18914–18919, Nov. 2009.

[7] N. Farsad, and A. Goldsmith, “Neural network detec-tion of data sequences in communicadetec-tion systems, ” arXiv:1802.02046, 2018.

[8] K. Fukushima, “Neocognitron: A self-organizing neural net-work model for a mechanism of pattern recognition unaf-fected by shift in position,” Bio. Cybern., vol. 36, no. 4, pp. 193-202, 1980.

[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,“ Gen-erative adversarial nets,”in Advances in neural information processing systems, pp. 26722680, 2014.

[10] I. Goodfellow, Y. Bengio, A. Courville “Deep learning, ” The MIT Press, 2016.

[11] K. Gregor, and Y. LeCun, “Learning fast approximations of sparse coding,” Proc. 27th Int. Conf. Machine Learning, pp. 399–406, 2010.

[12] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning-based channel decoding, ” arXiv:1701.07738, 2017.

[13] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The Shared Views of Four Research

Groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012.

[14] E. Hoﬀer, I. Hubara, D. Soudry, “Train longer, generalize better: closing the generalization gap in large batch training of neural networks,” arXiv:1705.08741, 2017.

[15] D. Ito, S. Takabe, T. Wadayama, “Trainable ISTA for sparse signal recovery, ” IEEE International Conference on Com-munications (ICC) workshop, Promises and Challenges of Machine Learning in Communication, Kansas City, 2018. (arXiv:1801.01978)

[16] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang, “On large-batch training for deep learning: gen-eralization gap and sharp minima, ” in Proceedings of ICLR 2017.

[17] D. P. Kingma and J. L. Ba, “Adam: A method for stochas-tic optimization,” arXiv:1412.6980, 2014.

[18] D. P. Kingma, M. Welling, “Auto-encoding variational Bayes, ” arXiv:1312.6114v10, 2013.

[19] A. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet classi-fication with deep convolutional neural networks.” Advances in Neural Inf. Proc. Sys. 2012, pp. 1097-1105, 2012.

[20] Y. A. LeCun, L. Bottou, G. B. Orr, and K. R. M¨uller, “Ef-ficient backprop,” in Neural Networks: Tricks of the Trade, G. B. Orr and K. R. M¨uller, Eds. Springer-Verlag, London, UK, 1998, pp. 9-50.

[21] E. Nachmani, Y. Be´ery and D. Burshtein, “Learning to de-code linear de-codes using deep learning,” 2016 54th Annual Allerton Conf. Comm., Control, and Computing, 2016, pp. 341-346.

[22] E. Nachmani, E. Marciano, D. Burshtein and Y. Be’ery, “RNN decoding of linear block codes, ” arXiv:1702.07560, 2017.

[23] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, “WaveNet: a generative model for raw au-dio,” arXiv:1609.03499v2, 2016.

[24] T. O’Shea and J. Hoydis, “An introduction to deep learn-ing for the physical layer,” IEEE Transactions on Cognitive Communications and Networking, vol. PP, no. 99, pp. 1-1. [25] J. G. Proakis, “Digital communications (2nd ed.),”

McGraw-Hill Book, 1989.

[26] T. Richardson and R. Urbanke, “Modern coding theory,” Cambridge University Press, 2008.

[27] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learn-ing representations by back-propagat“Learn-ing errors,” Nature, vol. 323, no. 6088, pp. 533?536, 1986.

[28] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Royal Stat. Society, Series B, vol. 58, pp. 267–288, 1996.

[29] B. Vasic, X. Xiao and S. Lin, “Learning to decode LDPC codes with finite-alphabet message passing, ” http://ita. ucsd.edu/workshop/18/files/paper/paper_273.pdf

[30] T. Wadayama and S. Takabe, “Joint quantizer optimiza-tion based on neural quantizer for sum-product decoder, ” arXiv:1804.06002, 2018.

[31] W. Xu, Z. Wu, Y. L. Ueng, X. You, and C. Zhang, “ Im-proved polar decoder based on deep learning,”in IEEE In-ternational Workshop on Signal Processing Systems (SiPS), Oct 2017, pp. 16, 2017.

[32] 岡谷貴之, “深層学習, ” 講談社, 2015.

[33] 斎藤康毅, “ゼロから作る Deep Learning ―Python で学ぶ

ディープラーニングの理論と実装, ” オライリー・ジャパン [34] フランシス・ショレ (著), 巣籠悠輔 (訳), “Python と Keras によるディープラーニング,” 株式会社クイープ, 2018. [35] “ディープラーニングに入門するためのリソース集と学習法（2018 年版）,” https://qiita.com/keitakurita/items/ df3a07135c9cfad810c7, Qiita, 2018

— 8 —

(46)

Comparisons in NMSE

NMSE of TISTA, LISTA and AMP;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, SNR = 40dB.}

The condition A

_i,j

∼

_{N (0, 1/M) is required for AMP to converge.}

-45 -40 -35 -30 -25 -20 -15 -10 -5 0 2 4 6 8 10 12 14 16 NMSE [dB] iteration TISTA LISTA AMP

Comparisons in NMSE

NMSE of TISTA, LISTA and AMP;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, SNR = 40dB.}

The condition A

_i,j

∼

_{N (0, 1/M) is required for AMP to converge.}

-45 -40 -35 -30 -25 -20 -15 -10 -5 0 2 4 6 8 10 12 14 16 NMSE [dB] iteration TISTA LISTA AMP

TISTAの復元性能 (10%が非ゼロ元となる状況)

(47)

Values of γ

_t

Three sequences of learned parameters γ

_t

;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, p = 0.1, SNR = 40dB.}

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 0 2 4 6 8 10 value iteration TISTA1 TISTA2 TISTA3

学習の結果(学習可能パラメータの値)

Values of γ

_t

Three sequences of learned parameters γ

_t

;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, p = 0.1, SNR = 40dB.}

(48)

Values of γ

_t

Three sequences of learned parameters γ

_t

;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, p = 0.1, SNR = 40dB.}

学習の結果(学習可能パラメータの値)

Values of γ

_t

Three sequences of learned parameters γ

_t

;

A

_i,j

∼

_{N (0, 1/M), N = 500, M = 250, p = 0.1, SNR = 40dB.}

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 0 2 4 6 8 10 value iteration TISTA1 TISTA2 TISTA3 なぜこの形が有利なのか（収束が速くなるのか）  。。。

(49)

Binary sensing matrix: TISTA,LISTA

N = 500, M = 250, p = 0.1, Ai,j ∈ {−1, 1}, SNR = 40 dB -45 -40 -35 -30 -25 -20 -15 -10 -5 2 4 6 8 10 12 14 16 NMSE [dB] iteration TISTA LISTA

観測行列が２値(+1, -1)の場合

Binary sensing matrix: TISTA,LISTA

N = 500, M = 250, p = 0.1, A

_i,j

_{∈ {−1, 1}, SNR = 40 dB}

-45 -40 -35 -30 -25 -20 -15 -10 -5 2 4 6 8 10 12 14 16 NMSE [dB] iteration TISTA LISTA

it-ken_open.key

深層学習技術と

信号処理・通信系アルゴリズム

̶概観と展望̶

名古屋工業大学

和田山 正

深層学習技術の進展

•

画像認識

•

音声認識

•

自然言語処理

•

機械翻訳

7 Discussion

深層学習の「深化と広がり」

深層学習の「深化と広がり」

本日のテーマ：

データ駆動型アルゴリズム設計

深層学習技術

人工知能実現のための技術

本日のテーマ：

データ駆動型アルゴリズム設計

深層学習技術

人工知能実現のための技術

通信系アルゴリズムとの自然な結びつき

学習可能な確率モデル

多段関数の非凸最適化

データ駆動型

アルゴリズム設計

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

関連研究の動向

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

本講演の概要

Deep Learning in 5 minutes

可微分プログラミング

関連研究の動向

研究事例紹介

(スパース信号再現アルゴリズム、MIMO検出、

量子化器設計)

こんな関数を作りたい！

f : ℝ

n

→ ℝ

0: 猫

{ , 1}

f

Θ

: ℝ

n

→ ℝ

Θ

関数の形状を制御するパラメータを導入する

訓練データを準備する

{ , 1} { , 1}

{ , 0} { , 0} { , 0}

パラメータを更新する(訓練・学習）

f

Θ

: ℝ

n

→ ℝ

{ , 1}

{ , 1}

{ , 0}

̂y

y

深層ネットワークモデル

Wh + b

線形層 活性化関数

g

f

Θ

: ℝ

n

和田山正

_Θ

_Θ

線形層活性化関数

_Θ

_{̂y = f}

_{∈ R}

₀