提案手法との比較モデル - 評価実験 - Conditional AutoEncoder による筆跡特徴抽出

第 3 章 Conditional AutoEncoder による筆跡特徴抽出

3.3 評価実験

3.3.3 提案手法との比較モデル

提案手法の字種ラベルの条件付け方法が，字種に依存しない筆跡特徴の抽出に有効であるか否かを検証するため，AEへの字種ラベルの付与方法を変更した 2つのモデルとの比較を行う．また，潜在空間に制約をつけて学習するVAEと提案手法であるAEベースの手法を比較することで，潜在空間を構造化することによる筆跡特徴抽出への影響について調べる．

3.3.3.1 AutoEncoder の字種ラベル付与の有効性

AE_2のようにEncoderとDecoderの両方に字種情報を付与する方法が，字種

に依存しない筆跡特徴抽出に有効であるか否かを検証するため，字種情報を付

・ノイズ除去

・枠内切出し 140

140

抽出画像 (8 bit)

64 64

正規化画像 (1 bit)

Created Frame

Binarization Resized

ROI Extraction Binarization

図 3.7 平仮名データの前処理

与しない通常の AutoEncoder と Decoder のみに字種情報を付与するモデルとの比較を行う．

AE_2 と比較する字種ラベルを含まない AutoEncoder として構築したモデル

（以下，AE_0と呼称する）のネットワーク構造を図 3.8及び表 3.2に示す．同様に，AE_2 との比較として構築した Decoder のみに字種ラベルを付与する

Encoder

Decoder

Reconstruction Error

Input Output

Input

Convolution+ReLU

Upsampling

Batch Normalization Sigmoid

Dense

Max Pooling

Convolution

Dense+ReLU+L2 Norm Latent

図 3.8 AE_0のネットワーク構造

Output shape (w, h, channel) or unit size

Input Input width, height, 1

Conv1 Convolution width, height, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP1 MaxPooling width/2, height/2, 4 Kernel=2, Stride=2

Conv2 Convolution width/2, height/2, 16 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP2 MaxPooling width/4, height/4, 16 Kernel=2, Stride=2

Flatten Flatten (width/4)×(height/4)×16

Enc-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2-BN BatchNormalization

Latent Dense z dim

Dec-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Dec-D2 Dense (width/4)×(height/4)×16 Activation=ReLU, L2 regularization (λ=0.01) Dec-D2-BN BatchNormalization

Reshape Reshape width/4, height/4, 16 US1 UpSampling width/2, height/2, 16

DeConv1 DeConvolution width/2, height/2, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1

US2 UpSampling width, height, 4

DeConv2 DeConvolution width, height, 1 Kernel=3, Stride=1, Padding=1 Dec-DeConv2-BN BatchNormalization

Output Output width, height, 1 Activation=Sigmoid Encoder

Decoder

Model Layer name Layer type Parameters

表 3.2 AE_0の各層の詳細

AutoEncoderのモデル（以下，AE_1と呼称する）のネットワーク構造を図 3.9及

び表 3.3に示す．AE_0，AE_1の誤差逆伝搬に用いる誤差関数は，AE_2と同様

に入力画像と出力画像間の平均絶対誤差を再構成誤差DREとして算出する．

Encoder

Decoder

Reconstruction Error

Input Output

Input

Convolution+ReLU

Upsampling

Batch Normalization Sigmoid

Concatenation Dense

Max Pooling

Convolution

Dense+ReLU+L2 Norm Latent

Label

図 3.9 AE_1のネットワーク構造

Output shape (w, h, channel) or unit size

Input Input width, height, 1

Conv1 Convolution width, height, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP1 MaxPooling width/2, height/2, 4 Kernel=2, Stride=2

Conv2 Convolution width/2, height/2, 16 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP2 MaxPooling width/4, height/4, 16 Kernel=2, Stride=2

Flatten Flatten (width/4)×(height/4)×16

Enc-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2-BN BatchNormalization

Latent Dense z dim

Label Input n

Dec-Merge Merge z dim+n Concatenation

Dec-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Dec-D2 Dense (width/4)×(height/4)×16 Activation=ReLU, L2 regularization (λ=0.01) Dec-D2-BN BatchNormalization

Reshape Reshape width/4, height/4, 16 US1 UpSampling width/2, height/2, 16

DeConv1 DeConvolution width/2, height/2, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1

US2 UpSampling width, height, 4

DeConv2 DeConvolution width, height, 1 Kernel=3, Stride=1, Padding=1 Dec-DeConv2-BN BatchNormalization

Output Output width, height, 1 Activation=Sigmoid Encoder

Decoder

Parameters Model Layer name Layer type

表 3.3 AE_1の各層の詳細

3.3.3.2 潜在空間の構造化の筆跡特徴抽出への影響

VAEは，AEと同様に画像のもつ潜在的な特徴を表現可能とする手法である．

AEでは潜在空間について制約を与えず学習を行うため，生成される潜在空間を特に構造化することは目的としていない．一方，VAE では，画像は何らかの統計的なプロセスを経て生成されていると仮定し，その生成過程を考慮して潜在空間を求める．本実験では，文献 [69]と同様に，zは多変量標準正規分布に従うと仮定する．提案手法のAE_2との比較にあたり，図 3.10及び表 3.4に示すVAE

Encoder Decoder

Reconstruction Error

z DKL = KL[N (µ,σ) || N (0, I)]

＋

＊ Sample ε from N (0,I)

Convolution+ReLU

Upsampling

Batch Normalization Sigmoid

Concatenation Dense

Max Pooling

Convolution

Dense+ReLU+L2 Norm

Input Output

Input

Latent

Label

Mean

Sigma

図 3.10 VAE_2のネットワーク構造

Output shape (w, h, channel) or unit size

Input Input width, height, 1

Conv1 Convolution width, height, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP1 MaxPooling width/2, height/2, 4 Kernel=2, Stride=2

Conv2 Convolution width/2, height/2, 16 Activation=ReLU, Kernel=3, Stride=1, Padding=1 MP2 MaxPooling width/4, height/4, 16 Kernel=2, Stride=2

Flatten Flatten (width/4)×(height/4)×16

Label Input n

Enc-Merge Merge (width/4)×(height/4)×16+n Concatenation

Enc-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Enc-D2-BN BatchNormalization

Mean Dense z dim

Sigma Dense z dim

Latent Dense z dim

Dec-Merge Merge z dim+n Concatenation

Dec-D1 Dense 500 Activation=ReLU, L2 regularization (λ=0.01)

Dec-D2 Dense (width/4)×(height/4)×16 Activation=ReLU, L2 regularization (λ=0.01) Dec-D2-BN BatchNormalization

Reshape Reshape width/4, height/4, 16 US1 UpSampling width/2, height/2, 16

DeConv1 DeConvolution width/2, height/2, 4 Activation=ReLU, Kernel=3, Stride=1, Padding=1

US2 UpSampling width, height, 4

DeConv2 DeConvolution width, height, 1 Kernel=3, Stride=1, Padding=1 Dec-DeConv2-BN BatchNormalization

Output Output width, height, 1 Activation=Sigmoid

Decoder

Model Layer name Layer type Parameters

Encoder

表 3.4 VAE_2の各層の詳細

構造のモデル（以下，VAE_2 と呼称する）を学習することでモデルパラメータを取得する．

VAE_2の学習で誤差逆伝搬に用いる誤差関数LVAE_2（I, O）を式（3.2）に示す．

LVAE_2は再構成誤差 DREと z の多変量標準正規分布との類似度を表す Kullback

Leibler距離DKLの2つの項から成る．DKLは式（3.3）により算出する．DREは誤差関数LCAEで用いる再構成誤差と同様に式（3.1）の平均絶対誤差により算出する．

𝐿𝐿

_{𝑉𝑉𝐶𝐶𝐶𝐶_2}

(𝐼𝐼, 𝑂𝑂) = 𝐷𝐷

_{𝑅𝑅𝐶𝐶}

(𝐼𝐼, 𝑂𝑂) + 𝐷𝐷

_{𝐾𝐾𝐿𝐿}

（ 3.2 ） 𝐷𝐷

_{𝐾𝐾𝐿𝐿}

= − 12 � �1+ 𝑙𝑙𝑜𝑜𝑙𝑙�𝜎𝜎

^{𝑧𝑧 𝑑𝑑𝑖𝑖𝑑𝑑} ^𝑘𝑘²

� − 𝜇𝜇

𝑘𝑘2

− 𝜎𝜎

_𝑘𝑘²

�

𝑘𝑘=1

（ 3.3 ）

ドキュメント内深層学習を用いた特徴表現に基づく字種非依存型オフライン筆者照合に関する研究 (ページ 37-41)