Modeling Naturalness - Naturalness learning and its application to thesynthesis of handwritten

and the true output activations V(n−1) in series-parallel mode. Using an output activation from the previous step is rare (and makes the output layer recurrent), but is essential for the naturalness learning framework [35]. Ta-ble 5.7 shows the results for the training and testing. Compared to all the previous trials, the mse_x,y for training was found to be lower by an order of magnitude, and Cx,y reached almost perfect correlation. The network uses its own history of outputs for computing the output at the next step. When training was completed, the network was then tested in parallel mode to see if it could generate correct outputs on its own. As expected, the performance was found to decrease. However, the drop in performance was not as severe as in the FFNN_H^V or FFNN_H^{U V} cases.

Fig. 5.20 plots the teacher signals vs. the signal estimated by the ESN model in the testing trial for writer wr = 1. Fig. 5.21 plots the residuals.

The performance of the ESN model in the test trial is visualized in Fig.5.22.

Plots for other writers are available in appendix A.7. As we can see, the network could re-generate all letters from the training set on its own with substantial increase in visual quality being achieved.

Table 5.7: ESN: Mean square errors and correlation coefficients for x and y components of naturalness.

train

wr mse_x,y C_x,y

#1 7.47×10⁻⁴ 4.18×10⁻⁴ 98.08% 98.36%

#2 6.28×10⁻⁴ 4.27×10⁻⁴ 98.27% 98.03%

#3 1.20×10⁻³ 1.16×10⁻³ 98.38% 97.64%

#4 6.28×10⁻⁴ 5.62×10⁻⁴ 98.63% 98.30%

#5 5.90×10⁻⁴ 6.38×10⁻⁴ 98.48% 98.10%

test

mse_x,y C_x,y

#1 8.64×10⁻³ 7.44×10⁻³ 75.15% 65.86%

#2 9.06×10⁻³ 9.68×10⁻³ 71.12% 46.71%

#3 1.65×10⁻² 1.41×10⁻² 75.19% 65.95%

#4 1.38×10⁻² 1.10×10⁻² 66.87% 61.79%

#5 8.85×10⁻³ 1.18×10⁻² 75.05% 58.96%

it is often the case that same letter written by the same writer several times will take on a slightly different shape each time. It is therefore appropriate to judge the performance visually rather than numerically⁷. Fig.5.23 visu-alizes the modeling trial for writer wr = 1 and shows that the ESN could generate untrained letters fairly well. Plots for other writer are available in appendix A.8.

Close inspection of Fig. 5.23 reveals that some parts of some strokes where there is a substantial change of a shape are accompanied by the gener-ation of slight zig-zag distortions. This can be most likely attributed to the inability of the network to quickly settle down after a quick change in the in-put signals, which occurs in places where the stroke shape changes abruptly.

This problem can be overcome by simple smoothing as shown in Fig.5.24,

re-7Further details in Chapter 7

500 1000 1500 2000 2500

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6

update steps, n = 1, ..., 2704

Naturalness (X component)

teacher model

500 1000 1500 2000 2500

−0.5

−0.4

−0.3

−0.2

−0.1 0 0.1 0.2 0.3 0.4 0.5

update steps, n = 1, ..., 2704

Naturalness (Y component)

teacher model

Figure 5.20: ESN: Teacher signals vs. model estimated signals for writer wr = 1.

sulting in production quality letters (see appendix A.8 for smoothed plots of other writers). Fig.5.25 demonstrates how the ESN can transform a font into a handwritten form. The results obtained indicate that modeling naturalness is possible, and that the naturalness learning framework is a promising and transparent approach for generating handwritten letters.

500 1000 1500 2000 2500

−0.5

−0.4

−0.3

−0.2

−0.1 0 0.1 0.2 0.3 0.4 0.5

update steps, n = 1, ..., 2704

Residuals (X component)

500 1000 1500 2000 2500

−0.5

−0.4

−0.3

−0.2

−0.1 0 0.1 0.2 0.3 0.4 0.5

update steps, n = 1, ..., 2704

Residuals (Y component)

Figure 5.21: ESN: Residuals for the x (the upper plot) and y (the lower plot) components of naturalness for writer wr = 1.

Figure 5.22: ESN: Test trial for writer wr = 1.

Figure 5.23: Modeling trial for writer wr= 1 in the ESN setup.

Figure 5.24: Production quality: modeling trial for writerwr = 1 in the ESN setup with smoothing being applied.

Figure 5.25: Font text transformed into a handwritten form by the proposed system. Top line plots font text. Bottom line plots generated handwriting.

The text means ”Happy New Year” in Japanese.

Table 5.8: ESN: Mean square errors and correlation coefficients for x and y components of naturalness.

modeling

wr mse_x,y C_x,y

#1 1.17×10⁻² 9.64×10⁻³ 57.39% 48.09%

#2 1.16×10⁻² 1.13×10⁻² 55.82% 40.11%

#3 2.31×10⁻² 2.14×10⁻² 58.87% 44.09%

#4 1.39×10⁻² 1.30×10⁻² 57.79% 51.60%

#5 1.19×10⁻² 1.39×10⁻² 57.82% 41.73%

Chapter 6 RNN with Recurrent Output Layer for Learning of

Naturalness

It is often the case that the input units of an RNN are connected to the output units directly, whereas connecting the output units to each other (what makes the output layer recurrent) is seldom used. The aim of this chapter is to explain why recurrent output layer (ROL) works well with the naturalness learning framework.

6.1 Dynamics of RNN With a Recurrent Out-put Layer

Adopting a standard perspective of system theory, we view a deterministic and discrete-time dynamical system as a function G, which yields the next system output, given an input and the output history:

y(n+ 1) =G(...,u(n),u(n+ 1);...y(n−1),y(n)) (6.1) where u(n) is the input vector and y(n) is the output vector for the time step n.

The echo-state approach enables us to approximate systems represented byGdirectly, without the necessity to convert a time series into static input patterns by the sliding window technique [17].

Let us consider a discrete-time ESN [18] consisting ofK input units with an activation vector u(n) = (u₁(n), ..., u_K(n))^t, N internal units with an activation vector x(n) = (x1(n), ..., xN(n))^t, and L output units with an ac-tivation vectory(n) = (y1(n), ..., yL(n))^t, where^t denotes the transpose. The corresponding input, internal and output connection weights are collected in the N ×K, N ×N, L×(K +N +L) weight matrices Wⁱⁿ, W, W^out, respectively. Optionally, a N ×L matrix W^back may be used to project the output units back to the internal units.

The internal units’ activation is computed according to

x(n+ 1) =f(Wⁱⁿu(n+ 1) +Wx(n) +W^backy(n)) (6.2) where f denotes the component-wise application of the transfer (activation) function to each internal unit. The output is computed as

y(n+ 1) =f^out(W^out(u(n+ 1),x(n+ 1),y(n)) (6.3) where (u(n+ 1),x(n+ 1),y(n)) represents the concatenated vector consisting of input, internal and output activation vectors. The concatenated vector often consists only of input and internal activations or internal activation only. Fig. 5.19 shows the architecture of an ESN. See [18] for further details concerning the training of ESN.

A closer look at Eq. (6.3) reveals that a system output y(n+ 1) is con-structed from the given input and output history via two distinct mecha-nisms: from the activation vectors of the internal units x(n) indirectly (by computing x(n+ 1) via Eq. (6.2)) and optionally from the activation vectors of the input units u(n+ 1) and/or output units y(n) directly.

The internal units’ activationx(n+ 1) is computed using the input and output activation u(n+ 1),y(n) and the activation x(n) of the internal units from the previous step, which recursively reflects the influence of input and output activations from previous steps. Eq. (6.2) can be therefore rewritten as

x(n+ 1) =E(...,u(n),u(n+ 1);...,y(n−1),y(n)) (6.4) where E depends on the history of input signal u and the history of de-sired output signal y itself, thus in each particular task, E shares certain properties with the desired output and/or given input. How strongly the internal activation x(n+ 1) is influenced by the activations u(n+ 1), y(n) and x(n) (which recursively consist of previous input/output activations) is controlled by the size of the weights in matrices Wⁱⁿ,W^back andW, respec-tively. Algebraic properties of the matrix W are particularly important for the short-term memory property of an ESN [36].

Besides using the activations of internal units, sometimes it is advanta-geous to use also the activations of input and output units directly. Although the activation vector x(n) reflects the history of the desired output and/or given input, the activation vectorsu(n+1),y(n) in Eq. 6.3 are used merely as another form of input. This usage corresponds to connecting the input units to the output units and output units to output units themselves directly.

Direct connection of input units to output units is often used, whereas direct connection of output units to output units is rare. It is the connecting of output units to each other what makes the output layer recurrent.

Recurrent output layer (ROL) allows for a substantial influence¹ of the previously generated output y(n) on the successive output y(n+ 1). The activation y(n) is only an approximation of a system output at the step n

1How big is the influence depends on absolute sizes of ROL weights. Even small weights, however, might render a network unstable.

and, thus, it is always generated with a certain error. This error is included in the computation of the successive output activation y(n+ 1) and can easily accumulate with each update step. It is for this reason that computation using ROL has been rare.

6.2 RNN With a Recurrent Output Layer for

ドキュメント内 Naturalness learning and its application to thesynthesis of handwritten characters (ページ 71-80)