• 検索結果がありません。

We evaluated the generalization of three-input IDS models in terms of classification. The two-spiral problem [20] is a well-known classification benchmark for supervised learning algorithms. In particular, FNNs have often used this problem to demonstrate the performance of new architectures and algorithms. The test involves the classification of two data sets that comprise two spirals. Typically, each spiral comprises 97 data points;

194 data points are plotted on the plane. Points in the spiral are located according to the following condition:

r=p(θ+ 2πn) +ro, (3.7)

wherer andθare the radius and angle, respectively. pandro are the parameters that determine the size of the spiral, andnis an integer that represents the number of revolutions. p = 1/πandro = 0.5are used as standard values.

Generally, single-hidden-layer networks based on standard BPL cannot perform stable perfect classifica-tions in the two-spiral problem. Simple gradient descent methods involve numerous training iteraclassifica-tions, which require a considerable amount of computational time. Only FNNs with refined architectures or algorithms can achieve 100% classifications with fast convergence. By using cascade-correlation learning proposed by Fahlman and Lebiere [21], it was possible to solve the two-spiral problem with fewer training iterations and a smaller network size. Hwanget al. [22] indicated the drawback of cascade-correlation learning that it is

22 CHAPTER 3. MODELING ABILITY OF THE IDS METHOD

-6 -4 -2 0 2 4 6

-6 -4 -2 0 2 4 6

x2

x1

Figure 3.6: Spiral data separated into 13 partitions for each input.

difficult to achieve a high accuracy in regression modeling tasks. Their projection pursuit learning network ex-hibited a good performance in both regression modeling and classification tasks. Jia and Chua [23] described the effect of input data encoding in which binary, weighted binary, gray, and temperature encoding schemes were tested. These encoding schemes can rapidly increase the number of input units. It is considered that the two-spiral problem can be solved by using many input units. ´Alvarez [24] proposed a knowledge-based neural network in which the radius and angle were used as inputs. This takes advantage of the geometric nature of input patterns; the use of polar coordinates is very effective in the classification of two spirals. Singh [25]

used spiral data expanded into a three-dimensional coordinate space. In this case, the position of the spirals was determined byx,y, andzcoordinates; thezcoordinate was equal to the mean of thexandycoordinates.

Singh used eight features including thex,y, andzcoordinates as the input vector for the FNN.

As described above, many researchers have attempted to determine an efficient solution to solve the two-spiral problem in FNNs by employing input encoding schemes or using preprocessing input data. It is evident that effective input data encoding improves the performance of FNNs. In other words, a supervised learning algorithm that can easily solve complex classification problems without adopting such techniques can serve as an excellent algorithm. Therefore, we applied the IDS method based on the standard use to the two-spiral benchmark, in the same manner as that of the previous benchmarks, without geometrically preprocessing the spiral data.

In the IDS modeling, each input domain was divided at regular intervals. The resolution of the x-y plane was 256×256 and the ink drop pattern size was set to 9. In intricate classification tasks, it is preferable to reduce the size of the ink drop pattern in terms of the classification rate and learning speed because the interpolation between the data points is not necessarily required. The IDS method cannot solve the two-spiral problem if the number of partitions of the input domain is insufficient. First, we examined the minimum

3.5. CLASSIFICATION 23

-6 -4 -2 0 2 4 6

-6 -4 -2 0 2 4 6

x2

x1

Figure 3.7: Output of a 13-partition IDS model constructed using a two-spiral data set. (100% classification achieved)

number of partitions that could be used to achieve the perfect classification of the spiral data and found that the 13-partition IDS model performed 100% classification. Figure 3.6 shows the spiral data points separated into 13 partitions. In this classification task, the data point was classified as a white spiral when the model output at some data point is greater than 0.6. When the model output was less than 0.4, the data point is classified as a black spiral. The remaining range, 0.4 to 0.6, is defined as an invalid output. This rule has been frequently applied to the binary output of FNNs. Figure 3.7 shows the output of the 13-partition IDS model for 64×64 input points closely set at regular intervals over the data plane. The cascade-correlation network is one of FNNs that can stably achieve perfect classifications in the two-spiral benchmark. We obtained the source code of the cascade-correlation learning algorithm from the CMU AI repository [26], and applied the algorithm to the two-spiral benchmark. Figure 3.8 shows the output of the cascade-correlation network. It can be observed that the IDS model represents the two spirals more clearly.

Singh [25] used spiral data that was expanded into a three-dimensional coordinate space as a new two-spiral benchmark. Figure 3.9 represents the two-spiral data points in three-dimensional coordinates. We applied the IDS method to this three-dimensional two-spiral benchmark. First, by using the constructive algorithm, we obtained a minimum structure that achieved 100% classification of the spiral data. Thus, we used the stop condition (3.3) and set emin(n) = 0. Mmin andMmax were set to 2 and 15, respectively. The minimum structure obtained from the search wasm1= 5,m2 = 15, andm3 = 3.

Next, we examined the generalization of the IDS model obtained by the constructive algorithm. For the evaluation of the generalization in the three-dimensional two-spiral problem, we used the same approach as that used by Singh [25]. Eight test sets were generated by introducing an offset in the training data: 1) (xd+δ, yd+δ,zd+δ), 2) (xd+δ,yd+δ,zd−δ), 3) (xd+δ,yd−δ,zd+δ), 4) (xd+δ,yd−δ,zd−δ), 5) (xd−δ,

24 CHAPTER 3. MODELING ABILITY OF THE IDS METHOD

-6 -4 -2 0 2 4 6

-6 -4 -2 0 2 4 6

x2

x1

Figure 3.8: Output of a cascade-correlation network. (100% classification achieved)

-6 -4 -2

x

0 2 4 6 -6 -4-2

0 2 4 6 -6

y

-4 -2 0 2 4 6

z

Figure 3.9: The two spirals in three-dimensional coordinates.

3.5. CLASSIFICATION 25

Table 3.3: Classification Rate (%) for Each Test Set δ= 0.1 δ = 0.2 δ= 0.3

1 99.0 90.6 80.2

2 98.5 91.7 83.3

3 97.4 92.7 80.2

4 97.9 94.8 85.9

5 96.9 92.2 84.9

6 97.4 89.6 78.1

7 99.0 94.8 87.0

8 100 96.9 88.0

ave. 98.3 92.9 83.5

yd+δ,zd+δ), 6) (xd−δ, yd+δ,zd−δ), 7) (xd−δ,yd−δ, zd+δ), and 8) (xd−δ, yd−δ, zd−δ), where (xd,yd,zd),xd∈X1,yd∈X2,zd∈X3, is the spiral data point, andδis the offset to be added. In this experiment, the input domain was defined as[6.5,6.5]3. When a small offset is introduced in the original spiral data points, one or two test points in each test set are located outside the input domain. We excluded such points from the calculations of the classification rate. Table 3.3 lists the classification rates for each test set. On the other hand, the average rates for the test sets with offsets0.1, 0.2, and0.3in Singh’s FNN [25]

were 96.5%, 91.0%, and 80.5%, respectively. For this benchmark, the generalization performance of the IDS model is sufficiently high when compared with that of Singh’s FNNs.

26 CHAPTER 3. MODELING ABILITY OF THE IDS METHOD

Chapter 4

Performance Evaluation

4.1 Introduction

This chapter deals with the performance of the IDS method as a soft computing tool. In the experiments, we use Hwang’s benchmark [17], described in Section 4.1.1. We compare the noise tolerance, fault tolerance, and real-time capabilities of IDS models with those of MLPs, radial basis function networks (RBFNs), and adap-tive neuro-fuzzy inference systems (ANFISs). The MLP is the most popular model of neural networks. The RBFN is a variant of the ANN and is known to have superior fault tolerance and fast convergence [27]. The ANFIS is characterized by its hybrid learning: the parameters of the premise part and those of the consequent part in fuzzy inference are adjusted by the gradient descent method and the least squares method, respectively [28]. The MLP, RBFN, and ANFIS used in the experiments are described in Sections 4.1.2, 4.1.3, and 4.1.4, respectively.

4.1.1 Hwang’s Benchmark

Hwang’s five-function set is used for function approximation in several studies on ANNs [19][29]–[33]. Some of the studies that deal with this benchmark evaluate the generalization performance using both noiseless and noisy training data, and their benchmark results are based on the same conditions in the number of training examples, the distribution of test data, and SNR. Hwang’s five-function set comprises the following non-linear functionsgi : [0,1]2 → <.

Simple Interaction Function:

g1(x1, x2) = 10.391 ((x10.4)(x20.6) + 0.36).

Radial Function:

g2(x1, x2) = 24.234 (r2(0.75−r2)) r2 = (x10.5)2+ (x20.5)2.

Harmonic Function:

g3(x1, x2) = 42.659 (0.1 + ˜x1(0.05 + ˜x4110˜x21x˜22+ 5˜x42)) wherex˜1 =x10.5andx˜2 =x20.5.

27

28 CHAPTER 4. PERFORMANCE EVALUATION

Additive Function:

g4(x1, x2) = 1.3356(1.5(1−x1) +e2x11sin(3π(x10.6)2) +e3(x20.5)sin(4π(x20.9)2)).

Complicated Interaction Function:

g5(x1, x2) = 1.9(1.35 +ex1sin(13(x10.6)2)ex2sin(7x2)).

Figure 4.1 graphically represents these functions. For the test conditions of this benchmark, 225 randomly generated examples are used as the training set. Let(xl, yl)be the lth training example. The noisy training data are generated as follows:

yl=gi(xl) + 0.25²l, i= 1,2,· · ·,5 (4.1) where²l ≈N(0,1)is the zero-mean unit-variance Gaussian noise. The test set comprises 10000 data points uniformly distributed over the input domain, as shown in (2.13). In this benchmark, FVU (2.12) is employed as an error measure. The model accuracy is shown by the FVU calculated from the 10000 data points.

In order to generate training sets based on random numbers, we used the drand48 function implemented as a pseudo-random number generator in the FreeBSD 6.1 operating system. Figure 4.2 shows the graphical representation of the noisy data whose points were uniformly plotted.

4.1.2 Multilayer Perceptron

Single-output MLPs are described as follows:

M LP(x) =

H i=1

uif(

I j=1

vijxj−vi0)−u0 (4.2)

whereHandI are the number of hidden units and input units, respectively; u0, the bias of the output unit;

ui, the interconnection weight between the output unit and theith hidden unit;vi0, the bias of theith hidden unit;vij, the interconnection weight between theith hidden unit and thejth input unit; andf is a sigmoidal activation function. The weights and biases are initialized with small random values and trained using the standard backpropagation learning [16].

4.1.3 Radial Basis Function Network

Single-output RBFNs with Gaussian activation functions are described as follows:

RBF N(x) =w0+

H i=1

wiexp(−kxcik2

r2 ) (4.3)

whereci is the center of the RBF of theith hidden unit;r, the radius of the RBF; andwi, the interconnection weight between the output unit and theith hidden unit.k · kdenotes the Euclidean norm.

For the setup conditions and training procedure of the RBFN, we followed the procedure given in [27] as shown below. The centers of the RBFs are uniformly distributed over the input domain, and their radii are fixed at the same value. These parameters are not updated during training. The weights are initialized to zero and trained using the gradient descent method.

4.1. INTRODUCTION 29

0.2 0 0.6 0.4

1 0.8

x2 0 0.20.4 0.60.81

x1

0 1 2 3 4 5 6 7

y

0.2 0 0.6 0.4

1 0.8

x2 0 0.2 0.40.60.8 1

x1

0 1 2 3 4 5

y

g1: Simple interaction g2: Radial

0.2 0 0.6 0.4

1 0.8

x2 0 0.20.4 0.60.81

x1

01 23 45 67 8 9

y

0.2 0 0.6 0.4

1 0.8

x2 0 0.2 0.40.60.8 1

x1

0 1 2 3 4 5 6

y

g3: Harmonic g4: Additive

0.2 0 0.6 0.4

1 0.8

x2 0 0.20.4 0.60.8 1

x1

0 1 2 3 4 5 6 7

y

g5: Complicated interaction Figure 4.1: Graphs of five functions.

0.2 0 0.6 0.4

1 0.8

x2 0 0.20.4 0.60.81

x1

0 1 2 3 4 5 6 7

y

0.2 0 0.6 0.4

1 0.8

x2 0 0.2 0.40.60.8 1

x1

0 1 2 3 4 5 6

y

(a) g1 (b) g4

Figure 4.2: Examples of noisy data. (30×30 data points distributed at regular intervals over the input domain)

30 CHAPTER 4. PERFORMANCE EVALUATION

A11

12 21 22

A A A

x x1 2 x1

x2

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5

Premise parameters Consequent parameters Figure 4.3: Structure of an ANFIS.

4.1.4 Adaptive Neuro-Fuzzy Inference System

ANFISs use fuzzy rules of the first-order Sugeno type [34]. Thekth rule is shown as follows:

Ifx1isAk1 and · · · andxN isAkN, theny=pk0+pk1x1+· · · +pkNxN. (4.4) Aij denotes thejth fuzzy set for theith input variable. The membership function ofAij uses the following formula:

µAij(x) = 1 1 +(

(xacij

ij )2)bij . (4.5)

The parametersaij,bij, andcij and the parameterspk0,pk1,· · ·,pkN are referred to as the premise and conse-quent parameters, respectively. The learning of ANFISs varies these parameters. The ANFIS has a neural-like network structure and employs a hybrid learning algorithm. Figure 4.3 shows the structure of a two-input ANFIS. In the forward pass, the premise parameters are fixed and the consequent parameters are updated after a batch of training examples is processed by the least squares method. In the backward pass, the consequent parameters are fixed and the premise parameters are updated after a batch of training examples is processed by the gradient descent method. For the learning of ANFISs, the batch mode and incremental mode are used.

The abovementioned procedure corresponds to the batch mode. In the incremental mode, the switch between the update of premise parameters and that of consequent parameters is performed for every training example.

関連したドキュメント