• 検索結果がありません。

An efficient learning scheme for pattern classification using multi-layer feed-forward neural networks

N/A
N/A
Protected

Academic year: 2021

シェア "An efficient learning scheme for pattern classification using multi-layer feed-forward neural networks"

Copied!
8
0
0

読み込み中.... (全文を見る)

全文

(1)

An Ecient Learning Scheme for Pattern Classi cation

Us-ing Multi-layer Feed-forward Neural Networks

33

KanadKeeni y

,KenjiNakayama yy

,and HiroshiShimo daira yyy

,

SUMMARY Thisstudyhighlightsonthesubjectofweight initializationinmulti-layerfeed-forwardnetworks.Trainingdata isanalyzedandthenotionofcriticalpointisintroducedfor de-terminingtheinitialweightsandthenumberofhiddenunits. Theprop osedmetho dhasb eenappliedtoarti cialdata, pub-liclyavailablecancerdatabaseandDevanagaricharacters. The exp erimental resultsofarti cialdata showthatthe prop osed metho dtakesalmost1/3ofthetrainingtimerequiredfor stan-dardbackpropagation.Inordertoverifythee ectivenessofthe prop osedmetho d,standardbackpropagationwherethelearning startswithrandominitialweightshasb eenappliedtothecancer databaseandDevanagaricharacters. Theexp erimentalresults indicatethetheprop osedweightinitializationmetho dresultsin b ettergeneralization.

key words: Neural networks, pattern classi cation, initial weights,decisionboundary,criticalpoints

1. Intro duction

Neuralnetworksarchitectureshavesparkedofgreat in-terestinrecentyearsb ecauseoftheirintriguing learn-ingcapabilities.Severallearningalgorithmshaveb een developed for trainingthe networksand out of them Back Propagation [1] is probably most widely used. The reason for thepopularity isthe underlying sim-plicityandrelativep owerofthealgorithm. Itsp ower derivesfromthefactthatunlikeitsprecursors,thep er-ceptionlearningrule[2],andtheWidrow-Ho learning rule[3],itcanb eemployedfortrainingnonlinear net-works of arbitrary connectivity. Since suchnetworks are often required for real-worldapplications, such a learning pro cedure iscritical. However thestandard back-propagationhasthefollowingdrawbacks.

1. Thelearningpro cedureistimeconsuming. Train-ingalwaysstartsfromrandominitialweightsand theweightadjustmentpro cedureisveryslow

2. Itisnotclearastohowtoconstructanappropriate networkarchitecture.Thenumberofhiddenunits

y

TheauthoriswiththeFacultyofInformationSystems andQuantitativeSciencesofNanzanUniversity

yy

TheauthoriswiththeFacultyElec. Eng.andComp. ScienceofKanazawaUniversity

yyy

TheauthoriswiththeFacultyofInformationScience ofJapanAdvancedInstituteofScienceandTechnology

33

Apartofthispap erwaspresentedattheInternational JointConferenceonNeuralNetworks(IJCNN'99)and In-ternational ConferenceonComputationalIntelligenceand

for a particular problem can onlyb edetermined bytrialanderror

In caseof 1., it iswidely known that theinitial weightsofthenetworkwouldlargelye ectthe general-izationp erformance. Forexample,twonetworkswith samearchitecture,whentrainedwithtotally di erent initialweightswouldpro ducedi erentresults. Several researchershavedesignedsystemsinwhichweightsare initializedsothattheinitialactivityofthenetwork cor-resp ondstothesuccessiverules,thatmaycomefroman exp ert. Wilsonhasprop osedFastBPN[4],wherethe initialparametersaredeterminedbyestimatingthe sig-nalrankwithgenerallikelihoo dratiotest(GLRT)and thesingularvaluedecomp osition(SVD)oftheGLRT covariancematrix. Howeverthedisadvantageof their metho disthatthenumberofhiddenunitscannot ex-ceedtheinputfeaturedimension. Inordertotackle1, mostoftheresearchershavemainlyfo cusedon improv-ingtheoptimizationpro cedurebydynamically adapt-ingthelearningrates[5]-[6]. Ontheotherhandithas b eenshownin [7]that trainingdataselection isalso imp ortant.

Incaseof2.,theproblemhasb eentreatedin var-ious ways. One mostcommon approachhas b eento start with a large number of hidden units and then prunethenetworkonceitistrained[8],[9]. However, pruningdo es not alwaysimprovegeneralization. An-other strategyfor nding a minimal architecture has b eentoaddorremoveunitssequentially[10],[11].

Itisalsowellknownthatneuralnetworksdonot make anyassumption ab out theprobability distribu-tion functions of data and can solve complex prob-lems witharbitrary decisionb oundary. Therefore, it isdesirabletoinvestigatethelearningcharacteristicsof thenetworksfor ndinganestimateaboutthedecision b oundary. However,there hasb een nosigni cant re-searchfor ndinginitialweightsand minimalnetwork architecturebyexploitingthecharacteristicsofdecision b oundary.

Inthepresentstudy,theab ovementioned draw-backsof backpropagation has b een carefully investi-gatedandametho dhasb eenprop osedfordetermining theinitialweightsforinputtohiddenlayerand estima-tionofhiddenunitsautomatically.

(2)

Multi-layerNeuralNetworks(MLNN). Thethird sec-tionpresentstheautomaticmetho dforgenerating ini-tialweightsforinputtohiddenlayerconnections.The fourth section describ es the database. Exp erimental resultsofarti cial,realworlddataareprovidedinthe Fifthsection.Finallythelastsectionisdevotedto con-clusionandfurtherresearches.

2. Pattern classi cation characteristics of MLNN

Inanypatternclassi cationsystem, patternmapping orpatternclassi cationisequivalenttodividinganN dimensionalspacewhere thepatterns aredistributed. Incaseofmulti-layerneuralnetworks(MLNN),thisN dimensional spaceisdividedbyforminghyper-planes with the help of synaptic weights of nonlinear neu-rons. MLNNsdonotmakeanyassumptionab outthe probabilitydistribution functionsof thetrainingdata and can solvecomplex problems witharbitrary deci-sion b oundary. The degreeof freedom inplacingthe decisionb oundaryisveryhigh. Therefore,neural net-works areconsidered to b ea go o dchoicefor pattern classi cationtasks.

Anothermostimp ortantasp ectofneuralnetworks islearn-ability.Incaseofsup ervisedlearning,networks can nd optimal synaptic weights through learning. However,sinceneuralnetworksarenonlinearsystems andgradientdescentisusedto ndasetofweightsthat optimizethep erformanceforaparticulartask,thereis alwaysa possibilityof getting stuckinlo cal minima. Therefore,globalminimaoroptimalsolutionisnot al-waysguaranteed. Furthermore,thelearningprocessis timeconsuminganditishighlydep endentonthe prob-lemthatistob esolved.

Ifweassumethattherearenooverlapamongthe distributionoftrainingpatterns,thenpatternmapping canb ecategorizedinthefollowingclasses.

1. kXi0Xjkissmall V kY i 0Yjkissmall 2. kX i 0X j kissmall V kY i 0Y j kislarge 3. kX i 0X j kislarge V kY i 0Y j kissmall 4. kX i 0X j kislarge V kY i 0Y j kislarge Here,X i and X j b elong toclass! 1 and! 2 ,Y i andY j

arethecorresp ondingoutputvectors,andk1k standsfortheEuclideannorm. Incaseof1.,the prob-lemistomapsimilarinputvectorsinawaysuchthat thecorresp ondingoutputvectorsalsob ecomesimilar. In thesecond case, theinput vectorsare similarbut theyaretob emapp edasdi erentpatternsinthe out-putspace. Thethirdcaseimpliesthattheinput pat-ternsthatarefarfromeachotherintheinputspaceare tob emapp edassimilarpatternsintheoutputspace. Finally,the4thcasemeansthattheinputpatternsare

farfromeachotherintheinputspaceandtheyareto b e mapp edas di erentpatterns intheoutput space. Now,thepatternmappingof1.,3.,and4. arenotthat dicult.However,incaseof2.,theproblemistomap thepatternsthatareverycloseintheinputspace,as di erentpatternsintheoutputspace.Inthiscaseeven thoughthesolutionexists,duetothedicultyofthe problemthetrainingprocesswouldb etimeconsuming. Therefore,thesecondtypeofpatternmappingresults inveryslowlearningandthep ossibilityofarrivingat alocalsolutionisveryhigh.

For example, if we de ne the connection weight fromthei'thinputtothej'thhiddenunitasw

ij then thetotalinputandoutputofthej'thhiddenunitcan b ede nedasfollows. net j = n X i=1 w ij x i + j (1) O j =(net j ) (2) (netj)= 1 1+e 0netj (3)

where, (1)istheactivationfunctionand j

isthe bias. Atthesametimethetotalinputtothek'th out-putunitandthecorresp ondingoutputcanb ede ned asfollows. net k = j X j =1 w jk O i + k (4) Ok=(netk) (5)

where,  (1) isthe sameactivation functionas it was withthehiddenlayer.

As mentionedearlierthe similarpatterns playa criticalroleinlearning.Supp osewehavetraining pat-ternsx

1n andx

2n

thatareverycloseintheinputspace andthepatternsb elongtotheclass!

1 and!

2 resp ec-tively. Inthiscase,thenetworkoutputwouldb ecome extremelysensitive.Thisisb ecausethenetworkoutput mustchangerapidlyforasmallchangeintheinput.

Now,ifthedecisionb oundaryisfarfromthe pat-terns x

1n and x

2n

, then the corresp onding outputs wouldhavethevalueO

1n  = O 2n  = 0 or1. However, duringthelearning pro cess,as thedecisionb oundary approachesx

1n andx

2n

theoutputofthecorresp ond-ingpatternsapproachthesamevalue,andthelearning pro cess b ecomesextremely slow. Inthiscase, as the decisionb oundary moves closeto the pair x

1n ,x

2n or entersthe regionb etween thepair, theamountof weightcorrectionb ecomesextremelysmall.Tob e spe-ci c,ifweassumeO  = O  =

(3)

amountofcorrectionforthen'thpattern1 n wouldb e asfollows. 1 n =  n O nj ; (6)  n =(t n 0O n )  f(net n ); (7) where,O nj

istheoutputofthej'thhiddenunit. Now, asthepatternsx

1n andx

2n

aresimilar,theoutputof thejthhiddenunitwouldalsob ecomesimilar,thatis

O 1nj  =O 2nj ; (8) and  f(net 1n )  =  f(net 2n ): (9)

Inthiscase,theweightcorrectionwillb easfollows.

11n+12 n = O1nj((t1n0O1n) +(t 2n 0O 2n ))  f(net 1n ): (10)

Ifitisassumedthatthetargetsofthepatternsare

t 1n

=1;t 2n

=0; (11)

andtheoutputofthepatternsare

O1n=z ;O2 n

=z0; (12)

thentheweightcorrectionwouldb ecomeasfollows.

1 1n +1 2n = O 1nj ((t 1n 0z) +(t 2n 0(z0)))  f(net 1n ) =((t 1n +t 2n )02z+)  f(net 1n ) =(102z+)  f(net 1n ) (13)

Nowatb eginningoftraining,thedecisionb ound-ary wouldb e far fromx

1n and x

2n

and inthat case thecorrectionofsynapticweightswouldnotb esmall. However, duringthe trainingpro cess,as thedecision b oundarymovestowardsx

1n andx

2n

,becauseofthe similarityof the patternstheoutput wouldapproach thesamevalue. Themostcriticalsituationwouldtake placeaszandkkapproachthevalue0.5and0resp ec-tively.Thatis,

11n+12 n = lim z !0:5 lim !0 (102z+)  f(net 1n )  = 0 (14)

Therefore,thecorrectionofweightsforthese pat-ternswouldb ecomeverysmallandasaresultthe learn-ingpro cesswouldb ecomeextremelyslow.

Ontheotherhand,ifthepatternsx 1n

andx 2n

are farfromeachotherintheinputspace,evenifthe de-cisionb oundarymovestowardsthemtheactivationof thecorresp ondingoutputswouldnotbecomethesame atthesametime. Hence,theweightcorrectionwillnot

Critical point

Decision boundary

Class2

Fig.1 Criticalp ointsanddecisionb oundary

3. Estimationofdecisionb oundary

Here,thedecisionruleistoselecttheclass corre-sp ondingtotheoutputneuronwiththelargestoutput. Forthesakeofsimplicity,thenumberofoutputunitis settotwo(two-classclassi cationproblem). However, theconcept can b e hop efully extended tomulti-class classi cation problems. The decisionb oundary for a multi-layerfeed-forwardnetworkisde nedasfollows.

De nition1. Thedecisionb oundarybetweentwo classesinafeed-forwardneuralnetworksisthelo cusof p ointswhere b othoftheoutputneuronsproduce the sameactivation.

If wede netheactivationoutputunitiasO i

(x) where x isan input vectorand let d(x) = O

1 (x)0 O

2

(x),thenthedecisionb oundarycanb ede nedas

fxjd(x)=0g (15)

Next,thenotionofCriticalpointsisintroducedas follows.

De nition2. Thesetofcriticalp ointscontainspairsof datathatsatisfythefollowingcondition:

min k (d(p i ;q k ))=d(p i ;q j ); (16) min k (d(p k ;q j ))=d(p i ;q j ) (17) whered(p i ;q k

)denotestheEuclideandistanceb etween thevectorp

i andq

k .

Ifwedenotethesamplesinclass! 1 asp i and sam-plesinclass!2 asq j

thenforeachsampleinclass!1 andclass!2,thesetofcriticalp ointsCcanb ede ned as C=f(p i ;q j )jmin k (d(p i ;q k ))=d(p i ;q j ); min k (d(p k ;q j ))=d(p i ;q j );p i 2! 1 ;q j 2! 2 g (18)

Ahyper-planehastobeplacedinb etweenthepair ofcriticalp oints. Iftheco ordinateofthepairof criti-calp oints(p i ;q k )are(x i ;y k )and(u i ;v k

)thentheideal hyp erplanemustgothroughthepoint

(xi+ui) 2

; (yk+vk)

2 and theslop ez ofthe straightlinecanb e calculated fromthefollowingequation.

v k 0y k u i 0x i 2z=01 (19)

In thepresent approachinsteadof starting from scratchtheinitialweightsforthehiddenunits

(4)

connec-Since,theweightvectorsareorthogonaltothe sep-aratinghyp er-plane,theinitialweightsaregeneratedin thefollowingway. First,thepairofcriticalp ointsare determinedfromthetrainingdataasmentionedab ove. Next forall pair of criticalpoints(p

i ;q k )theweight vectorsm n

aregeneratedbythefollowingequation:

m n = p i 0q k kp i 0q i k (20)

andthebiases n

aregeneratedbythefollowing equa-tion:  n =0m t n 1P=0 (p i 0q k ) t kp i 0q k k 1 (p i +q k ) 2 (21) 4. Databse

ga

u

a

0

kha

gha ca cha pa pha ba

bha

ta

da

dha na ta

tha da dha na

ya

ra

la

va

sa

sa sa

ha

kta

pta

ru

tra sra sra sta dhra

nja

tta

i

u

r

e

,

h

end

1

2

3

4

5

6

7

8

9

dva dbha

ddha

tta

pra

dba

ka

ja

ma

ksa

stha

a

0

Fig.2 SomeofthebasicDevanagaricharacters

Theprop osedmetho dhasb eenappliedtothe can-cerdatabaseobtainedfromtheUniversityofCalifornia Machine Learning Rep ository (two-class classi cation problem) andDevanagaricharacters(multi-class clas-si cation problem). The cancer database is publicly availableand it contains699 instanceseachhaving 9 attributes. TheDevanagaridatabaseconsistsof44 dif-ferentcharacters(23consonants,13c-ccombinations, 3 vowels,2 sp ecialcharactersand3 numerals). Sixty seven Devanagari characterstakenfromthetext [12] are showninFigure. 2. Thecharactersarestoredin the formof162 16 oatingp ointimages. Adetails ab outtheDevanagaridatabsecanb efoundin[13].

5. Exp eriments

5.1 Exp erimentswitharti cialdata

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Decision Boundary

"class1.output"

"class2.output"

"class1.orgdat"

"class2.orgdat"

"01.dat"

"10.dat"

Fig.3 Decisionb oundarygivenbytheprop osedmetho d

Inthepresentapproachthenumberofhiddenunits iskeptthesameas thenumb er ofcriticalp oints cal-culatedfromthetrainingdata. Twodimensionaldata isusedfortrainingandtesting. Fivesamplesforeach class(inthiscase2classes)wererandomlygenerated for training. The network had two input units, two outputunitsandthenumberofhiddenunitwassetto 3. Trainingwascontinueduntilthemeansquareerror reach0.001. Fortesting,10000sampleswererandomly generated and the class to which the testing sample fallsisdecidedbyconsideringthemaximumactivation oftheoutputunits. Thenetworkcouldlearnthe train-ingdatawith3hiddenunits.Thedecisionboundaryis estimatedfromtheoutputactivationofthenetworkin resp ecttothetestingsamplesasfollows.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Decision Boundary

"class1.output"

"class2.output"

"class1.orgdat"

"class2.orgdat"

"01.dat"

"10.dat"

Fig.4 Decisionb oundarygivenbytheConventional Back-propagation

Foreachtestingsamplecorrectlyclassi edasclass !

1

, ndthenearesttestingsamplecorrectlyclassi edas class!

2

. Thesamepro cessisrep eatedforthetesting samples classi edas class!

2

. Nowthe line connect-ingthepairsmentionedab ovemustpassthroughthe decisionb oundary sincethepair of samplescorrectly classi eddi erently. The decisionboundarygivenby thenetworkisshowninFigure 3,whereeachofthe

(5)

cal-In orderto evaluatethe e ectivenessof theprop osed metho d another set of exp eriments were p erformed byemployingtheconventionalback-propagation algo-rithmandthedecisionb oundaryisshowninFigure 4. Next,thenetworkwastrainedwith vedi erentinitial weights(weightsforhiddentooutputunitconnection) ,andtheresultissummarizedinTable 1.

Initweight Iteration Iteration Prop osedmetho d StandardBP

Seed1 6345 21279 Seed2 6341 21347 Seed3 6251 21263 Seed4 6179 21050 Seed5 6224 22019 Mean 6268 21392

Table1 Comparisonofresults

Itcanb eseeninTable 1,thattheiterations neces-saryfortheprop osedmetho dislessthan1/3ofthatof standardback-propagation. Incaseof standard back-propagation,thereisnootherwaythantocutandtry for determiningthenumberofhiddenunitsnecessary forsolvingaproblem. Incaseoftheprop osedmetho d, thenumberofhiddenunitisdeterminedautomatically.

5.2 Two-classclassi cationproblem

Exp erimentswerep erformed byemployingthecancer data base. It wasdivided intoten training and ten testsetsandatenfoldcrossvalidationwasp erformed. In orderto evaluatethe e ectivenessof theprop osed metho d another setof exp eriment wasp erformed by applying the standard back-propagation. The num-b er of hidden unit wasset to thenumber of critical p ointsgivenbytheprop osedmetho d. Theaverage ac-curacy rate of the prop osed metho d, standard back-propagationandtheresultsofapplyingBayesdecision (assuming normaldistributionforeachcategory)rule isshowninTable 2. Ontheotherhand,thefollowing

Metho d Accuracy(%) Prop osed 96.8 StandardBP 96.3

Bayes 95.27

Table2 Averageaccuracyrate

linear programming methods for pattern recognition: MultiSurfaceMetho d[18](MSM),RobustLinear Pro-gramming [19] (RLP), and Perturb ed Robust Linear Programming [20] (RLP-P) were also applied on the samedatasetandtheresultsaresummarizedinTable 3.

ItisclearfromTable 3thatoutofthethree meth-o dsRLPPgivestheb estaccuracy of96.6%. Nowif compared to Table. 2, it is clear that theprop osed metho dgivesslightlyb etterresultthanRLPP.Onthe otherhand,thesup eriorityoftheprop osedmethodover

MSM 92.6

RLP 96.4

RLP-P 96.6

Table3 Averageaccuracyrateoflinearprogramming meth-o ds

Bayesdecisionrulesuggeststhattheboundarygivenby theproposedmethodisreliable.

5.3 Multi-classClassi cationproblem

Here the proposed metho d is applied to Devanagari charactersforevaluatingitsapplicabilitytomulti-class classi cationproblem. However, a di erentapproach hasb eentakenforrepresentingtheoutputlayer. The outputlayerco dingmetho disthoroughlydiscussedin [12]. However,fortheb ene tofthereadersabrief de-scriptionofthemethodisgiveninthenextsubsection.

5.3.1 Automaticco dingscheme

Split the 16 X 16 image

into four parts

Cluster each part

5 bit binary number

Feature vector

y

y

y

y

q

q

q

q

C

(n)

Map each element of q to a

l

l

l

l

l

Binary number

encoder

1

2

3

4

1

2

3

4

n

n

n

n

(20−bit)

Fig.5 Automaticco dingpro cedure

One ofthe imp ortantasp ect of information pro-cessingisthewaytheinformationisrepresented. The conventionalapproachof assigningacategorytoeach cellisreasonableinthesensethat,thedistanceamong theco desisconstant. However,thiskindofapproach cannotbetakenforrepresentingalargenumb erof cat-egories. Althoughbinaryrepresentationapp earstobe reasonableinthesensethatitwouldrequirelog

2 nbits forrepresentingncategories; theyarenot wellsuited fornetworkoutputrepresentationsb ecauseofthe inter-dep endencyofcells whichmakethelearning dicult. Forexample,ifwerepresentthenumb ers7and8in bi-naryas0111and1000thenirresp ectiveofthefact thatthenumb ersareadjacentintegers,4outputunits willhavedi erentvalue.Here,insteadofastraight for-ward binarycodingmetho d, a di erentapproachhas b eentakenandthemethodisasfollows. At rsteach trainingdatax

n

(n=1;111;N)(16216pixels)is splitintofourparts(828pixels)inthefollowingway.

x n

(6)

Asaresultthefollowingsetofpatternisgeneratedfor eachparti=1;111;4. Y i =fy 1i ;y 2i ;111;y Ni g Next, eachY i

(i =1, 2, 3,4) isclustered. Herethe LBG method [14] wasapplied for clustering and the numberofclusterwassetto 16. Afterclustering, for each !

l

(l = 1;111;L) the clusterto which the part b elongs ischeckedand the nalclusterisdetermined byconsideringthemaximumnumb erofcharactersb e-longing toaspeci ccluster. So,inthisway,for each category !

l

there will b e a four dimensional vector q l = (q l1 ;q l2 ;q l3 ;q l4

), where eachelementof the vec-tor will corresp ondtotheclustertowhichthe

corre-11011

11010

10111

10110

10011

10001

01111

01110

01101

01100

01011

01001 00111

00110 00011

00010

Fig.6 Co deassignmentpro cedure

sp ondingpartofthetrainingsamplebelongs.Now,each elementof vectorq

l

ismapp edtoa 5-bitbinary pat-tern. Thebinarypatternsarehandpickedandtheyare mapp edinawaysuchthatforanytwoclusterswhose parentisthesame,theEuclideandistanceb etweenthe binarypatternsb ecomesone.Thiswillsomehowensure thatthesimilarpart(q

l

)ofeachcategory! l

wouldget similarcodes. On theotherhand, neglectingthe bi-narypatternscontainingonly0'sor1's,theminimum numberof bitsrequiredtomaintainthisconstraintis 5. Thecodeassignmentpro cedureisshowninFigure 6. Finally,q

l

istransformedtoafeature vectorwhich isa20-bit co de. Inthiswayforeachtrainingsample x

n

(n=1;111;N)therewillb e afeature vector c (n) with 20 elements. (Here(n) representsthe category numberto which x

n

b elongs.) Theentire pro cessis summarizedinFigure 5.

5.3.2 Recognitionpro cess

In the present approach, training pro cess is di-videdintotwophases. Inthe rstphase,thenetwork is trainedwith thetraining data ina usual way. In thesecondphase,thetrainingdataisfedtothe net-workandtheoutputvectorsgivenbythenetworkare employedforformingtheprototypesforeachcategory.

Input(xn)(n=1;:::;N) ? NN ? Output t t R o=(o 1 ;o 2 ;:::;o K ) (K=dimensionofo) Testmo de Learnmo de t 9 m1= 1 N 1 X x2! 1 o m2= 1 N 2 X x2!2 o . . . m M = 1 NM X x2! M o d 1 = K X k=1 jo k 0m 1k j 2 d2= K X k=1 jok0m2kj 2 . . . dM= K X k =1 jok0mMkj 2

Fig.7 Recognitionpro cess dM

3

=mindM=)M3:recognizedcategory

m 1 = 1 N 1 X x2!1 o m 2 = 1 N 2 X x2!2 o . . . m M = 1 N M X x2!M o

Here N and o stand for the number of samples usedforeachcategoryduringtrainingandtheoutput vectorresp ectively.Theseprototypesarelaterusedfor recognition.

During evaluation, Euclidean distance has b een takenamongtheprototypesformedinthesecondphase oftrainingandtheoutputvectorgivenbythenetwork duringtest. Thewholepro cess,fromformationof pro-totyp evectorsto evaluationpro cessissummarized in Figure 7.

Herethetargetoutputsaregeneratedby employ-ingtheprop osedautomaticco dingscheme.Thissolves theproblemofoutputlayerrepresentation. Thenext issues are related to network architecture and initial weights.

Insteadofp erformingtrialanderror,thenumber ofhiddenunitisdeterminedbyemployingthe knowl-edgeofcriticalp oints.Inthiscasethestraightforward approachofsetting thenumberofhidden unittothe numberofcalculatedcriticalpointspairscouldnotb e appliedduetotheenormousnumberofcriticalp oints. Therefore,thepair ofcriticalp oints aresortedbased onthedistanceb etweeneachpair. Next, thenumber ofhiddenunitisdeterminedby xingathresholdvalue d

m

axoverthedistance. Inthiswayif d m

axsettox resultsinuhiddenunits,then rstucriticalpointsare employedforcalculatingtheinitialweightsandbiases.

(7)

inthesense thatthepatternsthat staycloseto each otherintheinputspacewouldlargelye ectthe learn-ingpro cess.

5.3.3 Results

Therelationofdmaxandthenumberofhidden units isgiveninTable.4.

dmax 2.0 2.2 2.4 2.5 3.0 #Hiddenunits 35 46 57 61 76 Table4 Relationofdmaxand#hiddenunit

0.50

1.00

1.50

2.00

2.50

3.00

3.50

# Hidden unit

Method1

Method2

Training epoch (X 10 )

3

35

46

57

61

76

Fig.8 Trainingep o ch(Metho d1VsMetho d2)

97.5

98

98.5

99

99.5

100

35

40

45

50

55

60

65

70

75

80

C

l

ass

ifi

ca

ti

on ra

t

e

(%)

"training-error-method1"

"training-error-method2"

Fig. 9 Classi cation rate (Training data : Metho d1 Vs Metho d2)

Theperformanceof theprop osedautomaticco d-ing pro cedurecould haveb eencompared tothe con-ventionaloutputlayerrepresentationwhereeachofthe output unitrepresentsa singlecategory,howeverthis

92.5

93

93.5

94

94.5

95

95.5

96

96.5

97

35

40

45

50

55

60

65

70

75

80

C

l

ass

ifi

ca

ti

on ra

t

e

(%)

#

idd

i

"test-error-method1"

"test-error-method2"

Fig. 10 Classi cation rate (Testingdata : Metho d1 Vs Metho d2)

if thereal worldproblemswhere thenumberof cate-gorytob erecognizedistoolarge(forexampleKanji) isconsideredthenitisunrealistictoapplythe conven-tionalmetho d that would requiremuchmorestorage and/memorythantheprop osedmetho d.

In order to evaluate the e ect of the prop osed metho d, one set of exp eriments were p erformed by employingonlyautomaticco dingpro cedure(Metho d1), andanothersetofexp erimentswereperformedby em-ployingb othautomaticco ding and automatic weight initializationpro cedure(Metho d2). However,inb othof thecases,thelearningparameterswerekeptthesame. Theexp erimentaloutcomesareshowninFigure. 8,9, and10.

ItisclearfromFigure. 8,thatthenumberof train-ingep o chrequiredforthecombinationoftheprop osed metho dsislessthanthecasewhereonlyautomaticco d-ingprocedureisemployed.

In Figure 9, 10 it can be seen that the com-bined metho d results in b etter p erformance against b othtrainingandtestingdata. Nowb othofMetho d1 andMethod2gavesamep erformancefortrainingdata with57hiddenunits(Figure. 9). However,compared toMetho d1,Metho d2gaveb etterp erformancewith re-sp ecttotestingdata(Figure. 10).Thisassuresthatthe combinedmetho d(Metho d2)producesbettersolution. Therefore,theresults indicatethat theinitialweights aree ectiveforfastertrainingandbettersolution.

6. Conclusion

It has b een successfully shown through exp eriments that thea priori relatedto decisionboundarycanb e employedfordetermining theinitial weightsofa net-work.Comparedtostandardback-propagationthe pro-p osedmetho dreducestrainingtime. Themethodhas

(8)

cer databaseand Devanagaricharacters. Themetho d alsodeterminesthenumberofhiddenunits automati-cally.

However, inthepresentstage thepairof critical p oints areselectedbasedon a thresholdoverthe dis-tanceb etweenthepairofcriticalp oints,whichcanb e consideredasalo calapproach.Hence,someother cri-terionforselectingthepairofcriticalp ointsincaseof complexclassb oundaryistob efurtherinvestigated.

Acknowledgement

TheauthorwouldliketothanktheEducationCenter for InformationPro cessing(ECIP)ofTohoku Univer-sityfor providingtheDevanagaridatabase. Theyare also gratefulto MasayukiKimura, SusumuHoriguchi and Kazunori Kotaniof Japan AdvancedInstituteof ScienceandTechnologyfortheirvaluablecomments.

References

[1] Rumelhart, McClelland, and the PDP Research Group, \ParallelDistributedPro cessing,"TheMITPress,1989. [2] Rosenblatt,F:TwoTheoremsofStatisticalSeparabilityin

thePerceptron;ProceedingsofaSymposiumonthe Mecha-nizationofThoughtProcess,HerMajesty'sStationary Of- ce,London,1959,421-456.

[3] Widrow, B., and Ho , M.E: Adaptive Switching Cir-cuits;InstituteofRadioEngineers,WesternElectronicShow andConvention,ConventionRecord,part4,1960,96-104. [4] DavidH.Kil, Frances B.Shin,\Patternrecognitionand

Predictionwith applicationstoSignalCharacterization," AIPPRESS,1996,pp.134-138.

[5] Riedmiller,MartinandBraun:RPROP-AfastAdaptive learningalgorithm;Technicalrep ortUniversitatKarlsruhe, 1992.

[6] Y.Riedmiller,MartinandBraun:RPROP-Afast Adap-tivelearningalgorithm;Technicalrep ortUniversitat Karl-sruhe,1992.

[7] K. Hara and K. Nakayama: Training Data Selection Metho dforNeural Networks;IEICETrans.ofInf.Syst. SocietyofJapan,Fundamentals,Vol.E81-A,No.3,pp. 374-381,March1998.

[8] M.C.Mozer,P.Smolensky:Skelitonization:Atechnique fortrimmingthefatfromanetworkviarelevance assess-ment; inAdvancesin neural informationprocessing sys-tems,1,pp.107-115,1989.

[9] J.SietsmaandDow: Neuralnetworkpruning-whyand how;Proceedingofthesecondinternationalconferenceon neuralnetworks,pp.326-333,July1988.

[10]Fahlman,E.Scott:Anempiricalstudyoflearningsp eedin backpropagationnetworks;Technicalrep ort CMU-CS-88-162,1988.

[11]T.Ash: Dynamicno decreationinbackpropagation net-works;ConnectionScience,1(4),pp.365-375,1989. [12]H. Kernand BunyiuNanjio: Saddharmapundarika,

ST.-PETERSBOURG,1912.

[13]K.Keeni,H.Shimo daira,T.Nishino,Y.Tan\Recognition ofDevanagariCharactersUsingNeuralNetworks," Trans-actionofInformationandSystemsSo cietyofJapan,IEICE TRANS.INF.&SYST.,VOL.E79-D,No.5,May1996. [14]Y.Linde,A.Buzo,andR.M.Gray:Analgorithmfor

Fig. 1 Critical p oints and decision b oundary
Fig. 3 Decision b oundary given by the prop osed metho d
Table 3 Average accuracy rate of linear programming meth-
Fig. 7 Recognition pro cess
+2

参照

関連したドキュメント

An easy-to-use procedure is presented for improving the ε-constraint method for computing the efficient frontier of the portfolio selection problem endowed with additional cardinality

I give a proof of the theorem over any separably closed field F using ℓ-adic perverse sheaves.. My proof is different from the one of Mirkovi´c

In this paper, under some conditions, we show that the so- lution of a semidiscrete form of a nonlocal parabolic problem quenches in a finite time and estimate its semidiscrete

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

By considering the p-laplacian operator, we show the existence of a solution to the exterior (resp interior) free boundary problem with non constant Bernoulli free boundary

Since the boundary integral equation is Fredholm, the solvability theorem follows from the uniqueness theorem, which is ensured for the Neumann problem in the case of the

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

This paper presents an investigation into the mechanics of this specific problem and develops an analytical approach that accounts for the effects of geometrical and material data on