An Ecient Learning Scheme for Pattern Classication
Us-ing Multi-layer Feed-forward Neural Networks
33
KanadKeeni y
,KenjiNakayama yy
,and HiroshiShimo daira yyy
,
SUMMARY Thisstudyhighlightsonthesubjectofweight initializationinmulti-layerfeed-forwardnetworks.Trainingdata isanalyzedandthenotionofcriticalpointisintroducedfor de-terminingtheinitialweightsandthenumberofhiddenunits. Theprop osedmetho dhasb eenappliedtoarticialdata, pub-liclyavailablecancerdatabaseandDevanagaricharacters. The exp erimental resultsofarticialdata showthatthe prop osed metho dtakesalmost1/3ofthetrainingtimerequiredfor stan-dardbackpropagation.Inordertoverifytheeectivenessofthe prop osedmetho d,standardbackpropagationwherethelearning startswithrandominitialweightshasb eenappliedtothecancer databaseandDevanagaricharacters. Theexp erimentalresults indicatethetheprop osedweightinitializationmetho dresultsin b ettergeneralization.
key words: Neural networks, pattern classication, initial weights,decisionboundary,criticalpoints
1. Intro duction
Neuralnetworksarchitectureshavesparkedofgreat in-terestinrecentyearsb ecauseoftheirintriguing learn-ingcapabilities.Severallearningalgorithmshaveb een developed for trainingthe networksand out of them Back Propagation [1] is probably most widely used. The reason for thepopularity isthe underlying sim-plicityandrelativep owerofthealgorithm. Itsp ower derivesfromthefactthatunlikeitsprecursors,thep er-ceptionlearningrule[2],andtheWidrow-Holearning rule[3],itcanb eemployedfortrainingnonlinear net-works of arbitrary connectivity. Since suchnetworks are often required for real-worldapplications, such a learning pro cedure iscritical. However thestandard back-propagationhasthefollowingdrawbacks.
1. Thelearningpro cedureistimeconsuming. Train-ingalwaysstartsfromrandominitialweightsand theweightadjustmentpro cedureisveryslow
2. Itisnotclearastohowtoconstructanappropriate networkarchitecture.Thenumberofhiddenunits
y
TheauthoriswiththeFacultyofInformationSystems andQuantitativeSciencesofNanzanUniversity
yy
TheauthoriswiththeFacultyElec. Eng.andComp. ScienceofKanazawaUniversity
yyy
TheauthoriswiththeFacultyofInformationScience ofJapanAdvancedInstituteofScienceandTechnology
33
Apartofthispap erwaspresentedattheInternational JointConferenceonNeuralNetworks(IJCNN'99)and In-ternational ConferenceonComputationalIntelligenceand
for a particular problem can onlyb edetermined bytrialanderror
In caseof 1., it iswidely known that theinitial weightsofthenetworkwouldlargelyeectthe general-izationp erformance. Forexample,twonetworkswith samearchitecture,whentrainedwithtotally dierent initialweightswouldpro ducedierentresults. Several researchershavedesignedsystemsinwhichweightsare initializedsothattheinitialactivityofthenetwork cor-resp ondstothesuccessiverules,thatmaycomefroman exp ert. Wilsonhasprop osedFastBPN[4],wherethe initialparametersaredeterminedbyestimatingthe sig-nalrankwithgenerallikelihoo dratiotest(GLRT)and thesingularvaluedecomp osition(SVD)oftheGLRT covariancematrix. Howeverthedisadvantageof their metho disthatthenumberofhiddenunitscannot ex-ceedtheinputfeaturedimension. Inordertotackle1, mostoftheresearchershavemainlyfo cusedon improv-ingtheoptimizationpro cedurebydynamically adapt-ingthelearningrates[5]-[6]. Ontheotherhandithas b eenshownin [7]that trainingdataselection isalso imp ortant.
Incaseof2.,theproblemhasb eentreatedin var-ious ways. One mostcommon approachhas b eento start with a large number of hidden units and then prunethenetworkonceitistrained[8],[9]. However, pruningdo es not alwaysimprovegeneralization. An-other strategyfor nding a minimal architecture has b eentoaddorremoveunitssequentially[10],[11].
Itisalsowellknownthatneuralnetworksdonot make anyassumption ab out theprobability distribu-tion functions of data and can solve complex prob-lems witharbitrary decisionb oundary. Therefore, it isdesirabletoinvestigatethelearningcharacteristicsof thenetworksforndinganestimateaboutthedecision b oundary. However,there hasb een nosignicant re-searchforndinginitialweightsand minimalnetwork architecturebyexploitingthecharacteristicsofdecision b oundary.
Inthepresentstudy,theab ovementioned draw-backsof backpropagation has b een carefully investi-gatedandametho dhasb eenprop osedfordetermining theinitialweightsforinputtohiddenlayerand estima-tionofhiddenunitsautomatically.
Multi-layerNeuralNetworks(MLNN). Thethird sec-tionpresentstheautomaticmetho dforgenerating ini-tialweightsforinputtohiddenlayerconnections.The fourth section describ es the database. Exp erimental resultsofarticial,realworlddataareprovidedinthe Fifthsection.Finallythelastsectionisdevotedto con-clusionandfurtherresearches.
2. Pattern classication characteristics of MLNN
Inanypatternclassicationsystem, patternmapping orpatternclassicationisequivalenttodividinganN dimensionalspacewhere thepatterns aredistributed. Incaseofmulti-layerneuralnetworks(MLNN),thisN dimensional spaceisdividedbyforminghyper-planes with the help of synaptic weights of nonlinear neu-rons. MLNNsdonotmakeanyassumptionab outthe probabilitydistribution functionsof thetrainingdata and can solvecomplex problems witharbitrary deci-sion b oundary. The degreeof freedom inplacingthe decisionb oundaryisveryhigh. Therefore,neural net-works areconsidered to b ea go o dchoicefor pattern classicationtasks.
Anothermostimp ortantasp ectofneuralnetworks islearn-ability.Incaseofsup ervisedlearning,networks can nd optimal synaptic weights through learning. However,sinceneuralnetworksarenonlinearsystems andgradientdescentisusedtondasetofweightsthat optimizethep erformanceforaparticulartask,thereis alwaysa possibilityof getting stuckinlo cal minima. Therefore,globalminimaoroptimalsolutionisnot al-waysguaranteed. Furthermore,thelearningprocessis timeconsuminganditishighlydep endentonthe prob-lemthatistob esolved.
Ifweassumethattherearenooverlapamongthe distributionoftrainingpatterns,thenpatternmapping canb ecategorizedinthefollowingclasses.
1. kXi0Xjkissmall V kY i 0Yjkissmall 2. kX i 0X j kissmall V kY i 0Y j kislarge 3. kX i 0X j kislarge V kY i 0Y j kissmall 4. kX i 0X j kislarge V kY i 0Y j kislarge Here,X i and X j b elong toclass! 1 and! 2 ,Y i andY j
arethecorresp ondingoutputvectors,andk1k standsfortheEuclideannorm. Incaseof1.,the prob-lemistomapsimilarinputvectorsinawaysuchthat thecorresp ondingoutputvectorsalsob ecomesimilar. In thesecond case, theinput vectorsare similarbut theyaretob emapp edasdierentpatternsinthe out-putspace. Thethirdcaseimpliesthattheinput pat-ternsthatarefarfromeachotherintheinputspaceare tob emapp edassimilarpatternsintheoutputspace. Finally,the4thcasemeansthattheinputpatternsare
farfromeachotherintheinputspaceandtheyareto b e mapp edas dierentpatterns intheoutput space. Now,thepatternmappingof1.,3.,and4. arenotthat dicult.However,incaseof2.,theproblemistomap thepatternsthatareverycloseintheinputspace,as dierentpatternsintheoutputspace.Inthiscaseeven thoughthesolutionexists,duetothedicultyofthe problemthetrainingprocesswouldb etimeconsuming. Therefore,thesecondtypeofpatternmappingresults inveryslowlearningandthep ossibilityofarrivingat alocalsolutionisveryhigh.
For example, if we dene the connection weight fromthei'thinputtothej'thhiddenunitasw
ij then thetotalinputandoutputofthej'thhiddenunitcan b edenedasfollows. net j = n X i=1 w ij x i + j (1) O j =(net j ) (2) (netj)= 1 1+e 0netj (3)
where, (1)istheactivationfunctionand j
isthe bias. Atthesametimethetotalinputtothek'th out-putunitandthecorresp ondingoutputcanb edened asfollows. net k = j X j =1 w jk O i + k (4) Ok=(netk) (5)
where, (1) isthe sameactivation functionas it was withthehiddenlayer.
As mentionedearlierthe similarpatterns playa criticalroleinlearning.Supp osewehavetraining pat-ternsx
1n andx
2n
thatareverycloseintheinputspace andthepatternsb elongtotheclass!
1 and!
2 resp ec-tively. Inthiscase,thenetworkoutputwouldb ecome extremelysensitive.Thisisb ecausethenetworkoutput mustchangerapidlyforasmallchangeintheinput.
Now,ifthedecisionb oundaryisfarfromthe pat-terns x
1n and x
2n
, then the corresp onding outputs wouldhavethevalueO
1n = O 2n = 0 or1. However, duringthelearning pro cess,as thedecisionb oundary approachesx
1n andx
2n
theoutputofthecorresp ond-ingpatternsapproachthesamevalue,andthelearning pro cess b ecomesextremely slow. Inthiscase, as the decisionb oundary moves closeto the pair x
1n ,x
2n or entersthe regionb etween thepair, theamountof weightcorrectionb ecomesextremelysmall.Tob e spe-cic,ifweassumeO = O =
amountofcorrectionforthen'thpattern1 n wouldb e asfollows. 1 n = n O nj ; (6) n =(t n 0O n ) f(net n ); (7) where,O nj
istheoutputofthej'thhiddenunit. Now, asthepatternsx
1n andx
2n
aresimilar,theoutputof thejthhiddenunitwouldalsob ecomesimilar,thatis
O 1nj =O 2nj ; (8) and f(net 1n ) = f(net 2n ): (9)
Inthiscase,theweightcorrectionwillb easfollows.
11n+12 n = O1nj((t1n0O1n) +(t 2n 0O 2n )) f(net 1n ): (10)
Ifitisassumedthatthetargetsofthepatternsare
t 1n
=1;t 2n
=0; (11)
andtheoutputofthepatternsare
O1n=z ;O2 n
=z0; (12)
thentheweightcorrectionwouldb ecomeasfollows.
1 1n +1 2n = O 1nj ((t 1n 0z) +(t 2n 0(z0))) f(net 1n ) =((t 1n +t 2n )02z+) f(net 1n ) =(102z+) f(net 1n ) (13)
Nowatb eginningoftraining,thedecisionb ound-ary wouldb e far fromx
1n and x
2n
and inthat case thecorrectionofsynapticweightswouldnotb esmall. However, duringthe trainingpro cess,as thedecision b oundarymovestowardsx
1n andx
2n
,becauseofthe similarityof the patternstheoutput wouldapproach thesamevalue. Themostcriticalsituationwouldtake placeaszandkkapproachthevalue0.5and0resp ec-tively.Thatis,
11n+12 n = lim z !0:5 lim !0 (102z+) f(net 1n ) = 0 (14)
Therefore,thecorrectionofweightsforthese pat-ternswouldb ecomeverysmallandasaresultthe learn-ingpro cesswouldb ecomeextremelyslow.
Ontheotherhand,ifthepatternsx 1n
andx 2n
are farfromeachotherintheinputspace,evenifthe de-cisionb oundarymovestowardsthemtheactivationof thecorresp ondingoutputswouldnotbecomethesame atthesametime. Hence,theweightcorrectionwillnot
Critical point
Decision boundary
Class2
Fig.1 Criticalp ointsanddecisionb oundary
3. Estimationofdecisionb oundary
Here,thedecisionruleistoselecttheclass corre-sp ondingtotheoutputneuronwiththelargestoutput. Forthesakeofsimplicity,thenumberofoutputunitis settotwo(two-classclassicationproblem). However, theconcept can b e hop efully extended tomulti-class classication problems. The decisionb oundary for a multi-layerfeed-forwardnetworkisdenedasfollows.
Denition1. Thedecisionb oundarybetweentwo classesinafeed-forwardneuralnetworksisthelo cusof p ointswhere b othoftheoutputneuronsproduce the sameactivation.
If wedenetheactivationoutputunitiasO i
(x) where x isan input vectorand let d(x) = O
1 (x)0 O
2
(x),thenthedecisionb oundarycanb edenedas
fxjd(x)=0g (15)
Next,thenotionofCriticalpointsisintroducedas follows.
Denition2. Thesetofcriticalp ointscontainspairsof datathatsatisfythefollowingcondition:
min k (d(p i ;q k ))=d(p i ;q j ); (16) min k (d(p k ;q j ))=d(p i ;q j ) (17) whered(p i ;q k
)denotestheEuclideandistanceb etween thevectorp
i andq
k .
Ifwedenotethesamplesinclass! 1 asp i and sam-plesinclass!2 asq j
thenforeachsampleinclass!1 andclass!2,thesetofcriticalp ointsCcanb edened as C=f(p i ;q j )jmin k (d(p i ;q k ))=d(p i ;q j ); min k (d(p k ;q j ))=d(p i ;q j );p i 2! 1 ;q j 2! 2 g (18)
Ahyper-planehastobeplacedinb etweenthepair ofcriticalp oints. Iftheco ordinateofthepairof criti-calp oints(p i ;q k )are(x i ;y k )and(u i ;v k
)thentheideal hyp erplanemustgothroughthepoint
(xi+ui) 2
; (yk+vk)
2 and theslop ez ofthe straightlinecanb e calculated fromthefollowingequation.
v k 0y k u i 0x i 2z=01 (19)
In thepresent approachinsteadof starting from scratchtheinitialweightsforthehiddenunits
connec-Since,theweightvectorsareorthogonaltothe sep-aratinghyp er-plane,theinitialweightsaregeneratedin thefollowingway. First,thepairofcriticalp ointsare determinedfromthetrainingdataasmentionedab ove. Next forall pair of criticalpoints(p
i ;q k )theweight vectorsm n
aregeneratedbythefollowingequation:
m n = p i 0q k kp i 0q i k (20)
andthebiases n
aregeneratedbythefollowing equa-tion: n =0m t n 1P=0 (p i 0q k ) t kp i 0q k k 1 (p i +q k ) 2 (21) 4. Databse
ga
u
a
0
kha
gha ca cha pa pha ba
bha
ta
da
dha na ta
tha da dha na
ya
ra
la
va
sa
sa sa
ha
kta
pta
ru
tra sra sra sta dhra
nja
tta
i
u
r
e
,
h
end
1
2
3
4
5
6
7
8
9
dva dbha
ddha
tta
pra
dba
ka
ja
ma
ksa
stha
a
0
Fig.2 SomeofthebasicDevanagaricharacters
Theprop osedmetho dhasb eenappliedtothe can-cerdatabaseobtainedfromtheUniversityofCalifornia Machine Learning Rep ository (two-class classication problem) andDevanagaricharacters(multi-class clas-sication problem). The cancer database is publicly availableand it contains699 instanceseachhaving 9 attributes. TheDevanagaridatabaseconsistsof44 dif-ferentcharacters(23consonants,13c-ccombinations, 3 vowels,2 sp ecialcharactersand3 numerals). Sixty seven Devanagari characterstakenfromthetext [12] are showninFigure. 2. Thecharactersarestoredin the formof162 16 oatingp ointimages. Adetails ab outtheDevanagaridatabsecanb efoundin[13].
5. Exp eriments
5.1 Exp erimentswitharticialdata
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Decision Boundary
"class1.output"
"class2.output"
"class1.orgdat"
"class2.orgdat"
"01.dat"
"10.dat"
Fig.3 Decisionb oundarygivenbytheprop osedmetho d
Inthepresentapproachthenumberofhiddenunits iskeptthesameas thenumb er ofcriticalp oints cal-culatedfromthetrainingdata. Twodimensionaldata isusedfortrainingandtesting. Fivesamplesforeach class(inthiscase2classes)wererandomlygenerated for training. The network had two input units, two outputunitsandthenumberofhiddenunitwassetto 3. Trainingwascontinueduntilthemeansquareerror reach0.001. Fortesting,10000sampleswererandomly generated and the class to which the testing sample fallsisdecidedbyconsideringthemaximumactivation oftheoutputunits. Thenetworkcouldlearnthe train-ingdatawith3hiddenunits.Thedecisionboundaryis estimatedfromtheoutputactivationofthenetworkin resp ecttothetestingsamplesasfollows.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Decision Boundary
"class1.output"
"class2.output"
"class1.orgdat"
"class2.orgdat"
"01.dat"
"10.dat"
Fig.4 Decisionb oundarygivenbytheConventional Back-propagation
Foreachtestingsamplecorrectlyclassiedasclass !
1
,ndthenearesttestingsamplecorrectlyclassiedas class!
2
. Thesamepro cessisrep eatedforthetesting samples classiedas class!
2
. Nowthe line connect-ingthepairsmentionedab ovemustpassthroughthe decisionb oundary sincethepair of samplescorrectly classieddierently. The decisionboundarygivenby thenetworkisshowninFigure 3,whereeachofthe
cal-In orderto evaluatethe eectivenessof theprop osed metho d another set of exp eriments were p erformed byemployingtheconventionalback-propagation algo-rithmandthedecisionb oundaryisshowninFigure 4. Next,thenetworkwastrainedwithvedierentinitial weights(weightsforhiddentooutputunitconnection) ,andtheresultissummarizedinTable 1.
Initweight Iteration Iteration Prop osedmetho d StandardBP
Seed1 6345 21279 Seed2 6341 21347 Seed3 6251 21263 Seed4 6179 21050 Seed5 6224 22019 Mean 6268 21392
Table1 Comparisonofresults
Itcanb eseeninTable 1,thattheiterations neces-saryfortheprop osedmetho dislessthan1/3ofthatof standardback-propagation. Incaseof standard back-propagation,thereisnootherwaythantocutandtry for determiningthenumberofhiddenunitsnecessary forsolvingaproblem. Incaseoftheprop osedmetho d, thenumberofhiddenunitisdeterminedautomatically.
5.2 Two-classclassicationproblem
Exp erimentswerep erformed byemployingthecancer data base. It wasdivided intoten training and ten testsetsandatenfoldcrossvalidationwasp erformed. In orderto evaluatethe eectivenessof theprop osed metho d another setof exp eriment wasp erformed by applying the standard back-propagation. The num-b er of hidden unit wasset to thenumber of critical p ointsgivenbytheprop osedmetho d. Theaverage ac-curacy rate of the prop osed metho d, standard back-propagationandtheresultsofapplyingBayesdecision (assuming normaldistributionforeachcategory)rule isshowninTable 2. Ontheotherhand,thefollowing
Metho d Accuracy(%) Prop osed 96.8 StandardBP 96.3
Bayes 95.27
Table2 Averageaccuracyrate
linear programming methods for pattern recognition: MultiSurfaceMetho d[18](MSM),RobustLinear Pro-gramming [19] (RLP), and Perturb ed Robust Linear Programming [20] (RLP-P) were also applied on the samedatasetandtheresultsaresummarizedinTable 3.
ItisclearfromTable 3thatoutofthethree meth-o dsRLPPgivestheb estaccuracy of96.6%. Nowif compared to Table. 2, it is clear that theprop osed metho dgivesslightlyb etterresultthanRLPP.Onthe otherhand,thesup eriorityoftheprop osedmethodover
MSM 92.6
RLP 96.4
RLP-P 96.6
Table3 Averageaccuracyrateoflinearprogramming meth-o ds
Bayesdecisionrulesuggeststhattheboundarygivenby theproposedmethodisreliable.
5.3 Multi-classClassicationproblem
Here the proposed metho d is applied to Devanagari charactersforevaluatingitsapplicabilitytomulti-class classicationproblem. However, a dierentapproach hasb eentakenforrepresentingtheoutputlayer. The outputlayerco dingmetho disthoroughlydiscussedin [12]. However,fortheb enetofthereadersabrief de-scriptionofthemethodisgiveninthenextsubsection.
5.3.1 Automaticco dingscheme
Split the 16 X 16 image
into four parts
Cluster each part
5 bit binary number
Feature vector
y
y
y
y
q
q
q
q
C
(n)
Map each element of q to a
l
l
l
l
l
Binary number
encoder
1
2
3
4
1
2
3
4
n
n
n
n
(20−bit)
Fig.5 Automaticco dingpro cedure
One ofthe imp ortantasp ect of information pro-cessingisthewaytheinformationisrepresented. The conventionalapproachof assigningacategorytoeach cellisreasonableinthesensethat,thedistanceamong theco desisconstant. However,thiskindofapproach cannotbetakenforrepresentingalargenumb erof cat-egories. Althoughbinaryrepresentationapp earstobe reasonableinthesensethatitwouldrequirelog
2 nbits forrepresentingncategories; theyarenot wellsuited fornetworkoutputrepresentationsb ecauseofthe inter-dep endencyofcells whichmakethelearning dicult. Forexample,ifwerepresentthenumb ers7and8in bi-naryas0111and1000thenirresp ectiveofthefact thatthenumb ersareadjacentintegers,4outputunits willhavedierentvalue.Here,insteadofastraight for-ward binarycodingmetho d, a dierentapproachhas b eentakenandthemethodisasfollows. Atrsteach trainingdatax
n
(n=1;111;N)(16216pixels)is splitintofourparts(828pixels)inthefollowingway.
x n
Asaresultthefollowingsetofpatternisgeneratedfor eachparti=1;111;4. Y i =fy 1i ;y 2i ;111;y Ni g Next, eachY i
(i =1, 2, 3,4) isclustered. Herethe LBG method [14] wasapplied for clustering and the numberofclusterwassetto 16. Afterclustering, for each !
l
(l = 1;111;L) the clusterto which the part b elongs ischeckedand thenalclusterisdetermined byconsideringthemaximumnumb erofcharactersb e-longing toaspeciccluster. So,inthisway,for each category !
l
there will b e a four dimensional vector q l = (q l1 ;q l2 ;q l3 ;q l4
), where eachelementof the vec-tor will corresp ondtotheclustertowhichthe
corre-11011
11010
10111
10110
10011
10001
01111
01110
01101
01100
01011
01001 00111
00110 00011
00010
Fig.6 Co deassignmentpro cedure
sp ondingpartofthetrainingsamplebelongs.Now,each elementof vectorq
l
ismapp edtoa 5-bitbinary pat-tern. Thebinarypatternsarehandpickedandtheyare mapp edinawaysuchthatforanytwoclusterswhose parentisthesame,theEuclideandistanceb etweenthe binarypatternsb ecomesone.Thiswillsomehowensure thatthesimilarpart(q
l
)ofeachcategory! l
wouldget similarcodes. On theotherhand, neglectingthe bi-narypatternscontainingonly0'sor1's,theminimum numberof bitsrequiredtomaintainthisconstraintis 5. Thecodeassignmentpro cedureisshowninFigure 6. Finally,q
l
istransformedtoafeature vectorwhich isa20-bit co de. Inthiswayforeachtrainingsample x
n
(n=1;111;N)therewillb e afeature vector c (n) with 20 elements. (Here(n) representsthe category numberto which x
n
b elongs.) Theentire pro cessis summarizedinFigure 5.
5.3.2 Recognitionpro cess
In the present approach, training pro cess is di-videdintotwophases. Intherstphase,thenetwork is trainedwith thetraining data ina usual way. In thesecondphase,thetrainingdataisfedtothe net-workandtheoutputvectorsgivenbythenetworkare employedforformingtheprototypesforeachcategory.
Input(xn)(n=1;:::;N) ? NN ? Output t t R o=(o 1 ;o 2 ;:::;o K ) (K=dimensionofo) Testmo de Learnmo de t 9 m1= 1 N 1 X x2! 1 o m2= 1 N 2 X x2!2 o . . . m M = 1 NM X x2! M o d 1 = K X k=1 jo k 0m 1k j 2 d2= K X k=1 jok0m2kj 2 . . . dM= K X k =1 jok0mMkj 2
Fig.7 Recognitionpro cess dM
3
=mindM=)M3:recognizedcategory
m 1 = 1 N 1 X x2!1 o m 2 = 1 N 2 X x2!2 o . . . m M = 1 N M X x2!M o
Here N and o stand for the number of samples usedforeachcategoryduringtrainingandtheoutput vectorresp ectively.Theseprototypesarelaterusedfor recognition.
During evaluation, Euclidean distance has b een takenamongtheprototypesformedinthesecondphase oftrainingandtheoutputvectorgivenbythenetwork duringtest. Thewholepro cess,fromformationof pro-totyp evectorsto evaluationpro cessissummarized in Figure 7.
Herethetargetoutputsaregeneratedby employ-ingtheprop osedautomaticco dingscheme.Thissolves theproblemofoutputlayerrepresentation. Thenext issues are related to network architecture and initial weights.
Insteadofp erformingtrialanderror,thenumber ofhiddenunitisdeterminedbyemployingthe knowl-edgeofcriticalp oints.Inthiscasethestraightforward approachofsetting thenumberofhidden unittothe numberofcalculatedcriticalpointspairscouldnotb e appliedduetotheenormousnumberofcriticalp oints. Therefore,thepair ofcriticalp oints aresortedbased onthedistanceb etweeneachpair. Next, thenumber ofhiddenunitisdeterminedbyxingathresholdvalue d
m
axoverthedistance. Inthiswayif d m
axsettox resultsinuhiddenunits,thenrstucriticalpointsare employedforcalculatingtheinitialweightsandbiases.
inthesense thatthepatternsthat staycloseto each otherintheinputspacewouldlargelyeectthe learn-ingpro cess.
5.3.3 Results
Therelationofdmaxandthenumberofhidden units isgiveninTable.4.
dmax 2.0 2.2 2.4 2.5 3.0 #Hiddenunits 35 46 57 61 76 Table4 Relationofdmaxand#hiddenunit
0.50
1.00
1.50
2.00
2.50
3.00
3.50
# Hidden unit
Method1
Method2
Training epoch (X 10 )
3
35
46
57
61
76
Fig.8 Trainingep o ch(Metho d1VsMetho d2)
97.5
98
98.5
99
99.5
100
35
40
45
50
55
60
65
70
75
80
C
l
ass
ifi
ca
ti
on ra
t
e
(%)
"training-error-method1"
"training-error-method2"
Fig. 9 Classication rate (Training data : Metho d1 Vs Metho d2)
Theperformanceof theprop osedautomaticco d-ing pro cedurecould haveb eencompared tothe con-ventionaloutputlayerrepresentationwhereeachofthe output unitrepresentsa singlecategory,howeverthis
92.5
93
93.5
94
94.5
95
95.5
96
96.5
97
35
40
45
50
55
60
65
70
75
80
C
l
ass
ifi
ca
ti
on ra
t
e
(%)
#
idd
i
"test-error-method1"
"test-error-method2"
Fig. 10 Classication rate (Testingdata : Metho d1 Vs Metho d2)
if thereal worldproblemswhere thenumberof cate-gorytob erecognizedistoolarge(forexampleKanji) isconsideredthenitisunrealistictoapplythe conven-tionalmetho d that would requiremuchmorestorage and/memorythantheprop osedmetho d.
In order to evaluate the eect of the prop osed metho d, one set of exp eriments were p erformed by employingonlyautomaticco dingpro cedure(Metho d1), andanothersetofexp erimentswereperformedby em-ployingb othautomaticco ding and automatic weight initializationpro cedure(Metho d2). However,inb othof thecases,thelearningparameterswerekeptthesame. Theexp erimentaloutcomesareshowninFigure. 8,9, and10.
ItisclearfromFigure. 8,thatthenumberof train-ingep o chrequiredforthecombinationoftheprop osed metho dsislessthanthecasewhereonlyautomaticco d-ingprocedureisemployed.
In Figure 9, 10 it can be seen that the com-bined metho d results in b etter p erformance against b othtrainingandtestingdata. Nowb othofMetho d1 andMethod2gavesamep erformancefortrainingdata with57hiddenunits(Figure. 9). However,compared toMetho d1,Metho d2gaveb etterp erformancewith re-sp ecttotestingdata(Figure. 10).Thisassuresthatthe combinedmetho d(Metho d2)producesbettersolution. Therefore,theresults indicatethat theinitialweights areeectiveforfastertrainingandbettersolution.
6. Conclusion
It has b een successfully shown through exp eriments that thea priori relatedto decisionboundarycanb e employedfordetermining theinitial weightsofa net-work.Comparedtostandardback-propagationthe pro-p osedmetho dreducestrainingtime. Themethodhas
cer databaseand Devanagaricharacters. Themetho d alsodeterminesthenumberofhiddenunits automati-cally.
However, inthepresentstage thepairof critical p oints areselectedbasedon a thresholdoverthe dis-tanceb etweenthepairofcriticalp oints,whichcanb e consideredasalo calapproach.Hence,someother cri-terionforselectingthepairofcriticalp ointsincaseof complexclassb oundaryistob efurtherinvestigated.
Acknowledgement
TheauthorwouldliketothanktheEducationCenter for InformationPro cessing(ECIP)ofTohoku Univer-sityfor providingtheDevanagaridatabase. Theyare also gratefulto MasayukiKimura, SusumuHoriguchi and Kazunori Kotaniof Japan AdvancedInstituteof ScienceandTechnologyfortheirvaluablecomments.
References
[1] Rumelhart, McClelland, and the PDP Research Group, \ParallelDistributedPro cessing,"TheMITPress,1989. [2] Rosenblatt,F:TwoTheoremsofStatisticalSeparabilityin
thePerceptron;ProceedingsofaSymposiumonthe Mecha-nizationofThoughtProcess,HerMajesty'sStationary Of-ce,London,1959,421-456.
[3] Widrow, B., and Ho, M.E: Adaptive Switching Cir-cuits;InstituteofRadioEngineers,WesternElectronicShow andConvention,ConventionRecord,part4,1960,96-104. [4] DavidH.Kil, Frances B.Shin,\Patternrecognitionand
Predictionwith applicationstoSignalCharacterization," AIPPRESS,1996,pp.134-138.
[5] Riedmiller,MartinandBraun:RPROP-AfastAdaptive learningalgorithm;Technicalrep ortUniversitatKarlsruhe, 1992.
[6] Y.Riedmiller,MartinandBraun:RPROP-Afast Adap-tivelearningalgorithm;Technicalrep ortUniversitat Karl-sruhe,1992.
[7] K. Hara and K. Nakayama: Training Data Selection Metho dforNeural Networks;IEICETrans.ofInf.Syst. SocietyofJapan,Fundamentals,Vol.E81-A,No.3,pp. 374-381,March1998.
[8] M.C.Mozer,P.Smolensky:Skelitonization:Atechnique fortrimmingthefatfromanetworkviarelevance assess-ment; inAdvancesin neural informationprocessing sys-tems,1,pp.107-115,1989.
[9] J.SietsmaandDow: Neuralnetworkpruning-whyand how;Proceedingofthesecondinternationalconferenceon neuralnetworks,pp.326-333,July1988.
[10]Fahlman,E.Scott:Anempiricalstudyoflearningsp eedin backpropagationnetworks;Technicalrep ort CMU-CS-88-162,1988.
[11]T.Ash: Dynamicno decreationinbackpropagation net-works;ConnectionScience,1(4),pp.365-375,1989. [12]H. Kernand BunyiuNanjio: Saddharmapundarika,
ST.-PETERSBOURG,1912.
[13]K.Keeni,H.Shimo daira,T.Nishino,Y.Tan\Recognition ofDevanagariCharactersUsingNeuralNetworks," Trans-actionofInformationandSystemsSo cietyofJapan,IEICE TRANS.INF.&SYST.,VOL.E79-D,No.5,May1996. [14]Y.Linde,A.Buzo,andR.M.Gray:Analgorithmfor