Cognition: Differential-geometrical View on Neural Networks

(1)

Photocopying permittedbylicenseonly theGordon and BreachScience Publishersimprint.

PrintedinMalaysia.

Cognition: Differential-geometrical View on Neural Networks

S.A. BUFFALOV*

Radio-physical Department, TomskState University, Lytkina 24-109, Tomsk 634034,^Russia

(Received4February1999)

A neural network taken as a model of a trainablesystem appears to be nothing but a dynamicalsystemevolvingon atangentbundle withchangeablemetrics.Inotherwordsto learn meansto changemetrics of adefinite manifold.

Keywords." Neuralnetwork, Cognition, Dynamical system, Manifold, Metrics

I. INTRODUCTION

Anapplicationof differentialorintegro-differential calculus for modeling of dynamical and self- organizing processesin social and natural systems hasbecomeatradition sincetheworksof

.

^Lottca

who released a book "Elements of Physical Biol-

ogy"

(Baltimore,

1925)

and W. Waltterra whose paper "Sulla periodicita delle fluttuazioni bio- logiche" appeared in 1927. Lots of complicated problems in mathematics, physics, astronomy, chemistryandbiologyfind their decisions(Hilborn and Tufillaro,

1997)

when implementing the modern sophisticated and carefully elaborated nonlinear-dynamical approach. It combines dynamicalsystems

(Katok

andHasselblatt,

1995)

and category theories, topology (Akin,

1993)

and differentialgeometry,ergodic(Pollicottand Michiko,

1997)

and fixed point theories, combinatorics

* E-mail:[email protected].

(Harper and Mandelbaum,

1985),

representation theory (Vershik,

1992),

^domaintheory(Potts,

1995)

etc. Pleiad of these theories works wonderful for natural phenomena and is absolutely helpless as only one tries to apply one of them to social and culturalevents.

Todayoneshould ask oneself whetheraformal- ism ofintegro-differential equations he applies in social realm is sufficient for adequate synergetic exposition of phenomenon, for example, socioeconomic development? Tobe fair the most often answerisgoingto be"no".The reason is inman.

Modelsof somenatural,socioeconomic, political etc.dynamical and self-organizing processes should take into account a presence of anthropological factorintrinsic totheseones.

A

manwith his diverse set of behavioral patterns ^enriches any ^kind of human-loaded phenomena

(HLP)

with unpredict- ability andenormouscomplexity.

43

(2)

In particular, in this humanitarian context a cognitive activity ofahuman being appears to be a part of HLP almost the most difficult for explication and at the same time to be a generic feature of a carrier of cultural patterns and archetypes. In modeling of synergetic aspects in physical,chemicaland other "behavioral

systems"

thereis no such difficulty. Therefore a due regard for cognitioninsocial-synergeticmodels,being an independent scientific problem, is suitable to be a criterionof theircompleteness.

The article presents some kind of an elabora- tion ofHLP models that use differential calculus by introducing a mathematical caption of cognition due to consideration of a dynamical system embedded in a manifold with inconstant metrics.

The author shows that such system isnothing but an "intellectually and mentally inspired" neural network

(Buffalov, 1998)

capable of learning (Mitchell,

1997),

recognition (Ripley,

1996),

generalizing and forecasting. It is also shown that metrics alteration actually is the training of this neuralnetwork.

There is an alternative attempt of Scott and Fucks

(1995)

^to depict some features of human brain using thetheoriesof attractors and Sil’nikov chaos. It gives^anotionabout dynamics complexity and perpetuity bymeans of thedynamical systems theory, and we try using the same theory and differential geometry to show how to provide a dynamical system with intellectual and mental propertiestomakeit suitablefor modelingof social and culturalHLP.

Intellectual systems with cognition and self- regulationusually representawide class ofcomplex adaptive living beings studied by humanitarian, medical and biological sciences. Machine learning theory (Mitchell,

1997)

reflects on manmade self- trainingdevicesanaloguetotheirbiological proto- types.Weaddressaneural networkstudiedbythis theory as one of such artificial systems endowed with a synthetic intellect and cognition suitable for "intellectual" sophistication of the ordinary differentialcalculus.

II. NEURAL NETWORKS

Neuralnetworks (Ripley,

1996)

^{are an}information processing technique based on the way biological nervous systems, such as thebrain, process information. The fundamental concept of neural net- worksisthe structure oftheinformationprocessing system. Composed of a large number of highly interconnected processing elements or neurons, a neural network system uses the human-like tech- niqueoflearning byexample to resolveproblems.

The neural network is configured for a specific application, such as data classification or pattern recognition, through a learning process called training. Just as in biological systems, learning involves adjustments to the synaptic connections that exist between the neurons. Neural networks candifferon:the way theirneurons areconnected;

thespecifickindsofcomputationstheir neuronsdo;

the way theytransmitpatterns of activity through- outthenetwork; and the way theylearnincluding theirlearningrate.

Inthisarticle we aregoingtousethedifferential- geometricalformalism todescribe neuralnetworks ofacertainarchitecture outlined in Petritis

(1995)

and toimplementthem for "intellectualization" of differential formalism and dynamical systems, in particular. This approach is rather new though therewere some attemptsinPotts

(1995)

concern- ingforgetfulneuralnetworksto derivetheembed- dingstrengthdecayrateofthe stored patterns using recentadvancesin domainand topologytheories.

We consider neural networks, which can be defined as a cascadeconjunctionof severalproperly constructedlayers.The typicalonehas the follow- ingstructure(Petritis,

1995):

(1)

A level

of

^input ^neurons fed with a vector of external signals.

(2)

A linear

transformation

^level. ^Here ^the ^input

vector is multiplied on a matrix of synaptic weights responsible forinformationstoring.

(3)

Nonlinear

transformations

^level

^(a

^set ^of^neu-

ronswith nonlineartransferfunctions). Here^a

(3)

linearly transformed signal is nonlinearly converted.

(4)

Alevel

of

^output^neurons.

The lastlayerisfedbackto the first one.

The previous passage outlines the neural network’s description

framework

^giving ^a ^strict

defini-

tiontoits structurewhat’soffundamental meaning in neuralnetwork technique. Relayingonthat fact we assumethat any system includinga dynamical one, whichallowsadescriptionwithinthat frame- workcanbe treatedas"intellectual"and possessing cognitionso far asneural network.

Nowonecantransfer theconceptofcognitionto the scene ofthedifferentialcalculus and the theory ofdynamical systemsinaverysimple anduniversal fashion. Just develop a generalized description of the dynamical systems in such a manner that it incorporates the neural network’s description as particular case. Such unifying generalization will automatically assign all propertiesof the neural net- workto thedynamical system andvice versa. The context accompanying the assignment will define the differential-geometriccontentof cognition.

III. DIFFERENTIALGEOMETRY BACKGROUND

Topological spaceis aset of pointswithsubsets indicated tobe open.Itisrequired thatanarbitrary intersectionordisjointunionofanyfinalnumberof opensetsshouldbeopenaswell. Theset

.

^itself^and

empty ^set should be open. We will work with important particular case of topological space metricspace for any ^two points x andy of^which there is defined a functionp(x, y) called adistance betweenx andywiththefollowing properties:

1. p(x, y) p(y,

x);

2.^3. p(x,Triangle inequality:

x)

0andp(x,y)p(x, y)

>

0,if x

<_

p(x,

-

^y;

z) +

p(z, y).

Let Mbe adifferentiable manifold.We _{say that} M is a Riemannian

manifold

^if ^there ^{is an} ^inner

product gx(’,

")

^defined ^oneachtangentspace

TxM

for x M such that for any smooth vector fields Xand Y on M the function xgx(X(x),Y(x)) is asmoothfunction ofx.

Ineveryneighborhood Ui^withlocal coordinates

(x)]=

^a ^positively ^defined symmetrical matrix

gci (X] X)

^{sets a}Riemannian metricssothatfor anyvector in apointxthe equality

l

²

gi

holds.

Metrics

gij(Y,...

,yn) is said to be Euclidean if there exists a system of coordinates

xl,...,x n,

xi= xi(y,...,

y), ⁱ⁼1,...,n,such that Given a setM^one saythat there is a structure of

n-dimensional

differentiable manifold

^on ^M ^{if for}

each x M there exists a neighborhood U of x and homeomorphism h from Uto anopen ball in

.

^We ^call ^(U,h) ^a ^chart

^(or

^system ^{of local}

coordinates)aboutx.

If Misamanifoldand x Misapoint,thenwe definethe tangent space toMat x

(denote TxM)

to bethe setof all vectorstangentto Minx.

The tangent bundle ofM, denote TM, is defined to be the disjoint union over x M of

TxM,

i.e.

TM

Ux

^M

^TxM.

^We ^think ôf ^TM âs ^the ^set ôf

pairs(x,v),wherex Mand v

TxM.

The tangent bundleis in factamanifold itself.Onecanintroduce the cotangent bundle if we consider ^a covector instead ofavector.

det

\OyJJ

⁰ ^and

Ox Ox

k:l Oyi oyj Thesecoordinates

x,..., x"

arecalledEuclidean.

IV. DYNAMICALSYSTEMS BACKGROUND Forthepurpose ofthispaperadynamical systemis a topological metric space X and a continuous vector field F. The system is denoted as a pair

(X,F).

Locally it is described by a system of ordinarydifferentialequations ofthe first order.

There exist two principal approaches for dynamical systems, which suppose^aconstruction of developed theoretical base. Actually these are Lagrangian and Hamiltonformalisms. The first is

(4)

the particularcaseof the last.Thatiswhywerestrict ourselves to Hamiltoniandynamical systems.

Inanyspace

n

with coordinates

(yl,...,

yn)and metrics _gij, i, j=1,...,n, it is possible to define a scalarproduct andindexraisingup. Sothegradient

Vf

of the function

f(yl,...,

y)looks like

(vf)i

^g*J

OyJ"

Vector field

X7f

^{has a} corresponding system of differential equations

i_ (vf)i

called gradient system.

The space with a skew-symmetrical metrics i=l iscalled^aphasespace if it allows such coordinates(q, p)that:

I

whereIistheunitmatrix,pis acovectorand(q,p) belongs on acotangent bundle ofaconfiguration manifold.

A

gradient system in a phase space is called a Hamiltoniandynamicalsystem. Ingeneralan even- dimensional manifold (phase space), â symplectic structure on it (integral Poincare invariant) ând â function on it (Hamiltonian) ^completely ^{define a} Hamiltoniansystem.

V. "INTELLECTUALIZATION" OF DYNAMICAL SYSTEMS

Weareavoiding of consideringof anarbitrary dy- namicalsystemsofar andaddressthe Hamiltonian one embedded in a cotangent bundle of a con- figuration manifold with the Riemannian skew- symmetrical metrics G-

(gij)2n

_ij=l" ^Let ^it be described by the Hamilton equations for generalized coordinates

q

and impulsesp., which can be written inthe following form:

# GF(y, t), (1)

where

yi= qi, yn+i__pi,

i=1,...,n,

OH(y, t)/Oy J,

j 1,...,n.

Fj(y, t)

In the case of an arbitrary nonobligatory gradient dynamical system

gl

Qi(y, t),

/5i Pi(y,

t),

1,...,n, in a cotangent bundle quantities in Eq.

(1)

^will

have the following denotation:

Fi(y,t)=Qi(y,t),

Fi(y,

t)-

Pi(Y,

t),

and

G(y, t) (Offi/OyJ)id=

2n ^is ^the

Jacobi matrix offrametransformation

(GG

^T^isthe Euclideanmetrics).

Equation

(1)

^canbewrittenin a form of a finite differenceschemewith asufficientlysmall time discretization step

-.

According to the Euler method weobtain an iterativeprocesswithnthstep giving

Yn Yn-1

+ 7-GnF(Yn-1, tn-1). (2)

Itcan be easily interpreted in terms ofa neural network withinputvectory,lineartransformation G, nonlinear transformation, i.e. a set oftransfer functionsF., andafeedback signaldecayrate

-.

It is known from numerical methods that accuracy of theapproximation

(2)

canbesubstan- tially improvedif to add in the right part of

(2)

a vectoroferrorscalculated usingthe firstformulaof Runge

y y

R- k2

(3)

where

y

and

y

are approximations calculated withdecayrates

-

^andk- for any integerk.

This procedure can be interpreted as a fruitful discussion between two neural networks with differentdecayrates

(or

"intellectuallevels").

The discretization ofEq.

(1)

provides two ways for displaying of cognition ⁱⁿ the framework of dynamical systemsbymeansof interpretationsheld in terms of neuralnetworktheory:

Mathematical caption

of

^cognition ^through

metrics alterations

Any

Hamiltonian dynamical system (X,

F)

evolving in the phase space Xwith the changeable Riemannian skew-symmetrical metricsGdefines a neural networkwiththe setof transfer functions

F;,

i=1,...,2n, and G as the matrixof synaptic weights.

(5)

Mathematical caption

of

^cognition^through

frame

alterations Both an arbitrary non-Hamiltonian dynamical system

(X,F)

evolving in the metric space X and an inconstant Jacobi matrix G of frame transformation define a neural network with the set oftransfer functions Fi, i=1,..., 2n, andGasthematrixof synaptic weights.

Resting upononeoftheseinterpretationsone can treat a dynamical system in a differentiable manifold with a changeablemetrics as a neural networkalongatraining process;

givemoreexplicitsolutionof the centralproblem of theneural networktheory:memorizationofan arbitrary set of patterns and determination of their attractionbasins. Withacertainnetwork’s architecture in hand this problem is solved by appropriate choosingof itstransferfunctions(i.e.

avectorfield Fi, 1,...,n, which is adynamical system in

fact)

and training algorithm

(a

law of evolution of a manifoldmetrics). Inother words the solution is given by correct setting up ofa dynamical system

(X, F),

where Xis a metricspace with ametricsG providingthebest (accordingto a givencriterion)

patterns’

memorization;

takeuse ofrich toolkitoftopology and smooth theories forinvestigation of "knowledge" struc- tures generated by neural network invariant to continuous and smoothchanges ofcoordinates, i.e. patterns remaining stable in the memory of thenetworkunderitstraining. Such patternscan becalledunconditionedreflexes;

address the fixed point theory as the most powerful tool forperception ofpatterns stable under network’s "cognitive" dynamics when recognizing, generalizing, predicting and etc. In particular, these patterns can be called condi- tionedreflexes obtainedthroughout learning for certainexternalinputs;

sophisticate and deepen., a research of neural networks usingLiealgebras ofvectorfields and aphase portraitofthetrainedneural network(its outputsignal’s dynamics during recognizingand

etc.),

namely,ofappropriate dynamical systemin acurved manifold;

generalize one’sinvestigations dueto categories of topological spaces andvector fieldselaborated in thecategorytheory.

VI. METRICS ALTERATION VERSUS TRAINING

"Intellectualization" endows a dynamical system

(X, F)

with one moredegree of freedom revealedin plasticity of quantities defining the metrics ofX.

Thisplasticityreflectstrainingabilitiesoftheneural networkassociated withthe dynamical system.

Let us consider autonomous differential equa- tionsestablishinganarbitrary training algorithm:

1 Gtr(Y), (4)

where y ^{is defined} through integral with G in integrand[refer

Eq. (2)].

If closeenoughto anend of the training process theintegro-differential equation

(4)

pertainingtoG canbe simplified to an ordinarydifferentialequa- tion

(see

Appendix)

lJ

lkr

(5)

where

R/(y,t) OGOt;

Fr(y,t),

i,j,k,r- 1,...,2n.

Here and further we mean asummation all over dummyindexesvalues.

As

you can see the metrics evolution equations

(5)

^describethemotionof 2n coupledoscillators.

VII. SOLUTION OFTHEMETRICS EVOLUTIONEQUATIONS

Werewrite

Eqs. (5)

in a concise matrixform:

R, (6)

where

,

^{is a} ^vector representation of the metric tensorG

(gij)2n

_ij=l andRis an operatorrepresen- tationof thetensor

Rk.

(6)

IfR isimplicitly time-dependent and

’(Yo, to)

g, (Yo, to) -’o

^are^entry^conditions ^then

^{Eq. (6)}

hasasolution:

t) cos[W(t- to)] ’o +

^W^-1

sin[W(t- to)] o,

(7)

where

W(y)

and

R2t4

C

cos(Wt)

^I

.I ^Rt2 ⁺

R2t5

W^-1

sin(Wt)

^It

.

^Rt

⁺

We can always find such nonsingular matrix X that

’0- 0 ^(for

^example,

- ^’0-).

^Inducing

X

CC -

^we^rewrite

^{Eq. (7)}

^as^follows:

,(y, t)- {X cos{W(t- to)] +

^W

- ^sin[W(t- ^t0)] ^}0.

Using the well-known trigonometric relations we obtain

,(y, t)

A(y)sin[W(t-

to) + F(y)]

o,

(8)

whereA

v/X ₊

R^-1 andF-arcctg[WX].

Equation

(8)

^is the solution of the metrics evolution equation

(6).

^It describes a complicated oscillatory dynamics of the neural network’s synaptic weights defining themetrics ofthemani- fold. Such solution is very interesting from the neuro-dynamical point of view since it allows to speak aboutexistence inthe neural networktheory ofanalogofunfading oscillatoryneocortexelectro- chemical activity, i.e. brain’s rhythms

(Haken

and Stadler,

1990).

During the training the behavior of

,(y, t)

^is

rather complicated because ofconstantly varying amplitudes, frequencies and phases of coupled harmonics in

(8).

^But in the very moment when theneural networkis trained all thesemagnitudes accept fixed values and do not vary ^{in time} any more. The network passes in aphase of unfading oscillations which parameters reflect an informa- tionstoredbyit.

VIII. CATASTROPHE

Assoonasthedynamical system

(2)

settlesdownto somefixedpoint

y,

i.e. F(y

, t)=

0, the elementsof themetric tensor

(or

^matrixofsynaptic weights)are subjected to anunbounded lineargrowth in time.

Itbecomesevident if to considerEq.

(6)

wherethe right partis set to zero.

Suchacatastrophicoutcome occursonlyif

y

is a stable fixedpoint and the"cognitive" dynamics of the neural network fades

(assume

that our brain stops functioning. It’s impossible!). Otherwise, when

y

is unstable the output signals of the network evolve endlessly and never settle down.

Thecatastrophenever occursbut anotherproblem ofeverlasting dynamics appears.

Tosolve thisproblem andtomaketheprocedure of trainingof theneural network decliningonehave to restrict a scope of synaptic weights evolution inlight ofaspecial kind of dynamical system

(2).

Oneofthe possibleways, which lies in wonderful agreementwithexperimentisto consider adynamical system displaying the Sil’nikov chaos

(Scott

and Fucks,

1995).

In this case it never actually settles inastablefixedpointatall,butcontinuously evolvesin thevicinity ofasaddlefocus.

So to avoid the catastrophe and to provide an adequatememorizationofagivensetof patternswe shouldconstructanappropriate dynamical system

(2)

exhibiting the Sil’nikov chaos and a training algorithm

(4)

insucha mannerthat

any given pattern is a stable fixed point of the map

Ctr;

any stable fixed point of the map

Gtr

^coincides

with oneof the saddle focuseslaying onhomo- clinic orbitsofthedynamical system,i.e.transfer functionsof the neuralnetwork.

Now we say that the neural network is trained when its output signal dynamicsis restrained to a vicinity ofone ofthe saddle focuses. In this very moment amplitudes, frequencies and phases of coupledharmonics in

(8)

accept"fixed"values but varyinsignificantlyin time.Thenetwork passesin a phaseofunfading slowly varyingoscillations which parameters reflectaninformation storedbyit.

(7)

IX. CONCLUSION

We tried to make a due regard for cognition in social-synergetic models ofHLPthatusedifferen- tialcalculusby introducingamathematicalcaption of cognition due to consideration ofa dynamical system embedded in a manifold with changeable metrics.

Any

dynamical system

(X,F)

evolving in the phase spaceXwithchangeableRiemannian metrics G appears to be a neural network with transfer functions Fi, i=1,... ,n, and G as the matrix of synaptic weights. ^Suchinterpretation hastwovery important consequences:

It enriches exceedingly the neural network theory by the theoretical and computational power of topology and smooth theories, category and ergodic theories, dynamical systems and fixed point theories, ^Lie algebras, phase portrait techniqueetc.

It endows social-synergetic models with extra

"cognitive"degrees of freedom givingareal pos- sibility to grasp anthropological dimension of some natural,cultural, socioeconomic, political, dynamicalandself-organizing processesetc.

When closeenoughto afixedpoint the dynamics of synaptic weights defining the metrics G is described by the system of differential equations for 2n²coupledoscillators.Wefindthis solution to be in wonderful coherence with the fact of the neocortexoscillatory activity.

The idea ofthe dynamical system embedded in the manifold with inconstant metrics plays con- siderable role in the new understanding of neural networks and the nature of training. The inter- pretation offered here doesnotapplyforgenerality andcompletenessof anexpositionof all details.Its mainpurposeis to designatethe newapproachto

comprehensionofanthropologicaldimension in social-synergeticmodels;

understanding of neural networks within the framework of the nonlinear dynamics (synergetics).

References

Akin, E. (1993). The generaltopologyofdynamical systems.

Am.Math.Soc.,261.

Buffalov,S.A. (1998).Neural networks inphysis.Deposited in RussianJ."Fizica",Tomsk, 26.05.98, 1591-A98.P.8.

Haken, H. and Stadler, M. (1990). Synergetics of^Cognition.

Springer,Berlin, p. 438.

Harper, J.R. and Mandelbaum, R. (1985). Combinatorial methods in topology and algebraic geometry. Am. Math.

Soc.,p. 349.

Hilborn, R.C. and Tufillaro, N.B. (1997) Resource letter:

nonlineardynamics. Am. J. Phys.65,822-834.

Katok,A.andHasselblatt, B.(1995).Introductiontothe Modern TheoryofDynamical Systems. CambridgeUniversityPress, Cambridge,p. 254.

Mitchell,T. (1997).Machine Training.McGraw Hill, New York, p. 414.

Petritis, D. (1995). Thermodynamic formalism of neural computing.

http://www.ma,^utexas,edu/mp_arc/mp_arc-home,html.

Pollicott, M.and Michiko,Y.(1997). Dynamical Systemsand Ergodic Theory. Cambridge University Press, Cambridge, p. 180.

Potts, P.J. (1995). The Storage Capacity ofForgetfulNeural Networks.http://xyz.lanl.gov.

Ripley, B.D.(1996).PatternRecognition and Neural Networks.

Cambridge UniversityPress,Cambridge,p. 416.

Scott, J.A. andFucks,A. (1995). Self-organizing dynamics of the human brain: critical instabilities and Sil’nikovchaos.

Chaos5(1),64-69.

Vershik, A.M. (1992). Representation theory and dynamical systems.Am.Math.Soc.,p. 267.

APPENDIX

For simplification ofEq.

(4)

^we^use

Gtr(Y)

expan- sionintheTailor series in avicinityof a fixedpoint

yr.

So

Gtr(Y f)

0 implies that the neural network istrained or metrics evolution came into a station- arystate:

’lJ--Oy oa ;

^k

(yk--yfk)+ ^...,

i,j,k--l,...,2n.

yf

We neglect by derivatives of the second and highest orders and thendifferentiatebytime.Here we usethe factthat

[see Eq. (4)]

Ot

Oy

/

After this we consider the system of ordinary differentialequations

(5).