Principal Component J~nalysis

(1)

Principal Component J~nalysis

for a Dynamical System of Emigration

Manabu Yuasa and Wasaburo Unno

Kinki University, Research Institute for Science and Technology, Higashi-Osaka, Osaka 577 Ja.pan

(Received January 17, 199ii)

Abstract

Dynamical system theory of migration is developed and applied to study causal relation among principal components describing the emigration from England and Wales, 1861-1900. The ma- terial is taken exclusively from Morgan {Appendix in Baines, 1985). Our method can elucidate driving forces of migration as far as information is included in the input data. even implicitly.

Morgan's data system is found to be controlled essentially by 4 principal components for over 90

%of its activity. The first principal component which governs about 50% of system variations, however, is the combination of several variables in the data. base, implying that Morgan's data base does not contain direct agencies responsible for emigration. The driving force of emigration depends by only about 50 % on 4 principal components describing 90 % of the system activity so that the controlling factors of the other 50 % lie in the outside of the system constructable from the data base.

Key words: Emigration, Principal component analysis, Dynamical system, Causal relation of activities

1 Introduction

In our preceding study [1], a method is developed to find causal relation among various entities related to migration by use of principal compcr nent analysis (PCA). In this paper, the data base used by Morgan [2] for the linear covariance analysis is employed to construct multi-dimensional phase space in which the dynamical system of emigration is embedded. The result of the analysis is different from Morgan in that the dynami- cal structure of emigration can be obtained in the form of a differential equation. For this purpose, multi-dimensional phase space is constructed not only with observational data hierarchy including nonlinear products but also their time derivatives.

The data are taken from Morgan's article ( Ap- pendix in Baines,1985), treating migration in Eng-

land and Wales, 1861-1900. The inclusion of the time derivatives in the extended PCA enables us to elucidate the driving agencies of the system.

In Morgan [2], the data base contains: X₁= (Age), X₂= (.~Agric), X₃= (ilLiteracy), X₄= (Ll Wages), X₅=(Urban), and X₆

=

^{(Mlag), Y}1

=

(male overseas emigrant),

Y2 =

(female overseas emigrant), Y₃ = (male internal imigrant), Y₄

=

(female internal imigrant). The meaning of these variables should be referred to Morgan in Baines [2] (see, aiso Unno, Yuasa and Onishi [1]). These data are given for 52 counties and for each of 4 decades from 1861 to 1900. Therefore, these ten variables and their combinations are considered as the state variables (like pressure, density, temper-

- 67-

(2)

ature, and velocity components in fluid dynamics) describing the state of each county at each decade.

In other words, these variables and their combinations form a vector phase space in which dynamical state of the system is embedded. For complete embedding without ambiguity, however, (2n+1)- dimensional space is needed for embedding a n- dimensional system. Since n may be fairly large, we better increase the number of variables by tak- ing not only the laged emigration, X₆= (Mlag),

but also 2nd order quantities:

zl

⁼

r-?, z2

⁼

^YiY2,

Z3 =

YiY3,

Z4 = Y1Y_{4 ,} Z₅= Y1^X^{6 ,}etc. as state variables constructing a state vector.

We are now interested mostly in how the rate of change of emigration 8t(Y₁+ Y2) depends on state variables Xi, }j and Zk· The extended principal component analysis developed in the previous paper [1] is appropriate to establish such a dynamical system formalism.

2 Standardization of Variables

Variables should be normalized in the PCA, since we are interested in deriving causal relations.

Normalized variables xi, Yi' Zk are given by the following formulae:

Y. - (}j- (}j))

] - _Uyj

'

Zk

=

^(Zk- ^(Zk))' ⁽¹⁾

azk

where (Xi) and axi etc. denote the average and the variance of Xi etc., respectively. The values of these normalized variables should be given for every county and for every decade. Extrapolation and/ or interpolation may be used to evaluate the normalized variables, if the data are not available at the right epoch. The multiplicity can be re- garded as (m₁+ m2) for all variables, because in this paper we are investigating- the net overseas

emigration, Y(= Y1 + Y_2),under the background of all variables. Then the average and variance are given for

xi

^by

52 4 52 4

(Xi)= L:~:)m1+m2)XdL:L:(m1+m2), (2)

c=lp=l c=lp=l

and

[ ^E~!l ^E!=l

^(m1

⁺

m2)(Xi- (Xi) )2 ]112

axi

=

52 " 4 ( ) '

:Ec=l L..tp=l m1 + m2

(3) where summation extends for counties from c = 1 to 52 and for epochs from p = 1 to 4. Similar equations hold for }j and Zk· Thus, we have all the variables normalized for each of 52 counties and for each decade of 4 epochs, and 208 points can, therefore, be plotted on the phase space of 15 dimensions, (x1 ,.,x6, Y1 ,.,y4, z1 ,.,z5).

3 Principal Component Analysis

Each emigrant plots one point representing his background of emigration on the 15-D phase space. Then, each of the above 208 points should have the multiplicity (m1 + m2). The 15-D ellip- soidal distribution approximates the distribution of these points in the phase space, and the principal axes of ellipsoid correspond the principal components. The principal axes are, therefore, spec- ified by the eigen-values of the matrix of correla-

tions that are taken for all combinations of two coordinates among (xi, Yil zk)·

Let qn(n=1, .. ,15) be the unified symbol for

Xi, Yi, and Zk in this order. A set of values of qn(n=1, .. ,15) are obtained for 208 cases (denoted by c=1 to c=208) of different counties and epochs to which emigrants belong with the multiplicity m1 + m2 or the weight ^We given by We = (m1 + m2)/ L:~~⁸

1

^(m1+ m2)· Note that these qn's

-68-

(3)

have been normalized so that the average of qn are zero: (qn) = L:~~⁸

1

^wcq~c)= 0, and the variance are unity· _•a_qn² = ""208 w _L.,..c=1 _c[q(c)] 2 _n = 1 for all n. The (iJ') element of the correlation matrix, rii is given by

(4) Eigen-values Ao: and the corresponding eigen- vectors J..l(o:) are defined by the following set of linear equations,

15

direction-cosine of the principal component vector J..l(o:) with respect to the qcaxis. The value of a principal component Po: of a point qn(n=1, .. ,15) in space is, therefore, given by

p~) =

L

15 J..l~o:)q~c) (a= 1, .. , 15). (7)

n=l

The importance of a principal component ^Po:is judged from the value of the eigen-value Ao: being larger or smaller than unity. This is because the eigen-value is the variance of the principal compo-

"'"' (a:) - A (a:)

LJ TijJ..lj - o:J..li ' (i

=

1, .. , 15), (5) nent as shown by the expression,

j=1

having nontrivial solutions under the fulfillment of

the characteristic equation, ^c=1

(8)

i=1 c=1

(6) which measures the contribution of the principal component Po: in controlling the system. We note here that L:~~

1

Ao: = 15, as derived from the trace of the matrix in equation(6).

det being the determinant.

The principal component axes are the eigen- vectors J..l(o:) belonging to the roots Ao: (a = 1, .. , 15) of the characteristic equation, and the component J..l~o:), (L:i!dJ..L~o:))² = 1), defines the

The results are as follows, eigen values:

A1 = 7.581, A2 = 3.521, A₃= 1.322, A4 = 1.451, A₅= 0.404, A6 = 0.322, A7 = 0.206, As = 0.194, A9 = 0.126, A10 = 0.071, A11 = 0.050, A12 = 0.035, A13 = 0.017, A14 = 0.003, A15 = 0.002, eigen vectors corresponding to the 4 largest eigen values:

0.159, 0.327, 0.332, 0.188, -0.179, -0.169, 0.548, 0.084, 0.087, -0.382, -0.008, -0.107,

0.113, 0.281, 0.332, 0.409, -0.185, -0.191, 0.252, -0.034, -0.031, 0.081, -0.096, -0.004,

-0.211, 0.111, 0.333, -0.111, -0.215, 0.053, 0.151, -0.224, -0.261, 0.819, 0.134, -0.086,

-0.211, 0.115, 0.335, -0.290, 0.438, 0.045, -0.023, -0.367, -0.253, -0.328, -0.004, -0.085,

0.288, 0.167, 0.327 ), -0.334,

0.432, -0.166 ),

-0.333, -0.371, 0.126 ), -0.072, -0.014, -0.106 ).

ditional insignificant one, we obtain the following eigen values:

concluding equation,

lS

8tqo =

L

^anqn,

n=l

^0.577,

eigen vectors:

0.116, 0.577, -0.706 ), 0.737, -0.009, -0.001 ), 0.570, 0.096, -0.002 ), -0.321, 0.570, -0.001 ), 0.118, 0.577, 0.708 ).

(12)

(13)

If we neglect As ,which corresponds to the new additional principal component, the following concluding relation is obtained.

8tqo

=

0.220q1 -0.047q2 -0.683q3 +0.246q4 +0.074qs

-0.084q6 -0.007q7 -0.153qg +0.121qg +0.127q10 (14) -0.014qu -0.080q12 +0.061q13 +0.058q14 -0.019qls·

The largest driving force for the emigration

(=A/

../2). Consequently, the driving force of seems to be q₃(Literacy). If q₃decreases, namely emigration depends by only about 50% on 4 prin- Literacy increases, the rate of change of emigra- cipal components describing 90% of the system ac- tion increases. Other small driving forces can tivity so that the controlling factors of the other also be interpreted similarly. But the value of 50% lie in the outside of the system constructable As is rather large and the error derived from the from the data base.

neglection of As is estimated to be about 50%

References

[1] Unno,W., Yuasa,M. and Onishi,T. Dynamical System Theory of Migration, Science and Technol- ogy, Kinki University, No.8, 1996.

[2] Morgan,M. in Migration in a Mature Economy: Emigration and Migration in England and Wales, 1861-1900 by B.Dudley, (Cambridge University Press, 1985).

-70-

Principal Component J~nalysis