• 検索結果がありません。

TA1 最近の更新履歴 Econometrics Ⅰ 2016 TA session

N/A
N/A
Protected

Academic year: 2018

シェア "TA1 最近の更新履歴 Econometrics Ⅰ 2016 TA session"

Copied!
11
0
0

読み込み中.... (全文を見る)

全文

(1)

TA session #1

Shouto Yonekura

April 17, 2016

Contents

1 Probability 1

2 Random variables, Probability density functions and Distribution functions 2

3 Independence 4

4 Moments 6

5 Moment generating functions 8

6 Texts 11

1 Probability

Let Ω be an abstract (nonempty) space. Let F ∈ 2, where 2is a power set of Ω. Def 1.1

F is a σ-algebra if it satisfies (1) Ω ∈ F

(2) A ∈ F ⇒ Ac∈ F

(3) A1, A2,· · · An,· · · ∈ F =⇒ ∪i=1Ai∈ F.

We call (Ω F) a mesurable space. Next, we define a probability measure. Def 1.2

A probability measure is a funcation P : F → [0, 1] such that (1) P (Ω) = 1

(2) A1, A2,· · · An,· · · ∈ F, Ai∩ Aj = φ ∀i 6= j =⇒Pi=1P(Ai) = P (∪i=1Ai)

(2) above is called countable additivity and P (A) is called the probability of the event A ∈ F. We call (Ω F P ) a probability space.

Thm 1.3

(2)

Let (Ω F P ) be a probability space and A, B ∈ F. (1) P (Ac) = 1 − P (A)

(2) P (φ) = 0

(3) A ⊂ B =⇒ P (A) ≤ P (B) (4)Pi=1P(Ai) ≥ P (∪i=1Ai)

(5) ∀n, An≤ An+1=⇒ limn→∞P(An) = P (∪nAn)

(6) ∀n, An≥ An+1 and P (A1) < ∞ =⇒ limn→∞P(An) = P (∩nAn) Proof

(1) Since P (Ω) = 1 and Ω = A ∪ Ac,we get 1 = P (Ω) = P (A ∪ Ac) = P (A) + P (Ac). (2) SinceΩc = φ ∈ F, we get P (φ) = 1 − P (Ω) = 0.

(3) Let A ⊂ B. Then P (B) = P (A ∪ (B − A)) = P (A) + P (B − A). Thus P (A) ≤ P (B).

(4) Set B1:= A1,· · · Bn:= A1−∪n−1i=1Bi(n ≥ 2). Then ∀n, {Bn} are disjoint and ∪iAi = ∪iBi. ThereforeP (∪iAi) = P(∪iBi) =PiP(Bi) ≤PiP(Ai).

(5) Set B1= A1 and Bi = Ai− Ai−1 for i > 1. Then we get ∪iAi = ∪iBi and ∀n, {Bn} are disjoint. Thus P(∪iAi) = P (∪iBi)

=PiP(Bi)

= lim

n→∞

Pn i P(Bi)

= lim

n→∞P(∪ niBi)

= lim

n→∞P(An).

(6) Left as an exercise. Q.E.D

2 Random variables, Probability density functions and Distribution func-

tions

We call the samllest σ−algebra which contains all interval in R a Boral algebra denoted by B(R). Def 1.4

Let (Ω F P ) be a probability space. We say X is a random variable if it satisfies

∀A ∈ B(R), X−1(A) := {ω ∈ Ω; X(ω) ∈ A} ∈ F. In addition,

PX(A) := P (X−1(A)), A ∈ B(R) is called a probability distribution.

Note that PX is also a probability measure on (R B(R)) since

PX (∪nAn) = P (X−1(∪nAn))

= P(∪nX−1(An))

= PnP(X−1(An))

=PnPX(An)

(3)

holds for disjoint {An} ∈ B(R) and obviously satisfies (1) and (2) of Def 2.1. Def 1.5

Let (Ω F P ) be a probability space.

F(x) := P (X ≤ x) = PX((−∞ x]) is called the (cumulative) distribution function of the random variable X. Prop 1.6

F(x) satisfies following properties: (1) x1≤ x2=⇒ F (x1) ≤ F (x2)

(2) limn→∞F(xn) = 1, limn→−∞F(xn) = 0 (3) Right continuous

(4) 0 ≤ F (x) ≤ 1 Proof

(1) If x1≤ x2 then (−∞ x1] ⊂ (−∞ x2) holds. Thus it follows from Thm 1.3 (2) Since R = ∪n(−∞ n], using Thm 1.3 implies

1 = PX(R) = PX(∪n(−∞ n]) = limn→∞PX((−∞ n]) = limx→∞F(x).

(3) Since

F(+x) := limy↓xF(y) = limn→∞F(x + 1n),

= limn→∞PX((−∞ x +n1]), and ∩n(−∞ x + 1n] = (−∞ x), it follows by Thm (6) of thm 1.6.

(4) Obvious by the definition. Q.E.D

If a randon variable X takes only countable values, then we say PX of X is discrete and p(xi) := P (X = xi)

= PX({xi}) i = 1, 2, · · · is called a probability funcation. By the definition, p(xi) satisfies

(1) p(xi) ≥ 0 , i = 1, 2, · · · , (2) P p(xi) = 1,

(3) F(x) =Pk:xk≤xp(xk) ∀x ∈ R.

If the values X will take are continuous, then we say PX of X is continuous and a function f (x) such that P(X ≤ x) = F (x) :=´−∞x f(t)dt, ∀x ∈ R

is called a probability density function. It is obvious to see that f (x) satisfies (1) dF(x)dx = f (x),

(2) f (x) ≥ 0 , ∀x ∈ R, (3) ´−∞ f(x)dx = 1.

(4)

Next we extend above to X ∈ Rn. Let X = (X1, X2,· · · Xn) be a random vector on (Ω, F) and its disribution is PX. For any x = (x1· · · , xn) ∈ Rn, we define

F(x) := P ({X1≤ x1,· · · Xn≤ xn}),

= PX(Qi(−∞ xi]),

and we call this (n-dimensional) a joint distribution of X. If X is discrete, then p(xi) := P ({X = xi})

= PX({xi}), i = 1, 2, · · · and is continuous, then

F(x) =´−∞x1 · · ·´−∞xn f(t1,· · · , tn)dx1· · · dxn,∀x ∈ Rn. f(x) = ∂x∂F1···∂x(x)n is also satisfied in the continuous case.

3 Independence

Def 1.7

Let X and Y be are randon variables on (Ω F). If

∀A, ∀B ∈ B(R), P ({(X ∈ A) ∩ (Y ∈ B)}) = P ({X ∈ A}) × P ({Y ∈ B}) holds, then we say X and Y are independent.

Recall that the conditional probability of A given B is given by P (A | B) := P(A∩B)P(B) . If A and B are independent, then we get

P(A | B) := P(A∩B)P(B)

=P(A)P (B)P(B) P(A).

Thm 1.8

X and Y are independent ⇐⇒FX,Y(x, y) = FX(x)FY(y), where FX,Y is the joint distribution of X and Y . Proof

Without loss of generality, we only consicer one demensional case

=⇒By the definition of independence, we get

FX,Y(x, y) = P (X ≤ x, Y ≤ y)

= P (X ∈ (−∞, x])P (Y ∈ (−∞, y])

= P (X ≤ x)P (Y ≤ y) FX(x)FY(y),

(5)

for any x, y ∈ B(R).

⇐= Fix I = (a1, b1], (a2, b2] on R. Then we get

P(X ∈ (a1b1], Y ∈ (a2b2])

= P ((X, Y ) ∈ I)

= ∆2IF(X,Y )(x, y)

= F(X,Y )(b1b2) − F(X,Y )(b1a2) − F(X,Y )(a1b2) + F(X,Y )(a1a2)

1(a1,b1]FX(x) × ∆1(a2,b2]FY(y)

P(X ∈ (a1b1])P (Y ∈ (a2b2]) Q.E.D

Thm1.9

Let X = (X1, X2,· · · Xn) be a discrete or continuous random vector. (X1, X2,· · · Xn) are indepedent ⇐⇒

(p(X1,···Xn)(xk11,· · · xnkn) =QipXi(xkii), ∀ki= 1, 2 · · · , i = 1, 2, · · · n f(X1,···Xn)(x1· · · xn) =QifXi(xi), ∀x ∈ Rn.

Proof (n=2) Discrete

=⇒Let be{(xi, yj) | i, j = 1, 2 · · · } values of X and Y . By the definition we get p(X,Y )(xiyj) = P (X = xi, Y = yj)

= P (X = xi)P (Y = yj)

= pX(xi)pY(yj)

⇐= FixA, B ∈ B(R).Then we get,

P(X ∈ A, Y ∈ B) = P ((X, Y ) ∈ A × B),

=P(k,l):(xkyl)∈A×BP((X, Y ) = (xkyl)), P

(k,l):(xkyl)∈A×BP(X = xk, Y = yl) ,

=P(k,l):(x

kyl)∈A×Bp(X,Y )(xkyl),

=P(k,l):(xkyl)∈A×BpX(xk)pY(yl) P(X ∈ A)P (Y ∈ B).

Continuous

=⇒ For anyai< bi, we can show that

´b1

a1

´b2

a2 f(X,Y )(x, y)dxdy = P (X ∈ (a1b1], Y ∈ (a2b2]),

= P (X ∈ (a1b1])P (Y ∈ Y ∈ (a2b2]), (´ab11fX(x)dx)(´ab22fY(y)dy),

´b1

a1 fX(x)dx

´b2

a2 fY(y)dy,

⇐⇒ f(X,Y )(x, y) = fX(x)fY(y),

(6)

hold∀x, y ∈ R,

⇐= ∀a1, a2∈ R,we observe that

F(X,Y )(a1a2) =´−∞a1 ´−∞a2 f(X,Y )(x, y)dx dy

=´−∞a1 ´−∞a2 fx(x)fY(y)dxdy (´−∞a1 fX(x)dx)(´−∞a2 fY(y)dy)

FX(a1)FY(b1) Q.E.D

4 Moments

The expecattion of a random variable X denoted µ is given by

E[X] := (P

xxp(x),

´ xf (x)dx, and the variance of it denoted σ2 is given by

V[X] := (P

x(x − E[X])2p(x),

´ (x − E[x])2f(x)dx. Let g() be a (measureble) function. Then its expecation is given by

E[g(X)] := (P

xg(x)p(x)

´ g(x)f (x)dx. The expecation E[X] haslinearity, that is

E[ag(X) + bh(X)] = aE[g(X)] + bE[b(X)], for any a, b ∈ R. We often use following formula for calculating a variance:

σ2= E[(X − µ)2] = E[X2− 2µX + µ2]

= E[X2] − 2µE[X] + µ2

= E[X2] − E[X]2. We also easily observe that

V[a + bX] = E[(a + bX − E[a + bX])2]

= E[(a + bX − a + bE[X])2]

= E[b2X2− 2b2XE[X] + b2E[X]2]

= b2E[X2] − b2E[X]2

= b2(E[X2] − E[X]2) b2V[X],

(7)

for any a, b ∈ R. We call

µk = E[Xk], µk= E[(X − µ)k],

k-th moment and k-th deviation (from mean) respectively. We can show that following formula for calculationg µk

and µk:

(X − µ)k =

 k 0



µ0Xk

 k 1



µ1Xk−1+

 k 2



µ2Xk−2− · · · +

 k

k− 1



µk−1X1+ (−1)k

 k k

 µkX0

= Xk− kµXk−1+k(k−1)2 µ2Xk−2− · · · + kµk−1X+ (−1)kµk. thus we get,

µk= E[Xk] − kµE[Xk−1] +k(k−1)2 µE[Xk−2] − · · · + kµk−1E[X] + (−1)kµk.

= µk− kµ1µk−1+k(k−1)21)2µk−2− · · · + k(µ1)k−1µ

1+ (−1)k1)k.

For example, in the case of µ2, µ3and µ4 , we get

µ2= E[(X − µ)2] = E[X2] − E[X]2

= µ2− (µ1)2, and

E[(X − µ)3] = µ3− 3µ1µ2+ 2(µ1)3,

= E[X3] − 3E[X]E[X2] + 2(E[X])3, E[(X − µ)4] = µ4− 4µ1µ3+ 6(µ1)2µ2− 3(µ1)4.

= E[X4] − 4E[X]E[X3] + 6E[X]2E[X2] − 3E[X]4.

The covariance of X and Y is defined by

Cov[X, Y ] := E[(X − E[X])(Y − Y [X])], and also correlation is given by

ρ(X, Y ) := √Cov[X,Y ]

V[X]V[Y ].

Thm 1.10

(1)X and Y are independent =⇒ ρ(X, Y ) = 0. (2) −1 ≤ ρ ≤ 1.

Proof

(8)

(1) We can rewrite

Cov[X, Y ] = E[E(X − µX)E(Y − µY)],

= E[XY − µXE[Y ] − Xµy+ µXµY],

= E[XY ] − E[X]E[Y ], and E[XY ] = E[X]E[Y ] (by assumption), we get Cov[X, Y ] = 0.

(2) The Cauchy–Schwarz inequality implies

|Piaibi|≤pPia2i

pP

ib2i.

Thus if we set ai= xi− µx, bi= yi− µy, then we get

| Cov(X, Y ) |2≤ V (X)V (Y ), Q.E.D

5 Moment generating functions

Def 1.11

Let X a discrete or continuous random variable. We call

M(t) := E[etX] =Px=0etxp(x) M(t) := E[etX] =´−∞ etxf(x)dx the moment generationg function of X.

Calculating moments using m.g.f;:

dkM(t)

dtk |t=0 = E[X

k], k = 1, 2, · · ·

dM(t)

dt =

d dt(

ˆ

−∞

etxf(x)dx)

= ˆ

−∞

∂te

txf(x)dx

= ˆ

−∞

xetxf(x)dx

dM(t)

dt |t=0 =

ˆ

−∞

xf(x)dx = E[X] d2M(t)

dt2 |t=0= ˆ

−∞

x2etxf(x)dx = E[X2]

V[X] = E[X2] − E[X]2=d

2M(t)

dt2 |t=0−( dM(t)

dt |t=0)

2

(9)

Example

X is normal distribution if it has following p.d.f:

f(x) =´−∞ 2πσ1 2e(x−µ)22σ2 dx. Then its m.g.f is given by

M(t) = ˆ

−∞

etx

√2πσ2e

(x−µ)22σ2 dx,

and could be calculated as follows:

12(x2− 2µx + µ2− 2σ2tx).

= 12(x2− 2x(µ + σ2t) + (µ + σ2t)2 ) +(µ + σ

2t)2

2 µ22

= 12(x2− µ)2+ µt + σ22t2, thus we get

M(t) = eµt+σ2t22 ˆ

−∞

√ 1 2πσ2e

(x−µ

)2 2σ2 dx

| {z }

=1

= eµt+σ2t22 .

Using this m.g.f, the mean and the variance of a normal distribution can be calculated as follows:

dM(t)

dt |t=0= exp(µt + σ2t2

2 )(µ + σ 2t) |

t=0

= µ = E[X]

d2M(t)

dt2 |t=0= (σ

2exp(µt +σ2t2

2 ) + exp(µt + σ22t2)(µ + σ2t)2) |t=0

= σ2+ µ2= E[X2] V[X] = d2dtM(t)2 |t=0−(dM(t)dt |t=0)2

= σ2+ µ2− µ2

= σ2.

Example

Next we calculate moments of a binomial distribution. First we derive the m.g.f ot it as follows: M(t) =Px etx

 n x



pxqn−x(q∗ = 1 − p)

=Px

 n x



(etp)xqn−x

= (etp+ q)n.

(10)

Using this gives

dM(t)

dt |t=0= n(etp+ q)n−1etp|t=0

= np = E[X]

d2M(t)

dt2 |t=0= (n(etp+ q)n−1etp+ n(n − 1)(etp+ q)n−2(etp)2) |t=0

= np + n(n − 1)p2

V[X] = d2dtM(t)2 |t=0−(dM(t)dt |t=0)2

= np + n(n − 1)p2− n2p2

= np(1 − p) npq.

Thm 1.12

Let X and Y be are independent random variables and defineZ := X + Y . Then the m.g.f of Z is given by MZ(t) = MX(t)MY(t).

Moreover, if {Xt} are independent and Z :=Pni Xi, then the m.g.f of Z is given by

MZn(t) = Yn i=1

MXi(t).

Proof

We only consider the case of n = 2. Let X and Y be are independent random variables. Then etX and etY are also independent. Thus we get

MZ(t) = E[etZ]

= E[et(X+Y )]

= E[etXetY]

= E[etX]E[etY]

= MX(t)MY(t) as required. Q.E.D

Thm 1.13

Let FX be a distribution function of X andMX(t) be a its m.g.f. Then for any continuous points a < b of FX, FX(b) − FX(a) = limT →∞ 1 ´−TT exp(−bt)−exp(−at)

−t MX(t)dt

holds.

I will skip the proof of this theorem. However, this theorem says that analysing some distributions is equiva- lent to analysing its m.g.f.

(11)

6 Texts

• Statistics

Yong and Smith (2010) Essentials of Statistical Inference Keener (2010) Theoretical Statistics: Topics for a Core Course

• Linear algebra

Axler, Sheldon. (2006) Linear Algebra Done Right (Undergraduate Texts in Mathematics) Harville David. (1997) Matrix Algebra From a Statistician’s Perspective

• Measure theory and Probability theory

Capinski Marek and Kopp Peter. (2008) Measure, Integral and Probability Rosenthal (2006) A First Look at Rigorous Probability Theory

参照

関連したドキュメント

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

With this goal, we are to develop a practical type system for recursive modules which overcomes as much of the difficulties discussed above as possible. Concretely, we follow the

1,2 Extensive research by Negishi showed that the best results (reaction rate, yield, and stereoselectivity) are obtained when organozincs are coupled in the presence of Pd

The intention of this work is to generalise the limiting distribution results for the Steiner distance and for the ancestor-tree size that were obtained for the special case of

2008年 2010年 2012年 2014年 2016年 2018年 2020年

Tokyo Electric Power Company Annual Report 2010.. Rising awareness of global warming has created new social expectations for TEPCO. We are conscious of global warming as a

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

教職員用 平均点 保護者用 平均点 生徒用 平均点.