TA1 最近の更新履歴 Econometrics Ⅰ 2016 TA session

(1)

TA session #1

Shouto Yonekura

April 17, 2016

1 Probability

Let Ω be an abstract (nonempty) space. Let F ∈ 2^Ω^{, where 2}^Ωis a power set of Ω. Def 1.1

F is a σ-algebra if it satisfies (1) Ω ∈ F

(2) A ∈ F ⇒ A^c∈ F

(3) A1, A2,_{· · · A}_n,· · · ∈ F =⇒ ∪^∞i=1^Ai∈ F.

We call (Ω F) a mesurable space. Next, we define a probability measure. Def 1.2

A probability measure is a funcation P : F → [0, 1] such that (1) P (Ω) = 1

(2) A1, A2,_{· · · A}n,· · · ∈ F, Ai∩ Aj = φ ∀i 6= j =⇒^P^∞i=1^P^(Aⁱ) = P (∪^∞i=1^Aⁱ⁾

(2) above is called countable additivity and P (A) is called the probability of the event A ∈ F. We call (Ω F P ) a probability space.

Thm 1.3

(2)

Let (Ω F P ) be a probability space and A, B ∈ F. (1) P (A^c) = 1 − P (A)

(2) P (φ) = 0

(3) A ⊂ B =⇒ P (A) ≤ P (B) (4)^P^∞_i=1P(Ai) ≥ P (∪^∞i=1^Ai)

(5) ∀n, An≤ Aⁿ⁺¹=⇒ limn→∞^P(Aⁿ) = P (∪n^An)

(6) ∀n, An≥ Aⁿ⁺¹ ^{and P (A}¹) < ∞ =⇒ limn→∞^P^(Aⁿ) = P (∩n^An) Proof

(1) Since P (Ω) = 1 and Ω = A ∪ A^c^,we get 1 = P (Ω) = P (A ∪ A^c) = P (A) + P (A^c). (2) SinceΩ^c = φ ∈ F, we get P (φ) = 1 − P (Ω) = 0.

(3) Let A ⊂ B. Then P (B) = P (A ∪ (B − A)) = P (A) + P (B − A). Thus P (A) ≤ P (B).

(4) Set B1:= A1,_{· · · B}n:= A1_−∪ⁿ⁻¹_i=1Bi(n ≥ 2). Then ∀n, {Bⁿ} are disjoint and ∪ⁱ^Aⁱ = ∪ⁱ^Bⁱ. ThereforeP (∪ⁱ^Aⁱ^{) =} P_(∪iBi) =^P_iP(Bi) ≤^P_i^P^(Ai).

(5) Set B1= A1 and Bi = Ai− Ai−1 for i > 1. Then we get ∪iAi = ∪iBi and ∀n, {Bn} are disjoint. Thus P_(∪iAi) = P (∪iBi)

=^P_iP(Bi)

= lim

n→∞

Pn i ^P^(Bⁱ⁾

= lim

n→∞^P^(∪ ni^Bi)

= lim

n→∞^P^(Aⁿ^).

(6) Left as an exercise. Q.E.D

2 Random variables, Probability density functions and Distribution func-

tions

We call the samllest σ−algebra which contains all interval in R a Boral algebra denoted by B(R). Def 1.4

Let (Ω F P ) be a probability space. We say X is a random variable if it satisfies

∀A ∈ B(R), X⁻¹(A) := {ω ∈ Ω; X(ω) ∈ A} ∈ F. In addition,

P^X(A) := P (X⁻¹(A)), A ∈ B(R) is called a probability distribution.

Note that P^X is also a probability measure on (R B(R)) since

P^X _(∪_nA_n) = P (X⁻¹_(∪n^An))

= P_(∪nX⁻¹(An))

= ^P_nP(X⁻¹(An))

=^P_nP^X(An)

(3)

holds for disjoint {An} ∈ B(R) and obviously satisfies (1) and (2) of Def 2.1. Def 1.5

Let (Ω F P ) be a probability space.

F(x) := P (X ≤ x) = P^X((−∞ x]) is called the (cumulative) distribution function of the random variable X. Prop 1.6

F(x) satisfies following properties: (1) x1≤ x2=⇒ F (x1) ≤ F (x2)

(2) lim_n→∞F(xn) = 1, lim_n→−∞F(xn) = 0 (3) Right continuous

(4) 0 ≤ F (x) ≤ 1 Proof

(1) If x1_{≤ x}2 _{then (−∞ x}1_{] ⊂ (−∞ x}2) holds. Thus it follows from Thm 1.3 (2) Since R = ∪n(−∞ n], using Thm 1.3 implies

1 = P^X(R) = P^X(∪n(−∞ n]) = limn→∞^P^X((−∞ n]) = limx→∞^F^(x).

(3) Since

F(+x) := lim_y↓xF(y) = lim_n→∞F(x + ¹_n),

= lim_n→∞P^X_{((−∞ x +}_n¹]), and ∩n(−∞ x + ¹_n] = (−∞ x), it follows by Thm (6) of thm 1.6.

(4) Obvious by the definition. Q.E.D

If a randon variable X takes only countable values, then we say P^X of X is discrete and p(xi) := P (X = xi)

= P^X_({xi}) i = 1, 2, · · · is called a probability funcation. By the definition, p(xi) satisfies

(1) p(xi) ≥ 0 , i = 1, 2, · · · , (2) P p(xi) = 1,

(3) F(x) =^P_k:x_k_≤xp(xk) ∀x ∈ R.

If the values X will take are continuous, then we say P^X of X is continuous and a function f (x) such that P(X ≤ x) = F (x) :=^´_−∞^x ^f(t)dt, ∀x ∈ R

is called a probability density function. It is obvious to see that f (x) satisfies (1) ^dF(x)_dx = f (x),

(2) f (x) ≥ 0 , ∀x ∈ R, (3) ^´_−∞^∞ f(x)dx = 1.

(4)

Next we extend above to X ∈ Rⁿ. Let X = (X1, X2,_{· · · X}n) be a random vector on (Ω, F) and its disribution is P^X. For any x = (x1· · · , xn) ∈ Rⁿ, we define

F(x) := P ({X¹≤ x¹^,· · · Xn≤ xn}),

= P^X(^Q_i_{(−∞ x}i]),

and we call this (n-dimensional) a joint distribution of X. If X is discrete, then p(xi) := P ({X = xi})

= P^X_({xi}), i = 1, 2, · · · and is continuous, then

F(x) =^´_−∞^x¹ _{· · ·}^´_−∞^xⁿ f(t1,_{· · · , t}_n)dx1_{· · · dx}n^,∀x ∈ Rⁿ^. f(x) = _∂x^∂F₁_···∂x^(x)_n is also satisfied in the continuous case.

3 Independence

Def 1.7

Let X and Y be are randon variables on (Ω F). If

∀A, ∀B ∈ B(R), P ({(X ∈ A) ∩ (Y ∈ B)}) = P ({X ∈ A}) × P ({Y ∈ B}) holds, then we say X and Y are independent.

Recall that the conditional probability of A given B is given by P (A | B) := ^P(A∩B)_P(B) . If A and B are independent, then we get

P_{(A | B) :=} ^P^(A∩B)_P(B)

=^{P(A)P (B)}_P(B) P(A).

Thm 1.8

X and Y are independent ⇐⇒FX,Y(x, y) = FX(x)FY(y), where FX,Y is the joint distribution of X and Y . Proof

Without loss of generality, we only consicer one demensional case

=⇒By the definition of independence, we get

F_X,Y(x, y) = P (X ≤ x, Y ≤ y)

= P (X ∈ (−∞, x])P (Y ∈ (−∞, y])

= P (X ≤ x)P (Y ≤ y) F_X(x)FY(y),

(5)

for any x, y ∈ B(R).

⇐= Fix I = (a1, b1], (a2, b2] on R. Then we get

P_{(X ∈ (a}1b1_{], Y ∈ (a}2b2])

= P ((X, Y ) ∈ I)

= ∆²_IF_{(X,Y )}(x, y)

= F(X,Y )(b1b2_{) − F}(X,Y )(b1a2_{) − F}(X,Y )(a1b2) + F(X,Y )(a1a2)

∆¹_(a₁_,b₁_]FX(x) × ∆¹_(a₂_,b₂_]^FY(y)

P_{(X ∈ (a}1b1])P (Y ∈ (a2b2]) Q.E.D

Thm1.9

Let X = (X1, X2,_{· · · X}n) be a discrete or continuous random vector. (X1, X2,_{· · · X}_n) are indepedent ⇐⇒

(p_(X₁_,···X_n₎(x^k₁¹,_{· · · x}_n^kⁿ) =^Q_ip_X_i(x^k_iⁱ_{), ∀k}i= 1, 2 · · · , i = 1, 2, · · · n f(X1_,···Xn)(x1· · · xn) =^Q_ifXi^(xi), ∀x ∈ Rⁿ^.

Proof (n=2) Discrete

=⇒Let be{(xi^{, y}j) | i, j = 1, 2 · · · } values of X and Y . By the definition we get p_{(X,Y )}(xiyj) = P (X = xi, Y = yj)

= P (X = xi)P (Y = yj)

= pX(xi)pY(yj)

⇐= FixA, B ∈ B(R).Then we get,

P(X ∈ A, Y ∈ B) = P ((X, Y ) ∈ A × B),

=^P_(k,l):(x_k_y_l_)∈A×BP((X, Y ) = (xk^yl)), P

(k,l):(xkyl)∈A×B^{P(X = x}^k^{, Y} ^{= y}^l⁾ ^,

=^P_(k,l):(x

kyl)∈A×B^p^{(X,Y )}^(x^k^y^l^),

=^P_(k,l):(x_k_y_l_)∈A×Bp_X(xk)pY(yl) P(X ∈ A)P (Y ∈ B).

Continuous

=⇒ For anyaⁱ^{< b}ⁱ, we can show that

´b1

a1

´b2

a2 ^f^{(X,Y )}(x, y)dxdy = P (X ∈ (a1b1], Y ∈ (a2b2]),

= P (X ∈ (a1b1])P (Y ∈ Y ∈ (a2b2]), (^´_a^b₁¹fX(x)dx)(^´_a^b₂²fY(y)dy),

´b1

a1 ^f^X^(x)dx

´b2

a2 ^f^Y^(y)dy,

⇐⇒ ^f(X,Y )(x, y) = fX(x)fY(y),

(6)

hold∀x, y ∈ R,

⇐= ∀a1, a2∈ R,we observe that

F(X,Y )(a1a2) =^´_−∞^a¹ ^´_−∞^a² f(X,Y )(x, y)dx dy

=^´_−∞â¹ ^´_−∞â² fx(x)fY(y)dxdy (^´_−∞â¹ f_X(x)dx)(^´_−∞â² f_Y(y)dy)

FX(a1)FY(b1) Q.E.D

4 Moments

The expecattion of a random variable X denoted µ is given by

E[X] := (P

x^xp(x),

´ xf (x)dx, and the variance of it denoted σ² is given by

V[X] := (P

x(x − E[X])²^p(x),

´ (x − E[x])²^f^(x)dx. Let g() be a (measureble) function. Then its expecation is given by

E[g(X)] := (P

x^g(x)p(x)

´ g(x)f (x)dx. The expecation E[X] haslinearity, that is

E[ag(X) + bh(X)] = aE[g(X)] + bE[b(X)], for any a, b ∈ R. We often use following formula for calculating a variance:

σ²= E[(X − µ)²^{] = E[X}²− 2µX + µ²^]

= E[X²] − 2µE[X] + µ²

= E[X²_{] − E[X]}². We also easily observe that

V[a + bX] = E[(a + bX − E[a + bX])²^]

= E[(a + bX − a + bE[X])²^]

= E[b²X²_{− 2b}²XE[X] + b²E[X]²]

= b²E[X²_{] − b}²E[X]²

= b²(E[X²_{] − E[X]}²) b²V[X],

(7)

for any a, b ∈ R. We call

µ^′_k = E[X^k], µk= E[(X − µ)^k^],

k-th moment and k-th deviation (from mean) respectively. We can show that following formula for calculationg µk

and µ^′_k:

(X − µ)^k ⁼

k 0

µ⁰X^k₋

k 1

µ¹X^k−1+

k 2

µ²X^k−2_{− · · · +}

k

k_{− 1}

µ^k−1X¹_{+ (−1)}^k

k k

µ^kX⁰

= X^k_{− kµX}^k−1+^k^(k−1)₂ µ²X^k−2− · · · + kµ^k−1^X+ (−1)^k^µ^k^. thus we get,

µk= E[X^k_{] − kµE[X}^k−1] +^k(k−1)₂ µE[X^k−2] − · · · + kµ^k−1^E[X] + (−1)^k^µ^k^.

= µ^′_k_{− kµ}^′₁µ^′_k−1+^k(k−1)₂ (µ^′₁)²µ^′_k−2− · · · + k(µ^′1⁾^k−1^µ

′

1+ (−1)^k^(µ^′1⁾^k^.

For example, in the case of µ2, µ3and µ4 , we get

µ2= E[(X − µ)²^{] = E[X}²] − E[X]²

= µ^′₂_{− (µ}^′₁)², and

E_{[(X − µ)}³] = µ^′₃_{− 3µ}^′₁µ^′₂+ 2(µ^′₁)³,

= E[X³] − 3E[X]E[X²] + 2(E[X])³, E_{[(X − µ)}⁴] = µ^′₄_{− 4µ}^′₁µ^′₃+ 6(µ^′₁)²µ^′₂_{− 3(µ}^′₁)⁴.

= E[X⁴] − 4E[X]E[X³^{] + 6E[X]}²^E[X²] − 3E[X]⁴^.

The covariance of X and Y is defined by

Cov[X, Y ] := E[(X − E[X])(Y − Y [X])], and also correlation is given by

ρ(X, Y ) := √^{Cov[X,Y ]}

V[X]^√V[Y ]^.

Thm 1.10

(1)X and Y are independent =⇒ ρ(X, Y ) = 0. (2) −1 ≤ ρ ≤ 1.

Proof

(8)

(1) We can rewrite

Cov[X, Y ] = E[E(X − µX)E(Y − µY)],

= E[XY − µ^X^E[Y ] − Xµ^y^{+ µ}^X^µ^Y^],

= E[XY ] − E[X]E[Y ], and E[XY ] = E[X]E[Y ] (by assumption), we get Cov[X, Y ] = 0.

(2) The Cauchy–Schwarz inequality implies

|^Pi^aⁱ^bⁱ|≤^pPi^a²i

pP

i^b²i^.

Thus if we set ai= xi− µx, bi= yi− µy, then we get

| Cov(X, Y ) |²≤ V (X)V (Y ), Q.E.D

5 Moment generating functions

Def 1.11

Let X a discrete or continuous random variable. We call

M(t) := E[e^tX] =^P^∞_x=0e^txp(x) M(t) := E[e^tX] =^´_−∞^∞ e^txf(x)dx the moment generationg function of X.

Calculating moments using m.g.f;:

d^kM(t)

dt^k ^|^t=0 ^{= E[X}

k], k = 1, 2, · · ·

dM(t)

dt ⁼

d dt⁽

ˆ _∞

−∞

e^txf(x)dx)

= ˆ _∞

−∞

∂

∂t^e

tx_f_(x)dx

= ˆ _∞

−∞

xe^txf(x)dx

dM(t)

dt ^|^t=0 ⁼

ˆ ∞

−∞

xf(x)dx = E[X] d²M(t)

dt² ^|^t=0⁼ ˆ ∞

−∞

x²e^txf(x)dx = E[X²]

V[X] = E[X²_{] − E[X]}²=^d

2_M_(t)

dt² ^|^t=0⁻⁽ dM(t)

dt ^|^t=0⁾

2

(9)

Example

X is normal distribution if it has following p.d.f:

f(x) =^´_−∞^∞ ^√_2πσ¹ ₂e⁻^(x−µ)2^2σ2 dx. Then its m.g.f is given by

M(t) = ˆ _∞

−∞

e^tx

√2πσ²^e

−^(x−µ)2_2σ2 _dx,

and could be calculated as follows:

−_2σ¹²^(x²− 2µx + µ²− 2σ²^tx).

= ₋_2σ¹2(x²_{− 2x(µ + σ}²t) + (µ + σ²t)² ) +^{(µ + σ}

2_t)2

2σ² ⁻ µ² 2σ²

= ₋_2σ¹2(x²_{− µ}^′)²+ µt + ^σ²₂^t², thus we get

M(t) = e^µt+^σ2t2² ˆ _∞

−∞

√ 1 2πσ²^e

−^(x−µ

′₎₂ 2σ2 _dx

| {z }

=1

= e^µt+^σ2t2² .

Using this m.g.f, the mean and the variance of a normal distribution can be calculated as follows:

dM(t)

dt ^|^t=0^{= exp(µt +} σ²t²

2 ^{)(µ + σ} 2_t_{) |}

t=0

= µ = E[X]

d²M(t)

dt² ^|^t=0^{= (σ}

2_{exp(µt +}σ²t²

2 ) + exp(µt + ^σ²₂^t²)(µ + σ²t)²_{) |}t=0

= σ²+ µ²= E[X²] V[X] = ^d²_dt^M(t)2 _|t=0₋₍^dM(t)_dt _|t=0)²

= σ²+ µ²_{− µ}²

= σ².

Example

Next we calculate moments of a binomial distribution. First we derive the m.g.f ot it as follows: M(t) =^P^∞_x e^tx

n x

p^xq^n−x(q∗ = 1 − p)

=^P^∞_x

n x

(e^tp)^xq^n−x

= (e^tp+ q)ⁿ.

(10)

Using this gives

dM(t)

dt |^t=0^{= n(e}^t^p^{+ q)}ⁿ⁻¹^e^t^p|^t=0

= np = E[X]

d²M(t)

dt² ^|^t=0^{= (n(e}^t^p^{+ q)}ⁿ⁻¹^e^t^p+ n(n − 1)(e^t^p^{+ q)}ⁿ⁻²^(e^t^p)²) |t=0

= np + n(n − 1)p²

V[X] = ^d²_dt^M(t)2 _|t=0−(^dM(t)_dt |t=0)²

= np + n(n − 1)p²− n²^p²

= np(1 − p) npq.

Thm 1.12

Let X and Y be are independent random variables and defineZ := X + Y . Then the m.g.f of Z is given by MZ(t) = MX(t)MY(t).

Moreover, if {Xt} are independent and Z :=^Pⁿ_i ^Xi, then the m.g.f of Z is given by

M_Z_n(t) = Yn i=1

M_X_i(t).

Proof

We only consider the case of n = 2. Let X and Y be are independent random variables. Then e^tX and e^tY are also independent. Thus we get

M_Z(t) = E[e^tZ]

= E[e^{t(X+Y )}]

= E[e^tXe^tY]

= E[e^tX]E[e^tY]

= MX(t)MY(t) as required. Q.E.D

Thm 1.13

Let FX be a distribution function of X andMX(t) be a its m.g.f. Then for any continuous points a < b of FX, FX(b) − FX(a) = lim_{T →∞} _2π¹ ^´_−T^T ^exp(−bt)−exp(−at)

−t ^M^X^(t)dt

holds.

I will skip the proof of this theorem. However, this theorem says that analysing some distributions is equiva- lent to analysing its m.g.f.

(11)

6 Texts

• Statistics

Yong and Smith (2010) Essentials of Statistical Inference Keener (2010) Theoretical Statistics: Topics for a Core Course

• Linear algebra

Axler, Sheldon. (2006) Linear Algebra Done Right (Undergraduate Texts in Mathematics) Harville David. (1997) Matrix Algebra From a Statistician’s Perspective

• Measure theory and Probability theory

Capinski Marek and Kopp Peter. (2008) Measure, Integral and Probability Rosenthal (2006) A First Look at Rigorous Probability Theory

TA1 最近の更新履歴 Econometrics Ⅰ 2016 TA session

TA session #1

Shouto Yonekura

April 17, 2016

Contents

1 Probability

2 Random variables, Probability density functions and Distribution func-

tions

3 Independence

4 Moments

5 Moment generating functions

6 Texts