The moment-generating function of X is given by φ 1 (θ 1 ) and that of Y is φ 2 (θ 2 ).

(1)

3. Theorem: Let φ(θ ₁ , θ ₂ ) be the moment-generating function of (X, Y).

The moment-generating function of X is given by φ ₁ (θ ₁ ) and that of Y is φ ₂ (θ ₂ ).

Then, we have the following facts:

φ 1 (θ 1 ) = φ(θ 1 , 0), φ 2 (θ 2 ) = φ(0, θ 2 ).

215 Proof:

Again, the definition of the moment-generating func- tion of X and Y is represented as:

φ(θ ₁ , θ ₂ ) = E(e

^θ¹

^X+θ

²

^Y ) = Z

∞

−∞

Z

∞

−∞

e

^θ¹

^x+θ

²

^y f _xy (x, y) dx dy.

When φ(θ ₁ , θ ₂ ) is evaluated at θ ₂ = 0, φ(θ ₁ , 0) is rewrit- ten as follows:

φ(θ ₁ , 0) = E(e

^θ¹

^X ) = Z

∞

−∞

Z

∞

−∞

e

^θ¹

^x f _xy (x, y) dx dy

= Z

∞

−∞

e

^θ¹

^x Z

∞

−∞

f xy (x, y) dy dx 216

= Z

^∞

−∞

e

^θ¹

^x f _x (x) dx = E(e

^θ¹

^X ) = φ ₁ (θ ₁ ).

Thus, we obtain the result: φ(θ ₁ , 0) = φ ₁ (θ ₁ ).

Similarly, φ(0, θ ₂ ) = φ ₂ (θ ₂ ) can be derived.

217 4. Theorem: The moment-generating function of (X, Y ) is given by φ(θ ₁ , θ ₂ ).

Let φ ₁ (θ ₁ ) and φ ₂ (θ ₂ ) be the moment-generating func- tions of X and Y , respectively.

If X is independent of Y, we have:

φ(θ 1 , θ 2 ) = φ 1 (θ 1 )φ 2 (θ 2 ).

218 Proof:

From the definition of φ(θ 1 , θ 2 ), the moment-generating function of X and Y is rewritten as follows:

φ(θ ₁ , θ ₂ ) = E(e

^θ¹

^X

⁺^θ²

^Y ) = E(e

^θ¹

^X )E(e

^θ²

^Y ) = φ ₁ (θ ₁ )φ ₂ (θ ₂ ).

The second equality holds because X is independent of Y.

Multivariate Case: For multivariate random variables X 1 , X ₂ , · · ·, X _n , the moment-generating function is defined as:

φ(θ ₁ , θ ₂ , · · · , θ _n ) = E(e

^θ¹

^X

¹⁺^θ²

^X

²⁺^···⁺^θⁿ

^X

ⁿ

).

(2)

1. Theorem: If the multivariate random variables X ₁ , X ₂ , · · ·, X _n are mutually independent,

the moment-generating function of X ₁ , X ₂ , · · ·, X _n , de- noted by φ(θ ₁ , θ ₂ , · · ·, θ _n ), is given by:

φ(θ 1 , θ 2 , · · · , θ n ) = φ 1 (θ 1 )φ 2 (θ 2 ) · · · φ n (θ n ), where φ i (θ) = E(e

^θXⁱ

).

221 Proof:

From the definition of the moment-generating function in the multivariate cases, we obtain the following:

φ(θ 1 , θ 2 , · · · , θ _n ) = E(e

^θ¹

^X

¹⁺^θ²

^X

²⁺^···⁺^θⁿ

^X

ⁿ

)

= E(e

^θ¹

^X

¹

)E(e

^θ²

^X

²

) · · · E(e

^θⁿ

^X

ⁿ

)

= φ ₁ (θ ₁ )φ ₂ (θ ₂ ) · · · φ _n (θ _n ).

222 2. Theorem: Suppose that the multivariate random vari- ables X ₁ , X ₂ , · · ·, X _n are mutually independently and identically distributed.

Suppose that X _i ∼ N(µ, σ ² ).

Let us define ˆ µ = P n

i

=

1 a _i X _i , where a _i , i = 1, 2, · · · , n, are assumed to be known.

Then, ˆ µ ∼ N(µ P n

i

=

1 a _i , σ ² P n i

=

1 a ² _i ).

223 Proof:

From Example 1.8 (p.111) and Example 1.9 (p.147), it is shown that the moment-generating function of X is given by: φ _x (θ) = exp(µθ + ¹ ₂ σ ² θ ² ), when X is normally distributed as X ∼ N(µ, σ ² ).

224 Let φ

µ

ˆ be the moment-generating function of ˆ µ.

φ

_µ

_ˆ (θ) = E(e

^θ^µ

^ˆ ) = E(e

^θ^Pⁿⁱ⁼¹

^a

ⁱ

^X

ⁱ

) =

n

Y

i

=

1 E(e

^θaⁱ

^X

ⁱ

)

=

n

Y

i

=

1 φ _x (a _i θ) =

n

Y

i

=

1 exp(µa _i θ + 1 2 σ ² a ² _i θ ² )

= exp(µ X n

i

=

1 a _i θ + 1 2 σ ²

X n

i

=

1 a ² _i θ ² )

which is equivalent to the moment-generating func- tion of the normal distribution with mean µ P n

i

=

1 a i and variance σ ² P n

i=1 a ² _i , where µ and σ ² in φ x (θ) is simply

replaced by µ P n

i=1 a i and σ ² P n

i=1 a ² _i in φ

µ

ˆ (θ), respec- tively.

Moreover, note as follows.

When a i = 1/n is taken for all i = 1, 2, · · · , n, i.e., when ˆ µ = X is taken, ˆ µ = X is normally distributed as:

X ∼ N(µ, σ ² /n).

(3)

6 Law of Large Numbers ( ର਺ͷ๏

ଇ ) and Central Limit Theorem ( ^த

৺ۃݶఆཧ )

6.1 Chebyshev’s Inequality (νΣϏγΣϑͷෆ

౳ࣜ)

227 Theorem: Let g(X) be a nonnegative function of the ran- dom variable X, i.e., g(X) ≥ 0.

If E(g(X)) exists, then we have:

P(g(X) ≥ k) ≤ E(g(X))

k , (6)

for a positive constant value k.

228 Proof:

We define the discrete random variable U as follows:

U =



 

 

 

 

1, if g(X) ≥ k, 0, if g(X) < k.

Thus, the discrete random variable U takes 0 or 1.

Suppose that the probability function of U is given by:

f (u) = P(U = u), where P(U = u) is represented as:

P(U = 1) = P(g(X) ≥ k), 229

P(U = 0) = P(g(X) < k).

Then, in spite of the value which U takes, the following equation always holds:

g(X) ≥ kU,

which implies that we have g(X) ≥ k when U = 1 and g(X) ≥ 0 when U = 0, where k is a positive constant value.

Therefore, taking the expectation on both sides, we obtain:

E(g(X)) ≥ kE(U), (7)

230 where E(U) is given by:

E(U) =

1 X

u

=

0 uP(U = u) = 1 × P(U = 1) + 0 × P(U = 0)

= P(U = 1) = P(g(X) ≥ k). (8) Accordingly, substituting equation (8) into equation (7), we have the following inequality:

P(g(X) ≥ k) ≤ E(g(X)) k .

Chebyshev’s Inequality: Assume that E(X) = µ, V(X) = σ ² , and λ is a positive constant value. Then, we have the following inequality:

P(|X − µ| ≥ λσ) ≤ 1 λ ² ,

or equivalently,

P(|X − µ| < λσ) ≥ 1 − 1

λ ² ,

which is called Chebyshev’s inequality.

(4)

Proof:

Take g(X) = (X − µ) ² and k = λ ² σ ² . Then, we have:

P((X − µ) ² ≥ λ ² σ ² ) ≤ E(X − µ) ² λ ² σ ² , which implies P(|X − µ| ≥ λσ) ≤ 1

λ ² . Note that E(X − µ) ² = V(X) = σ ² .

Since we have P(|X − µ| ≥ λσ) + P(|X − µ| < λσ) = 1, we can derive the following inequality:

P(|X − µ| < λσ) ≥ 1 − 1

λ ² . (9)

233 An Interpretation of Chebyshev’s inequality: 1/λ ² is an upper bound for the probability P(|X − µ| ≥ λσ).

Equation (9) is rewritten as:

P(µ − λσ < X < µ + λσ) ≥ 1 − 1 λ ² .

That is, the probability that X falls within λσ units of µ is greater than or equal to 1 − 1/λ ² .

Taking an example of λ = 2, the probability that X falls within two standard deviations of its mean is at least 0.75.

234 Furthermore, note as follows.

Taking = λσ, we obtain as follows:

P(|X − µ| ≥ ) ≤ σ ² ² , i.e.,

P(|X − E(X)| ≥ ) ≤ V(X)

² , (10)

which inequality is used in the next section.

235 Remark: Equation (10) can be derived when we take g(X) = (X − µ) ² , µ = E(X) and k = ² in equation (6).

Even when we have µ , E(X), the following inequality still hold:

P(|X − µ| ≥ ) ≤ E((X − µ) ² )

² .

Note that E((X−µ) ² ) represents the mean square error (MSE).

When µ = E(X), the mean square error reduces to the vari- ance.

236 6.2 Law of Large Numbers (ର਺ͷ๏ଇ) and Convergence in Probability ( ֬཰ऩଋ )

Law of Large Numbers 1: Assume that X 1 , X 2 , · · ·, X n

are mutually independently and identically distributed with mean E(X _i ) = µ for all i.

Supopose that the moment-generating function of X _i is finite.

Define X n = 1 n

n

X

i

=

1 X i .

Then, X _n −→ µ as n −→ ∞.

Proof: The moment-generating function is written as:

φ(θ) = 1 + µ

⁰

₁ θ + 1

2! µ

⁰

₂ θ ² + 1

3! µ

⁰

₃ θ ³ + · · ·

= 1 + µ

⁰

₁ θ + O(θ ² )

where µ

⁰

_k = E(X ^k ) for all k. That is, all the moments exist.

φ _x (θ) = φ θ

n n

= 1 + µ

⁰

₁ θ

n + O( θ ² n ² ) n

= 1 + µ

⁰

₁ θ

n + O( 1 n ² ) n

=

(1 + x)

¹^x

µθ+

O(n

⁻¹

)

−→ exp(µθ) as x −→ 0,

(5)

which is the following probability function:

f (x) =



 

 

 

 

1 if x = µ, 0 otherwise.

φ(θ) =

X e

^θx

f (x) = e

^θµ

f (µ) = e

^θµ

239 Law of Large Numbers 2: Assume that X ₁ , X ₂ , · · ·, X _n are mutually independently and identically distributed with mean E(X _i ) = µ and variance V(X _i ) = σ ² < ∞ for all i.

The moment-generating function of X is given by φ 1 (θ 1 ) and that of Y is φ 2 (θ 2 ).

3. Theorem: Let φ(θ 1 , θ 2 ) be the moment-generating function of (X, Y).

The moment-generating function of X is given by φ 1 (θ 1 ) and that of Y is φ 2 (θ 2 ).

Then, we have the following facts:

φ 1 (θ 1 ) = φ(θ 1 , 0), φ 2 (θ 2 ) = φ(0, θ 2 ).

215

Proof:

Again, the definition of the moment-generating func- tion of X and Y is represented as:

φ(θ 1 , θ 2 ) = E(e

X+θ

Y ) = Z

Z

e

x+θ

y f xy (x, y) dx dy.

When φ(θ 1 , θ 2 ) is evaluated at θ 2 = 0, φ(θ 1 , 0) is rewrit- ten as follows:

φ(θ 1 , 0) = E(e

X ) = Z

Z

e

x f xy (x, y) dx dy

= Z

e

x Z

f xy (x, y) dy dx 216

= Z

e

x f x (x) dx = E(e

X ) = φ 1 (θ 1 ).

Thus, we obtain the result: φ(θ 1 , 0) = φ 1 (θ 1 ).

Similarly, φ(0, θ 2 ) = φ 2 (θ 2 ) can be derived.

217

4. Theorem: The moment-generating function of (X, Y ) is given by φ(θ 1 , θ 2 ).

Let φ 1 (θ 1 ) and φ 2 (θ 2 ) be the moment-generating func- tions of X and Y , respectively.

If X is independent of Y, we have:

φ(θ 1 , θ 2 ) = φ 1 (θ 1 )φ 2 (θ 2 ).

218

Proof:

From the definition of φ(θ 1 , θ 2 ), the moment-generating function of X and Y is rewritten as follows:

φ(θ 1 , θ 2 ) = E(e

X

Y ) = E(e

X )E(e

Y ) = φ 1 (θ 1 )φ 2 (θ 2 ).

The second equality holds because X is independent of Y.

Multivariate Case: For multivariate random variables X 1 , X 2 , · · ·, X n , the moment-generating function is defined as:

φ(θ 1 , θ 2 , · · · , θ n ) = E(e

X

X

X

).

1. Theorem: If the multivariate random variables X 1 , X 2 , · · ·, X n are mutually independent,

the moment-generating function of X 1 , X 2 , · · ·, X n , de- noted by φ(θ 1 , θ 2 , · · ·, θ n ), is given by:

φ(θ 1 , θ 2 , · · · , θ n ) = φ 1 (θ 1 )φ 2 (θ 2 ) · · · φ n (θ n ), where φ i (θ) = E(e

).

221

Proof:

From the definition of the moment-generating function in the multivariate cases, we obtain the following:

φ(θ 1 , θ 2 , · · · , θ n ) = E(e

X

X

X

)

= E(e

X

)E(e

X

) · · · E(e

X

)

= φ 1 (θ 1 )φ 2 (θ 2 ) · · · φ n (θ n ).

222

2. Theorem: Suppose that the multivariate random vari- ables X 1 , X 2 , · · ·, X n are mutually independently and identically distributed.

Suppose that X i ∼ N(µ, σ 2 ).

Let us define ˆ µ = P n

i

1 a i X i , where a i , i = 1, 2, · · · , n, are assumed to be known.

Then, ˆ µ ∼ N(µ P n

i

1 a i , σ 2 P n i

3. Theorem: Let φ(θ ₁ , θ ₂ ) be the moment-generating function of (X, Y).

The moment-generating function of X is given by φ ₁ (θ ₁ ) and that of Y is φ ₂ (θ ₂ ).

φ(θ ₁ , θ ₂ ) = E(e

^X+θ

^Y ) = Z

^x+θ

^y f _xy (x, y) dx dy.

When φ(θ ₁ , θ ₂ ) is evaluated at θ ₂ = 0, φ(θ ₁ , 0) is rewrit- ten as follows:

φ(θ ₁ , 0) = E(e

^X ) = Z

^x f _xy (x, y) dx dy

^x Z

^x f _x (x) dx = E(e

^X ) = φ ₁ (θ ₁ ).

Thus, we obtain the result: φ(θ ₁ , 0) = φ ₁ (θ ₁ ).

Similarly, φ(0, θ ₂ ) = φ ₂ (θ ₂ ) can be derived.

4. Theorem: The moment-generating function of (X, Y ) is given by φ(θ ₁ , θ ₂ ).

Let φ ₁ (θ ₁ ) and φ ₂ (θ ₂ ) be the moment-generating func- tions of X and Y , respectively.

φ(θ ₁ , θ ₂ ) = E(e

^X

^Y ) = E(e

^X )E(e

^Y ) = φ ₁ (θ ₁ )φ ₂ (θ ₂ ).

Multivariate Case: For multivariate random variables X 1 , X ₂ , · · ·, X _n , the moment-generating function is defined as:

φ(θ ₁ , θ ₂ , · · · , θ _n ) = E(e

^X

^X

^X

1. Theorem: If the multivariate random variables X ₁ , X ₂ , · · ·, X _n are mutually independent,

the moment-generating function of X ₁ , X ₂ , · · ·, X _n , de- noted by φ(θ ₁ , θ ₂ , · · ·, θ _n ), is given by:

φ(θ 1 , θ 2 , · · · , θ _n ) = E(e

^X

^X

^X

^X

^X

^X

= φ ₁ (θ ₁ )φ ₂ (θ ₂ ) · · · φ _n (θ _n ).

2. Theorem: Suppose that the multivariate random vari- ables X ₁ , X ₂ , · · ·, X _n are mutually independently and identically distributed.

Suppose that X _i ∼ N(µ, σ ² ).

1 a _i X _i , where a _i , i = 1, 2, · · · , n, are assumed to be known.

1 a _i , σ ² P n i

1 a ² _i ).

From Example 1.8 (p.111) and Example 1.9 (p.147), it is shown that the moment-generating function of X is given by: φ _x (θ) = exp(µθ + ¹ ₂ σ ² θ ² ), when X is normally distributed as X ∼ N(µ, σ ² ).

_ˆ (θ) = E(e

^ˆ ) = E(e

^a

^X

^X

φ _x (a _i θ) =

exp(µa _i θ + 1 2 σ ² a ² _i θ ² )

a _i θ + 1 2 σ ²

a ² _i θ ² )

1 a i and variance σ ² P n

i=1 a ² _i , where µ and σ ² in φ x (θ) is simply

i=1 a i and σ ² P n

i=1 a ² _i in φ

X ∼ N(µ, σ ² /n).

ଇ ) and Central Limit Theorem ( ^த