• 検索結果がありません。

Slide 4_distribution 最近の更新履歴 Keisuke Kawata's HP

N/A
N/A
Protected

Academic year: 2018

シェア "Slide 4_distribution 最近の更新履歴 Keisuke Kawata's HP"

Copied!
35
0
0

読み込み中.... (全文を見る)

全文

(1)

Econometrics: Statistics

Keisuke Kawata

Hiroshima University

(2)

Plan of this class

4/28: Cancel

4/30: First R session

5/7: Midterm tests (You cannot use any calculator and bring course slide and note)

(3)

What is Statistics?

Statistics: Science of using data to learn about the world around us.

⇒ We can estimate the characteristics of the population set and then answer the research questions.

e.g.,)

1. What is mean wages of college graduates ? 2. What is mean education year?

3. Do mean earnings differ for urban and rural peoples, and if so, by how much?

(4)

Plan of talks

1. Key concepts

2. Estimation for a population mean 1. Statistical test

2. Confidence interval

3. Estimation for a conditional population mean 4. Estimation for average population difference

(5)

Key statistical methods

• Economists often use three types of statistical method:

: tr to o tai a est guess u eri al alue for a u k o characteristic of a population distribution, (e.g., mean, variance)

: try to test a specific hypothesis about the population (e.g., No wage difference between male and female in Bangladesh.) by using sample

evidence.

: try to estimate an interval for an unknown population characteristic.

(6)

Estimation of the Population Mean

• Suppose that we want to know the mean value of Y in a population, and we have a data about the population.

Estimators:

e.g.,) Our interest is mean wages, and we have data:

What is estimators? → Potentially, there are e.g., ea of sa ple, a i u alue, 1

ID Wage

1 1000

2 800

(7)

The o ditio s of good esti ator

• The characteristics of sample is a random variable → the estimator is also a random variable.

• What are desirable characteristics of an estimator? → We would like use an estimator that gets as close as possible to the unknown true value.

• Formally, there are three desirable characteristics:

: The expected value of an estimator is equal to an unknown true value. : When sample size is large, the variance of the estimator is very small.

(8)

Formal definitions

• Let denotes the (unknown) population mean and this estimator as � and , respectively.

Unbiasedness: Consistency:

Efficiency: For any possible estimator ,

(9)

The property of sample mean

If your data is pure random sampling, the mean of sample is

• and estimator of population mean (see Slide 3)

• most estimator of population mean among all unbiased estimators

⇒ The sample mean is of the population means.

(10)

Note: Importance of unbiasedness

• Is the sample mean is most efficient estimator?

e.g., 1 for a sa ples The variance of estimator is

• Even if the variance of an estimator is quit small, the estimator is not good if it is not unbiased estimator (biased estimator).

Ho e er

The estimator is always

(11)

The property of sample mean: sample size

• Recalling that the variance of sample mean is

• � ↑ and n ↓ Accuracy of estimator

(12)

Limitation of estimation

• The sample mean is a good estimator of the population mean e.g.,)

• Sample average of household income is 1500$ From this results, can we argue that the population mean of household income is higher than 1000$?

Name Income

I 200

Y 1000

K 200

ID Income

1 2000

2 1000

(13)

Hypothetical test

• Hypothetical test are constructed by two hypothesis. Null Hypothesis:

Alternative Hypothesis:

To test the population mean is not equal to �, , the null hypothesis (� ) should be

� : The alternative hypothesis is then

(14)

Hypothesis Tests

• How can we test the null-hypothesis is correct or not?

⇒We take following steps.

1. Suppose that the null hypothesis is true (the population mean is �, ) .

2. Under the null hypothesis, we estimate the distribution of the sample mean. 3. Based on the (estimated) distribution, we calculate the probability that we

draw our data.

⇒ If this probability is e distance is , the null hypothesis is , and the alternative hypothesis is .

(15)

Casual example: Lottery

• You dra a all fro a la k o populatio

• You k o the olor of all is hether red or hite ut do ’t k o its’ share.

• Your hypothesis is that the share of red ball in the box is 50% The probability that you draw the red ball is 50%.

• You draw balls with ten times and all color of drawn balls are white ( Your data).

⇒ If the hypothesis is correct, the probability that you continue to draw the white ball as 10 times is just 0.097%.

(16)

Probability distribution of sample means

• We suppose the pure random sampling data and the population mean as ��, .

⇒ We can use following theorems.

1. The average of sample mean is �, (Property as unbiased estimator)

2. The variance of sample mean is ��� � (where var(Y) is the population variance, and n is the sample size).

3. If sample size is enough large (more than 30), the distribution of sample mean can be approximated by the normal distribution.

⇒ Be ause the or al distri utio is hara terized usi g o l it’s ea a d

(17)

Estimator of the population variance

We use the estimator of var(Y) as

and the estimator of the standard error is then

is and estimator of the population variance. The probability distribution of sample means can be estimated as � � , �

(18)

p-value

: The probability of drawing a sample mean at least as far in the tails of its distribution under the null hypothesis as the observed sample mean.

where 0 is the probability computed under the null hypothesis, and ��� is the value of the observed sample mean.

• If the p- alue is , dra i g the o ser ed sa ple ea is er rare e e t

− � = � 0 � − ��, > ���� − ��,

(19)

p-value in statistical test for population mean

• The p-value is then calculated as

− � =

where Φ is the normal distribution as N(0,1), and ��� = (t-statistic).

• t-statisti is a easure of dista e et ee sa ple ea a d the ull- hypothesis.

• Based on p-value, we should judge that the null hypothesis can be rejected or not.

• ���� − ��,

(20)

Hypothesis Testing: Significance level

• When should we reject the null-hypothesis?

⇒ I the e piri al orks, e ofte set the sig ifi a e le el as .

– The p-value is lower than 5% the null-hypothesis will be . – The p-value is higher than 5% the null-hypothesis will be .

• Note that thresholds as 1% and 10% are also often used.

• What means the significance level?

(21)

Error of H pothesis Testi g

• A hypothesis test can make two types of mistakes: Type I error: the null hypothesis is

Type II error: the null hypothesis is

• The significance level is related to .

• We use the significance level as 5% the probability of type I error is 5% and then you would incorrectly reject the true null on average

⇒ If you set lower significance level, the probability of type I error is

(22)

Important Remark

• If p-value is high the null hypothesis is not reject Can we argue that the null hypothesis is true?

⇒ We can just say that by using current data, we cannot find the evidence to reject the null hypothesis.

⇒ Statistical hypothesis testing can be posed as either rejecting the null hypothesis or failing to do so.

Casual example:

You o e to Japa to reje t the ull h pothesis as that there are o hite ro s .

⇒You cannot find a white crow (Evidence to reject the null)

(23)

Confidence intervals

• Because the characteristics of your data is the random variable, it is impossible to learn the exact value of population mean.

⇒ Alternatively, we can construct a confidence interval of population mean.

: a set of values that contains the true population mean with a certain prespecified probability ( ).

Confidence interval with 95% confidence level (95% confidence interval): contains the true population mean with .

(24)

Question

• True/False question. If our data is the pure-random sampling data; 1. The sample mean is a consisted estimator of population mean. 2. The sample mean is an unbiased estimator of population mean. 3. The sample mean is a most efficient estimator of population mean.

4. To calculate the p-value, we need to estimate the distribution of population. 5. If sample size is enough large, for any distribution of a population, we can

conduct hypothetical tests.

6. If the p-value of a null hypothesis (population mean = 1000) is huge, we can say that population mean equal to 1000 with high probability.

(25)

Importance of Random Sampling

• If you can use a random sampling data, the sample mean is good estimator of population mean.

• If our data is a o ra do sa pli g data, hat’s happe ?

⇒The sample mean is ( ).

e.g.,) Suppose that the population is peoples in your country,

• Internet survey The share of young, rich, and living urban area The expected sample means of income may be than the population mean.

• Survey in Urban Area The share of rich persons The expected sample

(26)

Importance of Random Sampling

• The randomness of your data sampling crucially depends on the design of

sample selection schemes (Internet, door to door) You should take care !!!!!

• We should he k the the share of respondants)

⇒ If response rate is low, your data set may be

• You should compare the descriptive statistics of your data and data (e.g., government statistics). If there exists large

difference, your data set is from population.

(27)

How to overcome?

• You should about bias in your paper.

• Feasible research plan depends on of your data. e.g.,) Even through initial research plan focus on the national average impact of education on incomes, you only get the data in urban area.

⇒ You should ha ge our resear h pla a d title as the i pa ts of edu atio o i o e i ur a area .

(28)

Importance of sample size

• If sample size is small,

– The variation of estimator becomes

– The distribution of sample means cannot be approximated by the normal distribution We the statistical test based on the normal distribution.

• To conduct valid statistical test, the sample size should be higher than 30.

(29)

Estimation of conditional means

• In advanced discussions, conditional means in population play important rolls.

← should learn how to estimate. If your sample size is enough, e.g.,)

• You would like to estimate the conditional mean income among males and females.

⇒ Sub-sample means in males and females sub-samples are BLUE of the population conditional means.

(30)

Importance of sample size: revisit

• To estimate sub-sample means, the requirement for sample size is more demanding.

⇒ Sample sizes in each sub-samples must be higher than 30.

e.g.) If you would like to estimate the conditional mean income among junior high, high, and college graduates, the sample size must be higher than

If the distribution is , the analysis is more difficult.

e.g.,) You would like to the conditional mean income among majority and minority ethnic groups. The sample size of majority is 990, but the size of minority is 10.

you cannot estimate the mean income of minority.

(31)

Estimation of mean difference

• Many research questions are related to the difference of two and more populations.

e.g.,)

Gender discrimination: The difference of mean income between male and female. Effect of education: The difference of mean income between college and non-

college graduates.

Income growth: The difference of mean income between in current and past year.

(32)

BLUE of mean difference

• Suppose that two sub-sample (male and female).

• We would like to estimate the (population) mean income difference between males and females as

(Remainder) The sub-sample means are of � � and

�[� | � ].

⇒The BLUE of the mean income difference

If sample sizes in each sub-samples are enough, we can conduct the statistical test and construct the confidence interval.

(33)

Detai;l Property of estimator

• The variance of difference of sample means is

where and are the variation in populations, and and are the size of each sub-samples.

• If the size of a sub-sample is quit small, the variance is so large (even if the size of another group is large)

��� � − � =

“tud a out i orit group is diffi ult.

(34)

Hypothesis Tests for the difference

• Using same technique as the hypothesis tests for a population mean, We can test some hypothesis about the difference as

The null hypothesis (� ):

The alternative hypothesis (� ):

• Similar to the statistical test for a population mean, we can calculate the p-value of each null and construct the confidence interval.

e.g.,) Are there the gender difference? You should test the null as =

� − � =

� − �

0

(35)

Conclusion

• We leaned estimation, statistical test, and confidence interval.

• These approach can be used to estimate the conditional and unconditional means and the mean difference.

参照

関連したドキュメント

The maximum likelihood estimates are much better than the moment estimates in terms of the bias when the relative difference between the two parameters is large and the sample size

The reader is referred to [4, 5, 10, 24, 30] for the study on the spatial spreading speeds and traveling wave solutions for KPP-type one species lattice equations in homogeneous

F igueiredo , Positive solution for a class of p&q-singular elliptic equation, Nonlinear Anal.. Real

Abstract: In this note we investigate the convexity of zero-balanced Gaussian hypergeo- metric functions and general power series with respect to Hölder means..

These power functions will allow us to compare the use- fulness of the ANOVA and Kruskal-Wallis tests under various kinds and degrees of non-normality (combinations of the g and

Key words and phrases: Monotonicity, Strong inequalities, Extended mean values, Gini’s mean, Seiffert’s mean, Relative metrics.. 2000 Mathematics

The goal of the present paper is to derive ways to construct samples of (chordal) SLE curves (or the related SLE κ (ρ) curves) out of the sample of a Conformal Loop Ensemble

A groupoid G is said to be principal if all the isotropy groups are trivial, and a topological groupoid is said to be essentially principal if the points with trivial isotropy