Additional rule
Student ID Weight(kg)
1 60
2 70
3 80
Additional rule
Student ID Test score Gender
1 40 male
2 40 female
3 70 male
Download R and R studio
• In the software excise, we use R (and R studio). R: Statistical software (Free!!!)
R studio: Useful interface for R
• You should download R and R studio.
1. Download R (please download newest version)
see (http://rprogramming.net/how-to-download-r-quickly-and-easily/) 2. Download R studio
Econometrics: Probability
Keisuke Kawata
Hiroshima University
Randomness (or Chance)
The phenomena including elements of chance or randomness.
• Rolling a dice to see which number comes out on top
• The gender, high, and weight of the next new person you meet
• To orro ’s eather
• The winner of the next world cup
• GDP growth rate at 10 years ago
• The results of the life-time competition between Pr. Kaneko and Yoshida.
Structure of statistical works
Population
Sample
Sample: A sub-date of population.
Randomness of your date
→Given the population, the characteristic of your date is e.g.,) Wages of faculty staff
Population Potential sets of date (sample size=2)
Name Wage
Prof I 1000 Prof K 800 Prof Y 700
ID Wage
1 800
ID Wage 1 1000
2 800
ID Wage
1 800
2 700
ID Wage ID Wage
1 700
ID Wage 1 1000
2 700
Plan of talks
1. Definition of probability.
2. Key concepts of a single random variable. 3. Key concepts of multiple random variables. 4. Property of the random sampling observation.
5. The asymptotic distribution of sample distribution ← play a central role in the statistics and Econometrics.
Definition of probability and outcome
• Outcomes: the (mutually exclusive) potential results of a random process.
• The probability of an outcome: the proportion of times of the outcome if the phenomenon with chance were repeated infinitely many times.
Question: What is outcomes of rolling dice?, and probability?
Note: Discrete and Continuous random outcome
Formal definition
• Random variable
– takes on only a discrete values (the number of dice, the gender of the next new person you meet)
– takes on a continuum values (the high and weight of the new new person you meet, GDP)
Probability distribution: Discrete random variable
• The probability distribution of a discrete random variable: the list of all outcomes and probability.
• The probability of outcome x is denoted by Pr(x)
e.g.,) The probability distribution of rolling dice
• Sum of the probability of all outcomes must be .
outcomes x
1
2 3 4 5 6Probability Pr(x) 1/6 1/6 1/6 1/6 1/6 1/6
Probability of an event
• Event: A set of outcomes.
e.g.,) The e e t the u er of top of di e is lo er tha = { , }
• The probability of an event = . e.g.,) The probability of a event in which the number of dice is higher than 2 = Pr(3)+Pr(4)+Pr(5)+Pr(6)=4/6=2/3.
Cumulative probability distribution: Discrete random variable
the random variable is less than or equal to a particular value.
e.g.,) The probability distribution of rolling dice
Outcome 1 1 2 3 4 5 6
Probability distribution 1/6 1/6 1/6 1/6 1/6 1/6
Cumulative probability distribution
• : probability that the random variable is less than or equal to a particular value.
e.g., ) The cumulative probability distribution of the high of the next new person you meet.
Cumulative probability : Continuous random outcome
Cumulative probability 1
Cumulative probability with continuous outcome
• We make the list of probabilities of each possible outcomes because outcome is continuous variable.→
e.g., ) The probability density function of the high of the next new person you meet.
Probability distribution: Continuous random outcome
Probability density
Expected values and variance
• Using only the probability distribution, can we show the characteristics of the random variable? ⇒
• To grasp the characteristics of a random variable, we often use some important mathematical concepts.
• In this class, we focus on the discrete random variable.
⇒ Definitions of each concepts for a continuous random variable are basically same these of the discrete random variable (If you have an interest, see Stock and Watson
Expected values (or Mean)
• : the long-run average value of the random variable over repeated trials.
(Note) Expected value = expectation = mean
Quation
Probability distribuion
What is the expected value?
Outcomes 0 1 2
Probaibility 0.1 0.8 0.1
Example: The limitation of expected value
• The random variable A
• The random variable B
• Expected values of random variables A and B are
Outcomes 0 1 2
Probaibility 0.1 0.8 0.1
Outcomes 0 1 2
Probaibility 0.3 0.4 0.3
same.
Standard Deviation and Variance
• To easure the dispersio or the spread of a pro a ilit distri utio , e ofte use the and the .
Standard Deviation and Variance
Example: rolling the dice
• The probability distribution
• What’s the e pe ted alue?
• What’s the aria e?
outcomes
1
2 3 4 5 6Probability 1/6 1/6 1/6 1/6 1/6 1/6
Key distribution: Normal distribution
0.15 0.2 0.25 0.3 0.35 0.4 0.45
Probabilistic Property of data
What probabilistic relationship between population and simple random sampling date ?
Important assumption ⇒
: Each member of the population is equally likely to be included in the sample.
Sampling distribution of the sample average
Example
Name Wage
Prof I 1000 Prof K 800 Prof Y 600
ID Wage
1 800
ID Wage 1 1000
2 800
ID Wage
1 800
2 600
ID Wage
1 600
ID Wage
1 600
ID Wage 1 1000
2 600
Mean: Mean: Mean:
Mean:
Large-Sample Approximation
• There are two approaches to advanced characterization of sampling distribution: Exact distribution approach: deriving a formula for the sampling distribution that holds exactly for any value of sample size n ⇒
: using approximations of the sample distribution in infinity large sample size case (n ⇒∞). ⇒ We use this approach.
Two key theorem: the law of large number, and the central limit theorem.
Consistency
Law of large number
• Combining Theorem 1 and consistency shows the following theorem. Theorem 2: Sample mean is equal to the mean in population with very high probability when sample size is infinity large.
Central Limit Theorem
Quiz
• True/False question.
Suppose the pure random sampling data.
1. If the sample size is totally large, the distribution of a value of a observation converge to the normal distribution.
2. The sample mean equals to the population mean.
3. If the population mean and variance are known, we can calculate the probability that the sample mean in large sample size is between 0 to 1.
Conclusion
• Given population, the characteristic of your date is random variable .
→ If your interest is the characteristics of population, we must first study the probabilistic relationship between population and samples.
• If your date is pure random sampling date,
• The expected value of sample mean is equal to population mean, but sample means still have a positive variance.
• If the sample size is enough large, the distribution of sample means can be approximated as the normal distribution (Central Limit Theorem)