By the end of this chapter, the student should be able to:
Recognize the Central Limit Theorem problems.
Classify continuous word problems by their distributions.
Apply and interpret the Central Limit Theorem for Averages.
Apply and interpret the Central Limit Theorem for Sums.
What does it mean to be average? Why are we so concerned with averages? Two reasons are that they give us a middle ground for comparison and they are easy to calculate. In this chapter, you will study averages and the Central Limit Theorem.
The Central Limit Theorem (CLT for short) is one of the most powerful and useful ideas in all of statistics. Both alternatives are concerned with drawing finite samples of size n from a population with a known mean, μ , and a known standard deviation, σ . The first alternative says that if we collect samples of size n and n is “large enough,” calculate each sample’s mean, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. The second alternative says that if we again collect samples of size n that are “large enough,” calculate the sum of each sample and create a histogram, then the resulting histogram will again tend to have a normal bell-shape.
In either case, it does not matter what the distribution of the original population is, or whether you even need to know it. The important fact is that the sample means (averages) and the sums tend to follow the normal distribution. And, the rest you will learn in this chapter.
The size of the sample, n , that is required in order to be to be ‘large enough’ depends on the original population from which the samples are drawn. If the original population is far from normal then more observations are needed for the sample averages or the sample sums to be normal. Sampling is done with replacement.
Do the following example in class: Suppose 8 of you roll 1 fair die 10 times, 7 of you roll 2 fair dice 10 times, 9 of you roll 5 fair dice 10 times, and 11 of you roll 10 fair dice 10 times. (The 8, 7, 9, and 11 were randomly chosen.)
Each time a person rolls more than one die, he/she calculates the average of the faces showing. For example, one person might roll 5 fair dice and get a 2, 2, 3, 4, 6 on one roll.
The average is . The 3.4 is one average when 5 fair dice are rolled. This same person would roll the 5 dice 9 more times and calculate 9 more averages for a total of 10 averages.
Your instructor will pass out the dice to several people as described above. Roll your dice 10 times. For each roll, record the faces and find the average. Round to the nearest 0.5.
Your instructor (and possibly you) will produce one graph (it might be a histogram) for 1 die, one graph for 2 dice, one graph for 5 dice, and one graph for 10 dice. Since the “average” when you roll one die, is just the face on the die, what distribution do these “averages” appear to be representing?
Draw the graph for the averages using 2 dice. Do the averages show any kind of pattern?
Draw the graph for the averages using 5 dice. Do you see any pattern emerging?
Finally, draw the graph for the averages using 10 dice. Do you see any pattern to the graph? What can you conclude as you increase the number of dice?
As the number of dice rolled increases from 1 to 2 to 5 to 10, the following is happening:
The average of the averages remains approximately the same.
The spread of the averages (the standard deviation of the averages) gets smaller.
The graph appears steeper and thinner.
You have just demonstrated the Central Limit Theorem (CLT).
The Central Limit Theorem tells you that as you increase the number of dice, the sample means (averages) tend toward a normal distribution (the sampling distribution).
A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Given a random variable (RV) with known mean μ and known standard deviation σ . We are sampling with size n and we are interested in two new RVs - the sample mean, , and the sample sum, ΣX . If the size n of the sample is sufficiently large, then ∼ and Σ X ∼ . If the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, , is called the standard error of the mean.
Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution). Using a subscript that matches the random variable, suppose:
a. μ _{ X } = the mean of X |
b. σ _{ X } = the standard deviation of X |
If you draw random samples of size n , then as n increases, the random variable which consists of sample means, tends to be normally distributed and
~
The Central Limit Theorem for Sample Means (Averages) says that if you keep drawing larger and larger samples (like rolling 1, 2, 5, and, finally, 10 dice) and calculating their means the sample means (averages) form their own normal distribution (the sampling distribution). The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by n , the sample size. n is the number of values that are averaged together not the number of times the experiment is done.
The random variable has a different z-score associated with it than the random variable X . is the value of in one sample.
μ _{ X } is both the average of X and of .
standard deviation of and is called the standard error of the mean.
Example 7.1.
An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size n = 25 are drawn randomly from the population.
Problem 1.
Find the probability that the sample mean is between 85 and 92.
Solution
Let X = one value from the original unknown population. The probability question asks you to find a probability for the sample mean (or average).
Let the mean or average of a sample of size 25. Since μ _{ X } = 90 , σ _{ X } = 15 , and ;
then ~
Find Draw a graph.
The probability that the sample mean is between 85 and 92 is 0.6997.
TI-83 or 84:
normalcdf
(lower value, upper value, mean for averages, stdev
for averages)
stdev
= standard deviation
The parameter list is abbreviated (lower, upper, μ , )
normalcdf
Problem 2.
Find the average value that is 2 standard deviations above the the mean of the averages.
Solution
To find the average value that is 2 standard deviations above the mean of the averages, use the formula
value =
value =
So, the average value that is 2 standard deviations above the mean of the averages is 96.
Example 7.2.
The length of time, in hours, it takes an “over 40” group of people to play one soccer match is normally distributed with a mean of 2 hours and a standard deviation of 0.5 hours. A sample of size n = 50 is drawn randomly from the population.
Problem
Find the probability that the sample mean is between 1.8 hours and 2.3 hours.
Solution
Let X = the time, in hours, it takes to play one soccer match.
The probability question asks you to find a probability for the sample mean or average time, in hours, it takes to play one soccer match.
Let = the average time, in hours, it takes to play one soccer match. If μ _{ X } = _________, σ _{ X } = __________, and n = ___________, then by the Central Limit Theorem for Averages of Sample Means. μ _{ X } = 2, σ _{ X } = 0.5, n = 50, and
Find . Draw a graph.
normalcdf
The probability that the sample mean is between 1.8 hours and 2.3 hours is ______.
A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Given a random variable (RV) with known mean μ and known standard deviation σ . We are sampling with size n and we are interested in two new RVs - the sample mean, , and the sample sum, ΣX . If the size n of the sample is sufficiently large, then ∼ and Σ X ∼ . If the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, , is called the standard error of the mean.
A continuous random variable (RV) with pdf , where μ is the mean of the distribution and σ is the standard deviation. Notation: X ~ N (μ, σ). If μ = 0 and σ = 1, the RV is called the standard normal distribution.
The standard deviation of the distribution of the sample means, .
Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution) and suppose:
a. μ _{ X } = the mean of X |
b. σ _{ X } = the standard deviation of X |
If you draw random samples of size n , then as n increases, the random variable ΣX which consists of sums tends to be normally distributed and
Σ X ~
The Central Limit Theorem for Sums says that if you keep drawing larger and larger samples and taking their sums, the sums form their own normal distribution (the sampling distribution). The normal distribution has a mean equal to the original mean multiplied by the sample size and a standard deviation equal to the original standard deviation multiplied by the square root of the sample size.
The random variable Σ X has the following z-score associated with it:
a. Σx is one sum. |
b. |
a. n ⋅ μ _{ X } = the mean of ΣX |
b. standard deviation of ΣX |
Example 7.3.
An unknown distribution has a mean of 90 and a standard deviation of 15. A sample of size 80 is drawn randomly from the population.
Problem
a. Find the probability that the sum of the 80 values (or the total of the 80 values) is more than 7500. |
b. Find the sum that is 1.5 standard deviations below the mean of the sums. |
Solution
Let X = one value from the original unknown population. The probability question asks you to find a probability for the sum (or total of) 80 values.
ΣX = the sum or total of 80 values. Since μ _{ X } = 90 , σ _{ X } = 15 , and n = 80 , then
Σ X ~
a. mean of the sums = |
b. standard deviation of the sums = |
c. sum of 80 values = Σx = 7500 |
Find Draw a graph.
P ( ΣX > 7500 ) = 0.0127
normalcdf
(lower value, upper value, mean of sums, stdev
of sums)
The parameter list is abbreviated (lower, upper, )
normalcdf
(7500,1E99,
Reminder:
1E99
=
10^{99}
. Press the EE
key for E.
Given a random variable (RV) with known mean μ and known standard deviation σ . We are sampling with size n and we are interested in two new RVs - the sample mean, , and the sample sum, ΣX . If the size n of the sample is sufficiently large, then ∼ and Σ X ∼ . If the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, , is called the standard error of the mean.
A continuous random variable (RV) with pdf , where μ is the mean of the distribution and σ is the standard deviation. Notation: X ~ N (μ, σ). If μ = 0 and σ = 1, the RV is called the standard normal distribution.
It is important for you to understand when to use the CLT. If you are being asked to find the probability of an average or mean, use the CLT for means or averages. If you are being asked to find the probability of a sum or total, use the CLT for sums. This also applies to percentiles for averages and sums.
If you are being asked to find the probability of an individual value, do not use the CLT. Use the distribution of its random variable.
Law of Large Numbers
The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean of the sample gets closer and closer to μ . From the Central Limit Theorem, we know that as n gets larger and larger, the sample averages follow a normal distribution. The larger n gets, the smaller the standard deviation gets. (Remember that the standard deviation for is .) This means that the sample mean must be close to the population mean μ . We can say that μ is the value that the sample averages approach as n gets larger. The Central Limit Theorem illustrates the Law of Large Numbers.
Central Limit Theorem for the Mean (Average) and Sum Examples
Example 7.4.
A study involving stress is done on a college campus among the students. The stress scores follow a uniform distribution with the lowest stress score equal to 1 and the highest equal to 5. Using a sample of 75 students, find:
The probability that the average stress score for the 75 students is less than 2.
The 90th percentile for the average stress score for the 75 students.
The probability that the total of the 75 stress scores is less than 200.
The 90th percentile for the total stress score for the 75 students.
Let X = one stress score.
Problems 1. and 2. ask you to find a probability or a percentile for an average or mean. Problems 3 and 4 ask you to find a probability or a percentile for a total or sum. The sample size, n , is equal to 75.
Since the individual stress scores follow a uniform distribution, X ~ U(1, 5) where a = 1 and b = 5 (See Continuous Random Variables for the uniform).
For problems 1. and 2., let = the average stress score for the 75 students. Then,
~ where n = 75.
Problem 1.
Find . Draw the graph.
Solution
The probability that the average stress score is less than 2 is about 0.
normalcdf
The smallest stress score is 1. Therefore, the smallest average for 75 stress scores is 1.
Problem 2.
Find the 90th percentile for the average of 75 stress scores. Draw a graph.
Solution
Let k = the 90th precentile.
Find k where .
k = 3.2
The 90th percentile for the average of 75 scores is about 3.2. This means that 90% of all the averages of 75 stress scores are at most 3.2 and 10% are at least 3.2.
invNorm
For problems c and d, let ΣX = the sum of the 75 stress scores. Then, ΣX ~
Problem 3.
Find P ( ΣX < 200 ) . Draw the graph.
Solution
The mean of the sum of 75 stress scores is 75 ⋅ 3 = 225
The standard deviation of the sum of 75 stress scores is
P ( ΣX < 200 ) = 0
The probability that the total of 75 scores is less than 200 is about 0.
normalcdf
.
The smallest total of 75 stress scores is 75 since the smallest single score is 1.
Problem 4.
Find the 90th percentile for the total of 75 stress scores. Draw a graph.
Solution
Let k = the 90th percentile.
Find k where P ( ΣX < k ) = 0.90 .
k = 237.8
The 90th percentile for the sum of 75 scores is about 237.8. This means that 90% of all the sums of 75 scores are no more than 237.8 and 10% are no less than 237.8.
invNorm
Example 7.5.
Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes.
Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.
Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.
X ~ From Chapter 5, we know that μ = 22 and σ = 22.
Let = the AVERAGE excess time used by a sample of n = 80 customers who exceed their contracted time allowance.
~ by the CLT for Sample Means or Averages
Problem 1.
a. Find the probability that the average excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find Draw the graph. |
b. Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer’s excess time is longer than 20 minutes. This is asking us to find P(X > 20) |
c. Explain why the probabilities in (a) and (b) are different. |
Solution
Find:
using
normalcdf
The probability is 0.7919 that the average excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.
1E99
=
10^{99}
and
-1E99
=
–
10^{99}
.
Press the EE
key for E. Or just use 10^99 instead of 1E99.
Find P(X>20) . Remember to use the exponential distribution for an individual: X~Exp(1/22).
P(X>20) = e^(–(1/22)*20) or e^(–.04545*20) = 0.4029
P ( X > 20 ) = 0.4029 but |
The probabilities are not equal because we use different distributions to calculate the probability for individuals and for averages. |
When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the CLT. Use the CLT with the normal distribution when you are being asked to find the probability for an average. |
Problem 2.
Find the 95th percentile for the sample average excess time for samples of 80 customers who exceed their basic contract time allowances. Draw a graph.
Solution
Let k = the 95th percentile. Find k where
k
=
26.0
using
invNorm
The 95th percentile for the sample average excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.
95% of such samples would have averages under 26 minutes; only 5% of such samples would have averages above 26 minutes.
Historically, being able to compute binomial probabilities was one of the most important applications of the Central Limit Theorem. Binomial probabilities were displayed in a table in a book with a small value for n (say, 20). To calculate the probabilities with large values of n , you had to use the binomial formula which could be very complicated. Using the Normal Approximation to the Binomial simplified the process. To compute the Normal Approximation to the Binomial, take a simple random sample from a population. You must meet the conditions for a binomial distribution:
there are a certain number n of independent trials
the outcomes of any trial are success or failure
each trial has the same probability of a success p
Recall that if X is the binomial random variable, then X ~ B ( n , p ) . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities n p and n q must both be greater than five ( n p > 5 and n q > 5; the approximation is better if they are both greater than or equal to 10). Then the binomial can be approximated by the normal distribution with mean and standard deviation Remember that In order to get the best approximation, add 0.5 to or subtract 0.5 from ( use X + 0.5 or X – 0.5 ) . The number 0.5 is called the continuity correction factor .
Example 7.6.
Suppose in a local Kindergarten through 12th grade (K - 12) school district, 53 percent of the population favor a charter school for grades K - 5. A simple random sample of 300 is surveyed.
Find the probability that at least 150 favor a charter school.
Find the probability that at most 160 favor a charter school.
Find the probability that more than 155 favor a charter school.
Find the probability that less than 147 favor a charter school.
Find the probability that exactly 175 favor a charter school.
Let X = the number that favor a charter school for grades K - 5. X ~ B ( n , p ) where n = 300 and p = 0.53. Since n p > 5 and n q > 5, use the normal approximation to the binomial. The formulas for the mean and standard deviation are and The mean is 159 and the standard deviation is 8.6447. The random variable for the normal distribution is Y . Y ~ N ( 159 , 8.6447 ). See The Normal Distribution for help with calculator instructions.
For Problem 1., you include 150 so P ( X ≥ 150 ) has normal approximation .
normalcdf
(
149.5
,
10^99
,
159
,
8.6447
)
=
0.8641
.
For Problem 2., you include 160 so P ( X ≤ 160 ) has normal approximation .
normalcdf
(
0
,
160.5
,
159
,
8.6447
)
= 0.5689
For Problem 3., you exclude 155 so P ( X > 155 ) has normal approximation .
normalcdf
(
155.5
,
10^99
,
159
,
8.6447
)
= 0.6572
For Problem 4., you exclude 147 so P ( X < 147 ) has normal approximation .
normalcdf
(
0
,
146.5
,
159
,
8.6447
)
= 0.0741
For Problem 5., P ( X = 175 ) has normal approximation .
normalcdf
(
174.5
,
175.5
,
159
,
8.6447
)
= 0.0083
Because of calculators and computer software that easily let you calculate binomial probabilities for large values of n , it is not necessary to use the the Normal Approximation to the Binomial provided you have access to these technology tools. Most school labs have Microsoft Excel, an example of computer software that calculates binomial probabilities. Many students have access to the TI-83 or 84 series calculators and they easily calculate probabilities for the binomial. In an Internet browser, if you type in “binomial probability distribution calculation,” you can find at least one online calculator for the binomial.
For Example 3, the probabilities are calculated using the binomial ( n =300 and p =0.53) below. Compare the binomial and normal distribution answers. See Discrete Random Variables for help with calculator instructions for the binomial.
P
(
X
≥
150
)
:
1 - binomialcdf
(
300
,
0.53
,
149
)
=0.8641
P
(
X
≤
160
)
:
binomialcdf
(
300
,
0.53
,
160
)
=0.5684
P
(
X
>
155
)
:
1 - binomialcdf
(
300
,
0.53
,
155
)
=0.6576
P
(
X
<
147
)
:
binomialcdf
(
300
,
0.53
,
146
)
=0.0742
P
(
X
=
175
)
: (You use the binomial pdf.)
binomialpdf
(
175
,
0.53
,
146
)
=0.0083
**Contributions made to Example 2 by Roberta Bloom
A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Given a random variable (RV) with known mean μ and known standard deviation σ . We are sampling with size n and we are interested in two new RVs - the sample mean, , and the sample sum, ΣX . If the size n of the sample is sufficiently large, then ∼ and Σ X ∼ . If the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. The mean of the sample means will equal the population mean and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, , is called the standard error of the mean.
A continuous random variable (RV) that appears when we are interested in the intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital. Notation: . The mean is and the standard deviation is . The probability density function is x ≥ 0 and the cumulative distribution function is .
A number that measures the central tendency. A common name for mean is ‘average.’ The term ‘mean’ is a shortened form of ‘arithmetic mean.’ By definition, the mean for a sample (denoted by ) is , and the mean for a population (denoted by μ ) is .
A continuous random variable (RV) that has equally likely outcomes over the domain, a < x < b . Often referred as the Rectangular distribution because the graph of the pdf has the form of a rectangle. Notation: X~U(a , b). The mean is and the standard deviation is The probability density function is for a < X < b or a ≤ X ≤ b . The cumulative distribution is .
Formula 7.1. Central Limit Theorem for Sample Means (Averages)
~ Mean for Averages :
Formula 7.2. Central Limit Theorem for Sample Means (Averages) Z-Score and Standard Error of the Mean
Standard Error of the Mean (Standard Deviation for Averages ):
Formula 7.3. Central Limit Theorem for Sums
ΣX ~ Mean for Sums ( ΣX ) :
Formula 7.4. Central Limit Theorem for Sums Z-Score and Standard Deviation for Sums
Standard Deviation for Sums ( ΣX ) :
The student will explore the properties of data through the Central Limit Theorem.
Yoonie is a personnel manager in a large corporation. Each month she must review 16 of the employees. From past experience, she has found that the reviews take her approximately 4 hours each to do with a population standard deviation of 1.2 hours. Let X be the random variable representing the time it takes her to complete one review. Assume X is normally distributed. Let be the random variable representing the average time to complete the 16 reviews. Let ΣX be the total time it takes Yoonie to complete all of the month’s reviews.
Complete the distributions.
X ~
~
ΣX ~
For each problem below:
a. Sketch the graph. Label and scale the horizontal axis. Shade the region corresponding to the probability. |
b. Calculate the value. |
Exercise 7.6.1. (Go to Solution)
Find the probability that one review will take Yoonie from 3.5 to 4.25 hours.
a. |
b. P( ________ ________ ) = _______ |
Exercise 7.6.2. (Go to Solution)
Find the probability that the average of a month’s reviews will take Yoonie from 3.5 to 4.25 hrs.
a. |
b. P( ) = _______ |
Exercise 7.6.3. (Go to Solution)
Find the 95th percentile for the average time to complete one month’s reviews.
a. |
b. The 95th Percentile= |
Exercise 7.6.4. (Go to Solution)
Find the probability that the sum of the month’s reviews takes Yoonie from 60 to 65 hours.
a. |
b. The Probability= |
Exercise 7.6.5. (Go to Solution)
Find the 95th percentile for the sum of the month’s reviews.
a. |
b. The 95th percentile= |
Exercise 7.7.1. (Go to Solution)
X ~ . Suppose that you form random samples of 25 from this distribution. Let be the random variable of averages. Let ΣX be the random variable of sums. For c - f, sketch the graph, shade the region, label and scale the horizontal axis for , and find the probability.
a. Sketch the distributions of X and on the same graph. |
b. ~ |
c. |
d. Find the 30th percentile. |
e. |
f. |
g. ΣX ~ |
h. Find the minimum value for the upper quartile. |
i. P ( 1400 < ΣX < 1550 ) = |
Exercise 7.7.2.
Determine which of the following are true and which are false. Then, in complete sentences, justify your answers.
a. When the sample size is large, the mean of is approximately equal to the mean of X . |
b. When the sample size is large, is approximately normally distributed. |
c. When the sample size is large, the standard deviation of is approximately the same as the standard deviation of X . |
Exercise 7.7.3. (Go to Solution)
The percent of fat calories that a person in America consumes each day is normally distributed with a mean of about 36 and a standard deviation of about 10. Suppose that 16 individuals are randomly chosen.
Let average percent of fat calories.
a. ______ ( ______ , ______ ) |
b. For the group of 16, find the probability that the average percent of fat calories consumed is more than 5. Graph the situation and shade in the area to be determined. |
c. Find the first quartile for the average percent of fat calories. |
Exercise 7.7.4.
Previously, De Anza statistics students estimated that the amount of change daytime statistics students carry is exponentially distributed with a mean of $0.88. Suppose that we randomly pick 25 daytime statistics students.
a. In words, X = |
b. X ~ |
c. In words, |
d. ______ ( ______ , ______ ) |
e. Find the probability that an individual had between $0.80 and $1.00. Graph the situation and shade in the area to be determined. |
f. Find the probability that the average of the 25 students was between $0.80 and $1.00. Graph the situation and shade in the area to be determined. |
g. Explain the why there is a difference in (e) and (f). |
Exercise 7.7.5. (Go to Solution)
Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with a mean of 250 feet and a standard deviation of 50 feet. We randomly sample 49 fly balls.
a. If average distance in feet for 49 fly balls, then _______ ( _______ , _______ ) |
b. What is the probability that the 49 balls traveled an average of less than 240 feet? Sketch the graph. Scale the horizontal axis for . Shade the region corresponding to the probability. Find the probability. |
c. Find the 80th percentile of the distribution of the average of 49 fly balls. |
Exercise 7.7.6.
Suppose that the weight of open boxes of cereal in a home with children is uniformly distributed from 2 to 6 pounds. We randomly survey 64 homes with children.
a. In words, X = |
b. X ~ |
c. μ _{ X } = |
d. σ _{ X } = |
e. In words, ΣX = |
f. ΣX ~ |
g. Find the probability that the total weight of open boxes is less than 250 pounds. |
h. Find the 35th percentile for the total weight of open boxes of cereal. |
Exercise 7.7.7. (Go to Solution)
Suppose that the duration of a particular type of criminal trial is known to have a mean of 21 days and a standard deviation of 7 days. We randomly sample 9 trials.
a. In words, ΣX = |
b. ΣX ~ |
c. Find the probability that the total length of the 9 trials is at least 225 days. |
d. 90 percent of the total of 9 of these types of trials will last at least how long? |
Exercise 7.7.8.
According to the Internal Revenue Service, the average length of time for an individual to complete (record keep, learn, prepare, copy, assemble and send) IRS Form 1040 is 10.53 hours (without any attached schedules). The distribution is unknown. Let us assume that the standard deviation is 2 hours. Suppose we randomly sample 36 taxpayers.
a. In words, X = |
b. In words, |
c. |
d. Would you be surprised if the 36 taxpayers finished their Form 1040s in an average of more than 12 hours? Explain why or why not in complete sentences. |
e. Would you be surprised if one taxpayer finished his Form 1040 in more than 12 hours? In a complete sentence, explain why. |
Exercise 7.7.9. (Go to Solution)
Suppose that a category of world class runners are known to run a marathon (26 miles) in an average of 145 minutes with a standard deviation of 14 minutes. Consider 49 of the races.
Let the average of the 49 races.
a. |
b. Find the probability that the runner will average between 142 and 146 minutes in these 49 marathons. |
c. Find the 80th percentile for the average of these 49 marathons. |
d. Find the median of the average running times. |
Exercise 7.7.10.
The attention span of a two year-old is exponentially distributed with a mean of about 8 minutes. Suppose we randomly survey 60 two year-olds.
a. In words, X = | ||
b. X ~ | ||
c. In words, | ||
d. | ||
e. Before doing any calculations, which do you think will be higher? Explain why.
| ||
f. Calculate the probabilities in part (e). | ||
g. Explain why the distribution for is not exponential. |
Exercise 7.7.11. (Go to Solution)
Suppose that the length of research papers is uniformly distributed from 10 to 25 pages. We survey a class in which 55 research papers were turned in to a professor. We are interested in the average length of the research papers.
a. In words, X = |
b. X ~ |
c. μ _{ X } = |
d. σ _{ X } = |
e. In words, |
f. |
g. In words, ΣX = |
h. ΣX ~ |
i. Without doing any calculations, do you think that it’s likely that the professor will need to read a total of more than 1050 pages? Why? |
j. Calculate the probability that the professor will need to read a total of more than 1050 pages. |
k. Why is it so unlikely that the average length of the papers will be less than 12 pages? |
Exercise 7.7.12.
The length of songs in a collector’s CD collection is uniformly distributed from 2 to 3.5 minutes. Suppose we randomly pick 5 CDs from the collection. There is a total of 43 songs on the 5 CDs.
a. In words, X = |
b. X ~ |
c. In words, |
d. |
e. Find the first quartile for the average song length. |
f. The IQR (interquartile range) for the average song length is from _______ to _______. |
Exercise 7.7.13. (Go to Solution)
Salaries for teachers in a particular elementary school district are normally distributed with a mean of $44,000 and a standard deviation of $6500. We randomly survey 10 teachers from that district.
a. In words, X = |
b. In words, |
c. |
d. In words, ΣX = |
e. ΣX ~ |
f. Find the probability that the teachers earn a total of over $400,000. |
g. Find the 90th percentile for an individual teacher’s salary. |
h. Find the 90th percentile for the average teachers’ salary. |
i. If we surveyed 70 teachers instead of 10, graphically, how would that change the distribution for ? |
j. If each of the 70 teachers received a $3000 raise, graphically, how would that change the distribution for ? |
Exercise 7.7.14.
The distribution of income in some Third World countries is considered wedge shaped (many very poor people, very few middle income people, and few to many wealthy people). Suppose we pick a country with a wedge distribution. Let the average salary be $2000 per year with a standard deviation of $8000. We randomly survey 1000 residents of that country.
a. In words, X = |
b. In words, |
c. |
d. How is it possible for the standard deviation to be greater than the average? |
e. Why is it more likely that the average of the 1000 residents will be from $2000 to $2100 than from $2100 to $2200? |
Exercise 7.7.15. (Go to Solution)
The average length of a maternity stay in a U.S. hospital is said to be 2.4 days with a standard deviation of 0.9 days. We randomly survey 80 women who recently bore children in a U.S. hospital.
a. In words, X = | ||
b. In words, | ||
c. | ||
d. In words, ΣX = | ||
e. ΣX ~ | ||
f. Is it likely that an individual stayed more than 5 days in the hospital? Why or why not? | ||
g. Is it likely that the average stay for the 80 women was more than 5 days? Why or why not? | ||
h. Which is more likely:
| ||
i. If we were to sum up the women’s stays, is it likely that, collectively they spent more than a year in the hospital? Why or why not? |
Exercise 7.7.16.
In 1940 the average size of a U.S. farm was 174 acres. Let’s say that the standard deviation was 55 acres. Suppose we randomly survey 38 farmers from 1940. (Source: U.S. Dept. of Agriculture)
a. In words, X = |
b. In words, |
c. |
d. The IQR for is from _______ acres to _______ acres. |
Exercise 7.7.17. (Go to Solution)
The stock closing prices of 35 U.S. semiconductor manufacturers are given below. (Source: Wall Street Journal)
8.625; 30.25; 27.625; 46.75; 32.875; 18.25; 5; 0.125; 2.9375; 6.875; 28.25; 24.25; 21; 1.5; 30.25; 71; 43.5; 49.25; 2.5625; 31; 16.5; 9.5; 18.5; 18; 9; 10.5; 16.625; 1.25; 18; 12.875; 7; 12.875; 2.875; 60.25; 29.25
a. In words, X = | |||
b.
| |||
c. Construct a histogram of the distribution of the averages. Start at x = − 0.0005. Make bar widths of 10. | |||
d. In words, describe the distribution of stock prices. | |||
e. Randomly average 5 stock prices together. (Use a random number generator.) Continue averaging 5 pieces together until you have 10 averages. List those 10 averages. | |||
f. Use the 10 averages from (e) to calculate:
| |||
g. Construct a histogram of the distribution of the averages. Start at x = − 0.0005. Make bar widths of 10. | |||
h. Does this histogram look like the graph in (c)? | |||
i. In 1 - 2 complete sentences, explain why the graphs either look the same or look different? | |||
j. Based upon the theory of the Central Limit Theorem, |
Exercise 7.7.18.
Use the Initial Public Offering data (see “Table of Contents) to do this problem.
a. In words, X = | |||
b.
| |||
c. Construct a histogram of the distribution. Start at x = − 0.50. Make bar widths of $5. | |||
d. In words, describe the distribution of stock prices. | |||
e. Randomly average 5 stock prices together. (Use a random number generator.) Continue averaging 5 pieces together until you have 15 averages. List those 15 averages. | |||
f. Use the 15 averages from (e) to calculate the following:
| |||
g. Construct a histogram of the distribution of the averages. Start at x = − 0.50. Make bar widths of $5. | |||
h. Does this histogram look like the graph in (c)? Explain any differences. | |||
i. In 1 - 2 complete sentences, explain why the graphs either look the same or look different? | |||
j. Based upon the theory of the Central Limit Theorem, |
The next two questions refer to the following information: The time to wait for a particular rural bus is distributed uniformly from 0 to 75 minutes. 100 riders are randomly sampled to learn how long they waited.
Exercise 7.7.19. (Go to Solution)
The 90th percentile sample average wait time (in minutes) for a sample of 100 riders is:
A. 315.0 |
B. 40.3 |
C. 38.5 |
D. 65.2 |
Exercise 7.7.20. (Go to Solution)
Would you be surprised, based upon numerical calculations, if the sample average wait time (in minutes) for 100 riders was less than 30 minutes?
A. Yes |
B. No |
C. There is not enough information. |
Exercise 7.7.21. (Go to Solution)
Which of the following is NOT TRUE about the distribution for averages?
A. The mean, median and mode are equal |
B. The area under the curve is one |
C. The curve never touches the x-axis |
D. The curve is skewed to the right |
The next three questions refer to the following information: The cost of unleaded gasoline in the Bay Area once followed an unknown distribution with a mean of $2.59 and a standard deviation of $0.10. Sixteen gas stations from the Bay Area are randomly chosen. We are interested in the average cost of gasoline for the 16 gas stations.
Exercise 7.7.22. (Go to Solution)
The distribution to use for the average cost of gasoline for the 16 gas stations is
A. ~ N ( 2.59 , 0.10 ) |
B. ~ |
C. ~ |
D. ~ |
Exercise 7.7.23. (Go to Solution)
What is the probability that the average price for 16 gas stations is over $2.69?
A. Almost zero |
B. 0.1587 |
C. 0.0943 |
D. Unknown |
Exercise 7.7.24. (Go to Solution)
Find the probability that the average price for 30 gas stations is less than $2.55.
A. 0.6554 |
B. 0.3446 |
C. 0.0142 |
D. 0.9858 |
E. 0 |
Exercise 7.7.25. (Go to Solution)
For the Charter School Problem (Example 3) in Central Limit Theorem: Using the Central Limit Theorem, calculate the following using the normal approximation to the binomial.
A. Find the probability that less than 100 favor a charter school for grades K - 5. |
B. Find the probability that 170 or more favor a charter school for grades K - 5. |
C. Find the probability that no more than 140 favor a charter school for grades K - 5. |
D. Find the probability that there are fewer than 130 that favor a charter school for grades K - 5. |
E. Find the probability that exactly 150 favor a charter school for grades K - 5. |
If you either have access to an appropriate calculator or computer software, try calculating these probabilities using the technology. Try also using the suggestion that is at the bottom of Central Limit Theorem: Using the Central Limit Theorem for finding a website that calculates binomial probabilities.
Exercise 7.7.26. (Go to Solution)
Four friends, Janice, Barbara, Kathy and Roberta, decided to carpool together to get to school. Each day the driver would be chosen by randomly selecting one of the four names. They carpool to school for 96 days. Use the normal approximation to the binomial to calculate the following probabilities.
A. Find the probability that Janice is the driver at most 20 days. |
B. Find the probability that Roberta is the driver more than 16 days. |
C. Find the probability that Barbara drives exactly 24 of those 96 days. |
If you either have access to an appropriate calculator or computer software, try calculating these probabilities using the technology. Try also using the suggestion that is at the bottom of Central Limit Theorem: Using the Central Limit Theorem for finding a website that calculates binomial probabilities.
**Exercise 24 contributed by Roberta Bloom
Solution to Exercise 7.7.1. (Return to Exercise)
b. |
c. 0.5000 |
d. 59.06 |
e. 0.8536 |
f. 0.1333 |
h. 1530.35 |
i. 0.8536 |
Solution to Exercise 7.7.7. (Return to Exercise)
a. The total length of time for 9 criminal trials |
b. N(189,21) |
c. 0.0432 |
d. 162.09 |
Solution to Exercise 7.7.11. (Return to Exercise)
b. U(10,25) |
c. 17.5 |
d. = 4.3301 |
f. N(17.5,0.5839) |
h. N(962.5,32.11) |
j. 0.0032 |
Solution to Exercise 7.7.17. (Return to Exercise)
b. $20.71; $17.31; 35 |
d. Exponential distribution, X ~ |
f. $20.71; $11.14 |
j. |
The next three questions refer to the following information: Richard’s Furniture Company delivers furniture from 10 A.M. to 2 P.M. continuously and uniformly. We are interested in how long (in hours) past the 10 A.M. start time that individuals wait for their delivery.
Exercise 7.8.2. (Go to Solution)
The average wait time is:
A. 1 hour |
B. 2 hour |
C. 2.5 hour |
D. 4 hour |
Exercise 7.8.3. (Go to Solution)
Suppose that it is now past noon on a delivery day. The probability that a person must wait at least more hours is:
A. |
B. |
C. |
D. |
Exercise 7.8.4. (Go to Solution)
Given: .
a. Find P(X > 1) |
b. Calculate the minimum value for the upper quartile. |
c. Find |
Exercise 7.8.5. (Go to Solution)
40% of full-time students took 4 years to graduate
30% of full-time students took 5 years to graduate
20% of full-time students took 6 years to graduate
10% of full-time students took 7 years to graduate
The expected time for full-time students to graduate is:
A. 4 years |
B. 4.5 years |
C. 5 years |
D. 5.5 years |
Exercise 7.8.6. (Go to Solution)
Which of the following distributions is described by the following example?
Many people can run a short distance of under 2 miles, but as the distance increases, fewer people can run that far.
A. Binomial |
B. Uniform |
C. Exponential |
D. Normal |
Exercise 7.8.7. (Go to Solution)
The length of time to brush one’s teeth is generally thought to be exponentially distributed with a mean of minutes. Find the probability that a randomly selected person brushes his/her teeth less than minutes.
A. 0.5 |
B. |
C. 0.43 |
D. 0.63 |
Exercise 7.8.8. (Go to Solution)
Which distribution accurately describes the following situation?
The chance that a teenage boy regularly gives his mother a kiss goodnight (and he should!!) is about 20%. Fourteen teenage boys are randomly surveyed.
X = the number of teenage boys that regularly give their mother a kiss goodnight
A. B ( 14 , 0 . 20 ) |
B. P ( 2 . 8 ) |
C. N ( 2 . 8,2 . 24 ) |
D. |
Exercise 7.8.9. (Go to Solution)
Which distribution accurately describes the following situation?
A 2008 report on technology use states that approximately 20 percent of U.S. households have never sent an e-mail. (source: http://www.webguild.org/2008/05/20-percent-of-americans-have-never-used-email.php) Suppose that we select a random sample of fourteen U.S. households .
X = the number of households in the sample of 14 households that have never sent an email
A. B ( 14 , 0 . 20 ) |
B. P ( 2 . 8 ) |
C. N ( 2 . 8,2 . 24 ) |
D. |
**Exercise 9 contributed by Roberta Bloom
Class Time:
Names:
The student will examine properties of the Central Limit Theorem.
This lab works best when sampling from several classes and combining data.
Count the change in your pocket. (Do not include bills.)
Randomly survey 30 classmates. Record the values of the change.
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
Construct a histogram. Make 5 - 6 intervals. Sketch the graph using a ruler and pencil. Scale the axes.
Figure 7.1.
Calculate the following ( n = 1; surveying one person at a time):
a. = |
b. s = |
Draw a smooth curve through the tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
Repeat steps 1 - 5 (of the section above titled “Collect the Data”) with one exception. Instead of recording the change of 30 classmates, record the average change of 30 pairs.
Randomly survey 30 pairs of classmates. Record the values of the average of their change.
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
Construct a histogram. Scale the axes using the same scaling you did for the section titled “Collecting the Data”. Sketch the graph using a ruler and a pencil.
Figure 7.2.
Calculate the following ( n = 2; surveying two people at a time):
a. = |
b. s = |
Draw a smooth curve through tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
Repeat steps 1 – 5 (of the section titled “Collect the Data”) with one exception. Instead of recording the change of 30 classmates, record the average change of 30 groups of 5.
Randomly survey 30 groups of 5 classmates. Record the values of the average of their change.
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
Construct a histogram. Scale the axes using the same scaling you did for the section titled “Collect the Data”. Sketch the graph using a ruler and a pencil.
Figure 7.3.
Calculate the following ( n = 5; surveying five people at a time):
a. = |
b. s = |
Draw a smooth curve through tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
As n changed, why did the shape of the distribution of the data change? Use 1 – 2 complete sentences to explain what happened.
In the section titled “Collect the Data”, what was the approximate distribution of the data? X ~
In the section titled “Collecting Averages of Groups of Five”, what was the approximate distribution of the averages? ~
In 1 – 2 complete sentences, explain any differences in your answers to the previous two questions.
Class Time:
Names:
The student will examine properties of the Central Limit Theorem.
X = length of time (in days) that a cookie recipe lasted at the Olmstead Homestead. (Assume that each of the different recipes makes the same quantity of cookies.)
Recipe # | X | Recipe # | X | Recipe # | X | Recipe # | X | |||
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 16 | 2 | 31 | 3 | 46 | 2 | |||
2 | 5 | 17 | 2 | 32 | 4 | 47 | 2 | |||
3 | 2 | 18 | 4 | 33 | 5 | 48 | 11 | |||
4 | 5 | 19 | 6 | 34 | 6 | 49 | 5 | |||
5 | 6 | 20 | 1 | 35 | 6 | 50 | 5 | |||
6 | 1 | 21 | 6 | 36 | 1 | 51 | 4 | |||
7 | 2 | 22 | 5 | 37 | 1 | 52 | 6 | |||
8 | 6 | 23 | 2 | 38 | 2 | 53 | 5 | |||
9 | 5 | 24 | 5 | 39 | 1 | 54 | 1 | |||
10 | 2 | 25 | 1 | 40 | 6 | 55 | 1 | |||
11 | 5 | 26 | 6 | 41 | 1 | 56 | 2 | |||
12 | 1 | 27 | 4 | 42 | 6 | 57 | 4 | |||
13 | 1 | 28 | 1 | 43 | 2 | 58 | 3 | |||
14 | 3 | 29 | 6 | 44 | 6 | 59 | 6 | |||
15 | 2 | 30 | 2 | 45 | 2 | 60 | 5 |
Calculate the following:
a. μ _{ x } = |
b. σ _{ x } = |
Use a random number generator to randomly select 4 samples of size n = 5 from the given population. Record your samples below. Then, for each sample, calculate the mean to the nearest tenth. Record them in the spaces provided. Record the sample means for the rest of the class.
Complete the table:
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample means from other groups: | |
---|---|---|---|---|---|
Means: |
Calculate the following:
a. |
b. |
Again, use a random number generator to randomly select 4 samples from the population. This time, make the samples of size n = 10. Record the samples below. As before, for each sample, calculate the mean to the nearest tenth. Record them in the spaces provided. Record the sample means for the rest of the class.
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample means from other groups: | |
---|---|---|---|---|---|
Means: |
Calculate the following:
a. |
b. |
For the original population, construct a histogram. Make intervals with bar width = 1 day. Sketch the graph using a ruler and pencil. Scale the axes.
Figure 7.4.
Draw a smooth curve through the tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
For the sample of n = 5 days averaged together, construct a histogram of the averages (your means together with the means of the other groups). Make intervals with . Sketch the graph using a ruler and pencil. Scale the axes.
Figure 7.5.
Draw a smooth curve through the tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
For the sample of n = 10 days averaged together, construct a histogram of the averages (your means together with the means of the other groups). Make intervals with . Sketch the graph using a ruler and pencil. Scale the axes.
Figure 7.6.
Draw a smooth curve through the tops of the bars of the histogram. Use 1 – 2 complete sentences to describe the general shape of the curve.
Compare the three histograms you have made, the one for the population and the two for the sample means. In three to five sentences, describe the similarities and differences.
State the theoretical (according to the CLT) distributions for the sample means.
a. n = 5 : ~ |
b. n = 10 : ~ |
Are the sample means for n = 5 and n = 10 “close” to the theoretical mean, μ _{ x } ? Explain why or why not.
Which of the two distributions of sample means has the smaller standard deviation? Why?
As n changed, why did the shape of the distribution of the data change? Use 1 – 2 complete sentences to explain what happened.