By the end of this chapter, the student should be able to:
Interpret the chi-square probability distribution as the sample size changes.
Conduct and interpret chi-square goodness-of-fit hypothesis tests.
Conduct and interpret chi-square test of independence hypothesis tests.
Conduct and interpret chi-square single variance hypothesis tests (optional).
Have you ever wondered if lottery numbers were evenly distributed or if some numbers occurred with a greater frequency? How about if the types of movies people preferred were different across different age groups? What about if a coffee machine was dispensing approximately the same amount of coffee each time? You could answer these questions by conducting a hypothesis test.
You will now study a new distribution, one that is used to determine the answers to the above examples. This distribution is called the Chi-square distribution.
In this chapter, you will learn the three major applications of the Chi-square distribution:
The goodness-of-fit test, which determines if data fit a particular distribution, such as with the lottery example
The test of independence, which determines if events are independent, such as with the movie example
The test of a single variance, which tests variability, such as with the coffee example
Though the Chi-square calculations depend on calculators or computers for most of the calculations, there is a table available (see the Table of Contents 15. Tables). TI-83+ and TI-84 calculator instructions are included in the text.
Look in the sports section of a newspaper or on the Internet for some sports data (baseball averages, basketball scores, golf tournament scores, football odds, swimming times, etc.). Plot a histogram and a boxplot using your data. See if you can determine a probability distribution that your data fits. Have a discussion with the class about your choice.
The notation for the chi-square distribution is:
χ ^{2} ~ χ ^{2} _{df}
where df = degrees of freedom depend on how chi-square is being used. (If you want to practice calculating chi-square probabilities then use df = n – 1 . The degrees of freedom for the three major uses are each calculated differently.)
For the χ ^{2} distribution, the population mean is μ = df and the population standard deviation is .
The random variable is shown as χ ^{2} but may be any upper case letter.
The random variable for a chi-square distribution with k degrees of freedom is the sum of k independent, squared standard normal variables.
The curve is nonsymmetrical and skewed to the right.
There is a different chi-square curve for each df.
Figure 11.1.
The test statistic for any test is always greater than or equal to zero.
When df > 90 , the chi-square curve approximates the normal. For X ~ χ _{1000} ^{2} the mean, μ = df = 1000 and the standard deviation, . Therefore, X ~ N ( 1000 , 44.7 ) , approximately.
The mean, μ , is located just to the right of the peak.
Figure 11.2.
In this type of hypothesis test, you determine whether the data “fit” a particular distribution or not. For example, you may suspect your unknown data fit a binomial distribution. You use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternate hypotheses for this test may be written in sentences or may be stated as equations or inequalities.
The test statistic for a goodness-of-fit test is:
where:
O = observed values (data)
E = expected values (from theory)
n = the number of different data cells or categories
The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are n terms of the form .
The degrees of freedom are df = (number of categories - 1).
The goodness-of-fit test is almost always right tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.
Example 11.1.
Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Three statistics instructors wondered whether the absentee rate was the same for every day of the school week. They took a sample of absent students from three of their statistics classes during one week of the term. The results of the survey appear in the table.
Monday | Tuesday | Wednesday | Thursday | Friday | |
---|---|---|---|---|---|
# of students absent | 28 | 22 | 18 | 20 | 32 |
Determine the null and alternate hypotheses needed to run a goodness-of-fit test.
Since the instructors wonder whether the absentee rate is the same for every school day, we could say in the null hypothesis that the data “fit” a uniform distribution.
H _{ o } : The rate at which college students are absent from their statistics class fits a uniform distribution.
The alternate hypothesis is the opposite of the null hypothesis.
H _{ a } : The rate at which college students are absent from their statistics class does not fit a uniform distribution.
Problem 1.
How many students do you expect to be absent on any given school day?
Solution
The total number of students in the sample is 120. If the null hypothesis were true, you would divide 120 by 5 to get 24 absences expected per day. The expected number is based on a true null hypothesis.
Problem 2.
What are the degrees of freedom (df)?
Solution
There are 5 days of the week or 5 “cells” or categories.
df = no. cells - 1 = 5 - 1 = 4
Example 11.2.
Employers particularly want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. That is, the average number of times an employee is absent is the same on Monday, Tuesday, Wednesday, Thursday, or Friday. Suppose a sample of 20 absent days was taken and the days absent were distributed as follows:
Monday | Tuesday | Wednesday | Thursday | Friday | |
---|---|---|---|---|---|
Number of Absences | 5 | 4 | 2 | 3 | 6 |
Problem
For the population of employees, do the absent days occur with equal frequencies during a five day work week? Test at a 5% significance level.
Solution
The null and alternate hypotheses are:
H _{ o } : The absent days occur with equal frequencies, that is, they fit a uniform distribution.
H _{ a } : The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.
If the absent days occur with equal frequencies, then, out of 20 absent days, there would be 4 absences on Monday, 4 on Tuesday, 4 on Wednesday, 4 on Thursday, and 4 on Friday. These numbers are the expected ( E ) values. The values in the table are the observed ( O ) values or data.
This time, calculate the χ ^{2} test statistic by hand. Make a chart with the following headings:
Expected ( E ) values
Observed ( O ) values
( O – E)
( O – E)^{2}
Now add (sum) the last column. Verify that the sum is 2.5. This is the χ ^{ 2 } test statistic.
To find the p-value, calculate . This test is right-tailed.
The dfs are the number of cells – 1 = 4 .
Next, complete a graph like the one below with the proper labeling and shading. (You should shade the right tail. It will be a “large” right tail for this example because the p-value is “large.”)
Use a computer or calculator to find the p-value. You should get p-value = 0.6446.
The decision is to not reject the null hypothesis.
Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.
TI-83+ and TI-84: Press 2nd DISTR
. Arrow down to
χ
^{2}
cdf
. Press ENTER
.
Enter (2.5,1E99,4)
. Rounded to 4 places, you should see 0.6446 which is the p-value.
TI-83+ and some TI-84 calculators do not have a special program for
the test statistic for the goodness-of-fit test. The next example (Example 11-3) has
the calculator instructions.
The newer TI-84 calculators have in STAT TESTS
the test Chi2 GOF
. To run the
test, put the observed values (the data) into a first list and the expected values (the
values you expect if the null hypothesis is true) into a second list. Press STAT
TESTS
and Chi2 GOF
. Enter the list names for the Observed list and the
Expected list. Enter whatever else is asked and press calculate
or draw
. Make
sure you clear any lists before you start. See below.
To Clear Lists in the calculators: Go into STAT EDIT
and arrow up to the list
name area of the particular list. Press CLEAR
and then arrow down. The list will
be cleared. Or, you can press STAT
and press 4 (for ClrList
). Enter the list name
and press ENTER
.
Example 11.3.
One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population) as follows:
Number of Televisions | Percent |
---|---|
0 | 10 |
1 | 16 |
2 | 55 |
3 | 11 |
over 3 | 8 |
The table contains expected ( E ) percents.
A random sample of 600 families in the far western United States resulted in the following data:
Number of Televisions | Frequency |
---|---|
0 | 66 |
1 | 119 |
2 | 340 |
3 | 60 |
over 3 | 15 |
Total = 600 |
The table contains observed ( O ) frequency values.
Problem
At the 1% significance level, does it appear that the distribution “number of televisions” of far western United States families is different from the distribution for the American population as a whole?
Solution
This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.
The first table contains expected percentages. To get expected ( E ) frequencies, multiply the percentage by 600. The expected frequencies are:
Number of Televisions | Percent | Expected Frequency |
---|---|---|
0 | 10 | ( 0.10 ) ⋅ ( 600 ) = 60 |
1 | 16 | ( 0.16 ) ⋅ ( 600 ) = 96 |
2 | 55 | ( 0.55 ) ⋅ ( 600 ) = 330 |
3 | 11 | ( 0.11 ) ⋅ ( 600 ) = 66 |
over 3 | 8 | ( 0.08 ) ⋅ ( 600 ) = 48 |
Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter .10*600.
H _{ o } : The “number of televisions” distribution of far western United States families is the same as the “number of televisions” distribution of the American population.
H _{ a } : The “number of televisions” distribution of far western United States families is different from the “number of televisions” distribution of the American population.
Distribution for the test: χ _{4} ^{2} where df = (the number of cells) – 1 = 5 – 1 = 4 .
Calculate the test statistic: χ ^{2} = 29.65
Graph:
Probability statement: .
Compare α and the p-value:
α = 0.01
p-value = 0.000006
So, α > p-value .
Make a decision: Since α > p-value , reject H _{ o } .
This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.
Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the “number of televisions” distribution for the far western United States is different from the “number of televisions” distribution for the American population as a whole.
TI-83+ and some TI-84 calculators: Press STAT
and ENTER
. Make sure to
clear lists L1
, L2
, and L3
if they have data in them (see the note at the end of
Example 11-2). Into L1
, put the observed frequencies 66
, 119
, 349
, 60
, 15
. Into
L2
, put the expected frequencies .10*600, .16*600
, .55*600
, .11*600
, .08*600
.
Arrow over to list L3
and up to the name area "L3"
. Enter (L1-L2)^2/L2
and
ENTER
. Press 2nd QUIT
. Press 2nd LIST
and arrow over to MATH
. Press 5
.
You should see "sum" (Enter L3)
. Rounded to 2 decimal places, you should
see 29.65
. Press 2nd DISTR
. Press 7
or Arrow down to 7:χ2cdf
and press
ENTER
. Enter (29.65,1E99,4)
. Rounded to 4 places, you should see 5.77E-6 = .000006
(rounded to 6 decimal places) which is the p-value.
Example 11.4.
Problem
Suppose you flip two coins 100 times. The results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the coins fair? Test at a 5% significance level.
Solution
This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is {HH, HT, TH, TT}. Out of 100 flips, you would expect 25 HH, 25 HT, 25 TH, and 25 TT. This is the expected distribution. The question, “Are the coins fair?” is the same as saying, “Does the distribution of the coins (20 HH, 27 HT, 30 TH, 23 TT) fit the expected distribution?”
Random Variable: Let X = the number of heads in one flip of the two coins. X takes on the value 0, 1, 2. (There are 0, 1, or 2 heads in the flip of 2 coins.) Therefore, the number of cells is 3. Since X = the number of heads, the observed frequencies are 20 (for 2 heads), 57 (for 1 head), and 23 (for 0 heads or both tails). The expected frequencies are 25 (for 2 heads), 50 (for 1 head), and 25 (for 0 heads or both tails). This test is right-tailed.
H _{ o } : The coins are fair.
H _{ a } : The coins are not fair.
Distribution for the test: χ _{2} ^{2} where df = 3 – 1 = 2 .
Calculate the test statistic: χ ^{2} = 2.14
Graph:
Probability statement:
Compare α and the p-value:
α = 0.05
p-value = 0.3430
So, α < p-value .
Make a decision: Since α < p-value , do not reject H _{ o } .
Conclusion: The coins are fair.
TI-83+ and some TI- 84 calculators: Press STAT
and ENTER
. Make sure you
clear lists L1
, L2
, and L3
if they have data in them. Into L1
, put the observed
frequencies 20
, 57
, 23
. Into L2
, put the expected frequencies 25
, 50
, 25
. Arrow
over to list L3
and up to the name area "L3"
. Enter (L1-L2)^2/L2
and
ENTER
. Press 2nd QUIT
. Press 2nd LIST
and arrow over to MATH
. Press
5
. You should see "sum"
.Enter L3
. Rounded to 2 decimal places, you
should see 2.14
. Press 2nd DISTR
. Arrow down to 7:χ2cdf
(or press 7
). Press
ENTER
. Enter 2.14,1E99,2)
. Rounded to 4 places, you should see .3430
which
is the p-value.
For the newer TI-84 calculators, check STAT TESTS
to see if you have Chi2
GOF
. If you do, see the calculator instructions (a NOTE) before Example 11-3
Tests of independence involve using a contingency table of observed (data) values. You first saw a contingency table when you studied probability in the Probability Topics chapter.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:
where:
O = observed values
E = expected values
i = the number of rows in the table
j = the number of columns in the table
There are i ⋅ j terms of the form .
A test of independence determines whether two factors are independent or not. You first encountered the term independence in Chapter 3. As a review, consider the following example.
Example 11.5.
Suppose A = a speeding violation in the last year and B = a car phone user. If A and B are independent then P ( A AND B ) = P ( A ) P ( B ) . A AND B is the event that a driver received a speeding violation last year and is also a car phone user. Suppose, in a study of drivers who received speeding violations in the last year and who use car phones, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 were car phone users and 450 were not.
Let y = expected number of car phone users who received speeding violations.
If A and B are independent, then P ( A AND B ) = P ( A ) P ( B ) . By substitution,
Solve for
About 28 people from the sample are expected to be car phone users and to receive speeding violations.
In a test of independence, we state the null and alternate hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternate hypothesis states that they are not independent (dependent). If we do a test of independence using the example above, then the null hypothesis is:
H _{ o } : Being a car phone user and receiving a speeding violation are independent events.
If the null hypothesis were true, we would expect about 28 people to be car phone users and to receive a speeding violation.
The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, like goodness-of-fit.
The degrees of freedom for the test of independence are:
df = (number of columns - 1)(number of rows - 1)
The following formula calculates the expected number ( E ):
Example 11.6.
In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. The following table is a sample of the adult volunteers and the number of hours they volunteer per week.
Type of Volunteer | 1-3 Hours | 4-6 Hours | 7-9 Hours | Row Total |
---|---|---|---|---|
Community College Students | 111 | 96 | 48 | 255 |
Four-Year College Students | 96 | 133 | 61 | 290 |
Nonstudents | 91 | 150 | 53 | 294 |
Column Total | 298 | 379 | 162 | 839 |
Problem
Are the number of hours volunteered independent of the type of volunteer?
Solution
The observed table and the question at the end of the problem, “Are the number of hours volunteered independent of the type of volunteer?” tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.
H _{ o } : The number of hours volunteered is independent of the type of volunteer.
H _{ a } : The number of hours volunteered is dependent on the type of volunteer.
The expected table is:
Type of Volunteer | 1-3 Hours | 4-6 Hours | 7-9 Hours |
---|---|---|---|
Community College Students | 90.57 | 115.19 | 49.24 |
Four-Year College Students | 103.00 | 131.00 | 56.00 |
Nonstudents | 104.42 | 132.81 | 56.77 |
For example, the calculation for the expected frequency for the top left cell is
Calculate the test statistic: (calculator or computer)
Distribution for the test: χ _{4} ^{2}
df = (3 columns – 1)(3 rows – 1) = (2)(2) = 4
Graph:
Probability statement:
Compare α and the p-value: Since no α is given, assume α = 0.05. p-value = 0.0113. α > p-value.
Make a decision: Since α > p-value, reject H _{ o } . This means that the factors are not independent.
Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.
For the above example, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?
Calculator instructions follow.
TI-83+ and TI-84 calculator: Press the MATRX
key and arrow over to
EDIT
. Press 1:[A]
. Press 3 ENTER 3 ENTER
. Enter the table values by
row from Example 11-6. Press ENTER
after each. Press 2nd QUIT
. Press
STAT
and arrow over to TESTS
. Arrow down to C:χ2-TEST
. Press
ENTER
. You should see Observed:[A] and Expected:[B]
. Arrow down to
Calculate
. Press ENTER
. The test statistic is 12.9909 and the p-value = 0.0113. Do the procedure a second time but arrow down to Draw
instead of
calculate
.
Example 11.7.
De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. The table shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.
Need to Succeed in School | High Anxiety | Med-high Anxiety | Medium Anxiety | Med-low Anxiety | Low Anxiety | Row Total |
---|---|---|---|---|---|---|
High Need | 35 | 42 | 53 | 15 | 10 | 155 |
Medium Need | 18 | 48 | 63 | 33 | 31 | 193 |
Low Need | 4 | 5 | 11 | 15 | 17 | 52 |
Column Total | 57 | 95 | 127 | 63 | 58 | 400 |
Problem 1.
How many high anxiety level students are expected to have a high need to succeed in school?
Solution
The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.
The expected number of students who have a high anxiety level and a high need to succeed in school is about 22.
Problem 2.
If the two variables are independent, how many students do you expect to have a low need to succeed in school and a med-low level of anxiety?
Solution
The column total for a med-low anxiety level is 63. The row total for a low need to succeed in school is 52. The sample size or total surveyed is 400.
Problem 3. (Go to Solution)
a. = |
b. The expected number of students who have a med-low anxiety level and a low need to succeed in school is about: |
The method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other. The table provides an easy way to calculate conditional probabilities.
A test of a single variance assumes that the underlying distribution is normal. The null and alternate hypotheses are stated in terms of the population variance (or population standard deviation). The test statistic is:
where:
n = the total number of data
s ^{2} = sample variance
σ ^{2} = population variance
You may think of s as the random variable in this test. The degrees of freedom are df = n – 1.
A test of a single variance may be right-tailed, left-tailed, or two-tailed.
The following example will show you how to set up the null and alternate hypotheses. The null and alternate hypotheses contain statements about the population variance.
Example 11.8.
Problem
Math instructors are not only interested in how their students do on exams, on average, but how the exam scores vary. To many instructors, the variance (or standard deviation) may be more important than the average.
Suppose a math instructor believes that the standard deviation for his final exam is 5 points. One of his best students thinks otherwise. The student claims that the standard deviation is more than 5 points. If the student were to conduct a hypothesis test, what would the null and alternate hypotheses be?
Solution
Even though we are given the population standard deviation, we can set the test up using the population variance as follows.
H _{ o } : σ ^{2} = 5^{2}
H _{ a } : σ ^{2} > 5^{2}
Example 11.9.
Problem
With individual lines at its various windows, a post office finds that the standard deviation for normally distributed waiting times for customers on Friday afternoon is 7.2 minutes. The post office experiments with a single main waiting line and finds that for a random sample of 25 customers, the waiting times for customers have a standard deviation of 3.5 minutes.
With a significance level of 5%, test the claim that a single line causes lower variation among waiting times (shorter waiting times) for customers.
Solution
Since the claim is that a single line causes lower variation, this is a test of a single variance. The parameter is the population variance, σ ^{2} , or the population standard deviation, σ .
Random Variable: The sample standard deviation, s , is the random variable. Let s = standard deviation for the waiting times.
H _{ o } : σ ^{2} = 7.2^{2}
H _{ a } : σ ^{2} < 7.2^{2}
The word “lower” tells you this is a left-tailed test.
Distribution for the test: χ _{24} ^{2} , where:
n = the number of customers sampled
df = n – 1 = 25 – 1 = 24
Calculate the test statistic:
where n = 25, s = 3.5, and σ = 7.2.
Graph:
Probability statement:
Compare α and the p-value:
Make a decision: Since α > p-value, reject H _{ o } .
This means that you reject σ ^{2} = 7.2^{2} . In other words, you do not think the variation in waiting times is 7.2 minutes, but lower.
Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that a single line causes a lower variation among the waiting times or with a single line, the customer waiting times vary less than 7.2 minutes.
TI-83+ and TI-84 calculators: In 2nd DISTR
, use 7:χ2cdf
. The syntax is
(lower, upper, df)
for the parameter list.
For Example 11-9, χ2cdf(-1E99,5.67,24)
. The p-value = 0.000042.
Formula 11.1. The Chi-square Probability Distribution
μ = df and
Formula 11.2. Goodness-of-Fit Hypothesis Test
Use goodness-of-fit to test whether a data set fits a particular probability distribution.
The degrees of freedom are number of cells or categories - 1.
The test statistic is , where O = observed values (data), E = expected values (from theory), and n = the number of different data cells or categories.
The test is right-tailed.
Formula 11.3. Test of Independence
Use the test of independence to test whether two factors are independent or not.
The degrees of freedom are equal to (number of columns - 1)(number of rows - 1).
The test statistic is where O = observed values, E = expected values, i = the number of rows in the table, and j = the number of columns in the table.
The test is right-tailed.
If the null hypothesis is true, the expected number .
Formula 11.4. Test of a Single Variance
Use the test to determine variation.
The degrees of freedom are the number of samples - 1.
The test statistic is , where n = the total number of data, s ^{2} = sample variance, and σ ^{2} = population variance.
The test may be left, right, or two-tailed.
The student will explore the properties of goodness-of-fit test data.
The following data are real. The cumulative number of AIDS cases reported for Santa Clara County through December 31, 2003, is broken down by ethnicity as follows:
Ethnicity | Number of Cases |
---|---|
White | 2032 |
Hispanic | 897 |
African-American | 372 |
Asian, Pacific Islander | 168 |
Native American | 20 |
Total = 3489 |
The percentage of each ethnic group in Santa Clara County is as follows:
Ethnicity | Percentage of total county population | Number expected (round to 2 decimal places) |
---|---|---|
White | 47.79% | 1667.39 |
Hispanic | 24.15% | |
African-American | 3.55% | |
Asian, Pacific Islander | 24.21% | |
Native American | 0.29% | |
Total = 100% |
If the ethnicity of AIDS victims followed the ethnicity of the total county population, fill in the expected number of cases per ethnic group.
Perform a goodness-of-fit test to determine whether the make-up of AIDS cases follows the ethnicity of the general population of Santa Clara County.
Exercise 11.8.1.
H _{ o } :
Exercise 11.8.2.
H _{ a } :
Exercise 11.8.3.
Is this a right-tailed, left-tailed, or two-tailed test?
Exercise 11.8.7.
Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade in the region corresponding to the p-value.
Let α = 0.05
Decision:
Reason for the Decision:
Conclusion (write out in complete sentences):
Exercise 11.8.8.
Does it appear that the pattern of AIDS cases in Santa Clara County corresponds to the distribution of ethnic groups in this county? Why or why not?
Solution to Exercise 11.8.6. (Return to Exercise)
Rounded to 4 decimal places, the p-value is 0.0000.
The student will explore the properties of contingency tables.
Conduct a hypothesis test to determine if smoking level and ethnicity are independent.
Copy the data provided in Probability Topics Practice 1: Calculating Probabilities into the table below.
Smoking Level Per Day | African American | Native Hawaiian | Latino | Japanese Americans | White | TOTALS |
---|---|---|---|---|---|---|
1-10 | ||||||
11-20 | ||||||
21-30 | ||||||
31+ | ||||||
TOTALS |
State the hypotheses.
H _{ o } :
H _{ a } :
Enter expected values in the above below. Round to two decimal places.
Calculate the following values:
Exercise 11.9.4. (Go to Solution)
Is this a right-tailed, left-tailed, or two-tailed test? Explain why.
Exercise 11.9.5.
Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade in the region corresponding to the p-value.
State the decision and conclusion (in a complete sentence) for the following preconceived levels of α .
Exercise 11.9.6. (Go to Solution)
α = 0 . 05
a. Decision: |
b. Reason for the decision: |
c. Conclusion (write out in a complete sentence): |
Exercise 11.9.7.
α = 0.01
a. Decision: |
b. Reason for the decision: |
c. Conclusion (write out in a complete sentence): |
The student will explore the properties of data with a test of a single variance.
Suppose an airline claims that its flights are consistently on time with an average delay of at most 15 minutes. It claims that the average delay is so consistent that the variance is no more than 150 minutes. Doubting the consistency part of the claim, a disgruntled traveler calculates the delays for his next 25 flights. The average delay for those 25 flights is 22 minutes with a standard deviation of 15 minutes.
Exercise 11.10.1.
Is the traveler disputing the claim about the average or about the variance?
Exercise 11.10.2. (Go to Solution)
A sample standard deviation of 15 minutes is the same as a sample variance of __________ minutes.
Exercise 11.10.3.
Is this a right-tailed, left-tailed, or two-tailed test?
Perform a hypothesis test on the consistency part of the claim.
Exercise 11.10.4.
H _{ o } :
Exercise 11.10.5.
H _{ a } :
Exercise 11.10.9.
Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade the p-value.
Exercise 11.10.10.
Let α = 0.05
Decision:
Conclusion (write out in a complete sentence):
Exercise 11.10.11.
How did you know to test the variance instead of the mean?
Exercise 11.10.12.
If an additional test were done on the claim of the average delay, which distribution would you use?
Exercise 11.10.13.
If an additional test was done on the claim of the average delay, but 45 flights were surveyed, which distribution would you use?
Exercise 11.11.1.
a. Explain why the “goodness of fit” test and the “test for independence” are generally right tailed tests. |
b. If you did a left-tailed test, what would you be testing? |
For each word problem, use a solution sheet to solve the hypothesis test problem. Go to The Table of Contents 14. Appendix for the solution sheet. Round expected frequency to two decimal places.
Exercise 11.11.2.
A 6-sided die is rolled 120 times. Fill in the expected frequency column. Then, conduct a hypothesis test to determine if the die is fair. The data below are the result of the 120 rolls.
Face Value | Frequency | Expected Frequency |
---|---|---|
1 | 15 | |
2 | 29 | |
3 | 16 | |
4 | 15 | |
5 | 30 | |
6 | 15 |
Exercise 11.11.3. (Go to Solution)
The marital status distribution of the U.S. male population, age 15 and older, is as shown below. (Source: U.S. Census Bureau, Current Population Reports)
Marital Status | Percent | Expected Frequency |
---|---|---|
never married | 31.3 | |
married | 56.1 | |
widowed | 2.5 | |
divorced/separated | 10.1 |
Suppose that a random sample of 400 U.S. young adult males, 18 – 24 years old, yielded the following frequency distribution. We are interested in whether this age group of males fits the distribution of the U.S. adult population. Calculate the frequency one would expect when surveying 400 people. Fill in the above table, rounding to two decimal places.
Marital Status | Frequency |
---|---|
never married | 140 |
married | 238 |
widowed | 2 |
divorced/separated | 20 |
The next two questions refer to the following information. The columns in the chart below contain the Race/Ethnicity of U.S. Public Schools: High School Class of 2009, the percentages for the Advanced Placement Examinee Population for that class and the Overall Student Population. (Source: http://www.collegeboard.com). Suppose the right column contains the result of a survey of 1000 local students from the Class of 2009 who took an AP Exam.
Race/Ethnicity | AP Examinee Population | Overall Student Population | Survey Frequency |
---|---|---|---|
Asian, Asian American or Pacific Islander | 10.2% | 5.4% | 113 |
Black or African American | 8.2% | 14.5% | 94 |
Hispanic or Latino | 15.5% | 15.9% | 136 |
American Indian or Alaska Native | 0.6% | 1.2% | 10 |
White | 59.4% | 61.6% | 604 |
Not reported/other | 6.1% | 1.4% | 43 |
Exercise 11.11.4.
Perform a goodness-of-fit test to determine whether the local results follow the distribution of the U. S. Overall Student Population based on ethnicity.
Exercise 11.11.5. (Go to Solution)
Perform a goodness-of-fit test to determine whether the local results follow the distribution of U. S. AP Examinee Population, based on ethnicity.
Exercise 11.11.6.
The City of South Lake Tahoe, CA, has an Asian population of 1419 people, out of a total population of 23,609 (Source: U.S. Census Bureau, Census 2000). Suppose that a survey of 1419 self-reported Asians in Manhattan, NY, area yielded the data in the table below. Conduct a goodness of fit test to determine if the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area.
Race | Lake Tahoe Frequency | Manhattan Frequency | |
---|---|---|---|
Asian Indian | 131 | 174 | |
Chinese | 118 | 557 | |
Filipino | 1045 | 518 | |
Japanese | 80 | 54 | |
Korean | 12 | 29 | |
Vietnamese | 9 | 21 | |
Other | 24 | 66 |
The next two questions refer to the following information: UCLA conducted a survey of more than 263,000 college freshmen from 385 colleges in fall 2005. The results of student expected majors by gender were reported in The Chronicle of Higher Education (2/2/2006). Suppose a survey of 5000 graduating females and 5000 graduating males was done as a follow-up in 2010 to determine what their actual major was. The results are shown in the tables for Exercises 7 and 8. The second column in each table does not add to 100% because of rounding.
Exercise 11.11.7. (Go to Solution)
Conduct a hypothesis test to determine if the actual college major of graduating females fits the distribution of their expected majors.
Major | Women - Expected Major | Women - Actual Major |
---|---|---|
Arts & Humanities | 14.0% | 670 |
Biological Sciences | 8.4% | 410 |
Business | 13.1% | 685 |
Education | 13.0% | 650 |
Engineering | 2.6% | 145 |
Physical Sciences | 2.6% | 125 |
Professional | 18.9% | 975 |
Social Sciences | 13.0% | 605 |
Technical | 0.4% | 15 |
Other | 5.8% | 300 |
Undecided | 8.0% | 420 |
Exercise 11.11.8.
Conduct a hypothesis test to determine if the actual college major of graduating males fits the distribution of their expected majors.
Major | Men - Expected Major | Men - Actual Major |
---|---|---|
Arts & Humanities | 11.0% | 600 |
Biological Sciences | 6.7% | 330 |
Business | 22.7% | 1130 |
Education | 5.8% | 305 |
Engineering | 15.6% | 800 |
Physical Sciences | 3.6% | 175 |
Professional | 9.3% | 460 |
Social Sciences | 7.6% | 370 |
Technical | 1.8% | 90 |
Other | 8.2% | 400 |
Undecided | 6.6% | 340 |
Exercise 11.11.9. (Go to Solution)
A recent debate about where in the United States skiers believe the skiing is best prompted the following survey. Test to see if the best ski area is independent of the level of the skier.
U.S. Ski Area | Beginner | Intermediate | Advanced |
---|---|---|---|
Tahoe | 20 | 30 | 40 |
Utah | 10 | 30 | 60 |
Colorado | 10 | 40 | 50 |
Exercise 11.11.10.
Car manufacturers are interested in whether there is a relationship between the size of car an individual drives and the number of people in the driver’s family (that is, whether car size and family size are independent). To test this, suppose that 800 car owners were randomly surveyed with the following results. Conduct a test for independence.
Family Size | Sub & Compact | Mid-size | Full-size | Van & Truck |
---|---|---|---|---|
1 | 20 | 35 | 40 | 35 |
2 | 20 | 50 | 70 | 80 |
3 - 4 | 20 | 50 | 100 | 90 |
5+ | 20 | 30 | 70 | 70 |
Exercise 11.11.11. (Go to Solution)
College students may be interested in whether or not their majors have any effect on starting salaries after graduation. Suppose that 300 recent graduates were surveyed as to their majors in college and their starting salaries after graduation. Below are the data. Conduct a test for independence.
Major | < $30,000 | $30,000 - $39,999 | $40,000 + |
---|---|---|---|
English | 5 | 20 | 5 |
Engineering | 10 | 30 | 60 |
Nursing | 10 | 15 | 15 |
Business | 10 | 20 | 30 |
Psychology | 20 | 30 | 20 |
Exercise 11.11.12.
Some travel agents claim that honeymoon hot spots vary according to age of the bride and groom. Suppose that 280 East Coast recent brides were interviewed as to where they spent their honeymoons. The information is given below. Conduct a test for independence.
Location | 20 - 29 | 30 - 39 | 40 - 49 | 50 and over |
---|---|---|---|---|
Niagara Falls | 15 | 25 | 25 | 20 |
Poconos | 15 | 25 | 25 | 10 |
Europe | 10 | 25 | 15 | 5 |
Virgin Islands | 20 | 25 | 15 | 5 |
Exercise 11.11.13. (Go to Solution)
A manager of a sports club keeps information concerning the main sport in which members participate and their ages. To test whether there is a relationship between the age of a member and his or her choice of sport, 643 members of the sports club are randomly selected. Conduct a test for independence.
Sport | 18 - 25 | 26 - 30 | 31 - 40 | 41 and over |
---|---|---|---|---|
racquetball | 42 | 58 | 30 | 46 |
tennis | 58 | 76 | 38 | 65 |
swimming | 72 | 60 | 65 | 33 |
Exercise 11.11.14.
A major food manufacturer is concerned that the sales for its skinny French fries have been decreasing. As a part of a feasibility study, the company conducts research into the types of fries sold across the country to determine if the type of fries sold is independent of the area of the country. The results of the study are below. Conduct a test for independence.
Type of Fries | Northeast | South | Central | West |
---|---|---|---|---|
skinny fries | 70 | 50 | 20 | 25 |
curly fries | 100 | 60 | 15 | 30 |
steak fries | 20 | 40 | 10 | 10 |
Exercise 11.11.15. (Go to Solution)
According to Dan Lenard, an independent insurance agent in the Buffalo, N.Y. area, the following is a breakdown of the amount of life insurance purchased by males in the following age groups. He is interested in whether the age of the male and the amount of life insurance purchased are independent events. Conduct a test for independence.
Age of Males | None | $50,000 - $100,000 | $100,001 - $150,000 | $150,001 - $200,000 | $200,000 + |
---|---|---|---|---|---|
20 - 29 | 40 | 15 | 40 | 0 | 5 |
30 - 39 | 35 | 5 | 20 | 20 | 10 |
40 - 49 | 20 | 0 | 30 | 0 | 30 |
50 + | 40 | 30 | 15 | 15 | 10 |
Exercise 11.11.16.
Suppose that 600 thirty–year–olds were surveyed to determine whether or not there is a relationship between the level of education an individual has and salary. Conduct a test for independence.
Annual Salary | Not a high school grad. | High school graduate | College graduate | Masters or doctorate |
---|---|---|---|---|
< $30,000 | 15 | 25 | 10 | 5 |
$30,000 - $40,000 | 20 | 40 | 70 | 30 |
$40,000 - $50,000 | 10 | 20 | 40 | 55 |
$50,000 - $60,000 | 5 | 10 | 20 | 60 |
$60,000 + | 0 | 5 | 10 | 150 |
Exercise 11.11.17. (Go to Solution)
A plant manager is concerned her equipment may need recalibrating. It seems that the actual weight of the 15 oz. cereal boxes it fills has been fluctuating. The standard deviation should be at most oz. In order to determine if the machine needs to be recalibrated, 84 randomly selected boxes of cereal from the next day’s production were weighed. The standard deviation of the 84 boxes was 0.54. Does the machine need to be recalibrated?
Exercise 11.11.18.
Consumers may be interested in whether the cost of a particular calculator varies from store to store. Based on surveying 43 stores, which yielded a sample mean of $84 and a sample standard deviation of $12, test the claim that the standard deviation is greater than $15.
Exercise 11.11.19. (Go to Solution)
Isabella, an accomplished Bay to Breakers runner, claims that the standard deviation for her time to run the 7 ½ mile race is at most 3 minutes. To test her claim, Rupinder looks up 5 of her race times. They are 55 minutes, 61 minutes, 58 minutes, 63 minutes, and 57 minutes.
Exercise 11.11.20.
Airline companies are interested in the consistency of the number of babies on each flight, so that they have adequate safety equipment. They are also interested in the variation of the number of babies. Suppose that an airline executive believes the average number of babies on flights is 6 with a variance of 9 at most. The airline conducts a survey. The results of the 18 flights surveyed give a sample average of 6.4 with a sample standard deviation of 3.9. Conduct a hypothesis test of the airline executive’s belief.
Exercise 11.11.21. (Go to Solution)
According to the U.S. Bureau of the Census, United Nations, in 1994 the number of births per woman in China was 1.8. This fertility rate has been attributed to the law passed in 1979 restricting births to one per woman. Suppose that a group of students studied whether or not the standard deviation of births per woman was greater than 0.75. They asked 50 women across China the number of births they had. Below are the results. Does the students’ survey indicate that the standard deviation is greater than 0.75?
# of births | Frequency |
---|---|
0 | 5 |
1 | 30 |
2 | 10 |
3 | 5 |
Exercise 11.11.22.
According to an avid aquariest, the average number of fish in a 20–gallon tank is 10, with a standard deviation of 2. His friend, also an aquariest, does not believe that the standard deviation is 2. She counts the number of fish in 15 other 20–gallon tanks. Based on the results that follow, do you think that the standard deviation is different from 2? Data: 11; 10; 9; 10; 10; 11; 11; 10; 12; 9; 7; 9; 11; 10; 11
Exercise 11.11.23. (Go to Solution)
The manager of “Frenchies” is concerned that patrons are not consistently receiving the same amount of French fries with each order. The chef claims that the standard deviation for a 10–ounce order of fries is at most 1.5 oz., but the manager thinks that it may be higher. He randomly weighs 49 orders of fries, which yields: mean of 11 oz., standard deviation of 2 oz.
Exercise 11.11.24. (Go to Solution)
As the degrees of freedom increase, the graph of the chi-square distribution looks more and more symmetrical.
Exercise 11.11.25. (Go to Solution)
The standard deviation of the chi-square distribution is twice the mean.
Exercise 11.11.26. (Go to Solution)
The mean and the median of the chi-square distribution are the same if .
Exercise 11.11.27. (Go to Solution)
In a Goodness-of-Fit test, the expected values are the values we would expect if the null hypothesis were true.
Exercise 11.11.28. (Go to Solution)
In general, if the observed values and expected values of a Goodness-of-Fit test are not close together, then the test statistic can get very large and on a graph will be way out in the right tail.
Exercise 11.11.29. (Go to Solution)
The degrees of freedom for a Test for Independence are equal to the sample size minus 1.
Exercise 11.11.30. (Go to Solution)
Use a Goodness-of-Fit test to determine if high school principals believe that students are absent equally during the week or not.
Exercise 11.11.31. (Go to Solution)
The Test for Independence uses tables of observed and expected data values.
Exercise 11.11.32. (Go to Solution)
The test to use when determining if the college or university a student chooses to attend is related to his/her socioeconomic status is a Test for Independence.
Exercise 11.11.33. (Go to Solution)
The test to use to determine if a six-sided die is fair is a Goodness-of-Fit test.
Exercise 11.11.34. (Go to Solution)
In a Test of Independence, the expected number is equal to the row total multiplied by the column total divided by the total surveyed.
Exercise 11.11.35. (Go to Solution)
In a Goodness-of Fit test, if the p-value is 0.0113, in general, do not reject the null hypothesis.
Exercise 11.11.36. (Go to Solution)
For a Chi-Square distribution with degrees of freedom of 17, the probability that a value is greater than 20 is 0.7258.
Exercise 11.11.37. (Go to Solution)
If , the chi-square distribution has a shape that reminds us of the exponential.
Solution to Exercise 11.11.3. (Return to Exercise)
a. The data fits the distribution |
b. The data does not fit the distribution |
c. 3 |
e. 19.27 |
f. 0.0002 |
h. Decision: Reject Null; Conclusion: Data does not fit the distribution. |
Solution to Exercise 11.11.5. (Return to Exercise)
c. 5 |
e. 13.4 |
f. 0.0199 |
g. Decision: Reject null when a = 0 . 05 ; Conclusion: Local data do not fit the AP Examinee Distribution. Decision: Do not reject null when a = 0 . 01 ; Conclusion: Local data do fit the AP Examinee Distribution. |
Solution to Exercise 11.11.7. (Return to Exercise)
c. 10 |
e. 11.48 |
f. 0.3214 |
h. Decision: Do not reject null when a = 0 . 05 and a = 0 . 01 ; Conclusion: Distribution of majors by graduating females fits the distribution of expected majors. |
Solution to Exercise 11.11.9. (Return to Exercise)
c. 4 |
e. 10.53 |
f. 0.0324 |
h. Decision: Reject null; Conclusion: Best ski area and level of skier are not independent. |
Solution to Exercise 11.11.11. (Return to Exercise)
c. 8 |
e. 33.55 |
f. 0 |
h. Decision: Reject null; Conclusion: Major and starting salary are not independent events. |
Solution to Exercise 11.11.13. (Return to Exercise)
c. 6 |
e. 25.21 |
f. 0.0003 |
h. Decision: Reject null |
Solution to Exercise 11.11.17. (Return to Exercise)
c. 83 |
d. 96.81 |
e. 0.1426 |
g. Decision: Do not reject null; Conclusion: The standard deviation is at most 0.5 oz. |
h. It does not need to be calibrated |
Solution to Exercise 11.11.19. (Return to Exercise)
c. 4 |
d. 4.52 |
e. 0.3402 |
g. Decision: Do not reject null. |
h. No |
Solution to Exercise 11.11.21. (Return to Exercise)
c. 49 |
d. 54.37 |
e. 0.2774 |
g. Decision: Do not reject null; Conclusion: The standard deviation is at most 0.75. |
h. No |
Solution to Exercise 11.11.23. (Return to Exercise)
a. σ ^{ 2 } ≤ ( 1 . 5 )^{ 2 } |
c. 48 |
d. 85.33 |
e. 0.0007 |
g. Decision: Reject null. |
h. Yes |
The next two questions refer to the following real study:
A recent survey of U.S. teenage pregnancy was answered by 720 girls, age 12 - 19. 6% of the girls surveyed said they have been pregnant. (Parade Magazine) We are interested in the true proportion of U.S. girls, age 12 - 19, who have been pregnant.
Exercise 11.12.1. (Go to Solution)
Find the 95% confidence interval for the true proportion of U.S. girls, age 12 - 19, who have been pregnant.
Exercise 11.12.2. (Go to Solution)
The report also stated that the results of the survey are accurate to within ± 3.7% at the 95% confidence level. Suppose that a new study is to be done. It is desired to be accurate to within 2% of the 95% confidence level. What will happen to the minimum number that should be surveyed?
Exercise 11.12.3.
Given: X ~ . Sketch the graph that depicts: P(X > 1).
The next four questions refer to the following information:
Suppose that the time that owners keep their cars (purchased new) is normally distributed with a mean of 7 years and a standard deviation of 2 years. We are interested in how long an individual keeps his car (purchased new). Our population is people who buy their cars new.
Exercise 11.12.5. (Go to Solution)
Suppose that we randomly survey one person. Find the probability that person keeps his/her car less than 2.5 years.
Exercise 11.12.6. (Go to Solution)
If we are to pick individuals 10 at a time, find the distribution for the average car length ownership.
Exercise 11.12.7. (Go to Solution)
If we are to pick 10 individuals, find the probability that the sum of their ownership time is more than 55 years.
Exercise 11.12.8. (Go to Solution)
For which distribution is the median not equal to the mean?
A. Uniform |
B. Exponential |
C. Normal |
D. Student-t |
Exercise 11.12.9. (Go to Solution)
Compare the standard normal distribution to the student-t distribution, centered at 0. Explain which of the following are true and which are false.
a. As the number surveyed increases, the area to the left of -1 for the student-t distribution approaches the area for the standard normal distribution. |
b. As the number surveyed increases, the area to the left of -1 for the standard normal distribution approaches the area for the student-t distribution. |
c. As the degrees of freedom decrease, the graph of the student-t distribution looks more like the graph of the standard normal distribution. |
d. If the number surveyed is less than 30, the normal distribution should never be used. |
The next five questions refer to the following information:
We are interested in the checking account balance of a twenty-year-old college student. We randomly survey 16 twenty-year-old college students. We obtain a sample mean of $640 and a sample standard deviation of $150. Let X = checking account balance of an individual twenty year old college student.
Exercise 11.12.10.
Explain why we cannot determine the distribution of X .
Exercise 11.12.11. (Go to Solution)
If you were to create a confidence interval or perform a hypothesis test for the population average checking account balance of 20-year old college students, what distribution would you use?
Exercise 11.12.12. (Go to Solution)
Find the 95% confidence interval for the true average checking account balance of a twenty-year-old college student.
Exercise 11.12.13. (Go to Solution)
What type of data is the balance of the checking account considered to be?
Exercise 11.12.14. (Go to Solution)
What type of data is the number of 20 year olds considered to be?
Exercise 11.12.15. (Go to Solution)
On average, a busy emergency room gets a patient with a shotgun wound about once per week. We are interested in the number of patients with a shotgun wound the emergency room gets per 28 days.
a. Define the random variable X . |
b. State the distribution for X . |
c. Find the probability that the emergency room gets no patients with shotgun wounds in the next 28 days. |
The next two questions refer to the following information:
The probability that a certain slot machine will pay back money when a quarter is inserted is 0.30 . Assume that each play of the slot machine is independent from each other. A person puts in 15 quarters for 15 plays.
Exercise 11.12.16. (Go to Solution)
Is the expected number of plays of the slot machine that will pay back money greater than, less than or the same as the median? Explain your answer.
Exercise 11.12.17. (Go to Solution)
Is it likely that exactly 8 of the 15 plays would pay back money? Justify your answer numerically.
Exercise 11.12.18. (Go to Solution)
A game is played with the following rules:
it costs $10 to enter
a fair coin is tossed 4 times
if you do not get 4 heads or 4 tails, you lose your $10
if you get 4 heads or 4 tails, you get back your $10, plus $30 more
Over the long run of playing this game, what are your expected earnings?
Exercise 11.12.19. (Go to Solution)
The average grade on a math exam in Rachel’s class was 74, with a standard deviation of 5. Rachel earned an 80.
The average grade on a math exam in Becca’s class was 47, with a standard deviation of 2. Becca earned a 51.
The average grade on a math exam in Matt’s class was 70, with a standard deviation of 8. Matt earned an 83.
Find whose score was the best, compared to his or her own class. Justify your answer numerically.
The next two questions refer to the following information:
70 compulsive gamblers were asked the number of days they go to casinos per week. The results are given in the following graph:
Figure 11.3.
Exercise 11.12.22. (Go to Solution)
Based upon research at De Anza College, it is believed that about 19% of the student population speaks a language other than English at home.
Suppose that a study was done this year to see if that percent has decreased. Ninety-eight students were randomly surveyed with the following results. Fourteen said that they speak a language other than English at home.
a. State an appropriate null hypothesis. |
b. State an appropriate alternate hypothesis. |
c. Define the Random Variable, P’ . |
d. Calculate the test statistic. |
e. Calculate the p-value. |
f. At the 5% level of decision, what is your decision about the null hypothesis? |
g. What is the Type I error? |
h. What is the Type II error? |
Exercise 11.12.23.
Assume that you are an emergency paramedic called in to rescue victims of an accident. You need to help a patient who is bleeding profusely. The patient is also considered to be a high risk for contracting AIDS. Assume that the null hypothesis is that the patient does not have the HIV virus. What is a Type I error?
Exercise 11.12.24. (Go to Solution)
It is often said that Californians are more casual than the rest of Americans. Suppose that a survey was done to see if the proportion of Californian professionals that wear jeans to work is greater than the proportion of non-Californian professionals. Fifty of each was surveyed with the following results. 10 Californians wear jeans to work and 4 non-Californians wear jeans to work.
C = Californian professional
= non-Californian professional
a. State appropriate null and alternate hypotheses. |
b. Define the Random Variable. |
c. Calculate the test statistic and p-value. |
d. At the 5% level of decision, do you accept or reject the null hypothesis? |
e. What is the Type I error? |
f. What is the Type II error? |
The next two questions refer to the following information:
A group of Statistics students have developed a technique that they feel will lower their anxiety level on statistics exams. They measured their anxiety level at the start of the quarter and again at the end of the quarter. Recorded is the paired data in that order: (1000, 900); (1200, 1050); (600, 700); (1300, 1100); (1000, 900); (900, 900).
Exercise 11.12.25. (Go to Solution)
This is a test of (pick the best answer):
A. large samples, independent means |
B. small samples, independent means |
C. dependent means |
Solution to Exercise 11.12.22. (Return to Exercise)
d. z = − 1 . 19 |
e. 0.1171 |
f. Do not reject the null |
Class Time:
Names:
The student will evaluate data collected to determine if they fit either the uniform or exponential distributions.
Go to your local supermarket. Ask 30 people as they leave for the total amount on their grocery receipts. (Or, ask 3 cashiers for the last 10 amounts. Be sure to include the express lane, if it is open.)
Record the values.
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
__________ | __________ | __________ | __________ | __________ |
Construct a histogram of the data. Make 5 - 6 intervals. Sketch the graph using a ruler and pencil. Scale the axes.
Figure 11.4.
Calculate the following:
a. |
b. s = |
c. s ^{2} = |
Test to see if grocery receipts follow the uniform distribution.
Using your lowest and highest values, X ~
Divide the distribution above into fifths.
Calculate the following:
a. Lowest value = |
b. 20th percentile = |
c. 40th percentile = |
d. 60th percentile = |
e. 80th percentile = |
f. Highest value = |
For each fifth, count the observed number of receipts and record it. Then determine the expected number of receipts and record that.
Fifth | Observed | Expected |
---|---|---|
1st | ||
2nd | ||
3rd | ||
4th | ||
5th |
H _{ o } :
H _{ a } :
What distribution should you use for a hypothesis test?
Why did you choose this distribution?
Calculate the test statistic.
Find the p-value.
Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.
Figure 11.5.
State your decision.
State your conclusion in a complete sentence.
Test to see if grocery receipts follow the exponential distribution with decay parameter .
Using as the decay parameter, X ~ .
Calculate the following:
a. Lowest value = |
b. First quartile = |
c. 37th percentile = |
d. Median = |
e. 63rd percentile = |
f. 3rd quartile = |
g. Highest value = |
For each cell, count the observed number of receipts and record it. Then determine the expected number of receipts and record that.
Cell | Observed | Expected |
---|---|---|
1st | ||
2nd | ||
3rd | ||
4th | ||
5th | ||
6th |
H _{ o }
H _{ a }
What distribution should you use for a hypothesis test?
Why did you choose this distribution?
Calculate the test statistic.
Find the p-value.
Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.
Figure 11.6.
State your decision.
State your conclusion in a complete sentence.
Did your data fit either distribution? If so, which?
In general, do you think it’s likely that data could fit more than one distribution? In complete sentences, explain why or why not.
Class Time:
Names:
The student will evaluate if there is a significant relationship between favorite type of snack and gender.
Using your class as a sample, complete the following chart.
sweets (candy & baked goods) | ice cream | chips & pretzels | fruits & vegetables | Total | |
---|---|---|---|---|---|
male | |||||
female | |||||
Total |
Looking at the above chart, does it appear to you that there is dependence between gender and favorite type of snack food? Why or why not?
Conduct a hypothesis test to determine if the factors are independent
H _{ o } :
H _{ a } :
What distribution should you use for a hypothesis test?
Why did you choose this distribution?
Calculate the test statistic.
Find the p-value.
Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.
Figure 11.7.
State your decision.
State your conclusion in a complete sentence.
Is the conclusion of your study the same as or different from your answer to (I2) above?
Why do you think that occurred?