Chapter 11. The Chi-Square Distribution

11.1. The Chi-Square Distribution*

Student Learning Objectives

By the end of this chapter, the student should be able to:

  • Interpret the chi-square probability distribution as the sample size changes.

  • Conduct and interpret chi-square goodness-of-fit hypothesis tests.

  • Conduct and interpret chi-square test of independence hypothesis tests.

  • Conduct and interpret chi-square single variance hypothesis tests (optional).

Introduction

Have you ever wondered if lottery numbers were evenly distributed or if some numbers occurred with a greater frequency? How about if the types of movies people preferred were different across different age groups? What about if a coffee machine was dispensing approximately the same amount of coffee each time? You could answer these questions by conducting a hypothesis test.

You will now study a new distribution, one that is used to determine the answers to the above examples. This distribution is called the Chi-square distribution.

In this chapter, you will learn the three major applications of the Chi-square distribution:

  • The goodness-of-fit test, which determines if data fit a particular distribution, such as with the lottery example

  • The test of independence, which determines if events are independent, such as with the movie example

  • The test of a single variance, which tests variability, such as with the coffee example

Note

Though the Chi-square calculations depend on calculators or computers for most of the calculations, there is a table available (see the Table of Contents 15. Tables). TI-83+ and TI-84 calculator instructions are included in the text.

Optional Collaborative Classroom Activity

Look in the sports section of a newspaper or on the Internet for some sports data (baseball averages, basketball scores, golf tournament scores, football odds, swimming times, etc.). Plot a histogram and a boxplot using your data. See if you can determine a probability distribution that your data fits. Have a discussion with the class about your choice.

11.2. Notation*

The notation for the chi-square distribution is:

χ 2 ~ χ 2 df

where df = degrees of freedom depend on how chi-square is being used. (If you want to practice calculating chi-square probabilities then use df = n – 1 . The degrees of freedom for the three major uses are each calculated differently.)

For the χ 2 distribution, the population mean is μ = df and the population standard deviation is .

The random variable is shown as χ 2 but may be any upper case letter.

The random variable for a chi-square distribution with k degrees of freedom is the sum of k independent, squared standard normal variables.

11.3. Facts About the Chi-Square Distribution*

  1. The curve is nonsymmetrical and skewed to the right.

  2. There is a different chi-square curve for each df.

    Figure 11.1. 

    Example of a nonsymmetrical chi-square curve that has a different df from the graph on the right. The curve begins at (0,∞) and slopes downwards to (∞,0).
    (a)
    Example of a nonsymmetrical and skewed to the right, the peak is closer to the left and more values are in the tail on the right, chi-square curve which has a different df from the graph on the left.
    (b)


  3. The test statistic for any test is always greater than or equal to zero.

  4. When df > 90 , the chi-square curve approximates the normal. For X ~ χ 1000 2 the mean, μ = df = 1000 and the standard deviation, . Therefore, X ~ N ( 1000 , 44.7 ) , approximately.

  5. The mean, μ , is located just to the right of the peak.

    Figure 11.2. 

    Example of how the mean is located to the right of the peak with a nonsymmetrical chi-square curve skewed to the right with the mean on the x-axis.


11.4. Goodness-of-Fit Test*

In this type of hypothesis test, you determine whether the data “fit” a particular distribution or not. For example, you may suspect your unknown data fit a binomial distribution. You use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternate hypotheses for this test may be written in sentences or may be stated as equations or inequalities.

The test statistic for a goodness-of-fit test is:

(11.1)

where:

  • O = observed values (data)

  • E = expected values (from theory)

  • n = the number of different data cells or categories

The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are n terms of the form .

The degrees of freedom are df = (number of categories - 1).

The goodness-of-fit test is almost always right tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.

Example 11.1. 

Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Three statistics instructors wondered whether the absentee rate was the same for every day of the school week. They took a sample of absent students from three of their statistics classes during one week of the term. The results of the survey appear in the table.

Table 11.1.
 MondayTuesdayWednesdayThursdayFriday
# of students absent2822182032

Determine the null and alternate hypotheses needed to run a goodness-of-fit test.

Since the instructors wonder whether the absentee rate is the same for every school day, we could say in the null hypothesis that the data “fit” a uniform distribution.

H o : The rate at which college students are absent from their statistics class fits a uniform distribution.

The alternate hypothesis is the opposite of the null hypothesis.

H a : The rate at which college students are absent from their statistics class does not fit a uniform distribution.

Problem 1.

How many students do you expect to be absent on any given school day?

Solution

The total number of students in the sample is 120. If the null hypothesis were true, you would divide 120 by 5 to get 24 absences expected per day. The expected number is based on a true null hypothesis.



Problem 2.

What are the degrees of freedom (df)?

Solution

There are 5 days of the week or 5 “cells” or categories.

df = no. cells - 1 = 5 - 1 = 4




Example 11.2. 

Employers particularly want to know which days of the week employees are absent in a five day work week. Most employers would like to believe that employees are absent equally during the week. That is, the average number of times an employee is absent is the same on Monday, Tuesday, Wednesday, Thursday, or Friday. Suppose a sample of 20 absent days was taken and the days absent were distributed as follows:

Table 11.2. Day of the Week Absent
 MondayTuesdayWednesdayThursdayFriday
Number of Absences54236

Problem

For the population of employees, do the absent days occur with equal frequencies during a five day work week? Test at a 5% significance level.

Solution

The null and alternate hypotheses are:

  • H o : The absent days occur with equal frequencies, that is, they fit a uniform distribution.

  • H a : The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies, then, out of 20 absent days, there would be 4 absences on Monday, 4 on Tuesday, 4 on Wednesday, 4 on Thursday, and 4 on Friday. These numbers are the expected ( E ) values. The values in the table are the observed ( O ) values or data.

This time, calculate the χ 2 test statistic by hand. Make a chart with the following headings:

  • Expected ( E ) values

  • Observed ( O ) values

  • ( OE)

  • ( OE)2

Now add (sum) the last column. Verify that the sum is 2.5. This is the χ 2 test statistic.

To find the p-value, calculate . This test is right-tailed.

The dfs are the number of cells – 1 = 4 .

Next, complete a graph like the one below with the proper labeling and shading. (You should shade the right tail. It will be a “large” right tail for this example because the p-value is “large.”)

Blank nonsymmetrical chi-square curve for the test statistic of the days of the week absent.

Use a computer or calculator to find the p-value. You should get p-value = 0.6446.

The decision is to not reject the null hypothesis.

Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

TI-83+ and TI-84: Press 2nd DISTR. Arrow down to χ 2 cdf. Press ENTER. Enter (2.5,1E99,4). Rounded to 4 places, you should see 0.6446 which is the p-value.

Note

TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the goodness-of-fit test. The next example (Example 11-3) has the calculator instructions. The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list and the Expected list. Enter whatever else is asked and press calculate or draw. Make sure you clear any lists before you start. See below.

Note

To Clear Lists in the calculators: Go into STAT EDIT and arrow up to the list name area of the particular list. Press CLEAR and then arrow down. The list will be cleared. Or, you can press STAT and press 4 (for ClrList). Enter the list name and press ENTER.




Example 11.3. 

One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population) as follows:

Table 11.3.
Number of TelevisionsPercent
010
116
255
311
over 38

The table contains expected ( E ) percents.

A random sample of 600 families in the far western United States resulted in the following data:

Table 11.4.
Number of TelevisionsFrequency
066
1119
2340
360
over 315
 Total = 600

The table contains observed ( O ) frequency values.

Problem

At the 1% significance level, does it appear that the distribution “number of televisions” of far western United States families is different from the distribution for the American population as a whole?

Solution

This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.

The first table contains expected percentages. To get expected ( E ) frequencies, multiply the percentage by 600. The expected frequencies are:

Table 11.5.
Number of TelevisionsPercentExpected Frequency
010 ( 0.10 ) ⋅ ( 600 ) = 60
116 ( 0.16 ) ⋅ ( 600 ) = 96
255 ( 0.55 ) ⋅ ( 600 ) = 330
311 ( 0.11 ) ⋅ ( 600 ) = 66
over 38 ( 0.08 ) ⋅ ( 600 ) = 48

Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter .10*600.

H o : The “number of televisions” distribution of far western United States families is the same as the “number of televisions” distribution of the American population.

H a : The “number of televisions” distribution of far western United States families is different from the “number of televisions” distribution of the American population.

Distribution for the test: χ 4 2 where df = (the number of cells) – 1 = 5 – 1 = 4 .

Note

df ≠ 600 − 1

Calculate the test statistic: χ 2 = 29.65

Graph:

Non-symmetric chi-square curve with values of 0, 4, and 29.65 on the x-axis representing the test statistic of the comparison of the number of televisions in America. A vertical upward line extends from 29.65 to the curve, and the area to the right of this line is equal to the p-value.

Probability statement: .

Compare α and the p-value:

  • α = 0.01

  • p-value = 0.000006

So, α > p-value .

Make a decision: Since α > p-value , reject H o .

This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.

Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the “number of televisions” distribution for the far western United States is different from the “number of televisions” distribution for the American population as a whole.

Note

TI-83+ and some TI-84 calculators: Press STAT and ENTER. Make sure to clear lists L1, L2, and L3 if they have data in them (see the note at the end of Example 11-2). Into L1, put the observed frequencies 66, 119, 349, 60, 15. Into L2, put the expected frequencies .10*600, .16*600, .55*600, .11*600, .08*600. Arrow over to list L3 and up to the name area "L3". Enter (L1-L2)^2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You should see "sum" (Enter L3). Rounded to 2 decimal places, you should see 29.65. Press 2nd DISTR. Press 7 or Arrow down to 7:χ2cdf and press ENTER. Enter (29.65,1E99,4). Rounded to 4 places, you should see 5.77E-6 = .000006 (rounded to 6 decimal places) which is the p-value.




Example 11.4. 

Problem

Suppose you flip two coins 100 times. The results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the coins fair? Test at a 5% significance level.

Solution

This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is {HH, HT, TH, TT}. Out of 100 flips, you would expect 25 HH, 25 HT, 25 TH, and 25 TT. This is the expected distribution. The question, “Are the coins fair?” is the same as saying, “Does the distribution of the coins (20 HH, 27 HT, 30 TH, 23 TT) fit the expected distribution?”

Random Variable: Let X = the number of heads in one flip of the two coins. X takes on the value 0, 1, 2. (There are 0, 1, or 2 heads in the flip of 2 coins.) Therefore, the number of cells is 3. Since X = the number of heads, the observed frequencies are 20 (for 2 heads), 57 (for 1 head), and 23 (for 0 heads or both tails). The expected frequencies are 25 (for 2 heads), 50 (for 1 head), and 25 (for 0 heads or both tails). This test is right-tailed.

H o : The coins are fair.

H a : The coins are not fair.

Distribution for the test: χ 2 2 where df = 3 – 1 = 2 .

Calculate the test statistic: χ 2 = 2.14

Graph:

Nonsymmetrical chi-square curve with values of 0 and 2.14 on the x-axis representing the test statistic of results from flipping a coin. A vertical upward line extends from 2.14 to the curve and the area to the right of this is equal to the p-value.

Probability statement:

Compare α and the p-value:

  • α = 0.05

  • p-value = 0.3430

So, α < p-value .

Make a decision: Since α < p-value , do not reject H o .

Conclusion: The coins are fair.

Note

TI-83+ and some TI- 84 calculators: Press STAT and ENTER. Make sure you clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 20, 57, 23. Into L2, put the expected frequencies 25, 50, 25. Arrow over to list L3 and up to the name area "L3". Enter (L1-L2)^2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You should see "sum".Enter L3. Rounded to 2 decimal places, you should see 2.14. Press 2nd DISTR. Arrow down to 7:χ2cdf (or press 7). Press ENTER. Enter 2.14,1E99,2). Rounded to 4 places, you should see .3430 which is the p-value.

Note

For the newer TI-84 calculators, check STAT TESTS to see if you have Chi2 GOF. If you do, see the calculator instructions (a NOTE) before Example 11-3




11.5. Test of Independence*

Tests of independence involve using a contingency table of observed (data) values. You first saw a contingency table when you studied probability in the Probability Topics chapter.

The test statistic for a test of independence is similar to that of a goodness-of-fit test:

(11.2)

where:

  • O = observed values

  • E = expected values

  • i = the number of rows in the table

  • j = the number of columns in the table

There are ij terms of the form .

A test of independence determines whether two factors are independent or not. You first encountered the term independence in Chapter 3. As a review, consider the following example.

Example 11.5. 

Suppose A = a speeding violation in the last year and B = a car phone user. If A and B are independent then P ( A AND B ) = P ( A ) P ( B ) . A AND B is the event that a driver received a speeding violation last year and is also a car phone user. Suppose, in a study of drivers who received speeding violations in the last year and who use car phones, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 were car phone users and 450 were not.

Let y = expected number of car phone users who received speeding violations.

If A and B are independent, then P ( A AND B ) = P ( A ) P ( B ) . By substitution,

Solve for

About 28 people from the sample are expected to be car phone users and to receive speeding violations.

In a test of independence, we state the null and alternate hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternate hypothesis states that they are not independent (dependent). If we do a test of independence using the example above, then the null hypothesis is:

H o : Being a car phone user and receiving a speeding violation are independent events.

If the null hypothesis were true, we would expect about 28 people to be car phone users and to receive a speeding violation.

The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, like goodness-of-fit.

The degrees of freedom for the test of independence are:

df = (number of columns - 1)(number of rows - 1)

The following formula calculates the expected number ( E ):


Example 11.6. 

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. The following table is a sample of the adult volunteers and the number of hours they volunteer per week.

Table 11.6. Number of Hours Worked Per Week by Volunteer Type (Observed)
The table contains observed (O) values (data).
Type of Volunteer1-3 Hours4-6 Hours7-9 HoursRow Total
Community College Students1119648255
Four-Year College Students9613361290
Nonstudents9115053294
Column Total298379162839

Problem

Are the number of hours volunteered independent of the type of volunteer?

Solution

The observed table and the question at the end of the problem, “Are the number of hours volunteered independent of the type of volunteer?” tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.

H o : The number of hours volunteered is independent of the type of volunteer.

H a : The number of hours volunteered is dependent on the type of volunteer.

The expected table is:

Table 11.7. Number of Hours Worked Per Week by Volunteer Type (Expected)
The table contains expected ( E ) values (data).
Type of Volunteer1-3 Hours4-6 Hours7-9 Hours
Community College Students90.57115.1949.24
Four-Year College Students103.00131.0056.00
Nonstudents104.42132.8156.77

For example, the calculation for the expected frequency for the top left cell is

Calculate the test statistic: (calculator or computer)

Distribution for the test: χ 4 2

df = (3 columns – 1)(3 rows – 1) = (2)(2) = 4

Graph:

Nonsymmetrical chi-square curve with values of 0 and 12.99 on the x-axis representing the test statistic of number of hours worked by volunteers of different types. A vertical upward line extends from 12.99 to the curve and the area to the right of this is equal to the p-value.

Probability statement:

Compare α and the p-value: Since no α is given, assume α = 0.05. p-value = 0.0113. α > p-value.

Make a decision: Since α > p-value, reject H o . This means that the factors are not independent.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.

For the above example, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?

Note

Calculator instructions follow.

TI-83+ and TI-84 calculator: Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row from Example 11-6. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 12.9909 and the p-value = 0.0113. Do the procedure a second time but arrow down to Draw instead of calculate.




Example 11.7. 

De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. The table shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.

Table 11.8. Need to Succeed in School vs. Anxiety Level
Need to Succeed in SchoolHigh AnxietyMed-high AnxietyMedium AnxietyMed-low AnxietyLow AnxietyRow Total
High Need3542531510155
Medium Need1848633331193
Low Need4511151752
Column Total57951276358400

Problem 1.

How many high anxiety level students are expected to have a high need to succeed in school?

Solution

The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.

The expected number of students who have a high anxiety level and a high need to succeed in school is about 22.



Problem 2.

If the two variables are independent, how many students do you expect to have a low need to succeed in school and a med-low level of anxiety?

Solution

The column total for a med-low anxiety level is 63. The row total for a low need to succeed in school is 52. The sample size or total surveyed is 400.

Problem 3. (Go to Solution)

a. =
b. The expected number of students who have a med-low anxiety level and a low need to succeed in school is about:




Solutions to Exercises

Solution to Exercise 3. (Return to Problem)

a.
b. 8

Glossary

Contingency Table

The method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other. The table provides an easy way to calculate conditional probabilities.

11.6. Test of a Single Variance (Optional)*

A test of a single variance assumes that the underlying distribution is normal. The null and alternate hypotheses are stated in terms of the population variance (or population standard deviation). The test statistic is:

(11.3)

where:

  • n = the total number of data

  • s 2 = sample variance

  • σ 2 = population variance

You may think of s as the random variable in this test. The degrees of freedom are df = n – 1.

A test of a single variance may be right-tailed, left-tailed, or two-tailed.

The following example will show you how to set up the null and alternate hypotheses. The null and alternate hypotheses contain statements about the population variance.

Example 11.8. 

Problem

Math instructors are not only interested in how their students do on exams, on average, but how the exam scores vary. To many instructors, the variance (or standard deviation) may be more important than the average.

Suppose a math instructor believes that the standard deviation for his final exam is 5 points. One of his best students thinks otherwise. The student claims that the standard deviation is more than 5 points. If the student were to conduct a hypothesis test, what would the null and alternate hypotheses be?

Solution

Even though we are given the population standard deviation, we can set the test up using the population variance as follows.

  • H o : σ 2 = 52

  • H a : σ 2 > 52




Example 11.9. 

Problem

With individual lines at its various windows, a post office finds that the standard deviation for normally distributed waiting times for customers on Friday afternoon is 7.2 minutes. The post office experiments with a single main waiting line and finds that for a random sample of 25 customers, the waiting times for customers have a standard deviation of 3.5 minutes.

With a significance level of 5%, test the claim that a single line causes lower variation among waiting times (shorter waiting times) for customers.

Solution

Since the claim is that a single line causes lower variation, this is a test of a single variance. The parameter is the population variance, σ 2 , or the population standard deviation, σ .

Random Variable: The sample standard deviation, s , is the random variable. Let s = standard deviation for the waiting times.

  • H o : σ 2 = 7.22

  • H a : σ 2 < 7.22

The word “lower” tells you this is a left-tailed test.

Distribution for the test: χ 24 2 , where:

  • n = the number of customers sampled

  • df = n – 1 = 25 – 1 = 24

Calculate the test statistic:

where n = 25, s = 3.5, and σ = 7.2.

Graph:

Nonsymmetrical chi-square curve with values of 0 and 5.67 on the x-axis representing the test statistic of waiting times at the post office. A vertical upward line extends from 5.67 to the curve and the area to the left of this is equal to the p-value.

Probability statement:

Compare α and the p-value:

Make a decision: Since α > p-value, reject H o .

This means that you reject σ 2 = 7.22 . In other words, you do not think the variation in waiting times is 7.2 minutes, but lower.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that a single line causes a lower variation among the waiting times or with a single line, the customer waiting times vary less than 7.2 minutes.

TI-83+ and TI-84 calculators: In 2nd DISTR, use 7:χ2cdf. The syntax is (lower, upper, df) for the parameter list. For Example 11-9, χ2cdf(-1E99,5.67,24). The p-value = 0.000042.




11.7. Summary of Formulas*

Formula 11.1. The Chi-square Probability Distribution

μ = df and


Formula 11.2. Goodness-of-Fit Hypothesis Test

  • Use goodness-of-fit to test whether a data set fits a particular probability distribution.

  • The degrees of freedom are number of cells or categories - 1.

  • The test statistic is , where O = observed values (data), E = expected values (from theory), and n = the number of different data cells or categories.

  • The test is right-tailed.


Formula 11.3. Test of Independence

  • Use the test of independence to test whether two factors are independent or not.

  • The degrees of freedom are equal to (number of columns - 1)(number of rows - 1).

  • The test statistic is where O = observed values, E = expected values, i = the number of rows in the table, and j = the number of columns in the table.

  • The test is right-tailed.

  • If the null hypothesis is true, the expected number .


Formula 11.4. Test of a Single Variance

  • Use the test to determine variation.

  • The degrees of freedom are the number of samples - 1.

  • The test statistic is , where n = the total number of data, s 2 = sample variance, and σ 2 = population variance.

  • The test may be left, right, or two-tailed.


11.8. Practice 1: Goodness-of-Fit Test*

Student Learning Outcomes

  • The student will explore the properties of goodness-of-fit test data.

Given

The following data are real. The cumulative number of AIDS cases reported for Santa Clara County through December 31, 2003, is broken down by ethnicity as follows:

Table 11.9.
Ethnicity Number of Cases
White2032
Hispanic897
African-American372
Asian, Pacific Islander168
Native American20
 Total = 3489

The percentage of each ethnic group in Santa Clara County is as follows:

Table 11.10.
Ethnicity Percentage of total county population Number expected (round to 2 decimal places)
White47.79%1667.39
Hispanic24.15% 
African-American3.55% 
Asian, Pacific Islander24.21% 
Native American0.29% 
  Total = 100% 

Expected Results

If the ethnicity of AIDS victims followed the ethnicity of the total county population, fill in the expected number of cases per ethnic group.

Goodness-of-Fit Test

Perform a goodness-of-fit test to determine whether the make-up of AIDS cases follows the ethnicity of the general population of Santa Clara County.

Exercise 11.8.1.

H o :


Exercise 11.8.2.

H a :


Exercise 11.8.3.

Is this a right-tailed, left-tailed, or two-tailed test?


Exercise 11.8.4. (Go to Solution)

degrees of freedom =


Exercise 11.8.5. (Go to Solution)

test statistic =


Exercise 11.8.6. (Go to Solution)

p-value =


Exercise 11.8.7.

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade in the region corresponding to the p-value.

9.png

Let α = 0.05

Decision:

Reason for the Decision:

Conclusion (write out in complete sentences):


Discussion Question

Exercise 11.8.8.

Does it appear that the pattern of AIDS cases in Santa Clara County corresponds to the distribution of ethnic groups in this county? Why or why not?


Solutions to Exercises

Solution to Exercise 11.8.4. (Return to Exercise)

degrees of freedom = 4


Solution to Exercise 11.8.5. (Return to Exercise)

1132.12


Solution to Exercise 11.8.6. (Return to Exercise)

Rounded to 4 decimal places, the p-value is 0.0000.


11.9. Practice 2: Contingency Tables*

Student Learning Outcomes

  • The student will explore the properties of contingency tables.

Conduct a hypothesis test to determine if smoking level and ethnicity are independent.

Collect the Data

Copy the data provided in Probability Topics Practice 1: Calculating Probabilities into the table below.

Table 11.11. Smoking Levels by Ethnicity (Observed)
Smoking Level Per Day African American Native Hawaiian Latino Japanese Americans White TOTALS
1-10       
11-20       
21-30       
31+       
TOTALS       

Hypothesis

State the hypotheses.

  • H o :

  • H a :

Expected Values

Enter expected values in the above below. Round to two decimal places.

Analyze the Data

Calculate the following values:

Exercise 11.9.1. (Go to Solution)

Degrees of freedom =


Exercise 11.9.2. (Go to Solution)

test statistic =


Exercise 11.9.3. (Go to Solution)

p-value =


Exercise 11.9.4. (Go to Solution)

Is this a right-tailed, left-tailed, or two-tailed test? Explain why.


Graph the Data

Exercise 11.9.5.

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade in the region corresponding to the p-value.

Blank graph with vertical and horizontal axes.


Conclusions

State the decision and conclusion (in a complete sentence) for the following preconceived levels of α  .

Exercise 11.9.6. (Go to Solution)

α = 0 . 05

a. Decision:
b. Reason for the decision:
c. Conclusion (write out in a complete sentence):


Exercise 11.9.7.

α = 0.01

a. Decision:
b. Reason for the decision:
c. Conclusion (write out in a complete sentence):


Solutions to Exercises

Solution to Exercise 11.9.1. (Return to Exercise)

 12


Solution to Exercise 11.9.2. (Return to Exercise)

 10301.8


Solution to Exercise 11.9.3. (Return to Exercise)

 0


Solution to Exercise 11.9.4. (Return to Exercise)

 right


Solution to Exercise 11.9.6. (Return to Exercise)

a. Reject the null hypothesis

11.10. Practice 3: Test of a Single Variance*

Student Learning Outcomes

  • The student will explore the properties of data with a test of a single variance.

Given

Suppose an airline claims that its flights are consistently on time with an average delay of at most 15 minutes. It claims that the average delay is so consistent that the variance is no more than 150 minutes. Doubting the consistency part of the claim, a disgruntled traveler calculates the delays for his next 25 flights. The average delay for those 25 flights is 22 minutes with a standard deviation of 15 minutes.

Sample Variance

Exercise 11.10.1.

Is the traveler disputing the claim about the average or about the variance?


Exercise 11.10.2. (Go to Solution)

A sample standard deviation of 15 minutes is the same as a sample variance of __________ minutes.


Exercise 11.10.3.

Is this a right-tailed, left-tailed, or two-tailed test?


Hypothesis Test

Perform a hypothesis test on the consistency part of the claim.

Exercise 11.10.4.

H o :


Exercise 11.10.5.

H a :


Exercise 11.10.6. (Go to Solution)

Degrees of freedom =


Exercise 11.10.7. (Go to Solution)

test statistic =


Exercise 11.10.8. (Go to Solution)

p-value =


Exercise 11.10.9.

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade the p-value.

Blank graph with vertical and horizontal axes.


Exercise 11.10.10.

Let α = 0.05

Decision:

Conclusion (write out in a complete sentence):


Discussion Questions

Exercise 11.10.11.

How did you know to test the variance instead of the mean?


Exercise 11.10.12.

If an additional test were done on the claim of the average delay, which distribution would you use?


Exercise 11.10.13.

If an additional test was done on the claim of the average delay, but 45 flights were surveyed, which distribution would you use?


Solutions to Exercises

Solution to Exercise 11.10.2. (Return to Exercise)

 225


Solution to Exercise 11.10.6. (Return to Exercise)

 24


Solution to Exercise 11.10.7. (Return to Exercise)

 36


Solution to Exercise 11.10.8. (Return to Exercise)

 0.0549


11.11. Homework*

Exercise 11.11.1.

a. Explain why the “goodness of fit” test and the “test for independence” are generally right tailed tests.
b. If you did a left-tailed test, what would you be testing?

Word Problems

For each word problem, use a solution sheet to solve the hypothesis test problem. Go to The Table of Contents 14. Appendix for the solution sheet. Round expected frequency to two decimal places.

Exercise 11.11.2.

A 6-sided die is rolled 120 times. Fill in the expected frequency column. Then, conduct a hypothesis test to determine if the die is fair. The data below are the result of the 120 rolls.

Table 11.12.
Face ValueFrequencyExpected Frequency
115 
229 
316 
415 
530 
615 

Exercise 11.11.3. (Go to Solution)

The marital status distribution of the U.S. male population, age 15 and older, is as shown below. (Source: U.S. Census Bureau, Current Population Reports)

Table 11.13.
Marital StatusPercentExpected Frequency
never married31.3 
married56.1 
widowed2.5 
divorced/separated10.1 

Suppose that a random sample of 400 U.S. young adult males, 18 – 24 years old, yielded the following frequency distribution. We are interested in whether this age group of males fits the distribution of the U.S. adult population. Calculate the frequency one would expect when surveying 400 people. Fill in the above table, rounding to two decimal places.

Table 11.14.
Marital StatusFrequency
never married140
married238
widowed2
divorced/separated20

The next two questions refer to the following information. The columns in the chart below contain the Race/Ethnicity of U.S. Public Schools: High School Class of 2009, the percentages for the Advanced Placement Examinee Population for that class and the Overall Student Population. (Source: http://www.collegeboard.com). Suppose the right column contains the result of a survey of 1000 local students from the Class of 2009 who took an AP Exam.

Table 11.15.
Race/EthnicityAP Examinee PopulationOverall Student PopulationSurvey Frequency
Asian, Asian American or Pacific Islander10.2%5.4%113
Black or African American8.2%14.5%94
Hispanic or Latino15.5%15.9%136
American Indian or Alaska Native0.6%1.2%10
White59.4%61.6%604
Not reported/other6.1%1.4%43

Exercise 11.11.4.

Perform a goodness-of-fit test to determine whether the local results follow the distribution of the U. S. Overall Student Population based on ethnicity.


Exercise 11.11.5. (Go to Solution)

Perform a goodness-of-fit test to determine whether the local results follow the distribution of U. S. AP Examinee Population, based on ethnicity.


Exercise 11.11.6.

The City of South Lake Tahoe, CA, has an Asian population of 1419 people, out of a total population of 23,609 (Source: U.S. Census Bureau, Census 2000). Suppose that a survey of 1419 self-reported Asians in Manhattan, NY, area yielded the data in the table below. Conduct a goodness of fit test to determine if the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area.

Table 11.16.
RaceLake Tahoe FrequencyManhattan Frequency
Asian Indian131174 
Chinese118557 
Filipino1045518 
Japanese8054 
Korean1229 
Vietnamese921 
Other2466 

The next two questions refer to the following information: UCLA conducted a survey of more than 263,000 college freshmen from 385 colleges in fall 2005. The results of student expected majors by gender were reported in The Chronicle of Higher Education (2/2/2006). Suppose a survey of 5000 graduating females and 5000 graduating males was done as a follow-up in 2010 to determine what their actual major was. The results are shown in the tables for Exercises 7 and 8. The second column in each table does not add to 100% because of rounding.

Exercise 11.11.7. (Go to Solution)

Conduct a hypothesis test to determine if the actual college major of graduating females fits the distribution of their expected majors.

Table 11.17.
MajorWomen - Expected MajorWomen - Actual Major
Arts & Humanities14.0%670
Biological Sciences8.4%410
Business13.1%685
Education13.0%650
Engineering2.6%145
Physical Sciences2.6%125
Professional18.9%975
Social Sciences13.0%605
Technical0.4%15
Other5.8%300
Undecided8.0%420

Exercise 11.11.8.

Conduct a hypothesis test to determine if the actual college major of graduating males fits the distribution of their expected majors.

Table 11.18.
MajorMen - Expected MajorMen - Actual Major
Arts & Humanities11.0%600
Biological Sciences6.7%330
Business22.7%1130
Education5.8%305
Engineering15.6%800
Physical Sciences3.6%175
Professional9.3%460
Social Sciences7.6%370
Technical1.8%90
Other8.2%400
Undecided6.6%340

Exercise 11.11.9. (Go to Solution)

A recent debate about where in the United States skiers believe the skiing is best prompted the following survey. Test to see if the best ski area is independent of the level of the skier.

Table 11.19.
U.S. Ski AreaBeginnerIntermediateAdvanced
Tahoe203040
Utah103060
Colorado104050

Exercise 11.11.10.

Car manufacturers are interested in whether there is a relationship between the size of car an individual drives and the number of people in the driver’s family (that is, whether car size and family size are independent). To test this, suppose that 800 car owners were randomly surveyed with the following results. Conduct a test for independence.

Table 11.20.
Family SizeSub & CompactMid-sizeFull-sizeVan & Truck
120354035
220507080
3 - 4205010090
5+20307070

Exercise 11.11.11. (Go to Solution)

College students may be interested in whether or not their majors have any effect on starting salaries after graduation. Suppose that 300 recent graduates were surveyed as to their majors in college and their starting salaries after graduation. Below are the data. Conduct a test for independence.

Table 11.21.
Major< $30,000$30,000 - $39,999$40,000 +
English5205
Engineering103060
Nursing101515
Business102030
Psychology203020

Exercise 11.11.12.

Some travel agents claim that honeymoon hot spots vary according to age of the bride and groom. Suppose that 280 East Coast recent brides were interviewed as to where they spent their honeymoons. The information is given below. Conduct a test for independence.

Table 11.22.
Location20 - 2930 - 3940 - 4950 and over
Niagara Falls15252520
Poconos15252510
Europe1025155
Virgin Islands2025155

Exercise 11.11.13. (Go to Solution)

A manager of a sports club keeps information concerning the main sport in which members participate and their ages. To test whether there is a relationship between the age of a member and his or her choice of sport, 643 members of the sports club are randomly selected. Conduct a test for independence.

Table 11.23.
Sport18 - 2526 - 3031 - 4041 and over
racquetball42583046
tennis58763865
swimming72606533

Exercise 11.11.14.

A major food manufacturer is concerned that the sales for its skinny French fries have been decreasing. As a part of a feasibility study, the company conducts research into the types of fries sold across the country to determine if the type of fries sold is independent of the area of the country. The results of the study are below. Conduct a test for independence.

Table 11.24.
Type of FriesNortheastSouthCentralWest
skinny fries70502025
curly fries100601530
steak fries20401010

Exercise 11.11.15. (Go to Solution)

According to Dan Lenard, an independent insurance agent in the Buffalo, N.Y. area, the following is a breakdown of the amount of life insurance purchased by males in the following age groups. He is interested in whether the age of the male and the amount of life insurance purchased are independent events. Conduct a test for independence.

Table 11.25.
Age of MalesNone$50,000 - $100,000$100,001 - $150,000$150,001 - $200,000$200,000 +
20 - 2940154005
30 - 39355202010
40 - 4920030030
50 +4030151510

Exercise 11.11.16.

Suppose that 600 thirty–year–olds were surveyed to determine whether or not there is a relationship between the level of education an individual has and salary. Conduct a test for independence.

Table 11.26.
Annual Salary Not a high school grad.High school graduateCollege graduateMasters or doctorate
< $30,0001525105
$30,000 - $40,00020407030
$40,000 - $50,00010204055
$50,000 - $60,0005102060
$60,000 +0510150

Exercise 11.11.17. (Go to Solution)

A plant manager is concerned her equipment may need recalibrating. It seems that the actual weight of the 15 oz. cereal boxes it fills has been fluctuating. The standard deviation should be at most oz. In order to determine if the machine needs to be recalibrated, 84 randomly selected boxes of cereal from the next day’s production were weighed. The standard deviation of the 84 boxes was 0.54. Does the machine need to be recalibrated?


Exercise 11.11.18.

Consumers may be interested in whether the cost of a particular calculator varies from store to store. Based on surveying 43 stores, which yielded a sample mean of $84 and a sample standard deviation of $12, test the claim that the standard deviation is greater than $15.


Exercise 11.11.19. (Go to Solution)

Isabella, an accomplished Bay to Breakers runner, claims that the standard deviation for her time to run the 7 ½ mile race is at most 3 minutes. To test her claim, Rupinder looks up 5 of her race times. They are 55 minutes, 61 minutes, 58 minutes, 63 minutes, and 57 minutes.


Exercise 11.11.20.

Airline companies are interested in the consistency of the number of babies on each flight, so that they have adequate safety equipment. They are also interested in the variation of the number of babies. Suppose that an airline executive believes the average number of babies on flights is 6 with a variance of 9 at most. The airline conducts a survey. The results of the 18 flights surveyed give a sample average of 6.4 with a sample standard deviation of 3.9. Conduct a hypothesis test of the airline executive’s belief.


Exercise 11.11.21. (Go to Solution)

According to the U.S. Bureau of the Census, United Nations, in 1994 the number of births per woman in China was 1.8. This fertility rate has been attributed to the law passed in 1979 restricting births to one per woman. Suppose that a group of students studied whether or not the standard deviation of births per woman was greater than 0.75. They asked 50 women across China the number of births they had. Below are the results. Does the students’ survey indicate that the standard deviation is greater than 0.75?

Table 11.27.
# of birthsFrequency
05
130
210
35

Exercise 11.11.22.

According to an avid aquariest, the average number of fish in a 20–gallon tank is 10, with a standard deviation of 2. His friend, also an aquariest, does not believe that the standard deviation is 2. She counts the number of fish in 15 other 20–gallon tanks. Based on the results that follow, do you think that the standard deviation is different from 2? Data: 11; 10; 9; 10; 10; 11; 11; 10; 12; 9; 7; 9; 11; 10; 11


Exercise 11.11.23. (Go to Solution)

The manager of “Frenchies” is concerned that patrons are not consistently receiving the same amount of French fries with each order. The chef claims that the standard deviation for a 10–ounce order of fries is at most 1.5 oz., but the manager thinks that it may be higher. He randomly weighs 49 orders of fries, which yields: mean of 11 oz., standard deviation of 2 oz.


Try these true/false questions.

Exercise 11.11.24. (Go to Solution)

As the degrees of freedom increase, the graph of the chi-square distribution looks more and more symmetrical.


Exercise 11.11.25. (Go to Solution)

The standard deviation of the chi-square distribution is twice the mean.


Exercise 11.11.26. (Go to Solution)

The mean and the median of the chi-square distribution are the same if .


Exercise 11.11.27. (Go to Solution)

In a Goodness-of-Fit test, the expected values are the values we would expect if the null hypothesis were true.


Exercise 11.11.28. (Go to Solution)

In general, if the observed values and expected values of a Goodness-of-Fit test are not close together, then the test statistic can get very large and on a graph will be way out in the right tail.


Exercise 11.11.29. (Go to Solution)

The degrees of freedom for a Test for Independence are equal to the sample size minus 1.


Exercise 11.11.30. (Go to Solution)

Use a Goodness-of-Fit test to determine if high school principals believe that students are absent equally during the week or not.


Exercise 11.11.31. (Go to Solution)

The Test for Independence uses tables of observed and expected data values.


Exercise 11.11.32. (Go to Solution)

The test to use when determining if the college or university a student chooses to attend is related to his/her socioeconomic status is a Test for Independence.


Exercise 11.11.33. (Go to Solution)

The test to use to determine if a six-sided die is fair is a Goodness-of-Fit test.


Exercise 11.11.34. (Go to Solution)

In a Test of Independence, the expected number is equal to the row total multiplied by the column total divided by the total surveyed.


Exercise 11.11.35. (Go to Solution)

In a Goodness-of Fit test, if the p-value is 0.0113, in general, do not reject the null hypothesis.


Exercise 11.11.36. (Go to Solution)

For a Chi-Square distribution with degrees of freedom of 17, the probability that a value is greater than 20 is 0.7258.


Exercise 11.11.37. (Go to Solution)

If , the chi-square distribution has a shape that reminds us of the exponential.


Solutions to Exercises

Solution to Exercise 11.11.3. (Return to Exercise)

a. The data fits the distribution
b. The data does not fit the distribution
c. 3
e. 19.27
f. 0.0002
h. Decision: Reject Null; Conclusion: Data does not fit the distribution.

Solution to Exercise 11.11.5. (Return to Exercise)

c. 5
e. 13.4
f. 0.0199
g. Decision: Reject null when a = 0 . 05 ; Conclusion: Local data do not fit the AP Examinee Distribution. Decision: Do not reject null when a = 0 . 01 ; Conclusion: Local data do fit the AP Examinee Distribution.

Solution to Exercise 11.11.7. (Return to Exercise)

c. 10
e. 11.48
f. 0.3214
h. Decision: Do not reject null when a = 0 . 05 and a = 0 . 01 ; Conclusion: Distribution of majors by graduating females fits the distribution of expected majors.

Solution to Exercise 11.11.9. (Return to Exercise)

c. 4
e. 10.53
f. 0.0324
h. Decision: Reject null; Conclusion: Best ski area and level of skier are not independent.

Solution to Exercise 11.11.11. (Return to Exercise)

c. 8
e. 33.55
f. 0
h. Decision: Reject null; Conclusion: Major and starting salary are not independent events.

Solution to Exercise 11.11.13. (Return to Exercise)

c. 6
e. 25.21
f. 0.0003
h. Decision: Reject null

Solution to Exercise 11.11.15. (Return to Exercise)

c. 12
e. 125.74
f. 0
h. Decision: Reject null

Solution to Exercise 11.11.17. (Return to Exercise)

c. 83
d. 96.81
e. 0.1426
g. Decision: Do not reject null; Conclusion: The standard deviation is at most 0.5 oz.
h. It does not need to be calibrated

Solution to Exercise 11.11.19. (Return to Exercise)

c. 4
d. 4.52
e. 0.3402
g. Decision: Do not reject null.
h. No

Solution to Exercise 11.11.21. (Return to Exercise)

c. 49
d. 54.37
e. 0.2774
g. Decision: Do not reject null; Conclusion: The standard deviation is at most 0.75.
h. No

Solution to Exercise 11.11.23. (Return to Exercise)

a. σ 2 ≤ ( 1 . 5 ) 2
c. 48
d. 85.33
e. 0.0007
g. Decision: Reject null.
h. Yes

Solution to Exercise 11.11.24. (Return to Exercise)

 True


Solution to Exercise 11.11.25. (Return to Exercise)

 False


Solution to Exercise 11.11.26. (Return to Exercise)

 False


Solution to Exercise 11.11.27. (Return to Exercise)

 True


Solution to Exercise 11.11.28. (Return to Exercise)

 True


Solution to Exercise 11.11.29. (Return to Exercise)

 False


Solution to Exercise 11.11.30. (Return to Exercise)

 True


Solution to Exercise 11.11.31. (Return to Exercise)

 True


Solution to Exercise 11.11.32. (Return to Exercise)

 True


Solution to Exercise 11.11.33. (Return to Exercise)

 True


Solution to Exercise 11.11.34. (Return to Exercise)

 True


Solution to Exercise 11.11.35. (Return to Exercise)

 False


Solution to Exercise 11.11.36. (Return to Exercise)

 False


Solution to Exercise 11.11.37. (Return to Exercise)

 True


11.12. Review*

The next two questions refer to the following real study:

A recent survey of U.S. teenage pregnancy was answered by 720 girls, age 12 - 19. 6% of the girls surveyed said they have been pregnant. (Parade Magazine) We are interested in the true proportion of U.S. girls, age 12 - 19, who have been pregnant.

Exercise 11.12.1. (Go to Solution)

Find the 95% confidence interval for the true proportion of U.S. girls, age 12 - 19, who have been pregnant.


Exercise 11.12.2. (Go to Solution)

The report also stated that the results of the survey are accurate to within ± 3.7% at the 95% confidence level. Suppose that a new study is to be done. It is desired to be accurate to within 2% of the 95% confidence level. What will happen to the minimum number that should be surveyed?


Exercise 11.12.3.

Given: X ~ . Sketch the graph that depicts: P(X > 1).


The next four questions refer to the following information:

Suppose that the time that owners keep their cars (purchased new) is normally distributed with a mean of 7 years and a standard deviation of 2 years. We are interested in how long an individual keeps his car (purchased new). Our population is people who buy their cars new.

Exercise 11.12.4. (Go to Solution)

60% of individuals keep their cars at most how many years?


Exercise 11.12.5. (Go to Solution)

Suppose that we randomly survey one person. Find the probability that person keeps his/her car less than 2.5 years.


Exercise 11.12.6. (Go to Solution)

If we are to pick individuals 10 at a time, find the distribution for the average car length ownership.


Exercise 11.12.7. (Go to Solution)

If we are to pick 10 individuals, find the probability that the sum of their ownership time is more than 55 years.


Exercise 11.12.8. (Go to Solution)

For which distribution is the median not equal to the mean?

A. Uniform
B. Exponential
C. Normal
D. Student-t

Exercise 11.12.9. (Go to Solution)

Compare the standard normal distribution to the student-t distribution, centered at 0. Explain which of the following are true and which are false.

a. As the number surveyed increases, the area to the left of -1 for the student-t distribution approaches the area for the standard normal distribution.
b. As the number surveyed increases, the area to the left of -1 for the standard normal distribution approaches the area for the student-t distribution.
c. As the degrees of freedom decrease, the graph of the student-t distribution looks more like the graph of the standard normal distribution.
d. If the number surveyed is less than 30, the normal distribution should never be used.

The next five questions refer to the following information:

We are interested in the checking account balance of a twenty-year-old college student. We randomly survey 16 twenty-year-old college students. We obtain a sample mean of $640 and a sample standard deviation of $150. Let X = checking account balance of an individual twenty year old college student.

Exercise 11.12.10.

Explain why we cannot determine the distribution of X .


Exercise 11.12.11. (Go to Solution)

If you were to create a confidence interval or perform a hypothesis test for the population average checking account balance of 20-year old college students, what distribution would you use?


Exercise 11.12.12. (Go to Solution)

Find the 95% confidence interval for the true average checking account balance of a twenty-year-old college student.


Exercise 11.12.13. (Go to Solution)

What type of data is the balance of the checking account considered to be?


Exercise 11.12.14. (Go to Solution)

What type of data is the number of 20 year olds considered to be?


Exercise 11.12.15. (Go to Solution)

On average, a busy emergency room gets a patient with a shotgun wound about once per week. We are interested in the number of patients with a shotgun wound the emergency room gets per 28 days.

a. Define the random variable X .
b. State the distribution for X .
c. Find the probability that the emergency room gets no patients with shotgun wounds in the next 28 days.

The next two questions refer to the following information:

The probability that a certain slot machine will pay back money when a quarter is inserted is 0.30 . Assume that each play of the slot machine is independent from each other. A person puts in 15 quarters for 15 plays.

Exercise 11.12.16. (Go to Solution)

Is the expected number of plays of the slot machine that will pay back money greater than, less than or the same as the median? Explain your answer.


Exercise 11.12.17. (Go to Solution)

Is it likely that exactly 8 of the 15 plays would pay back money? Justify your answer numerically.


Exercise 11.12.18. (Go to Solution)

A game is played with the following rules:

  • it costs $10 to enter

  • a fair coin is tossed 4 times

  • if you do not get 4 heads or 4 tails, you lose your $10

  • if you get 4 heads or 4 tails, you get back your $10, plus $30 more

Over the long run of playing this game, what are your expected earnings?


Exercise 11.12.19. (Go to Solution)

  • The average grade on a math exam in Rachel’s class was 74, with a standard deviation of 5. Rachel earned an 80.

  • The average grade on a math exam in Becca’s class was 47, with a standard deviation of 2. Becca earned a 51.

  • The average grade on a math exam in Matt’s class was 70, with a standard deviation of 8. Matt earned an 83.

Find whose score was the best, compared to his or her own class. Justify your answer numerically.


The next two questions refer to the following information:

70 compulsive gamblers were asked the number of days they go to casinos per week. The results are given in the following graph:

Figure 11.3. 

Histogram of 5 bars with relative frequency on the y-axis, from 0.1-0.3 in increments of 0.1, and number of days on the x-axis, from 0-7 in increments of 1. No bars are present for 4 or 6.


Exercise 11.12.20. (Go to Solution)

Find the number of responses that were “5”.


Exercise 11.12.21. (Go to Solution)

Find the mean, standard deviation, all four quartiles and IQR.


Exercise 11.12.22. (Go to Solution)

Based upon research at De Anza College, it is believed that about 19% of the student population speaks a language other than English at home.

Suppose that a study was done this year to see if that percent has decreased. Ninety-eight students were randomly surveyed with the following results. Fourteen said that they speak a language other than English at home.

a. State an appropriate null hypothesis.
b. State an appropriate alternate hypothesis.
c. Define the Random Variable, P’ .
d. Calculate the test statistic.
e. Calculate the p-value.
f. At the 5% level of decision, what is your decision about the null hypothesis?
g. What is the Type I error?
h. What is the Type II error?

Exercise 11.12.23.

Assume that you are an emergency paramedic called in to rescue victims of an accident. You need to help a patient who is bleeding profusely. The patient is also considered to be a high risk for contracting AIDS. Assume that the null hypothesis is that the patient does not have the HIV virus. What is a Type I error?


Exercise 11.12.24. (Go to Solution)

It is often said that Californians are more casual than the rest of Americans. Suppose that a survey was done to see if the proportion of Californian professionals that wear jeans to work is greater than the proportion of non-Californian professionals. Fifty of each was surveyed with the following results. 10 Californians wear jeans to work and 4 non-Californians wear jeans to work.

  • C = Californian professional

  • = non-Californian professional

a. State appropriate null and alternate hypotheses.
b. Define the Random Variable.
c. Calculate the test statistic and p-value.
d. At the 5% level of decision, do you accept or reject the null hypothesis?
e. What is the Type I error?
f. What is the Type II error?

The next two questions refer to the following information:

A group of Statistics students have developed a technique that they feel will lower their anxiety level on statistics exams. They measured their anxiety level at the start of the quarter and again at the end of the quarter. Recorded is the paired data in that order: (1000, 900); (1200, 1050); (600, 700); (1300, 1100); (1000, 900); (900, 900).

Exercise 11.12.25. (Go to Solution)

This is a test of (pick the best answer):

A. large samples, independent means
B. small samples, independent means
C. dependent means

Exercise 11.12.26. (Go to Solution)

State the distribution to use for the test.


Solutions to Exercises

Solution to Exercise 11.12.1. (Return to Exercise)

( 0 . 0424 , 0 . 0770 )


Solution to Exercise 11.12.2. (Return to Exercise)

2401


Solution to Exercise 11.12.4. (Return to Exercise)

7.5


Solution to Exercise 11.12.5. (Return to Exercise)

0.0122


Solution to Exercise 11.12.6. (Return to Exercise)

N ( 7,0 . 63 )


Solution to Exercise 11.12.7. (Return to Exercise)

0.9911


Solution to Exercise 11.12.8. (Return to Exercise)

B


Solution to Exercise 11.12.9. (Return to Exercise)

a. True
b. False
c. False
d. False

Solution to Exercise 11.12.11. (Return to Exercise)

student-t with


Solution to Exercise 11.12.12. (Return to Exercise)

( 560 . 07 , 719 . 93 )


Solution to Exercise 11.12.13. (Return to Exercise)

quantitative - continuous


Solution to Exercise 11.12.14. (Return to Exercise)

quantitative - discrete


Solution to Exercise 11.12.15. (Return to Exercise)

b. P (4)
c. 0.0183

Solution to Exercise 11.12.16. (Return to Exercise)

greater than


Solution to Exercise 11.12.17. (Return to Exercise)

No; P ( X = 8 ) = 0 . 0348


Solution to Exercise 11.12.18. (Return to Exercise)

You will lose $5


Solution to Exercise 11.12.19. (Return to Exercise)

Becca


Solution to Exercise 11.12.20. (Return to Exercise)

14


Solution to Exercise 11.12.21. (Return to Exercise)

  • Mean = 3.2

  • Quartiles = 1.85, 2, 3, and 5

  • IQR = 3


Solution to Exercise 11.12.22. (Return to Exercise)

d. z = − 1 . 19
e. 0.1171
f. Do not reject the null

Solution to Exercise 11.12.24. (Return to Exercise)

c. z = 1 . 73 ; p = 0 . 0419
d. Reject the null

Solution to Exercise 11.12.25. (Return to Exercise)

C


Solution to Exercise 11.12.26. (Return to Exercise)

t 5


11.13. Lab 1: Chi-Square Goodness-of-Fit*

Class Time:

Names:

Student Learning Outcome:

  • The student will evaluate data collected to determine if they fit either the uniform or exponential distributions.

Collect the Data

Go to your local supermarket. Ask 30 people as they leave for the total amount on their grocery receipts. (Or, ask 3 cashiers for the last 10 amounts. Be sure to include the express lane, if it is open.)

  1. Record the values.

    Table 11.28.
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________

  2. Construct a histogram of the data. Make 5 - 6 intervals. Sketch the graph using a ruler and pencil. Scale the axes.

    Figure 11.4. 

    Blank graph with relative frequency on vertical


  3. Calculate the following:

    a.
    b. s =
    c. s 2 =

Uniform Distribution

Test to see if grocery receipts follow the uniform distribution.

  1. Using your lowest and highest values, X ~

  2. Divide the distribution above into fifths.

  3. Calculate the following:

    a. Lowest value =
    b. 20th percentile =
    c. 40th percentile =
    d. 60th percentile =
    e. 80th percentile =
    f. Highest value =

  4. For each fifth, count the observed number of receipts and record it. Then determine the expected number of receipts and record that.

    Table 11.29.
    Fifth ObservedExpected
    1st  
    2nd  
    3rd  
    4th  
    5th  

  5. H o :

  6. H a :

  7. What distribution should you use for a hypothesis test?

  8. Why did you choose this distribution?

  9. Calculate the test statistic.

  10. Find the p-value.

  11. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.

    Figure 11.5. 

    Blank graph with vertical and horizontal axes.


  12. State your decision.

  13. State your conclusion in a complete sentence.

Exponential Distribution

Test to see if grocery receipts follow the exponential distribution with decay parameter  .

  1. Using as the decay parameter, X ~ .

  2. Calculate the following:

    a. Lowest value =
    b. First quartile =
    c. 37th percentile =
    d. Median =
    e. 63rd percentile =
    f. 3rd quartile =
    g. Highest value =

  3. For each cell, count the observed number of receipts and record it. Then determine the expected number of receipts and record that.

    Table 11.30.
    CellObservedExpected
    1st  
    2nd  
    3rd  
    4th  
    5th  
    6th  

  4. H o

  5. H a

  6. What distribution should you use for a hypothesis test?

  7. Why did you choose this distribution?

  8. Calculate the test statistic.

  9. Find the p-value.

  10. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.

    Figure 11.6. 

    Blank graph with vertical and horizontal axes.


  11. State your decision.

  12. State your conclusion in a complete sentence.

Discussion Questions

  1. Did your data fit either distribution? If so, which?

  2. In general, do you think it’s likely that data could fit more than one distribution? In complete sentences, explain why or why not.

11.14. Lab 2: Chi-Square Test for Independence*

Class Time:

Names:

Student Learning Outcome:

  • The student will evaluate if there is a significant relationship between favorite type of snack and gender.

Collect the Data

  1. Using your class as a sample, complete the following chart.

    Table 11.31. Favorite type of snack
     sweets (candy & baked goods)ice creamchips & pretzelsfruits & vegetablesTotal
    male     
    female     
    Total     

  2. Looking at the above chart, does it appear to you that there is dependence between gender and favorite type of snack food? Why or why not?

Hypothesis Test

Conduct a hypothesis test to determine if the factors are independent

  1. H o :

  2. H a :

  3. What distribution should you use for a hypothesis test?

  4. Why did you choose this distribution?

  5. Calculate the test statistic.

  6. Find the p-value.

  7. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p-value.

    Figure 11.7. 

    Blank graph with vertical and horizontal axes.


  8. State your decision.

  9. State your conclusion in a complete sentence.

Discussion Questions

  1. Is the conclusion of your study the same as or different from your answer to (I2) above?

  2. Why do you think that occurred?