## Testing hypotheses, finding interval levels and explaining the results

The following solutions are based on hypothesis formulation and testing; finding confidence intervals, and explaining each of the results obtained clearly. There are various sets of data for which we tested null and alternative hypotheses. We accepted or rejected the hypotheses based on our accurate data analysis done.

Formulating and testing hypothesis at 5% significance level

**Question
**

For each problem part “a” asks for a hypothesis test. You need to list the five steps of the hypothesis test on your answer sheet.

Note: Make sure you label your test statistic as Z orT and write out full conclusions.

Assume populations are approximately normally distributed when it is necessary in order to complete the problem.

1) Many students struggle with the math portion of the college entrance exam. A local high school creates a new course in hopes of helping students pass the math portion of the college entrance exam. They take a sample of students who did not pass the entrance exam on the first try and put them through the course. After they complete the course they take the entrance exam again. The results are below (passing is a score of 120):

a) At a 5% significance level, is there sufficient evidence to conclude that the average entrance exam score before the course is less than the average entrance exam score after the course?

H0: **Average score before the course >= Average score after the course
**

Ha: **Average score before the course <= Average score after the course**

T.S :**T = -6.23**

P-value: **0.0002
**

Conclusion:

As the p-value is less than 5% so we reject H0 in favor of Ha.

**so YES the data has evidence to conclude that the average entrance exam score before the course is less than the average entrance exam score after the course.
**

**Finding the 95% confidence interval**

b) Find a 95% confidence interval for the true mean difference in the mean entrance exam score before and after the course is taken.

95% confidence interval for the true mean difference in the mean entrance exam score before and after the course is taken is:

( -15.0055, -6.7445 )

NOTE : as we already have m = mean(x) – mean(y) = -10.875, s = sd(diff) = 1.746808,

so the required interval is given by m+- quantile *s where quantile = 2.364624

**Explaining the results**

c) Give a practical conclusion from your interval in part “b”?

Here the 95% confidence interval would contain the true population parameter 95% of times when you draw a random sample many times i.e. (true mean difference in the mean entrance exam score before and after the course is taken). OR "we are 95% certain (confidence level) that most of these samples (confidence intervals) contain the true population parameter."

d) Did your interval in part “b” HAVE to give you the same conclusion as your test in part “a”? Why or why not?

NO

interval in part “b” need not give you the same conclusion as the test in part “a”

because part b gives the m+-c*sd confidence interval where c is the cutoff s is sd of diff m is the observed mean of the diff, and part a is just T = m/sd and we reject H0 if T< c*

the difference here is that c* is the lower 5% quantile but c is the two-sided quantile or 97.5% upper quantile.

So part a and b don’t need to have the same conclusion

**Formulating and testing hypothesis at 10% significance level.**

**
**

2) In 2015, the average number of hours Americans spent on their phone per day was 4.8 hours. At that time the standard deviation was 1.2 hours. Assume the population standard deviation now is also 1.2 hours. A recent random sample of Americans was taken and their time on the phone in the previous 24 hours was recorded below:

3.2 5.4 6.3 2.8 5.8 7.0 5.7 6.8

a) At a10% significance level, is there sufficient evidence to conclude that the average number of hours Americans are on their phones per day recently is now greater than 4.8 hours?

H0: the average number of hours Americans are on their phones per day <= 4.8

Ha: the average number of hours Americans are on their phones per day > 4.8

T.S : Z = 1.36

P-value: 0.0877

Conclusion:

Fail to reject H0, that means we don’t have sufficient evidence to conclude that the average number of hours Americans are on their phones per day recently is now greater than 4.8 hours.

b) Find a 95% confidence interval for the mean amount of time Americans spend on their phones per day recently.

95% confidence interval for the mean amount of time Americans spend on their phones per day recently is:

**(4.5436, 6.2065)**

**Interpreting the results
**

c) Interpret the interval you found in part “b”.

Here the 95% confidence interval would contain the true population parameter 95% of the times when you draw a random sample many times i.e. (the mean amount of time Americans spend on their phones per day). OR "we are 95% certain (confidence level) that most of these samples (confidence intervals) contain the true population parameter.

**Formulating and testing hypothesis at 1% significance level
**

3) In 2019, 52% of Americans had some form of streaming service. A sample of 800 Americans was taken in early 2021 and found that 57% had some form of streaming service.

a) At a 1% significance level, is there sufficient evidence to conclude that in the year 2021 more than 52% of Americans have some form of streaming service?

say p = true % of Americans having some form of streaming service

H0: p<= 52%

Ha: p>52%

T.S: Z = 2.83

P-value: 0.0023

Conclusion:

As p value > 1%

**Fail to reject H0**, that means we don’t have sufficient evidence to conclude that in the year 2021 more than 52% of Americans have some form of streaming service

**Finding the 90% CI
**

b) Find a 90% confidence interval for the proportion of all Americans in the year 2021 that have some form of streaming service.

90% confidence interval is: **(0.5409, 0.5991)**

**Elaborating the results
**

c) Interpret your interval in part “b”.

Here the 90% confidence interval would contain the true population parameter 90% of the times when you draw a random sample many times i.e. (the true % of Americans having some form of streaming service).

OR "we are 90% certain (confidence level) that most of these samples (confidence intervals) contain the true population parameter."

d) From your interval, can you conclude that the true proportion of Americans in the year 2021 that have some form of streaming service is less than 58%. Why or why not–Explain clearly.

**NO, we can’t** conclude that the true proportion of Americans in the year 2021 that have some form of streaming service is less than 58% because the confidence interval just tells that the true proportion is being captured by this confidence interval 90% of the time. and it does not say that the true proportion lies between these two values with a 90% probability.

**Formulating and testing hypothesis at 10% significance level**

4) Are people getting worse cases of shingles? In the year 2000, the average number of week store cover from shingles was 3.7 weeks. Below is a sample of times to recovery for people who got shingles in 2020.

3.8 3.4 3.9 4.0 3.5 4.2 4.1

a) At a10% significance level, is the sufficient evidence to conclude that recovery times for people with shingles in 2020 is more than 3.7 weeks?

H0: Average recovery times for people with shingles in 2020 <= 3.7

Ha: Average recovery times for people with shingles in 2020 > 3.7

T.S. T = 1.26

P-value: 0.1267

Conclusion:

As p value > 10%

**Fail to reject H0**, that means we don’t have sufficient evidence to conclude that recovery times for people with shingles in 2020 is more than 3.7 weeks

b) Find a 99% confidence interval for the true mean number of weeks for recovery from shingles in 2020.

99% confidence interval is :( 3.4471, 4.2386 )

**Interpreting the interval **

c) Interpret the interval you found in part “b”.

Here the 99% confidence interval would contain the true population parameter 99% of the times when you draw a random sample many times i.e. (the true Average recovery times for people with shingles in 2020).

. OR "we are 99% certain (confidence level) that most of these samples (confidence intervals) contain the true population parameter."

**Testing the hypothesis at the 5%significance level
**

5) A recent study was done concerning the average age when people are first married. Below is a sample of ages of residents from the USA and a sample of ages of residents from the United Kingdom:

a) At a 5% significance level, is there sufficient evidence to conclude there is a difference in the mean age of first marriage for Americans and Britons?

H0: Mean age of first marriage for Americans = Mean age of first marriage for Britons

Ha: Mean age of first marriage for Americans!= Mean age of first marriage for Britons

T.S :**T = -1.26
**

P-value: **0.2254
**

Conclusion:

As the p-value is more than 5% so we fail to reject H0.

**so NO the data does not have evidence to **conclude there is a difference in the mean age of first marriage for Americans and Britons

b) Find a 95% confidence interval for the mean difference in the age of first marriage for Americans and Britons.

95% confidence interval for the true mean difference in the mean age of first marriage for Americans and Britons: **(-10.7244, 2.7244)**

**
**

**Explaining the results
**

c) Give a practical conclusion from your interval in part “b”?

Here the 95% confidence interval would contain the true population parameter 95% of times when you draw a random sample many times i.e. (true mean difference in the mean age of first marriage for Americans and Britons). OR "we are 95% certain (confidence level) that most of these samples (confidence intervals) contain the true population parameter."

d) Did your interval in part “b” has to give you the same conclusion as your test in part “a”? Why or why not?

NO

interval in part “b” need not give you the same conclusion as the test in part “a”

because part b gives the m+-c*sd confidence interval where c is the cutoff s is sd of diff m is the observed diff of mean, and part a is just T = m/sd and we reject H0 if |T|< c*

the difference here is that c* is the upper 97.5% quantile but c is the two-sided quantile or 97.5% upper quantile.

So part a and b don’t need to have the same conclusion.

**
**

**
**

**
**

**
**

**
**

**
**

**
**

**
**

**
**

**
**

**
**