## Confidence Interval

Q1. Cadmium, a heavy metal, is toxic to animals. Mushrooms however can absorb and accumulate cadmium at high concentrations. The Czech and Slovak governments have set a safety limit for cadmium in dry vegetables at 0.5 ppm. Cadmium levels in 12 randomly chosen samples of the edible mushroom Boletus pinicola are shown in data file.

a) Hand calculates 99% confidence interval of cadmium level for this exam, confirm your calculation with MiniTab output.

b) Write proper conclusion from 99% CI.

c) State the null and alternate hypothesis you would use to determine whether the mean cadmium concentration of all Boletus pinicola mushroom in this region exceed the safety limit of 0.5 ppm.

d) Calculate the test statistic and show your calculations.

e) Confirm your results using MiniTab (show result). With = 0.01.

f) Write a clear conclusion regarding the results of the test of hypothesis.

g) What assumptions are required for the test results to be valid?

h) Using a boxplot and a probability plot to test data for normality (with Ryan-Joiner value), explain whether the assumption of part ‘g’ is reasonably satisfied using Normality p-value.

##### Question 1 Solution

Let the Cad.Level be X

Therfore, the 99% confidence interval is calculated as follows :

Therefore, we get the 99% confidence interval is (0.210, 0.842)

Using Minitab :

Steps : Enter Data >> Go to Stat >> Descriptive Statistics >> One sample t test >> set confidence level to 0.99 >> Click Enter >> Output

One-Sample T: Cad.Level

Test of mu = 0.5 vs > 0.5

99%

Lower

Variable N Mean StDev SE Mean Bound T P

Cad.Level 12 0.525833 0.352122 0.101649 0.249543 0.25 0.402

From the above calculation done we see that both the answers manually and by minitab are approximately same.

Step 6 of 8 :

(f)

Decision :

Since the test statistic lies in the critical region and the p-value > 0.01, we fail to reject the null hypothesis.

Conclusion :

Therefore, we have that there is not enough evidence to conclude that the mean cadmium concentration exceed the safety limit of 0.5 ppm.

In other words we can conclude that the mean cadmium concentration of all Boletus pinicola mushroom in this region do not exceed the safety limit of 0.5 ppm. at 0.01 significance level.

Step 7 of 8 :

## Performing A Sample T Test

We have performed one sample t test above .

Therefore, the assumptions are required for the test results to be valid are :

1. The scale of measurement applied to the data collected follows a continuous or ordinal scale.

2. The sample is random.

3. The third assumption is assumption of normality. The sample should be is normal.

4. The last assumption is the homogeneity of variance. Homogeneous, or equal, variance exists when the standard deviations of samples are approximately equal.

Step 8 of 8 :

## Verifying Normality Assumption

Here we have to verify the normality assumption using a boxplot and a probability plot .

Steps in mimitab :

Enter Data >> Go to stat >> Graphs >> Boxplot >> One Y Simple >> Enter Details >> Output

From the boxplot plotted above, since the interquartile box, shows symmetry, therefore, we can conlcude that the data is approximately normal.

For normality plot :

We can also use a normal probability plot to show that the data is normal as follows :

Steps in minitab :

Enter Data >> Go to stat >> Descriptive statistics >> Normality test >> Select Ryan-Joiner value in options >> Click Enter >> Output

Therefore, since all the points are close to the line above in the normal probability plot, (with Ryan-Joiner value = 0.943) and the normality p value > 0.1, we conclude that the normality assumption is satisfied.

Therefore, the normality assumption of part ‘g’ is reasonably satisfied using Normality p-value.

Pooled Proportion Test

Q2 (8 Mark) Louis Pasteur conducted a series of experiments that demonstrated the roles that yeast and bacteria play in the fermentation process. These results gave Joseph Lister, a British physician, the idea that human infections might have a similar organic origin. Lister developed a theory that using carbolic acid as a surgical room disinfectant would improve the postoperative survival rates for surgical patients. Out of 40 patients amputated with the use of carbolic acid, 34 survived. Out of the 35 patients amputated without the use of carbolic acid, 19 survived.

a) Can we use pooled proportion test in this test? Explain

b) State the null and alternate hypotheses for assessing if the use of carbolic acid improves the postoperative survival rates.

c) What proportion of patients amputated without the use of carbolic acid survived? What proportion of patients amputated with the use of carbolic acid survived?

d) Calculate the test statistic. Show hand calculation.

e) Using MiniTab, test your hypothesis using a 5% significance level. Show your results

f) Based on the p-value, interpret the results of this test.

g) What are the assumptions in using the two-sample proportion test and have we met those conditions?

Question 2 Solution

(e)

Test and CI for Two Proportions

Sample X N Sample p

1 34 40 0.850000

2 19 35 0.542857

Difference = p (1) - p (2)

Estimate for difference: 0.307143

95% lower bound for difference: 0.140388

Test for difference = 0 (vs > 0): Z = 2.91 P-Value = 0.002

Fisher's exact test: P-Value = 0.004

(f) Since p-value<0.05, we reject null hypothesis and conclude that the use of carbolic acid improves the postoperative survival rates.

Dependent and Independent Sampling

Q3 (10 Mark) The data in the file are the burning times (in minutes) of chemical flares of two different formulations. The design engineers are interested in both the mean and the variance of the burning times.

a) Is the sampling method dependent or independent?

b) State the null and alternate hypothesis we would use to determine whether there is a significant difference in the mean burning times of the two types of chemical flares.

c) Calculate the test statistic. Show hand calculation.

d) Using MiniTab, test your hypothesis using a 5% significance level. Show your results

e) Based on the p-value, interpret the results of this test.

f) Check assumptions for each data set with boxplot and probability plot (Normality check with Ryan-Joiner value). What are the assumptions in using the two-sample t-test and have we met those conditions using Normality p-values?

Question 3

d)

Q4. (11 marks) Two creams are available by prescription for treating moderate skin burns. A study to compare the effectiveness of the two creams is conducted using 15 randomly chosen patients with moderate burns on their arms. Two spots of the same size and degree of burn are marked on each patient’s arm. One of the creams is selected at random and applied to the first spot, while the remaining spot is treated with the other cream. The number of days until the burn has healed is recorded for each spot. The data are shown in data file.

a) Is this sampling method dependent or independent?

b) State the null and alternate hypotheses to test whether the two creams are equally effective (i.e., take the same average time to heal the burn).

c) Hand calculates the test statistic. Show your calculation.

d) Run the test in MiniTab and show output. Use the 5% significance level.

e) What are the assumptions for this test? Are the assumptions met? Draw boxplot and probability plot (Normality check with Ryan-Joiner value) to check if the assumptions being met using Normality p-value.

f) Based on the p-value, interpret the results of this test.

g) Hand calculates the 95% confidence interval for the difference between the two creams. Obtain MiniTab output to confirm your results.

h) Does your 95% CI confirm your hypothesis test conclusion? Explain.

Question 4

e. Assumption :

1. Samples are independent of each other.

2. Randomly selected samples.

3. Samples should collected from normally distributed population.

Boxplot :

## Pooled T-test

Q5 (5 marks) American League baseball teams play their games with a designated hitter rule, meaning that pitchers do not bat. The league believes that replacing the pitcher, traditionally a weak hitter, with another playing in the batting order produces more runs and generally more interest among the fans. The average number of home runs hit per game for the 2011 season in the American and National Leagues are found in the data file.

a) Obtain boxplots of the two data sets. Be sure to display them on the same plot. Are both data sets normally distributed? Does the spread of the data look roughly the same in each group? In other words, can we use the pooled t-test legitimately?

b) State the null and alternate hypothesis we would use to test whether there is a significant difference in the number of home runs hit per game between the American and National Leagues.

c) Run the two sample t-test two ways. The un-pooled test and the pooled test. What is the p-value for each test?

d) Do both tests reach the same conclusion? Use a 5% level of significance. Did the American League’s use of a designated hitter make any difference to the number of home runs hit per game?

Question 5 Solution:

(a) Boxplot for both datasets can be obtained as below:

Comments on the Boxplot:

• The distributions of both Sample seems to be approximately symmetrical on both sides of the median. This is because median seems to be equidistant from Q1 and Q3 both and whiskers of both datasets seems to be almost equal in length on both sides of the median.

• The IQR for American League = (1.2 - 0.86) = 0.34, whereas IQR for National League = (1.1 - 0.8) = 0.3. Hence we can say that the spread for both datsets seems to look roughly the same. This indicates that we have same variance in both groups which helps us conclude that we can legitimately use pooled t test for testing significant difference b/w both datasets.

## Hypothesis Testing

Ho: There is no difference b/w the home runs hit per games by American and National Leagues

Ho:

Ha: There is a significant difference b/w the home runs hit per games by American and National Leagues

Ha:

Where u1= mean home runs hit per game by American League

and u2 = mean home runs hit per game by National League

(c) Running the Pooled T Test assuming Equal Variances on both samples

t-Test: Two-Sample Assuming Equal Variances | ||

American League | National League | |

Mean | 1.027857143 | 0.975375 |

Variance | 0.048519516 | 0.04161 |

Observations | 14 | 16 |

Pooled Variance | 0.044818195 | |

Hypothesized Mean Difference | 0 | |

df | 28 | |

t Stat | 0.677404004 | |

P(T<=t) one-tail | 0.25185441 | |

t Critical one-tail | 1.701130934 | |

P(T<=t) two-tail | 0.50370882 | |

t Critical two-tail | 2.048407142 |

**0.503**Running the Un-Pooled T Test assuming UnEqual Variances on both samples

t-Test: Two-Sample Assuming Unequal Variances | ||

American League | National League | |

Mean | 1.027857 | 0.975375 |

Variance | 0.04852 | 0.04161 |

Observations | 14 | 16 |

Hypothesized Mean Difference | 0 | |

df | 27 | |

t Stat | 0.673827 | |

P(T<=t) one-tail | 0.253075 | |

t Critical one-tail | 1.703288 | |

P(T<=t) two-tail | 0.506149 | |

t Critical two-tail | 2.051831 |

p value of the test = 0.506

(d) As the p value of both test >> 0.05 at 5% significance level, hence using both the t test whether Pooled or Unpooled, we reach to the same conclusion that we dont have enough evidence to reject the Null Hypothesis and hence we can conclude that there is no difference b/w home runs hit per game by American or National Leagues.

Q6 (9 marks) In a study of Vietnam veterans who were exposed to Agent Orange, a herbicide defoliant used during the Vietnam war, 20 Vietnam veterans were randomly chosen and blood and fat tissue samples were taken from each one of them. The TCDD (a dioxin) levels, measured in ppm, was recorded for the blood plasma and in the fat tissues samples. Data are found in the data file.

(Note for the non-chemists…maybe all of you! The main ingredients of Agent Orange comprise an equal mixture of two phenoxyl herbicides – 2,4-dichlorophenoxyacetic acid (2,4-D) and 2,4,5-trichlorophenoxyacetic acid (2,4,5-T), and in the manufacturing process of those chemicals they may become contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD)….and that is my chemistry lesson for today!).

a) Is the sampling method dependent or independent?

b) State the null and alternate hypothesis we would use to test whether there is any difference in the TCDD level in the blood plasma and the fat tissues.

c) Calculate the test statistic. Show hand calculation.

d) Run the appropriate test in MiniTab and show output. What is the p-value for the test?

e) Assuming = 0.05, what are your conclusions for these data?

f) What assumptions are required for the test results to be valid?

g) Using a boxplot and a Normal check plot for the differences to test its normality, with Ryan-Joiner value, decide whether the assumption of part ‘f’ is reasonably satisfied using Normality p-value.

Question 6 Solution

Working:

We have used the excel data analysis tool to run the independent t-test.

a)

The sampling method is independent.

b)

The null and alternative hypotheses are

c)

The test statistic is given by

t = -0.3315

d)

p-value = 0.37103

e)

Since the p-value is greater than 0.05, we fail to reject the null hypothesis and conclude that there is any difference in the TCDD level in the blood plasma and the fat tissues.

Q7 (4 marks) A nutritionist wishes to estimate the difference between the percentage of men and women who have high cholesterol. What sample size should be obtained if she wishes the estimate to be within 3 percentage points, with 95% confidence assuming:

a) She uses the 2014 estimate of 18.8% males and 20.5% females from Health Canada

b) That she does not have any prior estimate.

Question 7 Solution

Q.7) Given that, margin of error (E) = 0.03 (3%)

A 95% confidence level has significance level of 0.05 and critical value is,

a) We want to find, the sample size (n) for p1 = 0.188 and p2 = 0.205

Q8 (2 marks) A company that manufactures the Stinger tee claims that the thinner shaft and smaller head will lessen resistance and drag, reducing spin and allowing the ball to travel further. To test this claim they compared the distance travelled by golf balls hit off regular wooden tees to those hit with the Stinger tees. All the balls were struck by the same golf club using a robotic device set to swing the club head at approximately 95 mph. In the original experiment, 6 balls were used for each type of tee. If we want the margin of error in the estimate of the mean difference to be just 1.0 yard, with 95% level of confidence, how many balls should have been struck off each type of tee? Be conservative and set the standard deviation to 3.0 yards.

(Hint: this is to determine sample size to estimate the difference in two population means)

Question 8