## Correlation and Hypothesis Testing

The questions on correlation and hypothesis testing below are perfectly answered. While some ask for the best correlation to use for a given study, others ask for an accurate calculator related to hypothesis testing.

## Determining the Best Correlation to Use for Different Studies

1. If you want to determine whether there is a significant relationship between attending Kendall Day (defined as yes/no) and a student’s grade on their final exam in chemistry (scores range from 0 to 100 points), what type of correlation would you conduct and why? (2 points)

**Ans:** Point Biserial Correlation. This is because one variable is numeric (grade) and the other is a binary variable i.e., two responses (attending Kendall Day (defined as yes/no)).

2. If you wanted to determine whether there is a significant relationship between the number of stuffed animals a child sleeps with nightly and the number of hours they sleep each night, what type of correlation would you run and why? (2 points)

**Ans:** Pearson’s correlation coefficient. Because in this case, we are trying to capture the association between two numeric variables.

3. If you wanted to determine whether there is a significant relationship between having a sibling (defined as yes/no) and the number of hours a child gets to spend with their guardians weekly, what type of correlation would you run and why? (2 points)

**Ans:** Point Biserial Correlation. This is because one variable is numeric (number of hours a child gets to spend with their guardians weekly) and the other is a binary variable i.e., two responses (sibling (defined as yes/no))

4. If the slope of the regression line equation is -1.5, what does that mean? In your response, describe what this tells you about the type of relationship between the variables (positive or negative) as well as how ‘y’ changes when ‘x’ moves. (2 points)

**Ans**: The relationship is negative as the slope is negative. The relative change can be quantified as: for each unit increase in x, the average change in y is -1.5.

## Testing Hypothesis

1. Prior research in the United States examining sibling relationships suggests that 20% of individuals do not have siblings, 64% have only full siblings, and 16% have at least one step, half, or adopted sibling. You conduct a study with 75 participants to see whether the proportion of sibling relationships seen in Canada differs significantly from the United States by asking your hypothetical participants which option (no sibling; only full siblings; at least one step, half, or adopted sibling) best describes their family. Below are the obtained values. **Round your responses to the hundredth.**

Provide your null and alternate hypothesis for this example. You should include the expected frequencies as part of this answer and show how you calculated the necessarily expected frequencies. (2 points)

**Ans:
**

Null Hypothesis - There is no difference between Canada and the US in the proportion of sibling relationships

Alternate hypothesis - There is a significant difference between Canada and the US in the proportion of sibling relationships

We shall use the Chi-Square Goodness of fit test to test the above hypothesis.

Expected Frequency can be calculated by n×p_(H_0 )

Hence, the below table gives the expected frequency of these.

a. What is the critical value? Provide the value and explain how you determined the critical value. For this example, you are using an alpha level of .05.

**Ans: **

The test statistic will follow a chi-squared distribution with 2 degrees of freedom as there are 3 categories. Hence, χ_0.95, 2^2=5.99.

b. Complete the necessary calculations to get the obtained value. You must show the formulas used as well as the steps taken to complete each calculation. Round your responses to the hundredth if needed.

**Ans:
**

Test statistic, χ^2=∑_i▒(O_i-E_i )^2/E_i ;

χ^2=0.6+3+6.75=10.35

C. Are your results significant? How did you determine this? (1 point)

**Ans: **Yes. The χ^2=10.35>5.99 is the critical value at a 5% level of significance. Hence, our result is statistically significant, and we would reject the null hypothesis.

## Regression

2. (Hypothetical) prior research suggests that there is a positive correlation between the number of credit hours taken and time spent studying during finals week. You decide to conduct a study examining whether you can predict how many hours students’ study during finals week based on how many credits they are taking in a semester. Below is hypothetical data for four students that you survey regarding how many hours each reported studying during finals week and the number of credit hours they were taking.

**a.** What is the predictor (X)? What is the criterion (Y)? (1 point)

Ans: X: how many credits they are taking in a semester

Y: how many hours do students’ study during finals week

**b**. Complete the necessary prep work to complete the calculations in 2c. (4 points)

Ans: The prep work is shown in the table below:

**C**. Complete the calculations to determine the slope and y-intercept for the regression line. (3 points)

Ans: The slope of equation E(Y)=α+βX is given by

β=(∑_i▒(X_i-X ̅ )(Y_i-Y ̅ ) )/(∑_i▒〖(X_i-X ̅ )^2)〗)=5/34=0.147

The intercept is given by: α=Y ̅-βX ̅=13-0.147*14=10.941

** d.** Create the regression formula based on the values you obtained. (1 point)

Ans: Hence, the estimated regression line is E(Y)=10.941+0.147×X

Test the accuracy of your formula created in 2d, by calculating the predicted y, with the following values. (1.5 points)

A student has 12 credit hours

A student has 18 credit hours

**Ans: **The prediction for given X values are:

These were calculated by the formula (Y_i ) ̂=10.941+0.147×X_i

f. Do you think that your formula results in inaccurate predictions? Why or why not? For full credit, you must address how the specific predicted values you obtained coincide or differ from the values used to create the regression equation. (1.5 points)

**Ans:** The predictions are not accurate as the predicted value for 18 credit hours is 13.59 but the data itself has 15 hours which means it is off by 1.5 hours. However, we do not expect predictions to be exact. The prediction for 12 credit hours seems to be close as the value in the sample was 14 for 10 hours and 11 for 13 hours and prediction is between them for 12.