## Assignment instructions

**Question 1
**

An intern researcher (Ivie Monroe, MS) at US Fake News is conducting a study about tuition at North American 4-year colleges and universities. She collected data from 20 schools about their tuition costs (in thousands of dollars), their score on an independent rating scale (in points out their size (in thousands of undergraduates), and the type of school (Public: Type=0 or Private Type = 1) school. A printout for the multiple regression of Tuition on the three predicting variables is shown below.

a) Interpret b0, b1, and b3 in terms of tuition costs, rating, and type of school.

b) The intern wants to know whether the combination of rating, size, and type of school is OVERALL useful for predicting tuition costs. Write out the hypotheses in MATH and in ENGLISH. Report your findings.

c) What is the R of this model? How do you interpret the R value in the context of tuition costs, rating, school size and type of school?

d) Use the printout to compute a 95% confidence interval for β2. Explain what the resulting

confidence interval tells you about the usefulness of student size as a predictor of college tuition.

e) Does it appear from this model that private schools cost significantly more than public schools? Perform an appropriate hypothesis test, making sure to state the null and alternative hypotheses mathematically and in ENGLISH words with a justification of your choice, and give the p-value and your real-world conclusions.

Question 2

Dr. Josephine King’s current study includes 1000 subjects. For each person, she has recorded whether or not they had hypercholesterolemia (high cholesterol). She has also recorded values for several potential predictors including family history (famhist = 1 if the person has a close relative with high cholesterol and 0 otherwise), diet as represented by the average daily carbohydrate consumption in grams (yum!), exercise (an indicator which is 1 if the person exercises regularly and 0 if they don’t), high blood pressure (highbp = 1 if the person’s average systolic blood pressure is above 165 and 0 otherwise) and gender (1 = female and 0 = male). Dr. King has started by fitting a standard logistic regression model with all of these covariates.

a) Write out the full logistic regression model evaluating the associations of family history,

diet, exercise, high blood pressure and gender on odds of having hypercholesterolemia.

b) Give a careful interpretation of the odds ratio for the exercise variable.

c) Suppose a person lowers their carbohydrate intake by 30 grams. Find the odds ratio

associated with this reduction and the corresponding 95% confidence interval. Interpret

the 95% OR confidence interval associated with this 30 grams carbs reduction.

Assignment Solution

Question One

A. From the table above, b0 = -2.405 which can also be called the mean value of response variable (tuition cost) when all the independent variables (ratings, size and type of schools) are zero.

b_1 = 0.097 can also be called the slope of independent variable (ratings). This mean a unit increase in the independent variable (ratings) would increase tuition cost by 0.097 when other independent variables (size and type of schools) were held constant.

b_3 = 16.858 can also be called the slope of independent variable (Type of schools). This mean a unit increase in the independent variable (type of school) would increase tuition cost by 16.858 when other independent variables (size and ratings) were held constant. \

B. H0: β_1=β_2 = ….= β_k = 0.

H1: β_j≠ 0. (for at least one j

H0: There is no relationship between response variable (tuition cost) and independent

Variables (ratings, size and type of schools)

H1: There is a significant relationship between response variable (tuition cost) and

Independent variables (ratings, size and type of schools).

Report: Using multiple linear regressions to predict tuition cost from independent variables (ratings, size and type of schools). The ANOVA table above shows [F(3,16) = 1686.01, p<0.01] at 5% and 1% level of significance we reject the null hypothesis above and conclude there is a significant relationship between response variable (tuition cost) and independent variables (ratings, size and type of schools).

C. R for this model is given as 0.998. Simply put, there is a strong correlation between the predicted values (ratings, size and type of schools) and tuition cost the response variable.

D. 95% confidence interval for β_2

pic

From the table above estimate of β_2= -0.0192, t_cal= -1.2 and 〖S.E〗_β= 0.0161

Hence we have

(-0.0192 – (-1.2*0.0161)) or (-0.0192 + (-1.2*0.0161))

95% confidence interval for β_2

(-0.0192 + 0.01932) or (-0.0192 - 0.01932)

(0.0001, -0.03852)

Interpretations: The confidence interval provides more evidence that the true slope of the independent variable (student size) lies between 0.0001 and -0.0385 at α = 5% level of significance.

E. H0: β_3<1

H1: β_3> 0

H0: Private school does not cost significantly more than public school.

Hi: Private school cost significantly more than public school.

From the table above, t_β = 50.21 and p-value is 0.001

Conclusion: Since the p-value is lesser than 5% level of significance we reject the null hypothesis and conclude that private school cost significantly more than public school.

Question 2

pic**Hence, the full regression model is**

**Y**=

__5.3843+1.0192x1+0.0215x2-0.4361x3+0.226x4-0.2293x5)__

(1-5.3843+1.0192x1+0.0215x2-0.4361x3+0.226x4-02293x5)

B. odds ratio for exercise = e^β

= e^(-0.4361) = 2.282

Since the odds ratio for exercise variable is 2.282, this can be interpreted that the odds of respondents exercise regularly are 182% higher than the odds of respondents that do not exercise.

C. Odds ratio for diet = e^β

= e^0.0215= 0.0584

The odds ratio means at 30g lower carbohydrates odds of a respondent’s diet is 5.84% higher than the odds of respondents consuming carbohydrates diets greater than 30g.

95% C.I for the OR.

t = β/(SE(β))

t = 0.0215/0.0018 = 11.944

95% C.I = (0.0584 – 11.944 * 0.0018) or (0.0584 + 11.944 * 0.0018)

= (0.0584 – 0.0215) or (0.0584 + 0.0215)

= (0.0369, 0.0799)

This means there is a 95% confidence level that the odds of respondents dying lesser than 30g falls within this range at α = 5%.