# Steps on How to Calculate Residual Analysis Homework Using R.

## Assignment on to Find Residual Analysis

August 7, 2021

Introduction

This work is aimed at formulating a predictive model for forecasting the chances of getting admission putting into consideration some other explanatory variables which are listed in the next section. The dataset contains 400 sets of students' information ranging from their TOEFL scores, and CGPA, among others.

Data Description

The following is the description of all the variables in the dataset collected from 400

students.

GRE.Score: This is the student's score in GRE

TOEFL.Score: TOEFL score of the students

University.Rating: The rating of students by the University

SOP

LOR

Research: Representing if students have done any research work before the study

The scatter plots above indicate that the dataset has no missing values as well as outliers.

This gives us a go-ahead in the modlling of the data.

## GRE.ScoreTOEFL.ScoreUniversity.Rating SOP

LOR

## GRE.Score 1.0000000 0.8359768 0.6689759 0.6128307

0.5575545

## TOEFL.Score 0.8359768 1.0000000 0.6955898 0.6579805

0.5677209

## University.Rating 0.6689759 0.6955898 1.0000000 0.7345228

0.6601235

## SOP 0.6128307 0.6579805 0.7345228 1.0000000

0.7295925

## LOR 0.5575545 0.5677209 0.6601235 0.7295925

1.0000000

## CGPA 0.8330605 0.8284174 0.7464787 0.7181440

0.6702113

## Research 0.5803906 0.4898579 0.4477825 0.4440288

0.3968593

## Chance.of.Admit 0.8026105 0.7915940 0.7112503 0.6757319

0.6698888

## GRE.Score 0.8330605 0.5803906 0.8026105

## TOEFL.Score 0.8284174 0.4898579 0.7915940

## University.Rating 0.7464787 0.4477825 0.7112503

## SOP 0.7181440 0.4440288 0.6757319

## LOR 0.6702113 0.3968593 0.6698888

## CGPA 1.0000000 0.5216542 0.8732891

## Research 0.5216542 1.0000000 0.5532021

## Chance. of.Admit 0.8732891 0.5532021 1.0000000

Linear Regression Model

Since it has been confirmed that there exist linear relationships between the response

variable (Chance of admission) and the explanatory variables, we, therefore, need to break

our dataset into two groups where the first group will be used in testing the adequacy of

the model to be set up and the other dataset will be used for the prediction.

## The following objects are masked from date:

##

## CGPA, Chance. of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

##

## Call:

## lm(formula = Chance.of.Admit ~ GRE.Score + TOEFL.Score + University.Rating

+

## CGPA + SOP + LOR, data = train)

##

## Residuals:

## Min 1Q Median 3Q Max

## -0.26231 -0.02965 0.01063 0.04823 0.15903

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -1.5419997 0.1785288 -8.637 2.16e-15 ***

## GRE.Score 0.0023330 0.0009709 2.403 0.017205 *

## TOEFL.Score 0.0042376 0.0019930 2.126 0.034754 *

## University.Rating 0.0073669 0.0069980 1.053 0.293791

## CGPA 0.1103004 0.0189598 5.818 2.45e-08 ***

## SOP -0.0057258 0.0081384 -0.704 0.482560

## LOR 0.0308114 0.0092193 3.342 0.000999 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 0.0753 on 193 degrees of freedom

## Multiple R-squared: 0.7745, Adjusted R-squared: 0.7674

## F-statistic: 110.5 on 6 and 193 DF, p-value: < 2.2e-16

Interpretation:

The value of R-squared= 0.7745 means that about 77.45% of the total variation in chance of

admission can be explained by the variables considered while about 22.54% are attributed

to other factors.

From the result above, it can be seen that variables "University Ratings" and "SOP" are not

significant in predicting chance of admission since the p-values are greater than 5% level of

significance; hence we remove these variables and try other variables with the second set

of data.

Residual Analysis based on Train data

## The following objects are masked from train (pos = 3):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from dat:

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

The residual plots indicate that the residuals from the model are normally distributed

which shows that the predictive power of this model is reliable.

Modelling the Test Dataset

## The following objects are masked from train (pos = 3):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from train (pos = 4):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from dat:

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, Research, SOP, TOEFL.Score,

## University.Rating

##

## Call:

## lm(formula = Chance.of.Admit ~ GRE.Score + TOEFL.Score + CGPA +

## LOR, data = test)

##

## Residuals:

## Min 1Q Median 3Q Max

## -0.279714 -0.022678 0.009575 0.036309 0.160523

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -1.4630686 0.1057458 -13.836 < 2e-16 ***

## GRE.Score 0.0023179 0.0005761 4.023 6.88e-05 ***

## TOEFL.Score 0.0029252 0.0010761 2.718 0.00685 **

## CGPA 0.1228233 0.0118475 10.367 < 2e-16 ***

## LOR 0.0239713 0.0048405 4.952 1.09e-06 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 0.06443 on 395 degrees of freedom

## Multiple R-squared: 0.7979, Adjusted R-squared: 0.7959

## F-statistic: 389.9 on 4 and 395 DF, p-value: < 2.2e-16

Interpretation

With the result above, the coefficient of multiple determination has increased to 0.7979

which means that almost 80% of the variation in chance of admission can be explained by

the vraibles retained. Hence, the predictive power of the model has increased. Therefore,

the model is given as

** Chance of admission = -1.4630 + 0.00238 GRE.SCore + 0.002925 TOEFL score + 0.12282

CGPA + 0.02397 LOR **

This means that a student with higher CGPA followed by a higher LOR with moderate

results in either TOEFL or GRE has a bigger chance of getting admitted

Residual Analysis of the new dataset

## The following objects are masked from test (pos = 3):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, Research, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from train (pos = 4):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from train (pos = 5):

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, SOP, TOEFL.Score,

## University.Rating

## The following objects are masked from dat:

##

## CGPA, Chance.of.Admit, GRE.Score, LOR, Research, SOP, TOEFL.Score,

## University.Rating

The plots also confirm the normality of the model's residuals.