Linear Correlation with Regression Rate

Linear Correlation with Regression Rate

 

Solution

Q1.

(a)

There are total 19 variables in the college dataset out of which first 2 columns are non-numerical attributes. The 19th column in the dataset represents the graduation rate.

So, there are 16 numerical attributes given in the college database other than the graduation rate and we have to report their estimated intercept and estimated slope.

The following represents the required matrix, where the 1st column of the ith row denotes the estimated intercept of the ith numerical attribute( (i+2)th column variable in the college dataset) and the 2nd column of the ith row denotes the estimated slope of the ith numerical attribute.

63.50816  0.0006513635

64.51098  0.0004717347

65.78546 -0.0004130195

52.17990  0.4820071339

42.36514  0.4139706867

66.49550 -0.0002789742

67.94348 -0.0028997609

39.99511  0.0024393270

36.45998  0.0066559190

65.40268  0.0001103764

74.62449 -0.0068334183

42.14600  0.3209089940

38.53865  0.3378137260

84.21679 -1.3310049683

49.98633  0.6804899127

53.05884  0.0012840848

(b)

We find the correlation between the 16 numerical attributes and the graduation rate variable using the function   cor() in R.

Then, we rank the attributes on the basis of the magnitude of their correlation with the graduation rate.

The rank of the attributes (in descending order) is:

  1. Outstate
  2. Top10perc
  3. alumni
  4. Top25perc
  5. Board
  6. Expend
  7. F.Ratio
  8. PhD
  9. Terminal
  10. Personal
  11. Undergrad
  12. Apps
  13. Undergrad
  14. Accept
  15. Enroll
  16. Books

(c)

After normalizing the entire data (subtracting the mean) and then repeating steps (a) and steps(b), the estimates of the slope and the intercept as well as the correlation remain the same as in the earlier scale and hence, the           results of the steps (a) and (b) remain the same as these estimates and the correlation do not change on such a       normalization.

(d)

We transformed all the variables of the data set (16 numerical attributes and the graduation rate) into

  • Their square root values
  • Their square values
  • Their log values

Under all these transformations, the estimated slope and intercepts do not change and hence, the matrix of part  (a) do not change. Also, the rank of the numerical attributes in relation to the graduation rate do not change in        any of these cases.

(e)

The top 4 attributes selected to plot the relation between them and the graduation rate are:

  1. Outstate
  2. Top10perc
  3. alumni
  4. Top25perc

The plot is attached below as required.

Q2.

(a)

Call:lm(formula = mpg ~ horsepower, data = auto) Residuals:     Min       1Q   Median       3Q      Max -13.5710  -3.2592  -0.3435   2.7630  16.9240  Coefficients:             Estimate Std. Error t value Pr(>|t|)    (Intercept) 39.935861   0.717499   55.66   <2e-16 ***horsepower  -0.157845   0.006446  -24.49   <2e-16 ***—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.906 on 390 degrees of freedomMultiple R-squared:  0.6059,   Adjusted R-squared:  0.6049 F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

(b)

(i)

There is a relationship between the predictor and the response.

(ii)

There is a strong relation between the predictor and the response as the p-value for the predictor variable is very low.

(ii)

The relation between the predictor and the response is negative as the estimated slope (-0.157) is negative.

(iv)

With a horsepower of 98, the predicted value of mpg is 24.4715

The associated 95% confidence intervals for prediction is 23.11784 – 25.81626

(c)

(d)

When the hypothesis testing is done to check the linear relation between the horsepower as the predictor                      variable and mpg as the response variable, the null hypothesis that we observe a relationship between                              horsepower and mpg by chance is rejected as the p-value of the hypothesis test is extremely low. (As the variable horsepower is flagged with three asterisks representing a highly statistically significant p-value)

Q3.

(a)

The scatterplot matrix of all the variables is as follows:

(b)

Call:lm(formula = mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto) Residuals:    Min      1Q  Median      3Q     Max -9.5903 -2.1565 -0.1169  1.8690 13.0604  Coefficients:               Estimate Std. Error t value Pr(>|t|)    (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***cylinders     -0.493376   0.323282  -1.526  0.12780    displacement   0.019896   0.007515   2.647  0.00844 ** horsepower    -0.016951   0.013787  -1.230  0.21963    weight        -0.006474   0.000652  -9.929  < 2e-16 ***acceleration   0.080576   0.098845   0.815  0.41548    year           0.750773   0.050973  14.729  < 2e-16 ***origin         1.426141   0.278136   5.127 4.67e-07 ***—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.328 on 384 degrees of freedomMultiple R-squared:  0.8215,   Adjusted R-squared:  0.8182 F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

(c)

  • There is a relationship between the predictors and the response but the relation is statistically insignificant for some of the predictor variables.
  • The predictors – displacement, weight, year and origin appear to be statistically significant for the response variable mpg.
  • The coefficient for the year suggests that the mpg of the auto increases by 0.750773 if the year is increased by 1 year.

(d)

The residual plots obtained for the linear regression fit using the plot() function are below.

The plots suggest that the assumptions of the multiple linear regression employed here are satisfied.

The plots suggest that there are a few unusually large outliers – data points 321, 324 and 325.

(e)

On fitting the model with interaction effects:

Call:lm(formula = mpg ~ cylinders * displacement * horsepower * weight * acceleration * year * origin, data = auto) Residuals:    Min      1Q  Median      3Q     Max -6.4533 -1.0707  0.0000  0.9962  9.1994  Coefficients: (15 not defined because of singularities)                                                                    Estimate Std. Error t value Pr(>|t|)  (Intercept)                                                        6.785e+04  5.058e+04   1.342   0.1808  cylinders                                                         -1.585e+04  9.813e+03  -1.616   0.1073  displacement                                                      -1.419e+03  9.892e+02  -1.434   0.1527  horsepower                                                        -1.300e+03  9.564e+02  -1.360   0.1751  weight                                                             4.373e+01  3.998e+01   1.094   0.2750  acceleration                                                      -8.400e+03  5.946e+03  -1.413   0.1589  year                                                               6.232e+02  7.248e+02   0.860   0.3906  origin                                                            -5.789e+04  4.376e+04  -1.323   0.1869  cylinders:displacement                                             3.404e+02  2.325e+02   1.464   0.1444  cylinders:horsepower                                               3.118e+02  2.133e+02   1.461   0.1450  displacement:horsepower                                            8.962e+00  6.394e+00   1.402   0.1622  cylinders:weight                                                  -1.158e+01  9.975e+00  -1.161   0.2468  displacement:weight                                                2.313e-01  1.695e-01   1.365   0.1735  horsepower:weight                                                  4.055e-01  3.201e-01   1.267   0.2063  cylinders:acceleration                                             2.009e+03  1.380e+03   1.456   0.1467  displacement:acceleration                                          1.682e+01  1.593e+01   1.056   0.2920  horsepower:acceleration                                            1.076e+01  2.313e+01   0.465   0.6421  weight:acceleration                                                3.748e-01  7.925e-01   0.473   0.6367  cylinders:year                                                    -1.748e+02  1.665e+02  -1.049   0.2949  displacement:year                                                  8.341e+00  6.084e+00   1.371   0.1715  horsepower:year                                                    1.607e+00  4.964e+00   0.324   0.7464  weight:year                                                       -9.176e-01  7.729e-01  -1.187   0.2361  acceleration:year                                                  9.485e+01  6.703e+01   1.415   0.1582  cylinders:origin                                                   1.319e+04  9.455e+03   1.395   0.1641  displacement:origin                                                1.351e+03  9.883e+02   1.367   0.1726  horsepower:origin                                                  1.241e+03  9.258e+02   1.340   0.1813  weight:origin                                                     -4.564e+01  3.692e+01  -1.236   0.2175  acceleration:origin                                                7.439e+03  5.873e+03   1.267   0.2064  year:origin                                                       -7.745e+02  5.774e+02  -1.341   0.1809  cylinders:displacement:horsepower                                 -2.075e+00  1.370e+00  -1.515   0.1310  cylinders:displacement:weight                                     -4.967e-02  3.278e-02  -1.515   0.1308  cylinders:horsepower:weight                                       -9.443e-02  6.523e-02  -1.448   0.1488  displacement:horsepower:weight                                    -2.805e-03  2.095e-03  -1.339   0.1816  cylinders:displacement:acceleration                               -3.446e+00  1.925e+00  -1.790   0.0745 .cylinders:horsepower:acceleration                                 -1.650e+00  2.294e+00  -0.719   0.4726  displacement:horsepower:acceleration                              -6.798e-02  1.616e-01  -0.421   0.6744  cylinders:weight:acceleration                                     -3.977e-02  8.196e-02  -0.485   0.6279  displacement:weight:acceleration                                  -3.174e-03  4.950e-03  -0.641   0.5220  horsepower:weight:acceleration                                    -3.717e-03  9.754e-03  -0.381   0.7035  cylinders:displacement:year                                       -1.871e+00  1.240e+00  -1.510   0.1323  cylinders:horsepower:year                                         -1.675e-01  4.792e-01  -0.349   0.7270  displacement:horsepower:year                                      -1.594e-02  3.453e-02  -0.462   0.6447  cylinders:weight:year                                              2.400e-01  1.956e-01   1.227   0.2208  displacement:weight:year                                          -7.017e-04  1.063e-03  -0.660   0.5098  horsepower:weight:year                                            -6.825e-04  2.087e-03  -0.327   0.7439  cylinders:acceleration:year                                       -2.225e+01  1.523e+01  -1.461   0.1452  displacement:acceleration:year                                    -1.068e-01  1.814e-01  -0.589   0.5565  horsepower:acceleration:year                                      -1.653e-01  2.979e-01  -0.555   0.5794  weight:acceleration:year                                          -6.359e-03  1.018e-02  -0.625   0.5326  cylinders:displacement:origin                                     -3.266e+02  2.342e+02  -1.394   0.1644  cylinders:horsepower:origin                                       -2.947e+02  2.146e+02  -1.373   0.1708  displacement:horsepower:origin                                    -8.446e+00  6.208e+00  -1.361   0.1747  cylinders:weight:origin                                            1.216e+01  9.713e+00   1.252   0.2118  displacement:weight:origin                                        -2.103e-01  1.620e-01  -1.298   0.1953  horsepower:weight:origin                                          -3.992e-01  2.979e-01  -1.340   0.1813  cylinders:acceleration:origin                                     -1.777e+03  1.391e+03  -1.278   0.2025  displacement:acceleration:origin                                  -1.175e+01  1.128e+01  -1.042   0.2985  horsepower:acceleration:origin                                    -3.670e+00  1.032e+01  -0.356   0.7223  weight:acceleration:origin                                        -1.515e-01  3.466e-01  -0.437   0.6624  cylinders:year:origin                                              2.131e+02  1.533e+02   1.390   0.1657  displacement:year:origin                                          -7.372e+00  5.763e+00  -1.279   0.2019  horsepower:year:origin                                            -6.129e-01  2.320e+00  -0.264   0.7919  weight:year:origin                                                 9.512e-01  7.427e-01   1.281   0.2014  acceleration:year:origin                                          -8.096e+01  6.551e+01  -1.236   0.2175  cylinders:displacement:horsepower:weight                           6.153e-04  4.125e-04   1.492   0.1369  cylinders:displacement:horsepower:acceleration                     8.053e-03  1.331e-02   0.605   0.5458  cylinders:displacement:weight:acceleration                         3.041e-04  4.129e-04   0.737   0.4620  cylinders:horsepower:weight:acceleration                           3.603e-04  8.587e-04   0.420   0.6751  displacement:horsepower:weight:acceleration                        2.971e-05  5.895e-05   0.504   0.6147  cylinders:displacement:horsepower:year                             1.436e-03  2.859e-03   0.502   0.6160  cylinders:displacement:weight:year                                 5.645e-05  8.960e-05   0.630   0.5292  cylinders:horsepower:weight:year                                   5.134e-05  1.863e-04   0.276   0.7831  displacement:horsepower:weight:year                                6.861e-06  1.258e-05   0.545   0.5860  cylinders:displacement:acceleration:year                           1.517e-02  1.668e-02   0.910   0.3639  cylinders:horsepower:acceleration:year                             2.417e-02  2.951e-02   0.819   0.4134  displacement:horsepower:acceleration:year                          1.006e-03  2.111e-03   0.477   0.6339  cylinders:weight:acceleration:year                                 7.543e-04  1.042e-03   0.724   0.4696  displacement:weight:acceleration:year                              4.571e-05  6.427e-05   0.711   0.4776  horsepower:weight:acceleration:year                                5.842e-05  1.252e-04   0.467   0.6411  cylinders:displacement:horsepower:origin                           1.974e+00  1.380e+00   1.430   0.1538  cylinders:displacement:weight:origin                               4.576e-02  3.286e-02   1.392   0.1649  cylinders:horsepower:weight:origin                                 9.146e-02  6.533e-02   1.400   0.1626  displacement:horsepower:weight:origin                              2.646e-03  2.022e-03   1.309   0.1917  cylinders:displacement:acceleration:origin                         2.354e+00  1.622e+00   1.451   0.1479  cylinders:horsepower:acceleration:origin                          -5.449e-02  2.162e-01  -0.252   0.8012  displacement:horsepower:acceleration:origin                        2.922e-02  8.997e-02   0.325   0.7456  cylinders:weight:acceleration:origin                              -1.186e-02  9.461e-03  -1.254   0.2109  displacement:weight:acceleration:origin                            1.539e-03  3.037e-03   0.507   0.6127  horsepower:weight:acceleration:origin                              2.184e-03  4.625e-03   0.472   0.6372  cylinders:displacement:year:origin                                 1.676e+00  1.247e+00   1.344   0.1800  cylinders:horsepower:year:origin                                  -8.713e-02  7.737e-02  -1.126   0.2611  displacement:horsepower:year:origin                                8.343e-03  1.938e-02   0.430   0.6672  cylinders:weight:year:origin                                      -2.488e-01  1.927e-01  -1.291   0.1977  displacement:weight:year:origin                                    3.907e-04  6.625e-04   0.590   0.5558  horsepower:weight:year:origin                                      5.097e-04  9.857e-04   0.517   0.6055  cylinders:acceleration:year:origin                                 1.900e+01  1.533e+01   1.239   0.2162  displacement:acceleration:year:origin                              3.522e-02  9.168e-02   0.384   0.7011  horsepower:acceleration:year:origin                                6.027e-02  1.337e-01   0.451   0.6525  weight:acceleration:year:origin                                    2.916e-03  4.536e-03   0.643   0.5208  cylinders:displacement:horsepower:weight:acceleration             -2.295e-06  4.466e-06  -0.514   0.6078  cylinders:displacement:horsepower:weight:year                     -4.178e-07  9.705e-07  -0.430   0.6672  cylinders:displacement:horsepower:acceleration:year               -1.124e-04  1.748e-04  -0.643   0.5206  cylinders:displacement:weight:acceleration:year                   -4.299e-06  5.405e-06  -0.795   0.4270  cylinders:horsepower:weight:acceleration:year                     -5.449e-06  1.111e-05  -0.491   0.6240  displacement:horsepower:weight:acceleration:year                  -4.413e-07  7.632e-07  -0.578   0.5636  cylinders:displacement:horsepower:weight:origin                   -5.872e-04  4.152e-04  -1.414   0.1584  cylinders:displacement:horsepower:acceleration:origin                     NA         NANANAcylinders:displacement:weight:acceleration:origin                         NA         NANANAcylinders:horsepower:weight:acceleration:origin                           NA         NANANAdisplacement:horsepower:weight:acceleration:origin                -1.684e-05  3.791e-05  -0.444   0.6573  cylinders:displacement:horsepower:year:origin                             NA         NANANAcylinders:displacement:weight:year:origin                                 NA         NANANAcylinders:horsepower:weight:year:origin                                   NA         NANANAdisplacement:horsepower:weight:year:origin                        -4.410e-06  8.158e-06  -0.541   0.5893  cylinders:displacement:acceleration:year:origin                           NA         NANANAcylinders:horsepower:acceleration:year:origin                             NA         NANANAdisplacement:horsepower:acceleration:year:origin                  -4.537e-04  1.169e-03  -0.388   0.6983  cylinders:weight:acceleration:year:origin                                 NA         NANANAdisplacement:weight:acceleration:year:origin                      -2.222e-05  3.907e-05  -0.569   0.5700  horsepower:weight:acceleration:year:origin                        -3.306e-05  5.950e-05  -0.556   0.5789  cylinders:displacement:horsepower:weight:acceleration:year         3.293e-08  5.819e-08   0.566   0.5719  cylinders:displacement:horsepower:weight:acceleration:origin              NA         NANANAcylinders:displacement:horsepower:weight:year:origin                      NA         NANANAcylinders:displacement:horsepower:acceleration:year:origin                NA         NANANAcylinders:displacement:weight:acceleration:year:origin                    NA         NANANAcylinders:horsepower:weight:acceleration:year:origin                      NA         NANANAdisplacement:horsepower:weight:acceleration:year:origin            2.522e-07  4.889e-07   0.516   0.6064  cylinders:displacement:horsepower:weight:acceleration:year:origin         NA         NANANA—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.418 on 279 degrees of freedomMultiple R-squared:  0.9315,   Adjusted R-squared:  0.904 F-statistic: 33.88 on 112 and 279 DF,  p-value: < 2.2e-16

None of the interactions appear to be statistically significant.

(f)

On fitting the square of the response variable mpg:

Call:lm(formula = (mpg^2) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto) Residuals:    Min      1Q  Median      3Q     Max -483.45 -141.87  -19.62  103.58 1042.84  Coefficients:               Estimate Std. Error t value Pr(>|t|)    (Intercept)  -1.878e+03  2.928e+02  -6.412 4.22e-10 ***cylinders    -1.436e+01  2.038e+01  -0.704  0.48157    displacement  1.328e+00  4.738e-01   2.802  0.00534 ** horsepower   -3.587e-01  8.693e-01  -0.413  0.68009    weight       -3.522e-01  4.111e-02  -8.567 2.62e-16 ***acceleration  9.278e+00  6.232e+00   1.489  0.13740    year          4.081e+01  3.214e+00  12.698  < 2e-16 ***origin        9.509e+01  1.754e+01   5.422 1.04e-07 ***—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 209.8 on 384 degrees of freedomMultiple R-squared:  0.7292,   Adjusted R-squared:  0.7243 F-statistic: 147.8 on 7 and 384 DF,  p-value: < 2.2e-16

The statistically significant variables are – displacement, weight, year, origin.

On fitting square root of the response variable:

Call:lm(formula = sqrt(mpg) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto) Residuals:     Min       1Q   Median       3Q      Max -0.98891 -0.18946  0.00505  0.16947  1.02581  Coefficients:               Estimate Std. Error t value Pr(>|t|)    (Intercept)   1.075e+00  4.290e-01   2.506   0.0126 *  cylinders    -5.942e-02  2.986e-02  -1.990   0.0474 *  displacement  1.752e-03  6.942e-04   2.524   0.0120 *  horsepower   -2.512e-03  1.274e-03  -1.972   0.0493 *  weight       -6.367e-04  6.024e-05 -10.570  < 2e-16 ***acceleration  2.738e-03  9.131e-03   0.300   0.7644    year          7.381e-02  4.709e-03  15.675  < 2e-16 ***origin        1.217e-01  2.569e-02   4.735 3.09e-06 ***—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3074 on 384 degrees of freedomMultiple R-squared:  0.8561,   Adjusted R-squared:  0.8535 F-statistic: 326.3 on 7 and 384 DF,  p-value: < 2.2e-16

Except acceleration, other response variables are statistically significant in predicting square root of mpg.

On fitting the log(mpg):

Call:lm(formula = log(mpg) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto) Residuals:     Min       1Q   Median       3Q      Max -0.40955 -0.06533  0.00079  0.06785  0.33925  Coefficients:               Estimate Std. Error t value Pr(>|t|)    (Intercept)   1.751e+00  1.662e-01  10.533  < 2e-16 ***cylinders    -2.795e-02  1.157e-02  -2.415  0.01619 *  displacement  6.362e-04  2.690e-04   2.365  0.01852 *  horsepower   -1.475e-03  4.935e-04  -2.989  0.00298 ** weight       -2.551e-04  2.334e-05 -10.931  < 2e-16 ***acceleration -1.348e-03  3.538e-03  -0.381  0.70339    year          2.958e-02  1.824e-03  16.211  < 2e-16 ***origin        4.071e-02  9.955e-03   4.089 5.28e-05 ***—Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1191 on 384 degrees of freedomMultiple R-squared:  0.8795,   Adjusted R-squared:  0.8773 F-statistic: 400.4 on 7 and 384 DF,  p-value: < 2.2e-16

Except acceleration, all other response variables are statistically significant in predicting log(mpg).

Q4.

(a)

Excluding mpg, year, origin and name, after principal component analysis of the auto database, the ranked features are:

  1. Cylinders
  2. Displacement
  3. Horsepower
  4. Weight
  5. Acceleration

For principal component analysis, we use prcomp() function in R.

Their importance is described as follows:

       eigenvalue percentage of variance cumulative percentage of variance

comp 1 4.07185982              81.437196                          81.43720

comp 2 0.69386125              13.877225                          95.31442

comp 3 0.13349305               2.669861                          97.98428

comp 4 0.06426839               1.285368                          99.26965

comp 5 0.03651750               0.730350                         100.00000

 

 

 

The principal components are:

 

PC1        PC2        PC3         PC4         PC5

cylinders    -0.4687175  0.2234786 -0.6587000  0.27031946  0.47265527

displacement -0.4824044  0.1786297 -0.1876404  0.02124108 -0.83649107

horsepower   -0.4738437 -0.1199885  0.6275710  0.59359925  0.12194007

weight       -0.4617902  0.3452864  0.3344970 -0.70324388  0.24715772

acceleration  0.3313787  0.8857363  0.1586562  0.28207124 -0.03038708

(b)

R files: 

Q1 

college<- read.csv(“College.csv”) #to read the file

q1 <- function(data) # a function to solve part a

{

coeff<- data.frame()

for(i in 3:18)

{

m <- lm(college$Grad.Rate ~ college[,i])

coeff<- rbind(coeff, unname(m$coefficients))

}

print(unname(coeff)) # this will print the matrix

}

q2 <- function(data) # a function to solve part b

{

correlation<- c()

for(i in 3:18)

{

c <- cor(college$Grad.Rate, college[,i])

correlation<- c(correlation, c)

}

print(correlation)

}

q1(college)

q2(college)

college2 <- college  # part c

for(i in 3:19)

{

mean<- mean(college2[,i])

college2[,i] <- college2[,i] – mean

}

q1(college2)  #solving part a after normalization

q2(college2)  #solving part b after normalization

#part d

#with square root transformation

college3 <- college

college3[, 3:19]<- sqrt(college3[, 3:19])

q1(college3)

q2(college3)

#with square transformation

college4 <- college

college4[, 3:19]<- (college4[, 3:19]) ^ 2

q1(college4)

q2(college4)

#with log transformation

college5 <- college

college5[, 3:19]<- log(college5[, 3:19])

q1(college5)

q2(college5)

#part e – plot

par(mfrow = c(2,2))

par(oma=c(0,0,2,0))

plot(college[,10], college[,19], main = “Outstate”, col=”blue”, xlab = “Outstate”, ylab = “Grad.Rate”)

plot(college[,6], college[,19], main = “Top10perc”, col=”red”, xlab = “Top10perc”, ylab = “Grad.Rate”)

plot(college[,17], college[,19], main = “perc.alumni”, col=”green”, xlab = “perc.alumni”, ylab = “Grad.Rate”)

plot(college[,7], college[,19], main = “Top25perc”, col=”yellow”, xlab = “Top25perc”, ylab = “Grad.Rate”)

title(“Plots of attributes vs Graduation rate”, outer = T, col.main=”indianred1″) 

Q2 

auto<- read.csv(“Auto.csv”) # reading the file

reg<- lm(mpg ~ horsepower, data = auto) # linear regression

summary(reg) # suummary

plot(auto$horsepower, auto$mpg, xlab = “horsepower”, ylab = “mpg”)  #plot

abline(reg) # to display the regression line 

Q3 

auto<- read.csv(“Auto.csv”) # to read the file

plot(auto) # to plot scatterplot matrix (part a)

reg<- lm(mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto) #multiple llinear regression

print(summary(reg)) #part b and c

plot(reg) # to produce the residual plots (part d)

# part e – with interaction effects

regint<- lm(mpg ~ cylinders * displacement * horsepower * weight * acceleration * year * origin, data = auto)

print(summary(regint))

#part f

#with transformation of square of response variable

regsquare<- lm((mpg^2) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto)

print(summary(regsquare))

#with transformation of square root of response variable

regroot<- lm(sqrt(mpg) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto)

print(summary(regroot))

#with transformation of log of response variable

reglog<- lm(log(mpg) ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = auto)

print(summary(reglog)) 

Q4 

auto<- read.csv(“Auto.csv”) #to read the file

data<- auto[,2:6] #extracting the attributes for PCA

pca<- prcomp(data, scale. = TRUE) #PCA

print(pca$sdev^2) #eigen values(importance) of attributes

print(pca$sdev^2/sum(pca$sdev^2)) # proprtion of variance by the attributes

pcr<- pca$rotation  #Principal components

print(pcr)

pc<- unname(pcr) #to make it a matrix

#part b, producing the required plot

par(mfrow = c(2,3))

par(oma=c(0,0,2,0))

plot(pc[1,1]*auto$cylinders+pc[2,1]*auto$displacement+pc[3,1]*auto$horsepower+pc[4,1]*auto$weight+pc[5,1]*auto$acceleration

, auto$mpg, main = “Principal Component 1”, col = “red”, xlab=  “Principal Component 1”, ylab = “mpg”)

plot(pc[1,2]*auto$cylinders+pc[2,2]*auto$displacement+pc[3,2]*auto$horsepower+pc[4,2]*auto$weight+pc[5,2]*auto$acceleration

, auto$mpg, main = “Principal Component 2”, col = “blue”, xlab=  “Principal Component 2”, ylab = “mpg”)

plot(pc[1,3]*auto$cylinders+pc[2,3]*auto$displacement+pc[3,3]*auto$horsepower+pc[4,3]*auto$weight+pc[5,3]*auto$acceleration

, auto$mpg, main = “Principal Component 3”, col = “green”, xlab=  “Principal Component 3”, ylab = “mpg”)

plot(pc[1,4]*auto$cylinders+pc[2,4]*auto$displacement+pc[3,4]*auto$horsepower+pc[4,4]*auto$weight+pc[5,4]*auto$acceleration

, auto$mpg, main = “Principal Component 4”, col = “yellow”, xlab=  “Principal Component 4”, ylab = “mpg”)

plot(pc[1,5]*auto$cylinders+pc[2,5]*auto$displacement+pc[3,5]*auto$horsepower+pc[4,5]*auto$weight+pc[5,5]*auto$acceleration

, auto$mpg, main = “Principal Component 5”, col = “black”, xlab=  “Principal Component 5”, ylab = “mpg”)

title(“Plots of all principal components versus mpg”, outer = T, col.main=”indianred1″)