# Understanding R-squared value

R-squared is a statistical measure that is used to represent the proportion of variance for an independent variable which is explained by an independent variable in a regression model. R-squared is used to explain the extent to which the variance of one variable explains the variance of the 2nd variable.

## Fixed-effects estimator

This is a panel data, t is 17, i is 5, the variables in the dataset are avgsalry, Allstars, attend, wins and teamed, the dataset contains 25 teams, 17 years are covered, the variable was measured in millions.
Wins have a positive effect on attendance and it is linear (upward linear trend)
There is homogeneity across wins.
Transformation in the attached R script.
The coefficient on wins is 20550 using the lm () function.
The coefficient on wins is 20550.3 using plm () function, the results are the same.
lm(formula = attend ~ wins + factor(teamid) - 1, data = panel)
Residuals:
Min 1Q Median 3Q Max
-701531 -237900 -17908 206587 989027
+

Coefficients:

Estimate Std. Error t value Pr(>|t|)

wins                18941       1493  12.684  < 2e-16 ***

factor(teamid)1    739134     149966   4.929 1.22e-06 ***

factor(teamid)2   1064265     139783   7.614 1.95e-13 ***

factor(teamid)3    743807     147562   5.041 7.05e-07 ***

factor(teamid)4    448318     144802   3.096 0.002099 **

factor(teamid)5    896821     138176   6.490 2.54e-10 ***

factor(teamid)6    597614     145993   4.093 5.15e-05 ***

factor(teamid)7    669337     144653   4.627 5.02e-06 ***

factor(teamid)8    540935     136722   3.956 9.00e-05 ***

factor(teamid)9    417589     147338   2.834 0.004827 **

factor(teamid)10   541477     139344   3.886 0.000119 ***

factor(teamid)11  1143475     145769   7.844 4.02e-14 ***

factor(teamid)12   574438     137884   4.166 3.80e-05 ***

factor(teamid)13   369900     138832   2.664 0.008026 **

factor(teamid)14   239706     142136   1.686 0.092489 .

factor(teamid)15   800743     152913   5.237 2.65e-07 ***

factor(teamid)16   705246     148987   4.734 3.07e-06 ***

factor(teamid)17   463995     147113   3.154 0.001732 **

factor(teamid)18   605697     136287   4.444 1.14e-05 ***

factor(teamid)19   413927     137157   3.018 0.002709 **

factor(teamid)20   573882     139783   4.106 4.89e-05 ***

factor(teamid)21   599319     141841   4.225 2.96e-05 ***

factor(teamid)22   570085     146142   3.901 0.000112 ***

factor(teamid)23   966598     145173   6.658 9.20e-11 ***

factor(teamid)24   721152     141399   5.100 5.26e-07 ***

factor(teamid)25   902227     148387   6.080 2.81e-09 ***

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 318100 on 399 degrees of freedom
Multiple R-squared: 0.9809, Adjusted R-squared: 0.9796
F-statistic: 786.7 on 26 and 399 DF, p-value: < 2.2e-16
plm(formula = attend ~ wins + factor(teamid) - 1, data = panel)
Balanced Panel: n = 25, T = 17, N = 425
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-701531 -237900 -17908 206587 989027
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
wins 18940.7 1493.3 12.684 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 5.6651e+13
Residual sum of Squares: 4.0373e+13
R-Squared: 0.28735
F-statistic: 160.881 on 1 and 399 DF, p-value: < 2.22e-16
There is no much difference in their results.

Estimate Std. Error t-value Pr(>|t|)

1     86746     149966  0.5784 0.563293

2    411878     139783  2.9466 0.003402 **

3     91419     147562  0.6195 0.535921

4   -204069     144802 -1.4093 0.159525

5    244434     138176  1.7690 0.077656 .

6    -54774     145993 -0.3752 0.707726

7     16950     144653  0.1172 0.906780

8   -111453     136722 -0.8152 0.415455

9   -234799     147338 -1.5936 0.111815

10  -110910     139344 -0.7959 0.426537

11   491088     145769  3.3689 0.000828 ***

12   -77949     137884 -0.5653 0.572171

13  -282488     138832 -2.0347 0.042537 *

14  -412682     142136 -2.9034 0.003896 **

15   148356     152913  0.9702 0.332537

16    52859     148987  0.3548 0.722936

17  -188393     147113 -1.2806 0.201079

18   -46691     136287 -0.3426 0.732085

19  -238461     137157 -1.7386 0.082877 .

20   -78506     139783 -0.5616 0.574688

21   -53069     141841 -0.3741 0.708496

22   -82303     146142 -0.5632 0.573637

23   314211     145173  2.1644 0.031028 *

24    68765     141399  0.4863 0.627008

25   249840     148387  1.6837 0.093020 .

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The graphs show that the relationships seem heteroskedastic, linearity exists between attend and avrsalry while non-linear relationship exists between attend and Allstars

## Regression of attendance on variables

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)   820348     117997   6.952 1.38e-11 ***

wins           10875       1653   6.578 1.42e-10 ***

avgsalry      331558      22118  14.991  < 2e-16 ***

allstars       66749      13125   5.086 5.52e-07 ***

the expected value of attendance is 820348 if all the independent variables are held constant. The expected value of attendance increases by 10875 when wins increase by one unit. The expected value of attendance increases by 1331558 when avgsalry increases by one unit the expected value of attendance increases by 66749 when Allstars increases by one unit it is actually easy to tell which variables have the largest impact reason been that the higher the magnitude of the coefficient the more the impact. In this case, avgsalry has the largest impact.
The relationships seem heteroskedastic, linearity exists between attending and the log of avrsalry while a non-linear relationship exists between attend and the log of Allstars

Estimate Std. Error t value Pr(>|t|)

(Intercept)   -1521621     555133  -2.741  0.00639 **

log(wins)       837102     128833   6.498 2.31e-10 ***

log(avgsalry)   377530      23260  16.231  < 2e-16 ***

log(allstars)   166452      30316   5.491 6.94e-08 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 290500 on 421 degrees of freedom
Multiple R-squared: 0.5648, Adjusted R-squared: 0.5617
F-statistic: 182.1 on 3 and 421 DF, p-value: < 2.2e-16
the expected value of attendance increases by 837102 when the log of wins increases by one unit.
the expected value of attendance increases by 377530 when logging of avgsalry increases by one unit
the expected value of attendance increases by 166452 when the log of Allstars increases by one unit.

## Studentized Breusch-Pagan test

data: olsmodel
BP = 1.8949, df = 3, p-value = 0.5945
Since the p-value is greater than the significance value, we do not reject the null hypothesis and conclude that there is homoscedasticity, hence, there is no need to correct our standard errors.

Estimate Std. Error t-value  Pr(>|t|)

log(wins)&nbasp;      930936     115273  8.0760 8.098e-15 ***

log(avgsalry)   326092      20840 15.6472 < 2.2e-16 ***

log(allstars)   123676      27468  4.5026 8.835e-06 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 5.6651e+13
Residual sum of Squares: 2.4522e+13
R-Squared: 0.56714
F-statistic: 173.383 on 3 and 397 DF, p-value: < 2.22e-16
The coefficient for the log of wins increases, however, there is a decrease in the coefficients of the log of avgsalry and log of Allstars.

Coefficients:

Estimate Std. Error t-value  Pr(>|t|)

log(wins)       904804     103933  8.7057 < 2.2e-16 ***

log(avgsalry)   305295      40274  7.5805 2.648e-13 ***

log(allstars)   126114      25081  5.0283 7.632e-07 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 3.7492e+13
Residual sum of Squares: 1.8645e+13
R-Squared: 0.50268