We delve into the world of the NBA to predict the shooting ability of players using logistic regression and K-Nearest Neighbors (KNN). The objective is to determine whether a player is a "good shooter" (coded as 1) or a "not good shooter" (coded as 0) based on a set of crucial variables, including Points Per Game (PTS), Three-Point Percentage (3P%), Free Throw Percentage (FT%), Rebounds per Game (REB), and Assists per Game (AST). Our analysis reveals that, among these variables, rebounds per game has a significant impact on a player's shooting ability, as evidenced by logistic regression. The homework also explores the limitations of other classification methods, emphasizing the importance of selecting the most accurate model for player evaluation and team decisions.
The dependent variable is binary, with "1" representing a "good shooter" and "0" representing a "not good shooter." Our goal in building these models, including logistic regression and K-Nearest Neighbors (KNN), is to understand which variables have the most significant impact on determining a player's shooting ability.
The independent variables used to predict shooting ability include:
- PTS (Points Per Game): Indicates a player's scoring ability.
- 3P% (Three-Point Percentage): Measures a player's efficiency in shooting three-pointers.
- FT% (Free Throw Percentage): Represents the accuracy of free throw shooting.
- REB (Rebounds per Game): Measures a player's ability to secure missed shots.
- AST (Assists per Game): Indicates the ability to create scoring opportunities for teammates.
These variables are derived from NBA 2023 data.
Predicting whether a player is "good" is valuable for coaches, scouts, and team management, as it provides insights into a player's scoring ability, shooting efficiency, and overall offensive contribution to the team. Logistic regression can identify influential variables that offer valuable insights into a player's shooting ability.
Table 1. Test of the null hypothesis H0: Pr(Good Shooter=1) = 0.373:
|Statistic||DF||Chi-square||Pr > Chi²|
Table 2. Type II analysis (Variable Good Shooter):
|Source||DF||Chi-square (Wald)||Pr > Wald||Chi-square (LR)||Pr > LR|
Table 3. Model parameters (Variable Good Shooter):
|Source||Value||Standard error||Wald Chi-Square||Pr > Chi²||Wald Lower bound (95%)||Wald Upper bound (95%)||Odds ratio||Odds ratio Lower bound (95%)||Odds ratio Upper bound (95%)|
The logistic regression equation, derived from the coefficients, is:
Pr(Good Shooter=1) = 1 / (1 + exp(-(4.636 - 0.159PTS - 0.0533P% - 0.053FT% + 0.800REB - 0.957*AST))
Interpretation of the Coefficient of Predictor Variables
- The coefficient of PTS (-0.159) suggests that an increase in points per game will decrease the likelihood of being a good shooter, indicating that scoring more points does not necessarily make a player a good shooter.
- The coefficient of 3P% (-0.053) indicates that a higher three-point shooting percentage reduces the likelihood of being a good shooter.
- The coefficient of FT% (-0.053) suggests that a higher free throw shooting percentage is associated with a slightly lower likelihood of being a good shooter.
- The coefficient of REB (0.800) means that grabbing more rebounds per game is associated with a higher likelihood of being a good shooter.
- The coefficient of AST (-0.957) indicates that a higher average number of assists per game is associated with a lower likelihood of being a good shooter.
K-Nearest Neighbors (KNN)
Summary statistics for quantitative data/predictors:
|Variable||Observations||Obs. with missing data||Obs. without missing data||Minimum||Maximum||Mean||Std. deviation|