In this statistical analysis homework, we aimed to predict the age of abalones based on various physical measurements such as length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight. The data used for this analysis is sourced from the UCI Abalone dataset (Kuhn, 2016). Our analysis revealed that height and shell weight (Shellwt) are significant predictors of an abalone's age, measured by the number of rings on its shell.
The primary objective of this analysis was to investigate the relationship between the number of rings on an abalone's shell and its physical attributes. We sought to determine whether the physical measurements could be used to predict the age of abalones.
The analysis followed a structured methodology:
- Descriptive Analysis: We conducted a descriptive analysis to understand the distribution of the available data. Basic statistics were computed for the physical measurements, both as a whole and segmented by the sex of the abalones. We also used boxplots to visualize the distributions of these variables and identify potential outliers.
- Correlation Analysis: We performed a Pearson correlation analysis to assess the relationship between the number of rings and the physical measurements. All correlations were found to be significantly different from zero (p < 0.001), indicating a relationship between these variables.
- Multiple Regression Analysis: Two linear regression models were constructed to predict abalone age. The first model, Model A, included multiple predictor variables (length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight). The second model, Model B, included only height and shell weight. Model A explained approximately 53% of the variance in abalone age, while Model B accounted for around 40% of the variance. Height and Shellwt were identified as significant predictors of abalone age in Model B.
- Regression Diagnostics: We performed regression diagnostics for Model B. However, it's important to note that the model exhibited some issues, as shown in Figures 4 and 5. The variance of the residuals appeared to increase as the predicted value increased, and the residuals did not appear to be normally distributed. Therefore, Model B should be used with caution.
The dataset used for this analysis contained 4,177 observations with no missing or null values. The variables included sex, length, diameter, height, whole weight, shucked weight, viscera weight, shell weight, and the number of rings. Sex was a categorical variable with three levels (male, female, or infant), while the other variables were quantitative with a ratio level of measurement. Female abalones had higher mean values for all quantitative variables compared to males and infants.
Age in years was calculated as the number of rings plus 1.5, and it exhibited a moderately positive correlation with the quantitative variables. The most significant correlation was between age and Shellwt (r = 0.628), while the weakest correlation was between age and Shuckedwt (r = 0.421). Furthermore, Length, Diameter, Height, Wholewt, Shuckedwt, and Viscerawt were strongly positively correlated, with correlation coefficients greater than 0.70.
Our analysis revealed that all the physical variables had strong positive correlations with the number of rings on the abalones' shells, indicating that as these physical measurements increased, the number of rings also increased. While Model A explained 53% of the variance in abalone age, it was not accepted due to some coefficients having negative signs, despite the observed correlations between the variables.
On the other hand, Model B, which included only Height and Shellwt as predictor variables, explained 40% of the variance in abalone age. Both Height and Shellwt were identified as significant predictors of abalone age. However, the regression diagnostics for Model B indicated issues with the model's residuals and their normality.
our analysis demonstrates the potential for predicting abalone age based on physical measurements, particularly height and shell weight. However, caution is advised when using Model B due to the identified issues in regression diagnostics.