Analysis and interpretation of data based on regression
Here we will collect data based on age, gender, and ICU status. We will then prepare and visualize the data to see the number of people who ended up in the ICU.
The pandemic has caused panic due to shortages in a lot of different aspects of life. One of the major ones was supplies in the medical field. We aim to predict the patients that will end in the ICU based on gender and age. Our outpatient clinics will launch marketing campaigns to target specific patient populations that are at higher risk of ending up in the ICU. This should help decrease shortages in our ICUs and medical departments. I need help with the following:
1) Identify and collect the data
2.)Prepare, explore, and visualize the data
3.) Use classification/regression modeling techniques
4.) Evaluate the models
5.)Communicate the results and conclusion data source
1. Identify and collect data: The variables for which data is to be collected are:
• ICU status: This measures whether the patient ended up in the intensive care unit. It is a binomial variable where Yes denote patients who ended up in ICU and No denote patient who did not end up in ICU. This is the dependent variable
• Gender: This indicates whether the patient is a male or female. It is also a binary variable with two categories. It is an independent variable.
• Age: this measures the respondent’s age. It is a categorical variable and an independent variable
2. prepare, explore, and visualize the data
• Preparation: the data was loaded on the software. We found there is no missing entry. Moreover, there is no error in data entry. The data is ready for exploration
• Explore: We used frequency distribution to show the percentages of each category in each variable. the result is presented in table 1.
Table 1: Frequency Distribution
Table 1 shows that 75.3% of covid patients sampled did not end up in the ICU while 24.7% ended up in the ICU. 50.8% of respondents are male while 49.2% of respondents are female. 1.6% of respondents ended are between 0-17 years while 23.8% are between 18 and 49 years. 25.4% of respondents are between 50 and 64 years while 49.2% of respondents are above 65 years.
Table 2: Cross Tabulation
The table shows the distribution of ICU status of patients within gender and age groups. The result shows that 28.6% of patients aged above 65 years ended up in ICU while 23.8% of those aged between 50 and 64 years ended up in ICU. 18.8% and 18.1% of patients aged 0-17 years and 18-49 years respectively ended up in ICU. 20.4% of females ended up in ICU while 28.9% of all males with covid ended up in ICU. The chi-square test shows there is a significant association between the ICU status and gender and between ICU status and age group.
Fig 1: Univariate Plot
Fig 2: Bi-variate plot
3. Classification/Regression Modeling Techniques
We used a classification tree for the classification techniques and binary logistic regression for the regression techniques. The result of the classification tree is presented in fig 3. The result shows that gender is the most important variable for splitting the tree with an improvement measure of 0.004. We see that male has a higher proportion at ICU than female (28.9% vs 20.4%). For males, the age group category splits the tree further into above 65 years (node 6) and below 65 years (node 5). Male above 65 years have a higher proportion at ICU than those below 65 years (33.2% vs 24.6%). Within Male below 65 years, those between 50 and 64 years (Node 7) have high proportion at the ICU than those below 50 years (node 8) (26.1% vs 22.9%). For females, the split by age group is between patients below 50 years and those above 50 years. Female patients below 50 years had a lower proportion at the ICU compared to those above 50 years.
Table 3 presents the result of a logistic regression model. The result shows that gender has a significant effect on ICU status (p<.001). female patients have significantly lower odds of being in ICU than males (p<.001). similarly, the age group has a significant effect on ICU. People less than 65 have significantly lower odds of being in ICU than patients that are 65 and above.
Fig 3: Classification Tree
Table 3: Logistic Regression
4. Model Evaluation
Table 5 presents the confusion matrix for the classification tree and logistic regression model. We fit the model on 80% of the data and tested the model on the remaining 20%. The result shows that the performance of both models is the same as they both have a 75.5% correct prediction rate. However, with 100% sensitivity and 0% specificity, the model’s performance may be questioned. This may be because some important variables are not included in the model.
Table 5: confusion matrix
5. Results and Conclusion
We see from the descriptive statistics that a low proportion of patients ended up in the ICU, covid does not differ significantly based on gender while the older, the more prone to the disease. The cross tab shows that a significantly high proportion of males ended up in ICU while more of patients in the high age group bracket (50 and above) ended up in ICU than younger patients (less than 50). The regression and classification models show that males have a higher probability of ending in ICU than females. Older people have a higher probability of ending up in the ICU than younger people. The classification results specifically show that males who are 65 years and above are the highest risk group followed by Male aged between 50 and 64 years. Males whose age is less than 50 years and females who are 50 years and above also have a high risk of ending up in ICU. In conclusion, as a matter of priority, this campaign should target males and most importantly males that are above 50 years of age.