Introduction
The objective of this study is to determine if there is a significant difference in the average length of hospital stay between pre and post periods. Secondly, to determine if there is a significant difference in the average length of hospital stay among different insurance. We will also explore variables that are important to predict the length of stay in the hospital. The dataset is extracted from the Sentara hospital system. It is related to patients undergoing Coronary Artery Bypass Graft (CABG). The data set contains information on 1,013 patients.
Method
To study if there is a significant difference in the average length of hospital stay between pre and post periods, we will use a regression model. The model to be estimated is
length of stay=β_0+β_1 pre_post
The null and alternative hypothesis is
H_0:β_1=0(There is no significant difference in the average length of hospital stay between pre and post period)
H_1:β_1≠0(There is a significant difference in the average length of hospital stay between pre and post period)
To determine if there is a significant difference in the average length of hospital stay among different insurance, we use oneway ANOVA. The null and alternative hypothesis are:
H0: There is no significant difference in the average length of hospital stay among different insurance
H1: There is a significant difference in the average length of hospital stay among different insurance
The assumptions underlying this test are
 Independence: This means that each record in the data must be a distinct and independent entity. This is met as each of the observation belongs to one group of the categorical variable only
 Normality: The responses for each factor level is normally distributed. i.e. average length of stay for each insurance group must be normally distributed.
 Homogeneity of variance: This means that the variance of the groups are equal
To determine variables that are important to predict the length of stay in the hospital, we will use the multiple linear regression model. The model to be estimated is
the length of stay=β_0+β_1 hosp charge+β_2 race+β_3 insurance+β_4 age+β_5 infection+β_6 heart attack+β_8 glucose.
The hypothesis to be tested is
H_0 1:β_1=0;H_0 2:β_2=0…H_0 8:β_8=0
H_1 1:β_1≠0;H_1 2:β_2≠0…H_1 8:β_8≠0
The assumptions of the simple/multiple linear regression are
Linearity: there must be the existence of a linear relationship between the dependent and the independent variables.
No autocorrelation: Autocorrelation occurs when the residuals are not independent of each other. In other words when the value of y(x+1) is not independent of the value of y(x). For the linear regression model, we expect the residuals to be independent of one another.
Normality of residuals: We expect the residual from the model to be normally distributed.
No heteroscedasticity: We expect the residual variance to be constant. However, heteroscedasticity occurs if the variance of the residuals changes with the observation. Therefore, there should be no heteroscedasticity
No outliers: Outlier values may bias the estimate from the regression model. outliers are values that are too large or too small compared to other observations. We require that no outlier exists in the dataset.
There is little or no multicollinearity: multicollinearity exists if there is a very high correlation between the independent variables. Therefore, we expect there should be a not too high a correlation between the independent variables.
Result
The descriptive statistics in table 1 show that the average age of the patient is 63.9 years (sd=10.06 years). The average length of stay in the hospital is 11.73 days (sd=8.03 days) whole average hospital charge is $150,606.1 (sd=$108,393.8) and the average glucose level is 137.25 (sd=15.84). 43.83% of respondents were measured during the preimplementation period while 56.17% were measured during the postimplementation period. 72.06% of respondents suffer a heart attack while 27.94% do not. 82.53% of patients suffer inhospital infection while 17.47% do not. 5.73% of respondents have Medicaid insurance, 25.77% have medicare insurance, 9.97% have other insurance and 58.54% have private insurance. 28.23% are African American, 63.28% are whites while 7.9% are other races.
Table 2 presents the result of the simple linear regression of length of stay on dummy variable measuring pre and postimplementation period. The result shows a significant estimate for the slope (β=1.154,[95%CI=2.15,0.16],p=0.02which means the length of stay is significantly different between pre and postimplementation period.
Table 3 presents the ANOVA result testing difference in length of stay among different insurance types. The result shows that F(3,1009)=4.39, p=0.004 which means we reject the null hypothesis. there is thus a significant difference in the average length of stay among the insurance types. The multiple comparison results show that a significant difference was found between the length of stay of private insurance and Medicaid (p=0.019).
Table 4 presents the result of multiple regression model to predict which of the variables is important in predicting length of stay. The result shows that age (β=0.03,[95%CI=0.002,0.058],p=0.034, hospital charge (β=5.92e05,[95%CI=5.67e05,6.17e05],p<0.001, heart attack (β=1.25,[95%CI=0.68,1.83],p<0.01, infection (β=1.7,[95%CI=0.995,2.40],p<0.001) and average glucose (β=0.05,[95%CI=0.03,0.07],p<0.01) were the only significant variables in the model.
Table 1: Descriptive summary of Infection data
Variable  Mean  Std. Dev. 
pat_age  3  6 
losadmitdi~e  11.73445  8.031901 
hosp_charge  150606.1  108393.8 
glucose  137.2452  15.84332 
Variable 
 n  % 
pre_post  pre post
 444 569  43.83 56.17

heartattack  No Yes
 283 730
 27.94 72.06

infection  No Yes
 836 177
 82.53 17.47

Insurance  Medicaid Medicare Others Private
 58 261 101 593
 5.73
25.77
9.97
58.54

Race  AAs Others Whites
 286 80
641
 28.23 7.9 63.28

Table 2: simple linear regression model
Variables  estimates  pvalue  confidence interval 
Intercept
 12.38  <.0001  (11.63,13.13) 
prepost  1.15  0.02  (2.15,0.16) 
Table 3: ANOVA result
Test  statistics  df  test statistics  p  multiple comparisons  p 
Anova  F  31,009  4.39  0.0044  medicareMedicaid othersMedicaid othersmedicare privateMedicaid privatemedicare privateothers  0.735 0.674 1.000 0.019 0.084 1.000 
Table 4: Parameter Estimates of the Multiple Linear Regression
Discussion
Postimplementation period length of stay in the hospital is lower than the preimplementation period length of stay by 1.15 days [95% CI=2.15,0.15]. This suggests that the intervention program is successful. length of stay in the hospital seems to be significantly different across insurance types but the difference occurs only between patients and Medicaid patients while no difference is found for other categories. age (β=0.03,[95%CI=0.002,0.058],p=0.034, hospital charge (β=5.92e05,[95%CI=5.67e05,6.17e05],p<0.001, heart attack (β=1.25,[95%CI=0.68,1.83],p<0.01, infection (β=1.7,[95%CI=0.995,2.40],p<0.001) and average glucose (β=0.05,[95%CI=0.03,0.07],p<0.01) were the significant factors affecting length of stay in the hospital. An additional year for age translates to 0.03 days more of staying in the hospital. This is plausible since the more people age, the less their body is responding to treatment. Hospital charge has a very small effect on the length of stay in the hospital. A dollar increase in hospital charge increases the length of stay by 4.57 seconds or for $100,000 in additional hospital charge, there is an additional 5.92 days stay in the hospital. People who have experienced heart attack stayed 1.25 days more in hospital than those who haven’t while those who suffered infection while in hospital stayed 1.7 days more in hospital than those who did not. An increase in average glucose increases the length of stay by 0.05 days. Whites stay in hospital 0.63 days lesser than African Americans (p=0.036).
Appendix
proc means data=WORK.QUERY chartype mean std min max n vardef=df;
var pat_age losadmitdischarge avg_glucose hosp_charge;
run;
proc freq data=WORK.QUERY;
tables pre_post insurance race heartattack infection/ plots=(freqplot cumfreqplot);
run;
proc glmselect data=WORK.QUERY outdesign(addinputvars)=Work.reg_design;
class pre_post/ param=glm;
model losadmitdischarge= pre_post/showpvalues selection=none;
run;
proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where pre_post is not missing
ods select ParameterEstimates OutputStatistics ResidualStatistics SpecTest
DiagnosticsPanel ResidualPlot ObservedByPredicted;
proc glmselect data=WORK.QUERY outdesign(addinputvars)=Work.reg_design;
class insurance race heartattack infection / param=glm;
model losadmitdischarge= pat_age avg_glucose hosp_charge insurance race heartattack infection /showpvalues selection=none;
run;
proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where insurance is not missing and race is not missing and heatattack is not missing and infection is not missing
ods select ParameterEstimates OutputStatistics ResidualStatistics SpecTest
DiagnosticsPanel ResidualPlot ObservedByPredicted;
proc glm data=WORK.QUERY;
class insurance;
model losadmitdischarge=insurance;
means ct / hovtest=levene welch plots=none;
lsmeans ct / adjust=tukey pdiff alpha=.05;
run;
quit;