+1 (315) 557-6473 

Data Analysis Case Study Homework Help

Are you stuck with your demanding data analysis case study? Our statistics homework helpers are at your service round the clock. Simply take our data analysis case study homework help in the comfort of your home. You do not need to make an appointment or wait for a specific day to meet our experts. Whether you contact us at the wee hours of the morning or late at night, you will find an expert who’s ready to assist you with your project. There is no need of procrastinating your homework. Hire our data analysis case study homework helpers now.

Case: Absenteeism

Hi, I’m Pam Poovey, the director of Human Resources for Ingels Sherman Inter-national Shipping. Recently I have been concerned about how often our workers have been taking days for either personal or family health reasons (i.e. Family and Medical Leave). The company has a very generous leave policy, but I have noticed that many more people are using the leave policy than they used to. I suspect that at least some of the employees have been taking advantage of the company’s policy but I would like to see if there is some sort of pattern in what kind of employee is more likely to take days . Our department has recorded whether or not each employee took a leave day last quarter along with some basic information that we think may be related for each employee. We recorded their current salary, whether the employee is full time or part time, age, and whether or not the employee got a raise or promotion in the last five years.
Because this project is sensitive, we would like you to be an external consultant to help us analyze this data. Primarily, we would like to know if any of the factors we have access to can help us identify employees who are most likely to use a leave day. Secondarily, we would like to know how accurately the data can be used to predict whether or not an employee is likely to take a leave day during a given three month period. As you might imagine, we want to avoid talking to employees to further investigate this issue unless we are very con dent that they are likely to use the leave policy in three months. For that reason, before we use the model you develop to start investigating possible abuse the policy, is there a way to tune the model so that at most 10% of the people who don’t use the leave policy are incorrectly predicted by the model to use the leave policy? If the model can be tuned that way, how good will the model be at accurately identifying people who actually will use the leave policy in three months if we do that?
Senior Analyst’s Objectives
1. Give a brief summary of the data using both numeric and visual summaries
2. Investigate the relationship between the predictors and the use of the leave policy to determine what model would be most appropriate
3. Evaluate and give a description of the utility and validity of the model
4. Analyze and interpret the relationship between the use of the leave policy and the predictors for the client
5. Be sure to comment on any issues or weaknesses the model may have so that the client understands the restrictions of the analysis

Summary of the Analysis

The main objective of the Analysis is to identify if there is a relationship between the employee taking a day off and some factors such as: the employee’s age , whether the employee’s is full time or part time and whether the employee got a raise or promotion in the last five years.
1.1 Brief summary of the data
Variable Description of the variable
   
Took leave This is a categorical variable , describing whether
  the employee has taken a leave or not , if yes, the
  variable takes ‘yesleave’ and if not, the variable
  takes ‘noleave’.
Salary This is a numerical variable , recording the
  employee’s salary , the minimum salary is
  20004.5 and the maximum salary is 69997.
   
Employment Status This is a categorical variable , describing the
  employee’s employment status. It takes part time
  when the employee is working part time and
  takes Fulltime when the employee is working for
  Fulltime.
RaiseorPromo This is a categorical variable , describing whether
  the employee has got a promotion or raise during
  the last five years. If the employee has got a
  promotion , the variable takes the value Yes , if
  not not the variable takes the value No.

➔ From the bar plot , we can see that around 800 employees took a leave , and around 200 employees did not take a leave from their job.
➔ From the bar plot , we can see that around 780 employees work full time and around 220 work part time.
➔ From the bar plot , we can see that around 810 employees have got no promotion or raise during the last five years , while 200 did.
➔ From the box plot , we can see that the average salary for the employees who took leave is $45000 while for those who did not is around $50000.
➔ From the box plot , we can see that the average age is 35 for both the employees who took a leave and those who did not.
1.2 Relationship between the predictors and the leave policy
From the correlation analysis, we can see that there exists a positive relationship between the employee’s age and employee’s salary.
For the categorical variables, the type of relationship cannot be identified, but it is predicted that the probability of employee’s having a leave increase when their age increases, and the type of employment is part time.
1.3 Utility and validity of the model
The model we would set up to find the relationship between the leave policy and the predictor variables , is the logistic regression model , the variable we have as independent which the variable we are trying to predict is the leave.
This model is crucial, because it will enable us to find which predictor variable has an impact on employee’s leave.
1.4 The model interpretation
According to the results of our model, when the employee is working full time, the probability that the employee took a leave is higher.
In addition, when the employee has not received any raise or promo during the last 5 years, the probability of taking a leave increase.
Furthermore, when the employee’s age increases or the salary increases, the employee’s probability of taking a leave decreases slightly.
1.5 problems of the model
The data is not that much representative to assess the leave policy and so the relationship between the leave policy and the predictor variables.
Since most of the employees have not taken a leave and, that is , the number of employees who have not taken a leave is higher than the number of employees who have taken the leave , the data representation and so the assessment of the leave policy is difficult to measure.
II. Statistical Appendix

Visual summary of the data

To visualize the data , we used bar plots and boxplots.
Bar plots and boxplots have different objective of use:
a) The bar plots are used for categorical variables, categorical variables are variables that have two or more categories, for instance for the leave, we have two categories: either yes when the employee has taken a leave or No when the employee has not.
Bar plots are used to show graphically the relationship between a categoric variable and a numeric, each category is represented as a bar, and the size of the bar is representing the numeric value.
b) The box plots are used for continuous variables, continuous variables are variables that can take any continuous numerical value , instance the salary of the employee is a continuous variable.
Box plots are used to depict graphically groups of numerical data through their quartiles.

Relationship between the predictors and the leave policy

To analyze the relationship between the predictors and the leave policy , we would use the correlation coefficient.
The correlation coefficient is a numerical measure of the statistical relationship between the variables.it takes values between -1 and 1 , the closer to -1 or 1 , the higher is the relationship between the variables.
III. Exhibit
data analysis case study

data analysis case study
data analysis case study
Asociación de probabilidades predichas y respuestas observadas
Concordancia de porcentaje 64.0 D de Somers 0.279
       
Discordancia de porcentaje 36.0 Gamma 0.279
       
Porcentaje ligado 0.0 Tau-a 0.089
       
Pares 169744 c 0.640
       
      Estimadores de ratio de probabilidades            
                           
                  Estimador   Límites de confianza de  
                  de punto   Wald al 95%      
Efecto                                
               
employmentStatus FullTime vs PartTime   1.450   0.771   2.726  
                     
raiseOrPromo  no vs yes         1.866   1.210   2.879  
                         
salary             1.000   1.000   1.000  
                         
age             0.995   0.970   1.021  
                                     
        Análisis de efectos Tipo 3                
                                     
              Chi-cuadrado                  
                de Wald                  
    Efecto     DF         Pr > ChiSq            
                                   
    employmentStatus 1     1.3269   0.2494            
                                     
    raiseOrPromo     1     7.9647   0.0048            
                                     
    salary     1     14.7770   0.0001            
                                     
    age     1     0.1320   0.7164            
                                     
        Análisis de estimación de verosimilitud máxima      
                                     
                        Error     Chi-      
                        estándar   cuadrado      
  Parámetro         DF Estimación       de Wald   Pr > ChiSq
                                 
  Intercept         1 -0.0717 0.5912   0.0147   0.9035
                               
  employmentStatus   FullTime   1 0.3712 0.3223   1.3269   0.2494
                               
  employmentStatus   PartTime   0 0 .   .   .
                                 
  raiseOrPromo   no     1 0.6241 0.2211   7.9647   0.0048
                                 
  raiseOrPromo   yes     0 0 .   .   .
                                 
  salary         1 -0.00004 0.000011   14.7770   0.0001
                                 
  age         1 -0.00469 0.0129   0.1320   0.7164