One-Sample T-test

One-Sample T-test

Homework Assignment SPSS Exercises

The questionnaire used to collect the data for the survey is in our textbook – Avery Fitness Center. You will need it to define the labels in SPSS.

Questions:

  1. Get freq, mean, std and sem for each variable. Tell for which variables the mean, std and sem are “meaningful” and give the interpretation of each.
 

Mean

 

 

Std

 

SEM

Can the mean of this type of variable be interpreted? (Y/N)
Weight
Classes
Circuit
Station
Pool
Visits
Daypart
Doctor
Enjoy
Age
Gender
  1. Using only the standard deviation for each of the Importance variables (in survey, how important …), which variable had the greatest amount of agreement? List these four variables in table below in order of most to least agreement.
Importance Variables Standard Deviation

Create and present a Frequency table to present your answers to the following questions:

  1. What percentage of the respondents who answered the Gender question are male?
  2. What percentage of everyone who took the survey are female?

Create and present a single Frequency table using the variable Income and answer the following questions:

  1. What percentage of the respondents who answered the question make over $120,000 per year?
  2. What percentage of the respondents who answered the question reported making over $60,000 per year?
  3. What percentage of the respondents who answered the question reported making $30,000 or less per year?
  4. What percentage of the survey respondents reported making between $45,001 and $60,000 per year?

Create and present a Histogram with a normal curve (can use the SPSS graph) using the variable Age and answer the following questions:

  1. What are the mean, standard deviation, and count for age?
  2. What are the upper and lower boundaries (i.e., ages) of the normal distribution? How did you calculate these numbers?
  3. Identify (by specific ages) any outliers (if any)?
  4. If there are outliers, what do you recommend be done with them and why?

Create and present a Frequency table using the variable Gender:

  1. Based on the percentages of males, calculate the sampling error for the proportion using the formula from our book. Be sure to show and explain the numbers you used. Also be sure to show the resultant confidence interval.

Create and present a single table that lists the following:

  1. Percentages and counts for each category of the four continuous variables: General Health/Fitness, Social Aspects, Physical Enjoyment, and Specific Medical Concerns;
  2. The top two boxes for each of these variables;

Create a single table to compare the means between:

  1. The pairs of all of these four continuous Importance variables (General Fitness; Social Aspects; Physical Enjoyment; Specific Medical Concerns). Explain if there are/are not significant differences between each pair of variables.
  1. List all the variables in the table in the order of most important to least important (be sure to show why/how you determined the level of importance). Be sure to show the ranking numbers (e.g., 1,2,3,4).

Run a One-sample T-test and present a table to determine:

  1. If the average number of monthly visits (i.e., the variable Visits) is significantly different from the national average of eight. Interpret and explain your relevant results. Be sure to report the mean difference, t-value, degrees of freedom, and significance level.

Create and present a Cross-tabulation table of the variables Pool and Doctor.

  1. What percentage of the total sample utilized the therapy pool?
  2. What percentage of those who used the therapy pool were recommended by a doctor?
  3. What percentage of those recommended by a doctor utilized the therapy pool?
  4. Are the results significant?
  5. How strongly, if at all, are the variables associated with each other?

Show in a table:

  1. The comparison of the means between the number of Visits and whether people had Utilized the exercise circuit. Explain, and show, if the means are significantly different from each other.

Run and interpret a correlation analysis and create a single table that:

  1. Uses the four Importance variables (General Fitness; Social Aspects; Physical Enjoyment; Specific Medical Concerns) showing the correlations and which are significant.
  2. Replace the diagonal values with the respective means in the table.
  3. Interpret the table.

Recommendations

  1. Based on your analysis of ALL the data in this assignment, write an Executive Summary of your findings with clear managerial recommendations. 200 -300 words 

Solution

Answers:

 

Mean

 

Std SEM Can the mean of this type of variable be interpreted? (Y/N)
weight 0.32 0.465 0.022 No
classes 0.26 0.440 0.021 No
circuit 0.22 0.415 0.020 No
station 0.12 0.325 0.015 No
pool 0.45 0.498 0.023 No
visits 14.20 7.733 0.387 Yes
daypart 1.30 0.549 0.027 No
doctor 0.26 0.439 0.021 No
enjoy 3.91 1.090 0.055 Yes
age 62.56 19.630 0.937 Yes
gender 1.79 0.105 0.019 No

The mean can be interpreted for quantitive variables only. Here in our case, we can interpret the mean of visits and age 

  • For the 400 persons who answered the question “Number of visits to AFC in previous 30 days”, the mean number of visits in the previous 30 days is of 14.20 visits with a standard deviation of 7.733.
  • The mean score for the physical enjoyment is of 3.91 meaning that the physical enjoyment is an important reason for participating in AFC
  • For the 439 persons who answered the question related to age, the mean age is of 62.56 years with a standard deviation of 19.630
  1. Using only the standard deviation for each of the Importance variables, The table below shows in order of most to least agreement
Importance Variables Standard Deviation
Fitness 0.745
Enjoy 1.090
Medical 1.206
Social 1.272

Let us create and present, using SPSS, a Frequency table for the variable gender 

Frequency Percent Valid Percent Cumulative Percent
Valid male 89 19.8 20.6 20.6
female 344 76.4 79.4 100.0
Total 433 96.2 100.0
Missing System 17 3.8
Total 450 100.0
  1. The percentage of male respondents who answered the Gender question is of 20.6% (89 male among 433 respondents)
  2. What percentage of everyone who took the survey are female 76.4 (344 female among 450 person)

Using SPSS, let us create and present a single Frequency table using the variable Income

Frequency Percent Valid Percent Cumulative Percent
Valid 0 – 15.000 14 3.1 4.0 4.0
15.001 – 30.000 43 9.6 12.3 16.3
30.001 – 45.000 49 10.9 14.0 30.3
45.001 – 60.000 83 18.4 23.7 54.0
60.001 – 75.000 60 13.3 17.1 71.1
75.001 – 90.000 35 7.8 10.0 81.1
90.001 – 105.000 25 5.6 7.1 88.3
105.001 – 120.000 23 5.1 6.6 94.9
more than 120.000 18 4.0 5.1 100.0
Total 350 77.8 100.0
Missing System 100 22.2
Total 450 100.0
  1. 0% (18 person among 350) of the respondents who answered the question make over $120,000 per year
  2. 46% (161 person among 350) of the respondents who answered the question reported making over $60,000 per year
  1. 3% (57 person among 350) is the percentage of the respondents who answered the question reported making $30,000 or less per year
  1. 7% (83 person among 350) of the survey respondents reported making between $45,001 and $60,000 per year.

The following figure represents a Histogram with a normal using the variable Age

  1. The mean age of the 439 respondents is of 62.56 years with a standard deviation of 19.63
  2. the upper and lower boundaries can be calculated using the following formula

Upper boundary is of 121 years

Lower boundary 3.67 years

  1. there are no outliers in the data
  1. Outlier can be replaced with the maximum/minimum value dependent if the outiler is greater/smaller than the upper/lower boundaries. Or it can simply be replaced with the median

The following table representa Frequency table using the variable Gender

Frequency Percent Valid Percent Cumulative Percent
Valid male 89 19.8 20.6 20.6
female 344 76.4 79.4 100.0
Total 433 96.2 100.0
Missing System 17 3.8
Total 450 100.0

Based on the percentages of males, the sampling error for the proportion is:

  1. The following table present the percentages and counts for each category of the four continuous variables: General Health/Fitness, Social Aspects, Physical Enjoyment, and Specific Medical Concerns
Variable Category Count Percent Valid Percent Cumulative Percent
Fitness Valid 1 10 2.2 2.2 2.2
2 4 .9 .9 3.1
3 8 1.8 1.8 4.9
4 50 11.1 11.2 16.1
5 374 83.1 83.9 100.0
Total 446 99.1 100.0
Missing System 4 .9
Total 450 100.0
Social Valid 1 53 11.8 13.5 13.5
2 66 14.7 16.8 30.2
3 113 25.1 28.7 58.9
4 94 20.9 23.9 82.7
5 68 15.1 17.3 100.0
Total 394 87.6 100.0
Missing System 56 12.4
Total 450 100.0
Enjoy Valid 1 18 4.0 4.6 4.6
2 20 4.4 5.1 9.6
3 84 18.7 21.3 31.0
4 128 28.4 32.5 63.5
5 144 32.0 36.5 100.0
Total 394 87.6 100.0
Missing System 56 12.4
Total 450 100.0
Medical Valid 1 33 7.3 8.1 8.1
2 14 3.1 3.4 11.6
3 43 9.6 10.6 22.2
4 122 27.1 30.0 52.2
5 194 43.1 47.8 100.0
Total 406 90.2 100.0
Missing System 44 9.8
Total 450 100.0
  1. The following table shows the top two boxes for each of these variables
Variable Category Count Percent Valid Percent
Fitness 1 10 2.2 2.2
2 4 .9 .9
Social 1 53 11.8 13.5
2 66 14.7 16.8
Enjoy 1 18 4.0 4.6
2 20 4.4 5.1
Medical 1 33 7.3 8.1
2 14 3.1 3.4
  1. There are six possible pairs, the following table shows comparisons of means between the pairs of all of four continuous Importance variables (General Fitness; Social Aspects; Physical Enjoyment; Specific Medical Concerns)
  PairedDifferences t df Sig. (2-tailed)
Mean Std. Deviation Std. ErrorMean 95% Confidence Interval of the Difference
Lower Upper
Pair 1 fitness – social 1.629 1.321 .067 1.499 1.760 24.482 393 .000
Pair 2 fitness – enjoy .853 1.069 .054 .747 .959 15.829 393 .000
Pair 3 fitness – medical .700 1.210 .060 .581 .818 11.645 405 .000
Pair 4 social – enjoy -.788 1.110 .057 -.899 -.676 -13.937 385 .000
Pair 5 social – medical -.925 1.537 .080 -1.081 -.768 -11.608 371 .000
Pair 6 enjoy – medical -.128 1.488 .077 -.279 .023 -1.664 375 .097

From the table above, we can notice that all the differences between the pairs are statistically significant (p-values of the 2 tailed test are smaller than the 5% significance level). Except the difference between enjoy and medical (Pair 6) which is not significant (t=-1.664, df=375, p-value=0.097>0.05) and hence the means of these two variables is not statistically significant.

  1. Using the mean, the following table shows the importance in the order of most important to least important
Importance Variables Standard Deviation
1 Fitness 4.74
2 Medical 4.06
3 Enjoy 3.91
4 Social 3.15
  1. In this question, we want to examine whether the average number of monthly visits is significantly different from the national average of eight. To do so, we proceed to a One-sample T-test. The following table shows the results obtained via SPSS.
variable t df Sig. (2-tailed) MeanDifference 95% Confidence Interval of the Difference
Lower Upper
visits 16.022 399 .000 6.195 5.43 6.96

From the table above, we can securely confirm, at the 5% significance level, that the average number of monthly visits is statistically different from the national average of eight (t=10.022, df=399 and p-value<0.05)

The following table represents a cross-tabulation table of the variables Pool and Doctor

pool
No Yes
Count Count
doctor No 203 130
Yes 44 73
Chi-square 19.071
df 1
Sig 0.000
  1. The percentage of the total sample utilized the therapy pool

  1. The percentage of those who used the therapy pool were recommended by a doctor

  1. The percentage of those recommended by a doctor utilized the therapypool

  1. From the table 8 above, the chi-square statistic is of 19.071 with 1 degree of freedom and a null p-value meaning that, at the 5% significance level, there is a significant association between utilized the therapy pool and a doctor recommendation.
  1. The coefficient of correlation between the pool and the doctor recommendation is of 0.206 meaning that there is a moderate association between these two variables.
  1. The following table represent the comparison of the means between the number of Visits and whether people had utilized the exercise circuit.
  t-test for Equality of Means
t df Sig. (2-tailed) MeanDifference Std. ErrorDifference 95% Confidence Interval of the Difference
Lower Upper
visits Equal variances assumed -2.522 398 .012 -2.302 .913 -4.096 -.508
Equal variances not assumed -2.849 184.863 .005 -2.302 .808 -3.896 -.708

From the table above, we can confirm that there is a significant difference between the mean number of visits and whether people had utilized the exercise circuit. In fact, the t-tests are significant. Assuming equal variance (t=-2.522, df=398,p-value=0.012<0.05) and not assuming equal variances (t=-2.849, df=184.863,p-value=0.005<0.05)

  1. The following table shows the correlation of the importance variables
fitness social enjoy medical
fitness Pearson Correlation 4.74 .188** .340** .271**
Sig. (2-tailed) .000 .000 .000
N 446 394 394 406
social Pearson Correlation .188** 3.15 .565** .238**
Sig. (2-tailed) .000 .000 .000
N 394 394 386 372
enjoy Pearson Correlation .340** .565** 3.91 .188**
Sig. (2-tailed) .000 .000 .000
N 394 386 394 376
medical Pearson Correlation .271** .238** .188** 4.06
Sig. (2-tailed) .000 .000 .000
N 406 372 376 406
**. Correlation is significant at the 0.01 level (2-tailed).
  1. See table above
  2. The table above shows a significant and positive correlation between the four importance variables. Meaning that that if one variable increases in value, the second variable also increase in value. Similarly, as one variable decreases in value, the second variable also decreases in value.
  1. In this question, we will resume the results obtained from the statistical analysis of this Avery Fitness Center survey. Let us start with the personal characteristics of the respondents, the mean age of the 439 respondents is of 62.56 years with a standard deviation of 19.63. 4% of the respondents are female. The high percentage of people who joined the program are reported making between $45.001 and $60.000 per year. Also, we have analyzed the personal reason for participating in AFC programs and the results showed that people participated respectively for fitness, medical and enjoy reasons. Besides, there is a significant difference between the mean average number of monthly and the national average of eight. In fact, that there is a significant difference between the mean number of visits and whether people had utilized the exercise circuit. Furthermore, there is a significant association between utilized the therapy pool and a doctor recommendation.

This information would help the center to focalize their marketing strategy. They should target female population aged between 30 and 70 years and making between $45.001 and $60.000 per year. The center should also work on the fitness program by good monitoring etc. and work on the medical programs and enjoyment materials. The center should work with doctors for recommendations.