N Median Mode Range

N Median Mode Range

SPSS 1 Assignment Instructions

This assignment is designed to help teach you to describe a single variable – its central location, its dispersion, to create an appropriate graphic to illustrate the variable, and to discuss the way in which you variables distribute.

Use the following format:

  1. A title page with “SPSS 1: Describing a single variable” as the title and your name, section, TA’s name, professor’s name, date, G# in the upper, right hand corner.
  2. Start a new page for each variable.
  • The variable name in bold and underlined at the top of the page.
  1. Your answers should have the following sections:
    • a)  A properly formatted frequency table for the variable you are describing.
    • b)  A table of appropriate summary statistics.
    • c)  An appropriate graphic.
    • d)  A paragraph describing your variable. 
 

For each of the variables:

  1. Identify the level of measurement for each variable.
  2. Build a table that shows the cumulative % and frequencies.
    1. The table must be in APA format
    2. Some variables will have to be recoded to effectively display in a table. RULE 
OF THUMB – no more than 10 categories should appear in ANY table.
  • Report the summary statistics that describes the variable in terms of all the appropriate measures of central location and dispersion.
  1. Create 1 appropriate graphic to display the distribution.
  2. Write a paragraph that describes the distribution in terms of central location, dispersion, outliers, and skew (if any). 
 

These are your variables:

  1. From the WORLD 2012dataset use the variable named “polity” with the label “Higher scores more democratic (Polity)”.
  2. From the NES 2012 dataset use the variable named “dem_marital” with the label “Marital Status”.
  3. From the NES 2012 dataset use the variable named “relig_attend” with the label “Attendance: Religious Services”.
  4. From the GSS 2012 dataset use the variable named “wordsum” with the label “Number of words correct in vocabulary test”.
  5. From the GSS 2012 dataset use the variable named “educ_4” with the label “Education in 4 Categories”. 
 

***Sample Problem*** Variable: age5

  1. Level of measurement for this variable is ordinal.
  2. Cumulative % frequency table:
Age in 5 Categoriesa
XF % Cum %
18-30 437 21.7 21.7
31-40 384 19.1 40.8
41-50 403 20.0 60.8
51-60 369 18.3 79.1
61+ 421 20.9 100.0
Total 2015 100.0
a. General Social Survey 2008

III. Table of summary statistics:

Summary Statistics
N Median Mode Range Minimum Maximum Q1
Q3
IQR V-ratio2015 41-50 18-30(61+) – (18-30) 18-30 61+ 31-40 51-60 (51-60) – (31-40) 0.783

 Bar chart:

  1. Descriptive paragraph requirements:
    1. What does your variable measure?
    2. Describe this variable in terms of all appropriate measurements of central
    3. Describe this variable in terms of the appropriate measurements of dispersion.
    4. If appropriate, is this distribution skewed negative or positive?
    5. Discuss any other interesting and relevant details about this distribution. 
 

SPSS1 Frequently Asked Questions and Point Breakdown

5 questions:

  • Level of measurement:
    1. You must have the correct level of measurement
    2. You should not put interval/ratio. You must clearly identify if the data is interval or ratio for full
    3. You need to report the level of measurement on the original data, not the recoded data.
  • Cumulative Frequency Table:
    1. You must have a title, a source and appropriate columns.
    2. Your table must be formatted according to APA guidelines. Week 6 under course content on our Blackboard Course website has both the template and an instructional video if you would like to refresh your memory from lab.
    3. If you have more than 10 rows you should recode the data. Make sure that the valid total of your recoded cumulative frequency table matches the valid total of the cumulative frequency table on the original data.
  • Descriptive Statistics:
    1. Do not directly copy and paste from SPSS. You must format the tables according to the APA Guidelines.
    2. Measures of central tendency must be correct for the level of measurement of the data.
    3. Measures of dispersion must be appropriate for the level of measurement of the data.
    4. Always report the category labels for categorical data.
    5. You will need to perform some calculations. Remember, SPSS does not give you the IQR or the 
V ratio. You will need to calculate these out correctly for full points.
    6. Remember, if you have ordinal data the numbers associated with the data are just the coding 
 Therefore, you need to simply the IQR and the Range as much as possible. (For example, if you were using a Likert Scale, you could end up with Hate-Love for the range and Somewhat dislike-Somewhat love for the IQR). Remember for categorical data you must work with the category labels.
    7. You must run statistics on the original data, not the recoded data. As such, your descriptive statistics should be appropriate to the level of measurement of your original data.
  • Graphics:
    1. These can be copied and pasted from SPSS. You need to make sure that your graphs have titles and sources. You may use Excel but we strongly encourage you to use SPSS
    2. Your graph must be appropriate to the level of measurement of your data.
    3. (nominal = pie chart; ordinal = bar graph; interval/ratio = histogram)
  1. If using recoded variable, should use the level of measurement AFTER the recoding (interval recoded to ordinal should use bar graph)
  • However, if you use a histogram with the interval/ratio original data, you will not be penalized
  1. Have value labels. Remember to use the crosshairs in the Graph Editor on SPSS. d. Bar chart should use % of cases (“Percent”) not count.
  2. Paragraph:
  3. You need to describe both central tendency and dispersion. Do not just laundry list all of the statistics. Focus on the most salient univariate statistics for your level of measurements (i.e. if you have ratio data, you should discuss the mean for central tendency and the standard deviation and the variance for measures of dispersion). You need to interpret theses as well. What do they tell you about the variable being measured? (For example, if you have a variable on age and the average age was 25, what could you infer about the population?). Report interesting trends from the statistics.
  • Discuss the shape of the distribution curve (positive or negative skew) if it is appropriate for the level of measurement.
  1. Make sure that you report outliers if it is appropriate to the level of measurement for your data. 

Solution

SPSS1: Describing a Single variable

Variable named “polity” with the label “Higher scores more democratic (Polity)” from WORLD 2012 dataset

  1. Level of measurement

Level of measurement for this variable is ratio as the scores are representation of being democratic, so absolute zero may not be defined

  1. Cumulative Frequency Table
Higher scores more democratic (Polity)
score % Cum
-10 1.4
-9 3.5
-8 4.9
-7 11.8
-6 12.5
-5 13.2
-4 16.0
-3 18.1
-2 23.6
-1 25.0
0 25.7
1 27.1
2 29.2
3 31.9
4 34.7
5 38.9
6 47.2
7 54.9
8 66.7
9 77.8
10 100
  1. Descriptive Statistics
Summary Statistics
N     Median  Mode  Range  Minimum  Maximum  Q1 
  Q3
 IQR          V-ratio167   7            10        20         -10             10             -0.75    9      (Min-Q1 9.25, Q1-Q2 7.75, Q2-Q3 2, Q3- max11) , (1-32/167 = 0.8083)
  1. Graphics

5.Paragraph

Since the data is ratio data, hence the mean of the of the data is 4.36 while its median is 7.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.972. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 6.104. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 4.36 +2×6.104 and 4.36 – 2×6.104 i.e. between 16.568 and -7.848. However , in this case the results are different as distribution is left skewed.

Variable named “dem_marital” with the label “Marital Status” from the NES 2012 dataset

  1. Level of measurement

Level of measurement for this variable is nominal

  1. Cumulative Frequency Table
PRE: Marital status
Category % Cum
Married: spouse present 51.5
Married: spouse absent 57.0
Widowed 67.6
Divorced 73.5
Separated 91.1
Never married 100.0
  1. Descriptive Statistics
Summary Statistics
N     Median  Mode  Range  Minimum  Maximum  Q1 
  Q3
 IQR          V-ratio5905   1           1        5          1                6                   1         5      (Min-Q1 0, Q1-Q2 0, Q2-Q3 4, Q3- max 1) , (1-3043/5905 = 0.48467)
  1. Graphics

5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.59 while its median is 1 which means median is less than mean. The mode of the data is 1. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “relig_attend” with the label “Attendance: Religious Services” from the NES 2012 dataset

  1. Level of measurement

Level of measurement for this variable is ordinal

  1. Cumulative Frequency Table
PRE: Marital status
Category % Cum
Never 42.9
Few/Yr 57.9
1-2/Mnth 67.5
Alm/Evwk 78.6
Ev Week 100.0
  1. Descriptive Statistics
Summary Statistics
N     Median  Mode  Range  Minimum  Maximum  Q1 
  Q3
 IQR          V-ratio5884   1           0        4         0                4                  0         3     (Min-Q1 0, Q1-Q2 1, Q2-Q3 2, Q3- max 1) , (1-2526/5884 = 0.5707)
  1. Graphics

5. Paragraph

The data is ordinal in nature, therefore, bar chart has been plotted. The mean of the data is 1.53 while its median is 1 which means median is less than mean. The mode of the data is 0. Since data is ordinal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “wordsum” with the label “Number of words correct in vocabulary test” from the GSS 2012 dataset

  1. Level of measurement

Level of measurement for this variable is ratio scale

  1. Cumulative Frequency Table
Number Words Correct In Vocabulary Test
score % Cum
0 .7
1 2.1
2 5.5
3 10.8
4 21.6
5 39.3
6 63.5
7 78.7
8 90.5
9 96.4
10 100.0
  1. Descriptive Statistics
Summary Statistics
N     Median  Mode  Range  Minimum  Maximum  Q1 
  Q3
 IQR          V-ratio1975   6          6         10         0                10               5         7     (Min-Q1 5, Q1-Q2 1, Q2-Q3 1, Q3- max 3) , (1- 310/1975 = 0.8430)
  1. Graphics

5. Paragraph

Since the data is ratio data, hence the mean of the of the data is 5.91 while its median is 6.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.234. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 1.988. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 5.91 +2×1.988 and 5.91 – 2×1.988 i.e. between 9.886 and 1.934. However , in this case the results are different as distribution is left skewed.

Variable named “educ_4” with the label “Education in 4 Categories” from the GSS 2012 dataset

  1. Level of measurement

Level of measurement for this variable is nominal

  1. Cumulative Frequency Table
Education: 4 Cats
Category % Cum
<HS 16.2
HS 42.9
Some Coll 69.9
Coll+ 100.0
  1. Descriptive Statistics
Summary Statistics
N     Median  Mode  Range  Minimum  Maximum  Q1 
  Q3
 IQR          V-ratio1975   3         4          3          1                4                 2           4    (Min-Q1 1, Q1-Q2 1, Q2-Q3 1, Q3- max 1) , (1- 593/1975 = 0.6997)
  1. Graphics

5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.71 while its median is 3 which means median is more than mean. The mode of the data is 4. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=wordsum

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/HISTOGRAM

/ORDER=ANALYSIS. 

Frequencies

Notes
Output Created 25-Oct-2017 01:21:40
Comments
Input Data C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav
Active Dataset DataSet1
Filter <none>
Weight Weight Variable
Split File <none>
N of Rows in Working Data File 1974
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data.
Syntax FREQUENCIES VARIABLES=wordsum

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/HISTOGRAM

/ORDER=ANALYSIS.

 

Resources Processor Time 00:00:00.280
Elapsed Time 00:00:00.310

[DataSet1] C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav

Statistics
Number Words Correct In Vocabulary Test
N Valid 1283
Missing 692
Mean 5.91
Std. Error of Mean .056
Median 6.00
Mode 6
Std. Deviation 1.988
Variance 3.954
Skewness -.234
Std. Error of Skewness .068
Kurtosis .067
Std. Error of Kurtosis .137
Range 10
Minimum 0
Maximum 10
Percentiles 25 5.00
50 6.00
75 7.00
Number Words Correct In Vocabulary Test
Frequency Percent Valid Percent Cumulative Percent
Valid 0 9 .5 .7 .7
1 17 .9 1.4 2.1
2 43 2.2 3.4 5.5
3 69 3.5 5.4 10.8
4 138 7.0 10.8 21.6
5 227 11.5 17.7 39.3
6 310 15.7 24.2 63.5
7 195 9.9 15.2 78.7
8 151 7.7 11.8 90.5
9 76 3.9 5.9 96.4
10 46 2.3 3.6 100.0
Total 1283 64.9 100.0
Missing IAP 662 33.5
DID NOT TRY 30 1.5
Total 692 35.1
Total 1975 100.0

FREQUENCIES VARIABLES=dem_marital

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS. 

Frequencies

Notes
Output Created 25-Oct-2017 00:14:58
Comments
Input Data C:\Users\Akki\Desktop\fwdfiles\NES2012.sav
Active Dataset DataSet1
Filter <none>
Weight Weight variable
Split File <none>
N of Rows in Working Data File 5916
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data.
Syntax FREQUENCIES VARIABLES=dem_marital

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.

 

Resources Processor Time 00:00:00.358
Elapsed Time 00:00:00.621

[DataSet1] C:\Users\Akki\Desktop\fwdfiles\NES2012.sav

Statistics
PRE: Marital status
N Valid 5905
Missing 11
Mean 2.59
Std. Error of Mean .024
Median 1.00
Mode 1
Std. Deviation 1.874
Variance 3.514
Skewness .614
Std. Error of Skewness .032
Kurtosis -1.263
Std. Error of Kurtosis .064
Range 5
Minimum 1
Maximum 6
Percentiles 25 1.00
50 1.00
75 5.00
PRE: Marital status
Frequency Percent Valid Percent Cumulative Percent
Valid 1. Married: spouse present 3043 51.4 51.5 51.5
2. Married: spouse absent {VOL} 320 5.4 5.4 57.0
3. Widowed 629 10.6 10.6 67.6
4. Divorced 347 5.9 5.9 73.5
5. Separated 1042 17.6 17.7 91.1
6. Never married 524 8.9 8.9 100.0
Total 5905 99.8 100.0
Missing System 11 .2
Total 5916 100.0

FREQUENCIES VARIABLES=relig_attend

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/BARCHART FREQ

/ORDER=ANALYSIS. 

Frequencies

Notes
Output Created 25-Oct-2017 00:54:18
Comments
Input Data C:\Users\Akki\Desktop\fwdfiles\NES2012.sav
Active Dataset DataSet1
Filter <none>
Weight Weight variable
Split File <none>
N of Rows in Working Data File 5916
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data.
Syntax FREQUENCIES VARIABLES=relig_attend

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/BARCHART FREQ

/ORDER=ANALYSIS.

 

Resources Processor Time 00:00:00.296
Elapsed Time 00:00:00.270

[DataSet1] C:\Users\Akki\Desktop\fwdfiles\NES2012.sav

Statistics
Attendance: Religious Services
N Valid 5884
Missing 32
Mean 1.53
Std. Error of Mean .021
Median 1.00
Mode 0
Std. Deviation 1.616
Variance 2.613
Skewness .478
Std. Error of Skewness .032
Kurtosis -1.413
Std. Error of Kurtosis .064
Range 4
Minimum 0
Maximum 4
Percentiles 25 .00
50 1.00
75 3.00
Attendance: Religious Services
Frequency Percent Valid Percent Cumulative Percent
Valid Never 2526 42.7 42.9 42.9
Few/Yr 879 14.9 14.9 57.9
1-2/Mnth 566 9.6 9.6 67.5
Alm/Evwk 657 11.1 11.2 78.6
EvWeek 1256 21.2 21.4 100.0
Total 5884 99.5 100.0
Missing System 32 .5
Total 5916 100.0

GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=educ_4

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS. 

Frequencies

Notes
Output Created 25-Oct-2017 01:29:31
Comments
Input Data C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav
Active Dataset DataSet1
Filter <none>
Weight Weight Variable
Split File <none>
N of Rows in Working Data File 1974
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data.
Syntax FREQUENCIES VARIABLES=educ_4

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.

 

Resources Processor Time 00:00:00.421
Elapsed Time 00:00:00.630

[DataSet1] C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav

Statistics
Education: 4 Cats
N Valid 1974
Missing 1
Mean 2.71
Std. Error of Mean .024
Median 3.00
Mode 4
Std. Deviation 1.064
Variance 1.132
Skewness -.209
Std. Error of Skewness .055
Kurtosis -1.214
Std. Error of Kurtosis .110
Range 3
Minimum 1
Maximum 4
Sum 5347
Percentiles 25 2.00
50 3.00
75 4.00
Education: 4 Cats
Frequency Percent Valid Percent Cumulative Percent
Valid <HS 320 16.2 16.2 16.2
HS 528 26.7 26.8 42.9
Some Coll 533 27.0 27.0 69.9
Coll+ 593 30.0 30.1 100.0
Total 1974 99.9 100.0
Missing System 1 .1
Total 1975 100.0