Statistical Analysis Homework: Histograms, QQ Plot & Box Plot in R

Problem Description:

This statistical analysis homework covers various statistical questions and tasks. We will provide solutions to these questions and briefly describe the approach taken for each.

Solution

Question 1

A study with a 70% power rate suggests that only 70 out of 100 statistical tests will actually find any true effects if there are any to be found in the 100 statistical tests. The projected Type II error threshold is 30% with a power of 70%. (1-0.7). The power of a study can be increased by increasing either the effect size, sample size or the significance level.

Question 2

a. Descriptive Statistics

library(haven)

data=read_sav('C:/Users/belak/Desktop/R/baguley_payne_2000.sav')

head(data)

summary(data)

group	match	trials	percent_accuracy
Min. :0.0	Min. : 1.00	Min. : 8	Min. : 12.50
1st Qu.:0.0	1st Qu.: 6.75	1st Qu.: 8	1st Qu.: 75.00
Median :0.5	Median :12.00	Median :16	Median : 87.50
Mean :0.5	Mean :13.43	Mean :16	Mean : 80.65
3rd Qu.:1.0	3rd Qu.:21.00	3rd Qu.:24	3rd Qu.: 91.67
Max. :1.0	Max. :24.00	Max. :24	Max. :100.00

b. The descriptive statistics is not appropriate for the “group” variable because it a categorical variable.

Question 4

a. Code and Plot


data=read_sav('C:/Users/belak/Desktop/R/Hayden_2005.sav')
head(data)
hist(data$days, xlab = "Days", main = "Histogram of Days")
hist(data$days, breaks=30, xlab = "Days"main="Histogram of Days With breaks=30")

Histogram of Days With Breaks

b. The histogram revealed that the days is normally distributed. It skewed to the left and hence the mean is less than the median.

Question 5

a. Code and Plot


data=read_sav('C:/Users/belak/Desktop/R/baguley_payne_2000.sav')
boxplot(data$percent_accuracy,ylab="Percent Accuracy",main="Boxplot of Percent Accuracy")

b. The box plot indicated that the percent accuracy is not normally distributed and negatively skewed. Hence the percent accuracy mean is less than the median.

c. Yes, there are outliers 4 outliers in the percent accuracy.

Question 6

a. Code and Plot


eclsk = read.csv("eclsk.csv")
head(eclsk)
lin.model <- lm (gen~math,eclsk)
plot(lin.model,2)

b. The Q-Q plot shows that the model residuals are normally distributed and hence, the normality of residual assumption was not violated.

Question 7

a. Code and Plot


eclsk = read.csv("eclsk.csv")
head(eclsk)
plot(eclsk$math,eclsk$gen, main='Regression for Math and Gen', xlab='Math',ylab='Gen')
abline(lm(gen~math,data=eclsk),col='red')

$Regrression for Math and Gen$

b. The scatter plot shows that there is linear relationship between the math and gen. Also, the assumption of homogeneity of variance is not violated.

Drawing Histograms, QQ Plot, and Box Plot in R for a Variety of Statistical Problems

Problem Description:

Solution