Wrangle Emotiv Data

Wrangle Emotiv Data

Project on R: Wrangle Emotiv Data

You are given a data set in CSV format, which containsEEG data read fromEmotiv EPOC headsets. The 14 channels of Emotiv headset are shown in the following figure.

Note that the labels of the columns of the dataset are in the header line column E. Read the labels and figure out what they mean before you perform any activities. Note that the labels of the columns that contain contact quality data have Prefix “CQ_”. Contact quality values range from 0 to 4, with 0 indicating the lowest contact quality and 4 the highest.

The Problem to Solve:

Write an R program to perform the following tasks:

  1. Replace the first row of the data frame by the actual labels of the columns.
  2. Purge the data with low contact quality, i.e., data items corresponding to a contact quality that is less than 2 (viz., 0 or 1) should be marked as missing (replaced by NA). After that, columns that contain contact qualities are removed from the data set.
  3. Delete columns for which the number of NA’s is over 1/3 of the total number of the rows.
  4. Delete rows that contain at least one NA.
  5. Normalize each column by scaling the all the values to range [0, 1].
  6. Save the resultant data frame to a CSV file.

Note that the above steps must be performed in the given order, especially steps 3 and 4. 

Solution 

R-script-clean.R 

# sets the working directory

setwd(‘b:/R/’)

############    TASK 1 ############################################################

# 1.  Replace the first row of the data frame by the actual labels of the columns.#

###################################################################################

# defines the variable names

V <- c(“COUNTER”,       “INTERPOLATED”,   “AF3”,      “F7”,       “F3”,       “FC5”,       “T7”,       “P7”,       “O1”,       “O2”,       “P8”,       “T8”,       “FC6”,       “F4”,       “F8”,       “AF4”,      “RAW_CQ”,   “CQ_AF3”,   “CQ_F7”,    “CQ_F3”,       “CQ_FC5”,   “CQ_T7”,    “CQ_P7”,    “CQ_O1”,    “CQ_O2”,    “CQ_P8”,    “CQ_T8”,       “CQ_FC6”,   “CQ_F4”,    “CQ_F8”,    “CQ_AF4”,   “CQ_CMS”,   “CQ_DRL”,   “GYROX”,       “GYROY”,    “MARKER”)

# loads the .csv file, gives approapriate names to the variables, and skips first row

MyData<- read.csv(file=’Meditation.csv.CSV’, header=FALSE, col.names=V, dec = “.”, skip=1)

# converts all variables to numeric ones

as.data.frame(lapply(MyData, as.numeric))

############    TASK 2 ##############################################################

# 2.  Purge the data with low contact quality, i.e., data items corresponding to    #

# a contact quality that is less than 2 (viz., 0 or 1) should be marked as missing  #

# (replaced by NA). After that, columns that contain contact qualities are removed  #

# from the data set.                                                                #

#####################################################################################

# tells R to work with MyData in all subsequent commands

attach(MyData)

# initiates the list of variables that recorded contact quality

contact<-c(“CQ_AF3”,    “CQ_F7”,    “CQ_F3”,    “CQ_FC5”,   “CQ_T7”,    “CQ_P7”,    “CQ_O1”,       “CQ_O2”,    “CQ_P8”,    “CQ_T8”,    “CQ_FC6”,   “CQ_F4”,    “CQ_F8”,    “CQ_AF4”,       “CQ_CMS”,   “CQ_DRL”)

# recode variables, based on contact quality

MyData$AF3 [CQ_AF3<3]<- NA

MyData$F7 [CQ_F7<3]<-NA

MyData$F3 [CQ_F3<3]<-NA

MyData$FC5 [CQ_FC5<3]<-NA

MyData$T7 [CQ_T7<3]<-NA

MyData$P7 [CQ_P7<3]<-NA

MyData$O1 [CQ_O1<3]<-NA

MyData$O2 [CQ_O2<3]<-NA

MyData$P8 [CQ_P8<3]<-NA

MyData$T8 [CQ_T8<3]<-NA

MyData$FC6 [CQ_FC6<3]<-NA

MyData$F4 [CQ_F4<3]<-NA

MyData$F8 [CQ_F8<3]<-NA

MyData$AF4 [CQ_AF4<3]<-NA

# removes columns of contact quality

MyData[,contact]<-list(NULL)

############    TASK 3 #########################################################################

# 3.  Delete columns for which the number of NA�s is over 1/3 of the total number of the rows. #

################################################################################################

MyData<- MyData[, colMeans(is.na(MyData)) <= .33]

############    TASK 4 #########################################################################

# 4.  Delete rows that contain at least one NA.                                                #

################################################################################################

MyData<-MyData[complete.cases(MyData),]

############    TASK 5 #########################################################################

# 5.  Normalize each column by scaling the all the values to range [0, 1].                     #

################################################################################################

# we use the plyr package

library(dplyr)

# defines the normalization function

normalize<- function(x){

return((x-min(x))/(max(x)-min(x)))

}

# normalization of all variables

MyData<- MyData %>%mutate_all(funs(normalize))

############    TASK 6 #########################################################################

# 6.  Save the resultant data frame to a CSV file.                                             #

################################################################################################

write.csv(MyData, file=”MyData.csv”)