- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- The submitted data set is tidy.
- The Github repo contains the required scripts.
- GitHub contains a code book that modifies and updates the available codebooks with the data to indicate all the variables and summaries calculated, along with units, and any other relevant information.
- The README that explains the analysis files is clear and understandable.
- The work submitted for this project is the work of the student who submitted it.
##Read in the test data, subject, and activitiy ID, then combine
testData<<-fread("UCI HAR Dataset/test/subject_test.txt",sep= " ", header=FALSE, col.names = "subjectID") %>%
cbind(fread("UCI HAR Dataset/test/y_test.txt",sep= " ", header=FALSE, col.names = "activityID")) %>%
cbind(fread("UCI HAR Dataset/test/X_test.txt",sep= " ", header=FALSE, col.names=read.csv("UCI HAR Dataset/features.txt", sep=" ", header=FALSE, stringsAsFactors = FALSE)[,2]))
##Read in the train data and combine with the test data to form one full data set
fullData<<-fread("UCI HAR Dataset/train/subject_train.txt",sep= " ", header=FALSE, col.names = "subjectID") %>%
cbind(fread("UCI HAR Dataset/train/y_train.txt",sep= " ", header=FALSE, col.names = "activityID")) %>%
cbind(fread("UCI HAR Dataset/train/X_train.txt",sep= " ", header=FALSE, col.names=read.csv("UCI HAR Dataset/features.txt", sep=" ", header=FALSE, stringsAsFactors = FALSE)[,2])) %>%
rbind(testData)
## first create a vector with the columns names i need to keep
myCols<-grep("mean|std",names(fullData),value = TRUE) %>%
append(c("subjectID","activityID"),.)
## then create the new table
meanStd<<-fullData[,..myCols]
myLabels<-fread("UCI HAR Dataset/activity_labels.txt", sep=" ",header=FALSE, col.names=c("activityID","activityLabel"))
meanStd<<-merge(myLabels, meanStd, by="activityID", all = TRUE)
dataAvg<<-group_by(meanStd,activityLabel,subjectID) %>%
summarise_each(list(mean))
write.table(myData, file="tidyData.txt", row.name=FALSE)
return(dataAvg)
I chose to make my tidy data set wide, with a seperate column for each average value. Per Hadley Wickham' paper "Tidy Data", tidy data sets can be wide or narrow