This repository host the submission for the course project of the Coursera course: Getting and Cleaning Data
The code of run_analysis.R perform the following 5 steps as given in the instruction.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- Creates a tidy data set with the average of each variable for each activity and each subject.
X_train
, y_train
, subject_train
, X_test
, y_test
, subject_test
contains the data loaded from downloaded txt files.
Then train_test
containg the training and testing data are generated using cbind() and rbind()
features
load the features corresponding to the columns of X_train
and X_test
files.
mean_and_std_positions
use grep to locate the feature names with either mean()
or std()
train_test_selected
contains the subset of train_test
that needs further analysis.
activity_labels
loads the activity labels and the corresponding activity names. It's merged with train_test_selected
by the activity lable to get all_data
.
all_data
has both the activity labels and the activity names.
Modify names(all_data)
to label the data set with descriptive variable names.
I create tidy_data
by using aggregate() function and then save it into tidy_data.txt file
README.md is this README file
CodeBook.md describes the procedures and variables
run_analysis.R is the R code to do the analysis and save a tidy data
UCI HAR Dataset is the folder contained the original data collected. The detailed explanation is here.
tidy_data.txt is the tidy data requred for submission. It's generated by the last step of the run_analysis.R code.
The community TA's posts in the forum are quite helpful for me to finish the project: