Git Product home page Git Product logo

getcleandate-courseproject's Introduction

the folders is uploaded for the course project assignment of "Getting and Cleaning Data" course on coursera.com

the run_analysis.R file contains 3 functions: run_analysis, mergedata and casting the mean_output.txt is the output of the run_analysis function the code book.md indicates the variables in the mean_output.txt

run_analysis function

the main function is the run_analysis functino as demanded in the project instruction. overall, the function including 4 parts.

first part: load packages

In this assignment, "plyr" and "reshape2" packages will be used

second part: merge train and test datasets

The idea is to build another function called "mergedata" to merge train and test datasets as there are 3 datasets needed to be merged. Thus the datasets are firstly connected by file() function (6 files in total), and passed to mergedata() function. Close() function is used to close the connections in the end of this part.

third part: deal with the variables' name and activity labels

The variables' name in "merge.x" and the activity labels in "merge.y" are described as text files in the original datasets (features.txt and activity_labels.txt). So the idea is to read them into data frames in the first place for further usage. Two data frames are created to store the variables' name (feature.df) and activity labels (actlabel.df).

fourth part: prepare the data which is needed in this project

In this part, the cols' name of the merged datasets are assigned for further demand: merge.x - col.names which is subsetted from feature.df merge.y - "activity" merged.subject - subjects

A new data frame called sub.df is subsetted from merge.x where the col names contains "mean" or "std" - as instructed in the assignment - through grep() function. The activity code are replaced by descriptive labels according to actlabel.df through a for loop. By the end of this part a subsetted data frame is prepared for next step.

fifth part: produce the required tidy data

The sub.df is splitted by subject id through split() function and passed to the casting() function to calculate the mean of each variable. The mean_output.txt is created through write.table() function to produce the data which is uploaded to coursera.com.

mergedata function

This function is designed to save memories and redundant coding since there are three datasets needed to be merged. The argguments include two file connection and the wanted class of the returned data frame which takes "factor" as default.

casting function

This function is designed to melt and cast the splited data to calculate the means of each variable by activity for each subject.

getcleandate-courseproject's People

Contributors

monaludao avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.