Git Product home page Git Product logo

courseproject's Introduction

#Readme file for the Data cleaning course project

Generating the tidy data set

  • The code for the file read and its cleaning and analysis is in run_analysis.R
  • The code assumes that the zip file of the raw data "getdata-projectfiles-UCI HAR Dataset.zip" exists in the same directory as the run_analysis.R
  • The R code then checks for the existence of the raw data zip file: if file exists, then proceed, otherwise stop
  • The R code then creates a subdirectory "mergeddata" for the target files. If the directory already exists, then it moves on to the next line of code.
  • The next step is to merge the train and test data into a single data set: mergeddata. This is done after matching the activity names with the dataset and also labelling the subject based on the subject file provided seprately.
  • From the merged dataset, select only the columns which have 'mean' or 'std' in their column names, as that is the requirement of this step
  • Finally, make the names of the merged dataset such that their readability is better. The steps followed here were: remove the bracket special characters and make the smallcase 'mean' to have uppercase first alphabet "M".
  • Generate and write the tidy_data.txt file in the "mergeddata" folder

Last part: Second independent tidy data set with average of each variable

  • Melt and dcast the merged dataset based on activity and subject. Summarise by Mean of the columns.
  • Write the second tidy file which has summary of means as "mergeddata.txt" in the mergeddata folder.

courseproject's People

Contributors

kunbatra avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.