Git Product home page Git Product logo

cleaningdataproj's Introduction

Getting and Cleaning Data Course Project

Objective of this Project

The Project's objective was to take raw data collected from accelerometers from the Samsung Galaxy S smartphone by subjects and prepare a clean tidy data set

Contents

This project contains the following:

  • README.md : General description of the project
  • CodeBook.md: Describes the variables and measurements in the tidy data set
  • run_analysis.R: Cleaning script for preparing the tidy dataset from the raw data

Source of Raw Data

The initial data for the cleaning exercise was obtained from: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones and https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Generated tidy data set using the run_analysis.R script

You can find the tidy data set uploaded here: https://s3.amazonaws.com/coursera-uploads/user-5148c4bb98a5b56203474727/973500/asst-3/53114f10ebec11e4a9166b9aa399b77f.txt

Descriptions of the observations and the measurements in the tidy data set have been described in CodeBook.md

What has been considered to be a "tidy dataset"

The cleaning script and resultant dataset (linked above) follows the following tidy data set principles:

  • Each variable measured is in ONE column
  • Each observation for that variable is in a different row
  • There is only ONE table for each 'kind' of variable

Description of the cleaning script: run_analysis.R

The script performs the following major functions:

  • Take data in the train and test sets (for subjects, activity and features) and merge them into a master data set
  • Provide the appropriate labels that are descriptive for the variable names. Here the names provided in features.txt of the Raw data was used
  • Create a smaller dataset that only has the mean() and std() feature measurements for the raw data
  • After creating the smaller dataset, replace the factor style values for the Activity variable with more descriptive alternatives
    • Again, activity names provided in activity_labels.txt of the Raw Data was used
  • Group the multiple observations (and feature measurements) for each subject and each activity, substituting it with the average over all those measurements
  • Finally, melt the "wide"-style tidy data set into a "narrow-tall" style dataset by combining all the different feature measurements (now averaged over multiple subject/ activity observations) into a single "feature" column and its corresponding measurement value in "featureValue" column
  • write the final tidy data out to "tidydataset.txt"

cleaningdataproj's People

Contributors

yaksha13 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.