Light

kunbatra / courseproject Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 120.65 MB

Getting and cleaning data - course project repository

R 100.00%

courseproject's Introduction

#Readme file for the Data cleaning course project

Generating the tidy data set

The code for the file read and its cleaning and analysis is in run_analysis.R
The code assumes that the zip file of the raw data "getdata-projectfiles-UCI HAR Dataset.zip" exists in the same directory as the run_analysis.R
The R code then checks for the existence of the raw data zip file: if file exists, then proceed, otherwise stop
The R code then creates a subdirectory "mergeddata" for the target files. If the directory already exists, then it moves on to the next line of code.
The next step is to merge the train and test data into a single data set: mergeddata. This is done after matching the activity names with the dataset and also labelling the subject based on the subject file provided seprately.
From the merged dataset, select only the columns which have 'mean' or 'std' in their column names, as that is the requirement of this step
Finally, make the names of the merged dataset such that their readability is better. The steps followed here were: remove the bracket special characters and make the smallcase 'mean' to have uppercase first alphabet "M".
Generate and write the tidy_data.txt file in the "mergeddata" folder

Last part: Second independent tidy data set with average of each variable

Melt and dcast the merged dataset based on activity and subject. Summarise by Mean of the columns.
Write the second tidy file which has summary of means as "mergeddata.txt" in the mergeddata folder.

courseproject's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.