Git Product home page Git Product logo

gettingandcleaningdata-project's Introduction

GettingAndCleaningData-Project

Initial data for research

The script is invented to analyze the data from UCI HAR Dataset. It's supposed that archive is extracted to the working directory.

The following files from the initial dataset is used:

  1. features.txt - includes the descriptions for features measured
  2. train/X_train.txt - includes the measurements of the features in train set (one row - 1 measurement of 561 features)
  3. test/X_test.txt - includes the measurements of the features in test set
  4. train/subject_train.txt - subject for each measurement from the train set
  5. test/subject_test.txt - subject for each measurement from the test set
  6. train/y_train.txt - activity (from 1 to 6) for each measurement from the train set
  7. test/y_test.txt - activity (from 1 to 6) for each measurement from the test set

How script works

Script involves the following stages:

  1. Downloads to R ids and descriptions for features being measured in experiment from file features.txt.

  2. Independently loads complete data for train and test sets. Let's revoke these loading process considering train set: a. Firstly loads the measurements from X_train.txt as a data frame b. For these data frame column names are updated to be more user friendly using features description loaded on the previous stage. (STEP 4: Appropriately label the data set with descriptive variable names of Course Project c. activity labels and subjects for measurements are also loaded from files train/y_train.txt and train/subject_train.txt and added to data frame as a separated columns.

Similar steps are made for test dataset and finally 2 rows of 2 data frames are merged together to form are data frame with complete data (STEP 1: Merge the training and the test sets to create one data set of assignment)

  1. To extract measurements that involves only mean and standard deviation values script uses grep, that finds column names that includes "mean()" or "std()" (also columns activity and subject are added to filtered data frame, since they are important dimensions). After that all new data frame with only necessary columns is created. (STEP 2: Extract only the measurements on the mean and standard deviation for each measurement of assignment)

  2. To provide descriptive values for activity labels a new variable "activitylabel" is added to dataset, that is a factor variable with levels mentioned in file activity_labels.txt (STEP 3: Use descriptive activity names to name the activities in the data set of assignment)

  3. Creates a melted data frame using activity label and subject as ids, after that mean values for all variables are calculated grouped by activity and subject using dcast() function and tidy data frame is created. (STEP 5: Create a second, independent tidy data set with the average of each variable for each activity and each subject)

gettingandcleaningdata-project's People

Contributors

xaraq avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.