Git Product home page Git Product logo

ramos-iyer / a-statistical-analysis-of-synthetic-population-data-using-multiple-linear-regression-and-binary-logi Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.02 MB

The primary objective of this research is to focus on the use of Multiple Linear Regression and Binary Logistic Regression. Using data from the Pew research center website, we will analyze the use of these regression techniques for predicting continuous as well as dichotomous variables and discuss on the statistical findings thereof. The dataset being used for the research is from a “2016 Online Opt-In Comparison Study” . It contains two data files, from which we will be studying the synthetic population data set. The underlying research performed has been as per the below analysis – a. Considering personal factors like Age, Gender, Ethnicity, Education, Marital Status, Children, US Citizenship, Income class and Worker class to analyze the hours of work put in by a person every week. b. Considering political, interpersonal, geographical and cultural factors like Military Service, Ownership of home, Area, Tenure (more than 1 year or not), trust in neighbor, supporting political party, religion and political ideology to analyze the ownership of a gun in the house.

R 100.00%

a-statistical-analysis-of-synthetic-population-data-using-multiple-linear-regression-and-binary-logi's Introduction

A-STATISTICAL-ANALYSIS-OF-SYNTHETIC-POPULATION-DATA-USING-MULTIPLE-LINEAR-REGRESSION-AND-BINARY-LOGI

Masters in Data Analytics Project

Project: A STATISTICAL ANALYSIS OF SYNTHETIC POPULATION DATA USING MULTIPLE LINEAR REGRESSION AND BINARY LOGISTIC REGRESSION

Table of Contents


Overview

The primary objective of this project is to focus on the use of Multiple Linear Regression and Binary Logistic Regression. Using data from the Pew research center website, we will analyze the use of these regression techniques for predicting continuous as well as dichotomous variables and discuss on the statistical findings thereof. The dataset being used for the research is from a “2016 Online Opt-In Comparison Study” . It contains two data files, from which we will be studying the synthetic population data set. The underlying research performed has been as per the below analysis –

a. Considering personal factors like Age, Gender, Ethnicity, Education, Marital Status, Children, US Citizenship, Income class and Worker class to analyze the hours of work put in by a person every week.

b. Considering political, interpersonal, geographical and cultural factors like Military Service, Ownership of home, Area, Tenure (more than 1 year or not), trust in neighbor, supporting political party, religion and political ideology to analyze the ownership of a gun in the house.

Components

There are three components to this project:

Data Preparation and Transformations

File 'Statistics.r' :

  • Loads the synthetic population dataset.
  • Performs the necessary transformations on the data to gain knowledge.
  • Removes the outliers from the data.
  • Divides the data into two different tables with different coumns based on the group of factors.
  • Exports the two datasets into 'SPSS' format for model application.

Multiple Linear Regression

File 'WorkHrsMulLinReg.spv' :

  • Loads the WORK HRS MLR.sav dataset.
  • Checks for all the assumptions of Multiple Linear Regression.
  • Evaluates the model performance.

Binary Logistic Regression

File 'OwnGunBinLogReg.spv' :

  • Loads the OWN GUN BLR.sav dataset.
  • Checks for all the assumptions of Binary Logistic Regression.
  • Evaluates the model performance.

Running the Code

The code in 'Statistics.r' needs to be opened on R Studio and can be run as a whole or run line by line. The code contains comments which provides details on what each chunk of code performs on the data.

The 'WorkHrsMulLinReg.spv' and 'OwnGunBinLogReg.spv' are SPSS files and needs to be opened on SPSS in order to understand what are the results and how the analysis is performed.

Screenshots

Screenshot1 Screenshot2 Screenshot3 Screenshot4 Screenshot5 Screenshot6

System Configuration Steps

In order to run the code, below are the necessary requirements:

  • R and R Studio: As the code in 'Statistics.r' is developed in R, you need to install R as well as R Studio in order to open and execute the files.
  • Packages: Below is a list of packages that need to be installed before execution of the code.

haven, caret, foreign

  • SPSS: As the implementation of Multiple Linear Regression and Binary Logistic Regression in the files 'WorkHrsMulLinReg.spv' and 'OwnGunBinLogReg.spv' have been performed using SPSS, it has to be installed. SPSS is a GUI based Tool that can be used for performing statistical analysis.

File Descriptions

There are 2 main folders and 3 files in the root directory that are necessary for the project:

  1. DataSet:
  • synthetic_population_dataset.sav: This file contains the entire synthetic population dataset in SPSS format.
  1. Statistics.r: This file contains the code for Data Preparation and Transformations.

  2. Cleaned Dataset:

  • OWN GUN BLR.sav: This file contains the dataset that is specific for application for Binary Logistic Regression.
  • WORK HRS MLR.sav: This file contains teh dataset that is specific for application of Multiple Linear Regression.
  1. OwnGunBinLogReg.spv: This file contains the implementation of Binary Logistic Regression in SPSS format.

  2. WorkHrsMulLinReg.spv: This file contains the implementation of Multiple Linear Regression in SPSS format.

Credits and Acknowledgements

  • Pew Research for providing the dataset used for this project.
  • NCI for a challenging project as part of their full-time masters in data analytics course subject 'Statistics for Data Analytics'

a-statistical-analysis-of-synthetic-population-data-using-multiple-linear-regression-and-binary-logi's People

Contributors

ramos-iyer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.