Git Product home page Git Product logo

dsc-in-depth-ab-testing-lab-houston-ds-042219's Introduction

In Depth A/B Testing - Lab

Introduction

In this lab, you'll explore a survey from Kaggle regarding budding data scientists. With this, you'll form some initial hypotheses, and test them using the tools you've acquired to date.

Objectives

You will be able to:

  • Conduct t-tests and an ANOVA on a real-world dataset and interpret the results

Load the Dataset and Perform a Brief Exploration

The data is stored in a file called multipleChoiceResponses_cleaned.csv. Feel free to check out the original dataset referenced at the bottom of this lab, although this cleaned version will undoubtedly be easier to work with. Additionally, meta-data regarding the questions is stored in a file name schema.csv. Load in the data itself as a Pandas DataFrame, and take a moment to briefly get acquainted with it.

Note: If you can't get the file to load properly, try changing the encoding format as in encoding='latin1'

#Your code here

Wages and Education

You've been asked to determine whether education is impactful to salary. Develop a hypothesis test to compare the salaries of those with Master's degrees to those with Bachelor's degrees. Are the two statistically different according to your results?

Note: The relevant features are stored in the 'FormalEducation' and 'AdjustedCompensation' features.

You may import the functions stored in the flatiron_stats.py file to help perform your hypothesis tests. It contains the stats functions that you previously coded: welch_t(a,b), welch_df(a, b), and p_value(a, b, two_sided=False).

Note that scipy.stats.ttest_ind(a, b, equal_var=False) performs a two-sided Welch's t-test and that p-values derived from two-sided tests are two times the p-values derived from one-sided tests. See the documentation for more information.

#Your code here

Wages and Education II

Now perform a similar statistical test comparing the AdjustedCompensation of those with Bachelor's degrees and those with Doctorates. If you haven't already, be sure to explore the distribution of the AdjustedCompensation feature for any anomalies.

#Your code here
Median Values: 
s1:74131.92 
s2:38399.4
Sample sizes: 
s1: 967 
s2: 1107
Welch's t-test p-value: 0.1568238199472023


Repeated Test with Ouliers Removed:
Sample sizes: 
s1: 964 
s2: 1103
Welch's t-test p-value with outliers removed: 0.0

Wages and Education III

Remember the multiple comparisons problem; rather than continuing on like this, perform an ANOVA test between the various 'FormalEducation' categories and their relation to 'AdjustedCompensation'.

#Your code here

Additional Resources

Here's the original source where the data was taken from:
Kaggle Machine Learning & Data Science Survey 2017

Summary

In this lab, you practiced conducting actual hypothesis tests on actual data. From this, you saw how dependent results can be on the initial problem formulation, including preprocessing!

dsc-in-depth-ab-testing-lab-houston-ds-042219's People

Contributors

mathymitchell avatar alexgriff avatar lmcm18 avatar

Watchers

James Cloos avatar Kevin McAlear avatar  avatar Mohawk Greene avatar Victoria Thevenot avatar Belinda Black avatar Bernard Mordan avatar raza jafri avatar  avatar Joe Cardarelli avatar Sara Tibbetts avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar Antoin avatar  avatar  avatar Amanda D'Avria avatar  avatar Scott Ungchusri avatar Nicole Kroese  avatar Lore Dirick avatar Kaeland Chatman avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.