The Probability Density Function - Lab

Introduction

In this lab, we will look at building visualizations known as density plots to estimate the probability density for a given set of data.

Objectives

You will be able to:

Plot and interpret density plots and comment on the shape of the plot
Estimate probabilities for continuous variables by using interpolation

Let's get started

Let's import the necessary libraries for this lab.

# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
# plt.style.use('ggplot')
import pandas as pd
import seaborn as sns

Import the data, and calculate the mean and the standard deviation

Import the dataset 'weight-height.csv' as a pandas dataframe.
Next, calculate the mean and standard deviation for weights and heights for men and women individually. You can simply use the pandas .mean() and .std() to do so.

Hint: Use your pandas dataframe subsetting skills like loc(), iloc(), and groupby()

data = None
male_df =  None
female_df =  None

# Male Height mean: 69.02634590621737
# Male Height sd: 2.8633622286606517
# Male Weight mean: 187.0206206581929
# Male Weight sd: 19.781154516763813
# Female Height mean: 63.708773603424916
# Female Height sd: 2.696284015765056
# Female Weight mean: 135.8600930074687
# Female Weight sd: 19.022467805319007

Plot histograms (with densities on the y-axis) for male and female heights

Make sure to create overlapping plots
Use binsize = 10, set alpha level so that overlap can be visualized

# Your code here

# Record your observations - are these inline with your personal observations?

# Record your observations - are these inline with your personal observations?

# Men tend to have higher values of heights in general than female
# The most common region for male and female heights is between 65 - 67 inches (about 5 and a half feet)
# Male heights have a slightly higher spread than female heights, hence the male height peak is slightly smaller than female height
# Both heights are normally distributed

Create a density function using interpolation

Write a density function density() that uses interpolation and takes in a random variable
Use np.histogram()
The function should return two lists carrying x and y coordinates for plotting the density function

def density(x):
    
    pass


# Generate test data and test the function - uncomment to run the test
# np.random.seed(5)
# mu, sigma = 0, 0.1 # mean and standard deviation
# s = np.random.normal(mu, sigma, 100)
# x,y = density(s)
# plt.plot(x,y, label = 'test')
# plt.legend()

Add overlapping density plots to the histograms plotted earlier

# Your code here

Repeat the above exercise for male and female weights

# Your code here

Write your observations in the cell below

# Record your observations - are these inline with your personal observations?


# What is the takeaway when comparing male and female heights and weights?

# Record your observations - are these inline with your personal observations?

# The patterns and overlap are highly similar to what we see with height distributions
# Men generally are heavier than women
# The common region for common weights is around 160 lbs. 
# Male weight has slightly higher spread than female weight (i.e. more variation)
# Most females are around 130-140 lbs whereas most men are around 180 pounds.

#Takeaway

# Weight is more suitable to distinguish between males and females than height

Repeat the above experiments in seaborn and compare with your results

# Code for heights here

# Code for weights here

# Your comments on the two approaches here. 
# are they similar? what makes them different if they are?

# Well, what do you think? Overlapping or side to side (or rather top/bottom)

Summary

In this lesson, you learned how to build the probability density curves visually for a given dataset and compare the distributions visually by looking at the spread, center, and overlap. This is a useful EDA technique and can be used to answer some initial questions before embarking on a complex analytics journey.

olitreadwell / flatiron-probability-density-functions-lab Goto Github PK

flatiron-probability-density-functions-lab's Introduction

The Probability Density Function - Lab

Introduction

Objectives

Let's get started

Import the data, and calculate the mean and the standard deviation

Plot histograms (with densities on the y-axis) for male and female heights

Create a density function using interpolation

Add overlapping density plots to the histograms plotted earlier

Repeat the above exercise for male and female weights

Write your observations in the cell below

Repeat the above experiments in seaborn and compare with your results

Summary

flatiron-probability-density-functions-lab's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent