In the previous lesson, you learned how you can create a cumulative distribution function for discrete and continuous random variables. In this lab, you'll try to calculate a CDF for a dice roll yourself, and visualize it.
You will be able to:
- Calculate CDF in Python for a given discrete variable with a limited set of possible values
- Visualize and inspect a CDF in order to make assumptions about the underlying data
Recall the formula to calculate the cumulative probability from the previous lesson:
So given a list of all possible values of x, We can easily calculate the CDF for a given possible value
- Build a function
calculate_cdf(lst,X)
, wherelst
is a list of all possible values in a discrete variable$x$ (6 values for a dice roll), and$X$ is the value for which we want to calculate the cumulative distribution function - Initialize a variable called
count
- For all values in
lst
, if a value is less than or equal to$X$ , add one tocount
- do nothing otherwise. (this will tell us the total number of values less than$X$ ) - Calculate the cumulative probability of
$X$ dividingcount
by the total number of possible values - Round by 3 decimals and return the cumulative probability of
$X$
def calculate_cdf(lst, X):
pass
# test data
test_lst = [1,2,3]
test_X = 2
calculate_cdf(test_lst, test_X)
# 0.667
0.667
Now, use this function to calculate a CDF for each value in a dice roll so you can plot it later on.
Perform the following steps in the cell below:
- Create a list
dice_lst
with all possible values of a fair dice - Initialize an empty list
dice_cum
for storing cumulative probabilities for these values. - For each value in the
dice_lst
calculate its cumulative probability using the function above and store indice_cum
list.
dice_lst = None
dice_cum = None
dice_cum
# [0.167, 0.333, 0.5, 0.667, 0.833, 1.0]
[0.167, 0.333, 0.5, 0.667, 0.833, 1.0]
CDFs are implemented with two sorted lists: one list which contains the potential outcome values of your discrete distribution, and another list which contains cumulative probabilities.
Following this, we now have a list of possible values and a second list containing cumulative probabilities for each value. Let's go ahead and plot these values in matplotlib using a bar plot.
- Use
dice_lst
for x-axis anddice_cum
for y-axis
# Your code here
CDFs (and PMFs) can be calculated using built-in NumPy and matplotlib methods. So we don't have create custom functions to calculate these. We can draw a histogram styled CDF as shown below using the following steps
You would need to perform these steps
-
Use
np.histogram()
to automatically calculate the histogram with probabilities. Here is numpy histogram documentation to help you dig deeper. -
Use
plt.scatter()
method withnp.cumsum()
to calculate and plot cumulative probabilities (just like we did above).
# Your code here
In this lesson, we looked at developing a CDF - a percentile function of a discrete random variable. We looked at how to calculate and visualize a CDF. This technique can also be applied to continuous random variables which we shall see later in this section.