Video Gambling Data and Pandas 🧐

Video gambling, in Illinois, was legalized in 2012.

Since then, it has become a boon for local bars, restaurants, and communities with declining tax revenue, but comes with public health concerns of gambling addition. Because of this Video Gambling is a frequent issue voted on by municipal governments in Illinois.

Video Gambling is closely monitored by the Illinois Gaming Board and has been the subject of much reporting by local news agencies.

Let's use our skills with Pandas to investigate this topic.

# Run this cell unchanged

# if you get a long error msg:
# - restart the kernal 
# (click the circular arrow icon right below the tab for the notebook)
# - make a new cell above this one and import pandas as pd there

import pandas as pd
from IPython.display import display, Markdown

def markdown(text):
    display(Markdown(text))

#used for tests
#testing
from test_scripts.test_class import Test
import numpy as np

testing = Test()

Our data is located within the data folder of this repo.

It is titled 2019-il-vgambling.csv

In the cell below:

Set the path variable to the path ./data/2019-il-vgambling.csv.
- (reverse the slashes if you're on Windows)
Run the cell to import our dataset.

path = None
data = pd.read_csv(path)

Ok, let's print out the first 5 rows using the .head() method.

# Your code here

Column Descriptions

Column Name	Description
Municipality	The community's name
Establishment Count	Number of businesses with video gambling licenses
Terminal Count	Number of video gambling machines in the community.
Amount Played	Total amount spent on video gambling by players.
Amount Won	Total amount won by players
Nti Tax	The Net Terminal Income Tax Rate is 30% of the Net Terminal Income. The funds are divided between the State of Illinois and local governmental organizations.
State Share	Total revenue received by the State Government
Municipality Share	Total revenue received by the Municipality

When examining data at the municipal level, it is common to scale our data according to the municipality's population.

This is often referred to as scaling our data per capita.

To do this let's import some population data for Illinois Municipalities.

In the cell below:

Set the path variable to the path for the population.csv file within the data folder.
Run the cell to import our population data.

path = None
pop = pd.read_csv(path)

Cool Cool, let's print out the first 5 rows using the .head() method.

# Your Code here

Let's remove the Unnamed: 0 column.

pop.drop('Unnamed: 0', axis = 1, inplace = True)

We need to merge our two datasets.

When merging datasets, it's important to check the length of our datasets before and after merging to make sure we are not losing too much data.

In the cell below:

Set the variable length_before_merge to the length of our data dataframe using python's built in len function

# Your code here
length_before_merge = None


# Run Code below without change
string = '''<u>Length before merge:</u> **{}**'''.format(length_before_merge)
markdown(string)

Merge Time

In the cell below:

Merge the two dataframes on the Municipality column.
- Save the merged dataframe as the variable df

# Your code here

Now we need to check the length of our dataframe to make sure we didn't lose data!

In the cell below:

Set the length_after_merge variable to the length of df.

# Your code here
length_after_merge = None


# Run Code below without change
string = '''<u>Length after merge:</u> **{}**'''.format(length_after_merge,)
markdown(string)

In the cell below, set the Municipality column as the index using the .set_index() method.

# Your code here

Let's sort our index alphabetically using this method.

# Your code here

To make things easier on ourselves, let's reformat our column names.

In the cell below:

Replace spaces with underscores for each column name
Lower each column name

Bonus points if you do this via list comphrension 😃

# Your code here

So much cleaning

Ok Ok, we're almost done formatting our data.

In the cell below:

Print out the datatypes for each of our columns using the .info() method.

# Your code here

Our population column contains commas which is causing the computer to interpret the column as a string.

In the cell below:

Remove the commas from the column using the .apply method

If your confused: Find the answer relating to .apply() in this Stack Overflow thread.

Convert the column datatype to integer
- Bonus points if you can do steps 1 & 2 with 1️⃣ line of code! 😻

# Your code here

Cleaning Complete!

Ok Ok!

Let's create a column that shows the number of gambling terminals per capita!

In the cell below:

Create a new column called terminals_percapita by dividing terminal_count by population

# Your code here

Now let's identify which communities have the highest number of gambling devices per capita.

In the cell(s) below:

Sort the dataframe according to the terminals_percapita column.
Identify the 10 communities with the highest number of gambling machines per capita.
Save those 10 community names in a list called highest_machines_percapita

# Your code here

Run the cell below to see if you identified the correct Municipalities!

testing.run_test(highest_machines_percapita, 'highest_machines_percapita')

Next, let's figure out how much money players lost for each municipality.

In the cell below:

Create a new column called amount_lost that is the difference between the amount_played and amount_won columns

# Your code here

In the cell below:

Save the mean of the amount_loss column as the variable average_loss.
Using numpy, round the average_loss variable to 2 decimal points.
- Save the rounded number as the variable average_loss_rounded

# Your code here

average_loss = None
average_loss_rounded = None

Let's zoom in on this new loss data.

In the cell below:

Create a new column called loss_percapita that is the division of the amount_lost and population

# Your code here

In the cell below

Sort the dataframe by loss_percapita and save the 10 communities with the highest loss per capita to a list called highest_loss_percapita

# Your code here
highest_loss_percapita = None

Run the cell below to see if you idenitified the correct municipalities!

testing.run_test(highest_loss_percapita, 'highest_loss_percapita')

In the cell below:

Filter our dataframe to contain municipalities with a loss_percapita of 406 or greater.
- Save this filtered dataframe as high_loss_percapita
Filter our dataframe to contain municipalities with a loss_percapita of 155 or less.
- Save this filtered dataframe as low_loss_percapita
Identify the mean population for the municipalities with a high per capita loss
- Using numpy, round this data point to 2 decimals
- Save this data point as the variable high_loss_average_population
Identify the mean population for the municipalities with a low per capita loss.
- Using numpy round this data point to 2 decimals
- Save this data point as the variable low_loss_average_population

# Your code here

Run the cell below to see if you identified the correct averages!

high_result = testing.run_test(high_loss_average_population, 'high_loss_average_population')
low_result = testing.run_test(low_loss_average_population, 'low_loss_average_population')

christosmaglaras / warmup-pandas-1 Goto Github PK

warmup-pandas-1's Introduction

Video Gambling Data and Pandas 🧐

Video gambling, in Illinois, was legalized in 2012.

Column Descriptions

So much cleaning

Cleaning Complete!

Ok Ok!

warmup-pandas-1's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent