Since then, it has become a boon for local bars, restaurants, and communities with declining tax revenue, but comes with public health concerns of gambling addition. Because of this Video Gambling is a frequent issue voted on by municipal governments in Illinois.
Video Gambling is closely monitored by the Illinois Gaming Board and has been the subject of much reporting by local news agencies.
Let's use our skills with Pandas to investigate this topic.
# Run this cell unchanged
# if you get a long error msg:
# - restart the kernal
# (click the circular arrow icon right below the tab for the notebook)
# - make a new cell above this one and import pandas as pd there
import pandas as pd
from IPython.display import display, Markdown
def markdown(text):
display(Markdown(text))
#used for tests
#testing
from test_scripts.test_class import Test
import numpy as np
testing = Test()
Our data is located within the data
folder of this repo.
It is titled 2019-il-vgambling.csv
In the cell below:
- Set the
path
variable to the path./data/2019-il-vgambling.csv
.- (reverse the slashes if you're on Windows)
- Run the cell to import our dataset.
path = None
data = pd.read_csv(path)
Ok, let's print out the first 5 rows using the .head()
method.
# Your code here
Column Name | Description |
---|---|
Municipality | The community's name |
Establishment Count | Number of businesses with video gambling licenses |
Terminal Count | Number of video gambling machines in the community. |
Amount Played | Total amount spent on video gambling by players. |
Amount Won | Total amount won by players |
Nti Tax | The Net Terminal Income Tax Rate is 30% of the Net Terminal Income. The funds are divided between the State of Illinois and local governmental organizations. |
State Share | Total revenue received by the State Government |
Municipality Share | Total revenue received by the Municipality |
When examining data at the municipal level, it is common to scale our data according to the municipality's population.
This is often referred to as scaling our data per capita.
To do this let's import some population data for Illinois Municipalities.
In the cell below:
- Set the
path
variable to the path for thepopulation.csv
file within thedata
folder. - Run the cell to import our population data.
path = None
pop = pd.read_csv(path)
Cool Cool, let's print out the first 5 rows using the .head()
method.
# Your Code here
Let's remove the Unnamed: 0
column.
pop.drop('Unnamed: 0', axis = 1, inplace = True)
We need to merge our two datasets.
When merging datasets, it's important to check the length of our datasets before and after merging to make sure we are not losing too much data.
In the cell below:
- Set the variable
length_before_merge
to the length of ourdata
dataframe using python's built inlen
function
# Your code here
length_before_merge = None
# Run Code below without change
string = '''<u>Length before merge:</u> **{}**'''.format(length_before_merge)
markdown(string)
Merge Time
In the cell below:
- Merge the two dataframes on the
Municipality
column.- Save the merged dataframe as the variable
df
- Save the merged dataframe as the variable
# Your code here
Now we need to check the length of our dataframe to make sure we didn't lose data!
In the cell below:
- Set the
length_after_merge
variable to the length ofdf
.
# Your code here
length_after_merge = None
# Run Code below without change
string = '''<u>Length after merge:</u> **{}**'''.format(length_after_merge,)
markdown(string)
In the cell below, set the Municipality column as the index using the .set_index()
method.
# Your code here
Let's sort our index alphabetically using this method.
# Your code here
To make things easier on ourselves, let's reformat our column names.
In the cell below:
- Replace spaces with underscores for each column name
- Lower each column name
Bonus points if you do this via list comphrension ๐
# Your code here
Ok Ok, we're almost done formatting our data.
In the cell below:
- Print out the datatypes for each of our columns using the
.info()
method.
# Your code here
Our population
column contains commas which is causing the computer to interpret the column as a string.
In the cell below:
- Remove the commas from the column using the
.apply
method
If your confused: Find the answer relating to
.apply()
in this Stack Overflow thread.
-
Convert the column datatype to integer
- Bonus points if you can do steps 1 & 2 with 1๏ธโฃ line of code! ๐ป
# Your code here
Let's create a column that shows the number of gambling terminals per capita!
In the cell below:
- Create a new column called
terminals_percapita
by dividingterminal_count
bypopulation
# Your code here
Now let's identify which communities have the highest number of gambling devices per capita.
In the cell(s) below:
- Sort the dataframe according to the
terminals_percapita
column. - Identify the 10 communities with the highest number of gambling machines per capita.
- Save those 10 community names in a list called
highest_machines_percapita
# Your code here
Run the cell below to see if you identified the correct Municipalities!
testing.run_test(highest_machines_percapita, 'highest_machines_percapita')
Next, let's figure out how much money players lost for each municipality.
In the cell below:
- Create a new column called
amount_lost
that is the difference between theamount_played
andamount_won
columns
# Your code here
In the cell below:
- Save the mean of the
amount_loss
column as the variableaverage_loss
. - Using numpy, round the
average_loss
variable to 2 decimal points.- Save the rounded number as the variable
average_loss_rounded
- Save the rounded number as the variable
# Your code here
average_loss = None
average_loss_rounded = None
Let's zoom in on this new loss data.
In the cell below:
- Create a new column called
loss_percapita
that is the division of theamount_lost
andpopulation
# Your code here
In the cell below
- Sort the dataframe by
loss_percapita
and save the 10 communities with the highest loss per capita to a list calledhighest_loss_percapita
# Your code here
highest_loss_percapita = None
Run the cell below to see if you idenitified the correct municipalities!
testing.run_test(highest_loss_percapita, 'highest_loss_percapita')
In the cell below:
-
Filter our dataframe to contain municipalities with a
loss_percapita
of 406 or greater.- Save this filtered dataframe as
high_loss_percapita
- Save this filtered dataframe as
-
Filter our dataframe to contain municipalities with a
loss_percapita
of 155 or less.- Save this filtered dataframe as
low_loss_percapita
- Save this filtered dataframe as
-
Identify the mean population for the municipalities with a high per capita loss
- Using numpy, round this data point to 2 decimals
- Save this data point as the variable
high_loss_average_population
-
Identify the mean population for the municipalities with a low per capita loss.
- Using numpy round this data point to 2 decimals
- Save this data point as the variable
low_loss_average_population
# Your code here
Run the cell below to see if you identified the correct averages!
high_result = testing.run_test(high_loss_average_population, 'high_loss_average_population')
low_result = testing.run_test(low_loss_average_population, 'low_loss_average_population')