Git Product home page Git Product logo

warmup-pandas-1's Introduction

Video Gambling Data and Pandas ๐Ÿง

Video gambling, in Illinois, was legalized in 2012.

Since then, it has become a boon for local bars, restaurants, and communities with declining tax revenue, but comes with public health concerns of gambling addition. Because of this Video Gambling is a frequent issue voted on by municipal governments in Illinois.

Video Gambling is closely monitored by the Illinois Gaming Board and has been the subject of much reporting by local news agencies.

Let's use our skills with Pandas to investigate this topic.

# Run this cell unchanged

# if you get a long error msg:
# - restart the kernal 
# (click the circular arrow icon right below the tab for the notebook)
# - make a new cell above this one and import pandas as pd there

import pandas as pd
from IPython.display import display, Markdown

def markdown(text):
    display(Markdown(text))

#used for tests
#testing
from test_scripts.test_class import Test
import numpy as np

testing = Test()

Our data is located within the data folder of this repo.

It is titled 2019-il-vgambling.csv

In the cell below:

  1. Set the path variable to the path ./data/2019-il-vgambling.csv.
    • (reverse the slashes if you're on Windows)
  2. Run the cell to import our dataset.
path = None
data = pd.read_csv(path)

Ok, let's print out the first 5 rows using the .head() method.

# Your code here

Column Descriptions

Column Name Description
Municipality The community's name
Establishment Count Number of businesses with video gambling licenses
Terminal Count Number of video gambling machines in the community.
Amount Played Total amount spent on video gambling by players.
Amount Won Total amount won by players
Nti Tax The Net Terminal Income Tax Rate
is 30% of the Net Terminal Income.
The funds are divided between
the State of Illinois and local
governmental organizations.
State Share Total revenue received by the State Government
Municipality Share Total revenue received by the Municipality

When examining data at the municipal level, it is common to scale our data according to the municipality's population.

This is often referred to as scaling our data per capita.

To do this let's import some population data for Illinois Municipalities.

In the cell below:

  1. Set the path variable to the path for the population.csv file within the data folder.
  2. Run the cell to import our population data.
path = None
pop = pd.read_csv(path)

Cool Cool, let's print out the first 5 rows using the .head() method.

# Your Code here

Let's remove the Unnamed: 0 column.

pop.drop('Unnamed: 0', axis = 1, inplace = True)

We need to merge our two datasets.

When merging datasets, it's important to check the length of our datasets before and after merging to make sure we are not losing too much data.

In the cell below:

  1. Set the variable length_before_merge to the length of our data dataframe using python's built in len function
# Your code here
length_before_merge = None


# Run Code below without change
string = '''<u>Length before merge:</u> **{}**'''.format(length_before_merge)
markdown(string)

Merge Time

In the cell below:

  1. Merge the two dataframes on the Municipality column.
    • Save the merged dataframe as the variable df
# Your code here

Now we need to check the length of our dataframe to make sure we didn't lose data!

In the cell below:

  1. Set the length_after_merge variable to the length of df.
# Your code here
length_after_merge = None


# Run Code below without change
string = '''<u>Length after merge:</u> **{}**'''.format(length_after_merge,)
markdown(string)

In the cell below, set the Municipality column as the index using the .set_index() method.

# Your code here

Let's sort our index alphabetically using this method.

# Your code here

To make things easier on ourselves, let's reformat our column names.

In the cell below:

  1. Replace spaces with underscores for each column name
  2. Lower each column name

Bonus points if you do this via list comphrension ๐Ÿ˜ƒ

# Your code here

So much cleaning


Ok Ok, we're almost done formatting our data.

In the cell below:

  1. Print out the datatypes for each of our columns using the .info() method.
# Your code here

Our population column contains commas which is causing the computer to interpret the column as a string.

In the cell below:

  1. Remove the commas from the column using the .apply method

If your confused: Find the answer relating to .apply() in this Stack Overflow thread.

  1. Convert the column datatype to integer

    • Bonus points if you can do steps 1 & 2 with 1๏ธโƒฃ line of code! ๐Ÿ˜ป
# Your code here

Cleaning Complete!

Ok Ok!

Let's create a column that shows the number of gambling terminals per capita!

In the cell below:

  1. Create a new column called terminals_percapita by dividing terminal_count by population
# Your code here

Now let's identify which communities have the highest number of gambling devices per capita.

In the cell(s) below:

  1. Sort the dataframe according to the terminals_percapita column.
  2. Identify the 10 communities with the highest number of gambling machines per capita.
  3. Save those 10 community names in a list called highest_machines_percapita
# Your code here

Run the cell below to see if you identified the correct Municipalities!

testing.run_test(highest_machines_percapita, 'highest_machines_percapita')

Next, let's figure out how much money players lost for each municipality.

In the cell below:

  1. Create a new column called amount_lost that is the difference between the amount_played and amount_won columns
# Your code here

In the cell below:

  1. Save the mean of the amount_loss column as the variable average_loss.
  2. Using numpy, round the average_loss variable to 2 decimal points.
    • Save the rounded number as the variable average_loss_rounded
# Your code here

average_loss = None
average_loss_rounded = None

Let's zoom in on this new loss data.

In the cell below:

  1. Create a new column called loss_percapita that is the division of the amount_lost and population
# Your code here

In the cell below

  1. Sort the dataframe by loss_percapita and save the 10 communities with the highest loss per capita to a list called highest_loss_percapita
# Your code here
highest_loss_percapita = None

Run the cell below to see if you idenitified the correct municipalities!

testing.run_test(highest_loss_percapita, 'highest_loss_percapita')

In the cell below:

  1. Filter our dataframe to contain municipalities with a loss_percapita of 406 or greater.

    • Save this filtered dataframe as high_loss_percapita
  2. Filter our dataframe to contain municipalities with a loss_percapita of 155 or less.

    • Save this filtered dataframe as low_loss_percapita
  3. Identify the mean population for the municipalities with a high per capita loss

    • Using numpy, round this data point to 2 decimals
    • Save this data point as the variable high_loss_average_population
  4. Identify the mean population for the municipalities with a low per capita loss.

    • Using numpy round this data point to 2 decimals
    • Save this data point as the variable low_loss_average_population
# Your code here

Run the cell below to see if you identified the correct averages!

high_result = testing.run_test(high_loss_average_population, 'high_loss_average_population')
low_result = testing.run_test(low_loss_average_population, 'low_loss_average_population')

warmup-pandas-1's People

Contributors

joelsewhere avatar ben-oren avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.