Git Product home page Git Product logo

data-science-for-good's Introduction

Data-Science-for-Good: How to measure juestic?

The Center for Policing Equity (CPE) host a Kaggle compettition for community trainers working together to build more fair and just systems. The main problems of this competition consist of two. How do you measure justice? And how do you solve the problem of racism in policing? In this project, I will discover factors that drive racial disparities in policing by analyzing census and police department deployment data.

1.Dataset

To begin with, lets look into the datasets that we are provided.

In Cpe-data folder, they are catagorized by different region of different state. image

Take Dept_11-00091 as example, it's the dataset folder from Massachusetts, and the three main dataset are police shapefiles(which are police district geometry data), ASC data(which are census data containing gender, race, poverty, education level, employment condition, etc), and police crime records(which are provided by police departments across the United States, it's the first and largest collection of standardized police behavioral data). image

prerequisite knowledge

Note that ACS data is census data regarding census tract, which is an area roughly equivalent to a neighborhood established by the Bureau of Census for analyzing populations. Census tracts generally encompass a population between 2,500 to 8,000 people. Bureau of Census describes them as "relatively permanent", but they do change over time. image

Then, we have to consider how to combine census data regarding census tracts, when we are provided geometry data of police districts because census tracts and police districts are different geometry concept. To settle this problem, we need to pull down supplementary data from US Census Bureau

Luckily, we can easily download shapefiles for census tracts from US Census Bureau's website:https://www.census.gov/geo/maps-data/data/cbf/cbf_tracts.html image

The next thing might be new to us is what we call shapefiles. Without getting into too many details, Shapefiles contain geospatial information (e.g. the shape of the boundaries of a U.S. County within the context of some coordinate system stored in a .shp file) and attributes about the geographical entity. Shapefiles can be read using GeoPandas.

Each Shapefile owns its Coordinate Reference Systems(CRS). In order to Map different shapefiles on the same coordinate, we have to standardize their CRS before mapping.

Clean and Select Data

In the police crime record data, there are unknown data or empty data in our target features, for example, races. We have to replace them with NaN values. image

In ACS data, we need to select target features that are likely to relate to crime. ACS data is relatively massive and scattered so it will be effective and plain for us if we extract the target features on the same dataFrame. Thus we extract target features from different dataFrame and merge them onto one dataFrame.

image

Data Aggregation

First data aggregation is to merge police crime records qith police district shapefiles on their common key, so that we can have a new police crime data with geometry data of police district. image

To aggreate the new police crime data with geometry data of police district with census tracts shapefiles, we have to figure out the geometrical relations between police districts and census tracts. image

By running the code above, we are able to visualize the relations of between police districts and census tracts, and compute the percentage that each census tract contributes to various polices districts. image

Then we are able to combine the new police crime data with geometry data of police district with census tracts shapefiles image

Hence, with what we aggregated so far, we are able to merge different dataframes onto one, so that we can do visualization with it. image

Data Visualization

What we can visualize with the new police crime data with police district geometry information(For more plots for other catagories please check out the code "CleanAndVisualizeMA"): image

What we can visualize with the census data for police district(For more plots for other catagories please check out the code "CleanAndVisualizeMA"): image

Correlation

From the visualization results above, we can roughly have a ideal of how crime records related to the target value we choose. But to specifically and mathematically measure the Equity in policing, we have to use a statistical tool named correlation.

Here, I use the population of different target features(races, genders, ages, poverty level, education, employment condition) from ACS data to correlate with the number of crime records regarding different police district. So that's the correlation I found out:

image

Notice: the correlation above dosen't necessarily mean that one feature cause the crimes, but it will be a good reference for police department to inspect their policing behavior.

data-science-for-good's People

Contributors

connorcheng2 avatar

data-science-for-good's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.