The Center for Policing Equity (CPE) host a Kaggle compettition for community trainers working together to build more fair and just systems. The main problems of this competition consist of two. How do you measure justice? And how do you solve the problem of racism in policing? In this project, I will discover factors that drive racial disparities in policing by analyzing census and police department deployment data.
To begin with, lets look into the datasets that we are provided.
In Cpe-data folder, they are catagorized by different region of different state.
Take Dept_11-00091 as example, it's the dataset folder from Massachusetts, and the three main dataset are police shapefiles(which are police district geometry data), ASC data(which are census data containing gender, race, poverty, education level, employment condition, etc), and police crime records(which are provided by police departments across the United States, it's the first and largest collection of standardized police behavioral data).
Note that ACS data is census data regarding census tract, which is an area roughly equivalent to a neighborhood established by the Bureau of Census for analyzing populations. Census tracts generally encompass a population between 2,500 to 8,000 people. Bureau of Census describes them as "relatively permanent", but they do change over time.
Then, we have to consider how to combine census data regarding census tracts, when we are provided geometry data of police districts because census tracts and police districts are different geometry concept. To settle this problem, we need to pull down supplementary data from US Census Bureau
Luckily, we can easily download shapefiles for census tracts from US Census Bureau's website:https://www.census.gov/geo/maps-data/data/cbf/cbf_tracts.html
The next thing might be new to us is what we call shapefiles. Without getting into too many details, Shapefiles contain geospatial information (e.g. the shape of the boundaries of a U.S. County within the context of some coordinate system stored in a .shp file) and attributes about the geographical entity. Shapefiles can be read using GeoPandas.
Each Shapefile owns its Coordinate Reference Systems(CRS). In order to Map different shapefiles on the same coordinate, we have to standardize their CRS before mapping.
In the police crime record data, there are unknown data or empty data in our target features, for example, races. We have to replace them with NaN values.
In ACS data, we need to select target features that are likely to relate to crime. ACS data is relatively massive and scattered so it will be effective and plain for us if we extract the target features on the same dataFrame. Thus we extract target features from different dataFrame and merge them onto one dataFrame.
First data aggregation is to merge police crime records qith police district shapefiles on their common key, so that we can have a new police crime data with geometry data of police district.
To aggreate the new police crime data with geometry data of police district with census tracts shapefiles, we have to figure out the geometrical relations between police districts and census tracts.
By running the code above, we are able to visualize the relations of between police districts and census tracts, and compute the percentage that each census tract contributes to various polices districts.
Then we are able to combine the new police crime data with geometry data of police district with census tracts shapefiles
Hence, with what we aggregated so far, we are able to merge different dataframes onto one, so that we can do visualization with it.
What we can visualize with the new police crime data with police district geometry information(For more plots for other catagories please check out the code "CleanAndVisualizeMA"):
What we can visualize with the census data for police district(For more plots for other catagories please check out the code "CleanAndVisualizeMA"):
From the visualization results above, we can roughly have a ideal of how crime records related to the target value we choose. But to specifically and mathematically measure the Equity in policing, we have to use a statistical tool named correlation.
Here, I use the population of different target features(races, genders, ages, poverty level, education, employment condition) from ACS data to correlate with the number of crime records regarding different police district. So that's the correlation I found out:
Notice: the correlation above dosen't necessarily mean that one feature cause the crimes, but it will be a good reference for police department to inspect their policing behavior.