sparrow0hawk / crime_sim_toolkit Goto Github PK
View Code? Open in Web Editor NEWWork-in-progress | Building a toolkit for simulating police crime count data
License: MIT License
Work-in-progress | Building a toolkit for simulating police crime count data
License: MIT License
The following lines:
def populate_offence(crime_frame):
....
if 'Week' in crime_frame.columns:
time_res = 'Week'
else:
time_res = 'datetime'
...
# reorder columns for ABM
if time_res == 'Week':
populated_frame = populated_frame[['UID','datetime',time_res,'Crime_description','Crime_type','LSOA_code','Police_force']]
else:
populated_frame = populated_frame[['UID','datetime','Crime_description','Crime_type','LSOA_code','Police_force']]
return populated_frame
Mean the utility function only really works with data out of the poisson sampler function. For ease of use a simple refactor should be done to mean this function can work on any data with the expected crime_type column
Can we add the fine crime descriptions (using utils.populate_offence) to a generic data dump from police UK.
The current matching procedure for the crime severity score weights does not generate a weight (or time) for anti-social behaviour events.
Using a large dataset from police UK (218MB) Initialiser.get_data fails with memory error with 4GB RAM.
Think this comes from iterative creation of dataframe and then large scale concat operation.
Could refactor to generate simple dictionary.
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-89b6520a04c4> in <module>
----> 1 data_file.get_data(directory='/home/alex/Downloads/WY_20162019',timeframe='Day')
~/Code/python/crime_sim_toolkit/crime_sim_toolkit/initialiser.py in get_data(self, directory, timeframe)
55 mut_counts_frame = self.reports_to_counts(dated_data, timeframe=timeframe)
56
---> 57 mut_counts_frame = self.add_zero_counts(mut_counts_frame)
58
59 return mut_counts_frame
~/Code/python/crime_sim_toolkit/crime_sim_toolkit/initialiser.py in add_zero_counts(self, counts_frame)
213
214
--> 215 new_tot_counts = pd.concat([counts_frame,pd.concat(pile_o_df)], sort=True)
216
217 new_tot_counts.reset_index(drop=True, inplace=True)
~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
256 )
257
--> 258 return op.get_result()
259
260
~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
471
472 new_data = concatenate_block_managers(
--> 473 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
474 )
475 if not self.copy:
~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
2051 else:
2052 b = make_block(
-> 2053 concatenate_join_units(join_units, concat_axis, copy=copy),
2054 placement=placement,
2055 )
~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/internals/concat.py in concatenate_join_units(join_units, concat_axis, copy)
266 concat_values = concat_values.copy()
267 else:
--> 268 concat_values = _concat._concat_compat(to_concat, axis=concat_axis)
269
270 return concat_values
~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/dtypes/concat.py in _concat_compat(to_concat, axis)
172 to_concat = [x.astype("object") for x in to_concat]
173
--> 174 return np.concatenate(to_concat, axis=axis)
175
176
<__array_function__ internals> in concatenate(*args, **kwargs)
MemoryError: Unable to allocate array with shape (2, 21278040) and data type int64
Need to change final columns outputted for ABM
Current output after allocating reports and crime descriptions
ID, Hour, Day, Mon, Crime_description, Crime_type, LSOA_code, UID, Force-Area
Desired
UID, Year, Mon, Day(/Week), Hour, Crime_description, Crime_type, LSOA_code, Force-Area
The current system takes the year-mon of crime events from Police UK and breaks it into separate Year, Month and (randomly allocated) timeframe (day or week). A more robust way to handle this time data would be for formulate this into a single datetime column using pandas to_datetime functionality.
Develop a method that can adjust crime counts from sampling to create a dataset that models an increase in crime rate in a particular crime type.
i.e.
utils.crime_surge(crime_type='Robbery', increase_rate=1.2)
utils.crime_drop(crime_type='Robbery, decrease_rate=0.95)
Currently the capitalisation of the matched crime severity score procedure crime_des_CSSweights.csv (column Offence_Description) does not match the capitalisation in the synthetic events generated by the sampler/historic data.
In particular Offence_Description in the above file is all lowercase - and inconsistently capitalised in the event files.
Can these be reconciled? - likely the most sensible option is for both files to use all lower-case. This should be easily accomplished by just changing the case in the event generator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.