Git Product home page Git Product logo

crime_sim_toolkit's People

Contributors

sparrow0hawk avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

danbirks m-o-p-d

crime_sim_toolkit's Issues

util.populate_offence function is inflexible for data not from sampler

The following lines:

def populate_offence(crime_frame):
   ....
    if 'Week' in crime_frame.columns:
        time_res = 'Week'
    else:
        time_res = 'datetime'
    
    ...

    # reorder columns for ABM

    if time_res == 'Week':

        populated_frame = populated_frame[['UID','datetime',time_res,'Crime_description','Crime_type','LSOA_code','Police_force']]

    else:

        populated_frame = populated_frame[['UID','datetime','Crime_description','Crime_type','LSOA_code','Police_force']]

    return populated_frame

Mean the utility function only really works with data out of the poisson sampler function. For ease of use a simple refactor should be done to mean this function can work on any data with the expected crime_type column

add_zero_counts memory error with large dataset

Using a large dataset from police UK (218MB) Initialiser.get_data fails with memory error with 4GB RAM.

Think this comes from iterative creation of dataframe and then large scale concat operation.

Could refactor to generate simple dictionary.

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-6-89b6520a04c4> in <module>
----> 1 data_file.get_data(directory='/home/alex/Downloads/WY_20162019',timeframe='Day')

~/Code/python/crime_sim_toolkit/crime_sim_toolkit/initialiser.py in get_data(self, directory, timeframe)
     55         mut_counts_frame = self.reports_to_counts(dated_data, timeframe=timeframe)
     56 
---> 57         mut_counts_frame = self.add_zero_counts(mut_counts_frame)
     58 
     59         return mut_counts_frame

~/Code/python/crime_sim_toolkit/crime_sim_toolkit/initialiser.py in add_zero_counts(self, counts_frame)
    213 
    214 
--> 215         new_tot_counts = pd.concat([counts_frame,pd.concat(pile_o_df)], sort=True)
    216 
    217         new_tot_counts.reset_index(drop=True, inplace=True)

~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    256     )
    257 
--> 258     return op.get_result()
    259 
    260 

~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
    471 
    472             new_data = concatenate_block_managers(
--> 473                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
    474             )
    475             if not self.copy:

~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2051         else:
   2052             b = make_block(
-> 2053                 concatenate_join_units(join_units, concat_axis, copy=copy),
   2054                 placement=placement,
   2055             )

~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/internals/concat.py in concatenate_join_units(join_units, concat_axis, copy)
    266                 concat_values = concat_values.copy()
    267     else:
--> 268         concat_values = _concat._concat_compat(to_concat, axis=concat_axis)
    269 
    270     return concat_values

~/anaconda3/envs/crime_sim/lib/python3.7/site-packages/pandas/core/dtypes/concat.py in _concat_compat(to_concat, axis)
    172                 to_concat = [x.astype("object") for x in to_concat]
    173 
--> 174     return np.concatenate(to_concat, axis=axis)
    175 
    176 

<__array_function__ internals> in concatenate(*args, **kwargs)

MemoryError: Unable to allocate array with shape (2, 21278040) and data type int64

Correct output columns for ABM

Need to change final columns outputted for ABM

Current output after allocating reports and crime descriptions

ID, Hour, Day, Mon, Crime_description, Crime_type, LSOA_code, UID, Force-Area

Desired

UID, Year, Mon, Day(/Week), Hour, Crime_description, Crime_type, LSOA_code, Force-Area

Datetime functionality

The current system takes the year-mon of crime events from Police UK and breaks it into separate Year, Month and (randomly allocated) timeframe (day or week). A more robust way to handle this time data would be for formulate this into a single datetime column using pandas to_datetime functionality.

Reconcile capitalisation of crime severity score weight table and generated event files

Currently the capitalisation of the matched crime severity score procedure crime_des_CSSweights.csv (column Offence_Description) does not match the capitalisation in the synthetic events generated by the sampler/historic data.

In particular Offence_Description in the above file is all lowercase - and inconsistently capitalised in the event files.

Can these be reconciled? - likely the most sensible option is for both files to use all lower-case. This should be easily accomplished by just changing the case in the event generator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.