Git Product home page Git Product logo

dap's Introduction

Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights (ACM SIGSPATIAL 2019)

Generate Input

One important process is to transform raw input data into the form of input for a machine learning model. Here we employed multiple processes as follows:

  • Step 1: Run 1-CreateInputForAccidentPrediction.ipynb from /1-GenerateFeatureVector to generate raw feature vectors. Each vector represents a geographical region of size 5km x 5km (that we call it a geohash) during a 15 minutes time interval. This code uses LSTW dataset for traffic events data, raw weather observation records for weather-related attributes (check data/Sample_Weather.tar.gz for sample data), and daylight information (check data/sample_daylight.csv for sample data).

  • Step 2: Run 2-CreateNaturalLanguageRepresentationForGeoHashes.ipynb to generate description to vector representation for geographical regions. The main inputs for this process are LSTW and GloVe. A sample output can be find as data/geohash_to_text_vec.csv.

  • Step 3: Run 3-DataCleaningAndIntegration.ipynb for data cleaning, and preparation for integration with POI data.

  • Step 4: Run 4-FinalTrainAndTestDataPreparation.ipynb to prepare final train and test data. This includes creating sample entries, and negative sampling for non-accident data samples. There are two versions of the code: single thread vs multi-thread. The multi-thread version uses more system cores but it needs more memory as well. It is more suitable for running on servers. Single-thread is for running on desktop devices for generating smaller train-test sets.

Implementations of these steps can be found in 1-GenerateFeatureVector. Also, note that the sample data and codes are for those cities that we used in the paper (e.g., Atlanta, Austin, Charlotte, Dallas, Houston, and Los Angeles).

Sample Data Files For Train and Test

To train and test our proposed model and the baselines, you can use our pre-generated train and test files for six cities Atlanta, Austin, Charlotte, Dallas, Houston, and Los Angeles. The time frame to generate sample data for these cities is the same as what we described in our paper. You can find these files in data/train_set.7z. Use 7za -e train_set.7z to decompress this file and obtain 4 numpy (.npy) files per city. Two files contain feature vectors for train and test, and two files contain train and test labels. These sample files are the result of the above input generation process.

Deep Accident Prediction (DAP) Model

Our Deep Accident Prediction model comprises several important components including Recurrent Component, Embedding Component, Description-to-Vector Component, Points-Of-Interest Component, and Fully-connected Component. The following image shows a demonstration of this model:

The implementation of this model can be found here: 2-DAP/DAP.ipynb.

Baseline Models

In terms of baselines, we employed the following models:

  • Logistic Regressions (LR): Find sample code in 3-Baselines/Traditional_Models_Sklearn.py.
  • Gradient Boosted classifier (GBC): Find sample code in 3-Baselines/Traditional_Models_Sklearn.py.
  • FeedForward Neural Network Model (DNN): An implementation of this model can be found in 3-Baselines/DNN.ipynb.
  • DAP Without Embedding Component (DAP-NoEmbed): An implementation of this model can be found in 2-DAP/DAP-NoEmbed.ipynb.

System Requirements

We recommend using Python 3.x, and install the following to properly run the code:

pip install tensorflow==1.14.0
pip install keras==2.3.1
pip install keras_metrics
pip install keras_self_attention
pip install scikit-learn==0.20.0

Please note that you may choose to use other versions of tensorflow and/or keras, but make sure that they are compatible.

How to Run the Code?

All implementations are in python, with deep learning models developed in Keras using Tensorflow as backend. For non-deep learning baselines (i.e., LR and GBC) you can run codes on CPU machines. But for deep-learning models, we recommend using GPU machines to speed-up the process.

Acknowledgment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.