Git Product home page Git Product logo

car-destination-prediction's Introduction

Car destination prediction model

The goal of this project is to create a predictive model to predict the position destination of a car based on a date and a starting position.

System requirements

The has been done using Python and Spark as the main technologies. To be able to run the notebooks you need to have installed the following:

  • Spark 2.2
  • Python 3.6. Also, these packages are required:
    • notebook
    • findspark
    • numpy
    • pandas
    • scikit-learn
    • python-geohash
    • matplotlib
    • gmaps

If you have Anaconda installed in your computer, you can easily get your Python environment ready by loading python-environment.yml, which contains all the dependencies. You can do it by simply running:

conda env create -f python-environment.yml

Although is not necessary to perform the data processing and running the model, you will need a Google Maps Javascript API Key to visualize maps with gmaps in some notebook. After you activate it in the Google Developers Console, you must add it to your environment by:

export GOOGLE_API_KEY=[Your fantastic API KEY goes here]

Project structure

The project contains the following type of files:

  • Jupyter notebooks. They contain the code for the project implementation. You will better understand the project by following in this order:
    • data-cleansing.ipynb: Contains the code for read and explore the raw dataset, make some data cleanup transformations and visualization (maps). It produces as a result the file processed-dataset.csv
    • features-preparation: Normalize the data, expands dimensionality, and in general compute new features which could be useful depending on the model that choose later. It produces featured-dataset.csv
    • random-forest-model.ipynb: Implements Random Forest Prediction Model.
    • k-nearest-model.ipynb: Implements K-Nearest Neighbor Prediction Model.
  • Python script.
    • predict-destination.py: This script runs the models generated in the notebooks to predict the heading of a vehicle based on its starting position and time.
  • Models. The trained models are stored in the following files:
    • random_forest_model.pkl
    • k_nearest_model.pkl
  • Analysis Documentation. There is a PDF file which details all the analysis, decision making, and discuss the code of the implementation: predictive-analytics-connected-car.pdf

Running the models

To ease the evaluation of the model, I've created a simple script in Python so that you can play with different values and see the prediction.

To run the script, Spark is not needed, and only numpy, scikit-learn and geohash Python packages are required. However, if you loaded the environment which I provided with the project, you'll have everything you need to go.

From a command line if you type:

./predict-destination.py -h

You will get help on how to use it:

usage: predict-destination.py [-h] {forest,knn} time latitude longitude

positional arguments:
  {forest,knn}  Predictive model to use, can be either forest or knn
  time          Start trip time, with the format "yyyy-MM-dd HH:mm:ss". It
                must be between quotation marks. For instance, you coud use:
                "2017-05-24 12:26:37"
  latitude      Latitude of the trip start position. For instance, you could
                use: 47.409291
  longitude     Longitude of the trip end position. For instance, you could
                use: 8.546942

optional arguments:
  -h, --help    show this help message and exit

For example, if you wanted to make a prediction using the K-Nearest Neighbor Model:

./predict-destination.py knn "2017-05-29 18:23:27" 32.989318 -97.263840

And that's all. Enjoy the code! Feedback is welcome ;-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.