Git Product home page Git Product logo

predicting_pirate_attack_success's Introduction

Predicting the Success of Pirate Attacks

This repo took shape during my Data Science course at Propulsion Academy in Zurich and was inspired by various in-class exercises. It gives a good overview of the content covered during the first month.

Image copyright: LEGO Group

Table of Contents
  1. Assignment Description
  2. Approach
  3. Results
  4. Conclusion
  5. Contact

Assignment Description

The task is to predict from a data set of 802 pirate attacks whether a pirate attack will be successful or not. This includes first exploring, cleaning and preprocessing the data, then feeding it into different models to compare their performance and finally fine-tuning and explaining the best-performing model.

Approach

The process was the following:

First, the data was cleaned (part 2 of the jupyter notebooks), then some interesting visualizations were created (part 3) to explore the data set and see if there are any trends that are clearly visible.

The data was preprocessed (part 4), which meant one-hot-encoding most of the variables, since much of the data is categorical. The selected features were then fed into seven different untuned baseline models (part 5). The model with the best performance was fine-tuned and tested (part 6).

As a final step, feature importance and SHAP values of the best model were explored (part 7) to explain how the model reaches its predictions.

Please note that the interactive visualizations in the notebooks 3 and 7 require Java Script and therefore will not render on GitHub. Instead, you can view them on nbviewer.

Results

The Random Forest Classifier performed best overall and was therefore fine-tuned. One of the goals of the fine-tuning, besides improving the performance, was to prevent overfitting. The model finally achieved a performance of about 65% balanced accuracy on the test set. The features with the biggest impact on the prediction were the longitude and latitude of the attack location.

Conclusion

As the results show, it is partially possible to predict the success or failure of pirate attacks from the limited amount of data provided. Overfitting of the model to the training data remains a problem, even after cleaning the data and partially constraining the model. This requires further work and it might also be interesting to finetune a more simple model, such as a logistic regression. It is also likely that better results could be achieved with additional data, especially since the data set covers a timespan of only 18 months.

Contact

If you find this repo interesting or would like to suggest improvements, please get in touch. I would be happy to hear from you.

predicting_pirate_attack_success's People

Contributors

alessine avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.