Git Product home page Git Product logo

sparkml-flights-delay's Introduction

project-logo
sparkml-flights-delay

Predicting the arrival delay time of a commercial flights using Apache Spark MLlib

Getting startedValidation processAuthorsLicense

Getting started

The easiest way to run this project is by cloning the project locally, create a fat jar using Maven and executing the shell script that can be found on the project's root directory.

mvn clean package
./run.sh

It is possible to active/deactivate the explore stage with the --explore flag (add/remove this flag inside the run.sh script).

The output should be similar to the following one:

project-demo

You can also import it to your favourite IDE, but keep in mind that the program requires one argument, which is the dataset to process. You can find multiple valid datasets at this link: Airline On-Time Statistics and Delay Causes.

Be aware that it can take a lot of time with a large dataset (14 models are trained with 10 folds cross-validation). This is why we included a small tuning.csv file in the raw folder. Please, consider using this dataset to check that the program works properly.

Validation process

The general workflow on the program is shown in the image below:

project-flow

Hyperparameter tuning and model selection are carried out using cross-validation on the training dataset. In this stage, a grid search is performed using two different models: Linear Regression and Random Forest (you can add your own extending the CVTuningPipeline class). Finally, the test error of the best model is obtained using the test set.

Authors 🇪🇸 💙 🇮🇹

  • Fernando Díaz
  • Giorgio Ruffa

License

This project is licensed under the MIT License - see the LICENSE.md file for details

sparkml-flights-delay's People

Contributors

fediazgon avatar gioruffa avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

josemprb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.