Git Product home page Git Product logo

idao-2020-qualifiers's Introduction

IDAO 2020 qualification phase

This repository contains my team's solution to the 2020 edition of the International Data Analysis Olympiad (IDAO). Our team is called Data O Plomo.

Overall we ranked 2nd on track 1, 1st on track 2, and 1st overall. We used the same model for both tracks. Our model is very simple, and is basically an autoregressive linear regression with a few bells and whistles.

Usage

You first want to unzip the data folder into data/. You should thus have data/train.csv, data/Track 1, and data/Track 2 on your path.

We built several simple models which are each contained in a Jupyter notebook. You can either open them and execute them manually, or programmatically by using nbconvert.

jupyter nbconvert \
    --execute auto-regression.ipynb \
    --to notebook \
    --inplace \
    --ExecutePreprocessor.timeout=-1 \
    --debug
    
jupyter nbconvert \
    --execute cycle_regression.ipynb \
    --to notebook \
    --inplace \
    --ExecutePreprocessor.timeout=-1 \
    --debug

Each notebook will produces validation scores as well as submission files, both of which are stored in the results directory. For instance, auto-regression.ipynb will output results/ar_track_1.csv (which is the submission file) and results/ar_val_scores.csv (which are the validation scores).

We can now blend the submissions. This will produce a submission file named track_1_blended.csv in the results directory.

python results/blend_track_1.py

Finally, the submission for track 2 can be obtained by zipping the track_2 directory. The latter contains a file named ar_models.pkl which is produced by the auto-regression.ipynb notebook.

rm -f track_2/*.csv track_2/*.dat  # remove unnecessary artifacts
zip -jr results/track_2.zip track_2

Track 2 performance profiling

The goal of track 2 was to implement a model which could make predictions on the test set in less than 60 seconds with under 500 MB of RAM. We used the memory_profiler package for measuring the memory consumption of our script for track 2.

cd track_2
pip install memory_profiler
mprof run python main.py
mprof plot --output ../results/track_2_memory_usage.png

track_2_memory_usage

As for speed, we used a rule of thumb, which is that the Yandex machine used for running our code is 20 seconds slower than our machine. We thus checked that our code took at most 40 seconds to run on our machine. For reference, our CPU model is Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz.

idao-2020-qualifiers's People

Contributors

maxhalford avatar raphaelsty avatar vaysserobin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.