Git Product home page Git Product logo

ubiquant_market_prediction's Introduction

Ubiquant market prediction

This is the the code used to run experiments for the Ubiquant market prediction competition.

The code is structured as follows:

  • ubiquant_market_prediction/: codebase
  • scripts/: scripts used to:
    • run cross validation
    • convert csv data to different formats

Dependencies are managed with poetry. To install them, run poetry install.

Cross validation experiments are tracked with mlflow, using a sqlite database.

Project structure

The following structure was used during the project

├── data # raw data from the competition
├── data_pickle # data converted to pickle
├── data_tensors # data converted to (N,T,F) tensors
├── experiments # output of run_validation.py
├── mlruns.db # tracker db
├── notebooks # notebooks used for quick experiments, analysis, etc
└── ubiquant_market_prediction # this repository

Config

The run_validation.py script requires a yaml configuration file as input (check scripts/example_configs/ for some example). Config file format is established by the ValidationConfig class defined in config.py.

Assuming you already created the necessary files using the convert_csv_to_pickle.py or the convert_df_to_tensor.py scripts, you can run a cross-validation experiment with:

poetry run python run_validation.py --validation_config <path/to/your/config>

Final submission

The final submission was a weighted average of lightgbm, lstm, and a modified version of a public solution using mlp. Weights were found using a grid search to maximize the cross-validation score.

In one of the two submissions the lightgbm models were trained online using the supplemental_train.csv.

Lstm hidden states were reset before inference on the test set due to the time gap. Inference was stateful (using an helper class to keep track of states while looping over ubiquant api, see rnn_inference_helper.py).

What worked:

  • engineer temporal features (mean/std across investment ids).
  • ensembling (different models, training periods, features, random seeds)
  • use correlation loss (for neural networks)
  • cropping target
  • fill missing data with 0s in lstm (better than ignore them in backprop)
  • bagging (lightgbm)

What didn't work:

  • attention
  • embedding of investement_id
  • use time_id as a feature
  • quantile scaling of features
  • stateful start at inference time (lstm)
  • feature bagging (lightgbm)

ubiquant_market_prediction's People

Contributors

davide-burba avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.