Git Product home page Git Product logo

hft-price-prediction's Introduction

HFT-price-prediction

A project of using machine learning model (tree-based) to predict instrument price up or down in high frequency trading.

Project Background

A data science hands-on exercise of a high frequency trading company.

Task

To build a model with the given data to predict whether the trading price will go up or down in a short future. (classification problem)

Data Explanation

Feature Columns

timestamp str, datetime string.
bid_price float, price of current bid in the market.
bid_qty float, quantity currently available at the bid price.
bid_price float, price of current ask in the market.
ask_qty float, quantity currently available at the ask price.
trade_price float, last traded price.
sum_trade_1s float, sum of quantity traded over the last second.
bid_advance_time float, seconds since bid price last advanced.
ask_advance_time float, seconds since ask price last advanced.
last_trade_time float, seconds since last trade.

Labels

_1s_side int
_3s_side int
_5s_side int
Labels indicate what is type of the first event that will happen in the next x seconds, where:
0 -- No price change.
1 -- Bid price decreased.
2 -- Ask price increased.

Process

Preprocessing

data type conversion: preprocessing()
data check: check_null()
missing value handling: fill_null(), based on the null check and basic logic, most of the sum_trade_1s null value happens when last_trade_time larger than 1 sec (in this case sum_trade_1s should be 0). Therefore, we make an assumption that all the sum_trade_1s null value could be filled with 0. Based on such assumption, last_trade_time can be filled with last_trade_time of the previous record plus a time movement if record interval is smaller than 1 sec.

Feature Engineering

correlation filter: correlation_filter.filter(), remove columns that are highly correlated to reduce data redundancy.
logical feature engineering: feature_eng.basic_features(), build up some features based on trading logic.
time-rolling feature engineering: feature_eng.lag_rolling_features(), build up features by lagging and rolling of time-series.

Feature Selection

feature_selection.select(), Hybrid approach of genetic algorithm selection plus feature importance selection.
genetic algorithm selection: feature_selection.GA_features()
feature importance selection: feature_selection.rf_imp_features()

Modelling

Ensemble of lightGBM and random forest model.
random forest: model.random_forest()
lightGBM: model.lightgbm()

Parameter Tuning

Based on search space to decide whether using grid search or genetic search for lightGBM model's parameter tuning.
grid search: model.GS_tune_lgbm()
genetic search: model.GA_tune_lgbm()

Performance

Out-of-sample classfication accuracy is roughly 76-78%, which means its prediction of the short-term future price movement is acceptable.

hft-price-prediction's People

Contributors

hzjken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hft-price-prediction's Issues

fit columns

I'm a little confused, to train the model I need to have the values of the labels filled in the CSV file(_1s_side,_3s_side,_5s_side)?
Isn't that what the model proposes to answer?

Thank's for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.