Git Product home page Git Product logo

ndx-forecasing-using-lstm's Introduction

About the Project

Rolling LSTM modelling framework for stock data prediction using candlestick data, technical indicators, and macroeconomic indicator.

Requirements & Run

  1. Install python >= 3.9.* and latest pip. Preferably using miniconda.

  2. Install required padckages using pip: pip install -r requirements.txt

  3. Download data and place it to data/raw. Otherwise, sample data is already in the repository.

  4. Run main.py


Description of modules

1. src.data.preprocessing

Input: raw datasets (csv): config[raw] -> data/raw

This module preprocesses and joins datasets: 
OHLCV, Initial Claims (ICSA), Technical Indicators, and transforms the target variable. 

Output: joined dataset (pkl, csv) config[prep][JoinedDfPkl] -> data/input/joined.pkl

2. src.data.windowSplit

Input: joined dataset (pkl) config[prep][JoinedDfPkl] -> data/input/joined.pkl

This module splits the dataset into train and test windows. 
There are 3 parameters to consider: lookback, train-window, test-window.
First of all, data is divided into train-test windows in a way that training period of next window moves 
over a test period of the previous window (see diagram #1 below).

Windows-Train-Test-split

Moreover, each train and test period is also handled using the rolling window approach (see diagram #2 below).

Windows-Train-Test-split

This approach utilizes lookback period which allows the model to train on small batches of recent data.
At the end, the code with default config should generate arrays with following dimensions:
Train window (features, targets): (N, train, look_back, n_feat), (N, train, look_forward, n_targets)
Test window (features, targets): (N, test, look_back, n_feat), (N, test, look_forward, n_targets)
where:
- N                  = resulting number of train-test windows
- look_back          = look-back period for feature matrix in each train window
- look_forward       = how many days ahead should the model predict the target (default = 1) (target period in diagram above)
- n_feat, n_targets  = number of features / targets in joined dataset
- train, test        = train, test periods

Default settings example:
Train window dimensions (features, targets): (70, 504, 63, 19), (70, 504, 1, 1)
Test window dimensions (features, targets): (70, 126, 63, 19), (70, 126, 1, 1)

Output: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

3. src.model.modelFitPredict

Input: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

The module utilizes keras Sequential model:
builds the framework, trains it on window data, and generates predictions using hyperparameters from config.ini. 

Output:

  • numpy array of predictions (pkl): config[prep][PredictionsArray] -> data/output/latest_preds.pkl
  • data-to-evaluate (csv, pkl): data/output/model_eval_data_<timestamp>.pkl
  • model configuration (json): reports/model_config_<timestamp>.json

3.1. src.model.performanceMetrics

Input: data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl

Calculates Equity Line and performance metrics: 
- Annualized Return Ratio, 
- Annualized Standard Deviation, 
- Information Ratio, 
- Maximum Loss Duration

Output:

  • Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
  • Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl

4. src.visualization.plotResults

Input:

  • data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl
  • window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl
  • model configuration (json): reports/model_config_<timestamp>.json
  • Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
  • Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl
Visualizes results. 
Includes information about model configuration, comparison between real vs predicted data, and performance metrics.

Output:

  • Equity Line plot (png): reports/figures/equity_line_<timestamp>.png
  • Predictions histogram (png): reports/figures/predictions_histogram_<timestamp>.png

Remarks

Further improvements to be included:

  • Averaging the results from many runtimes (random seed cannot be currently set due to the large amount of stochastic processes)

  • Hyper-param tuning between windows

  • Real time approach

License

MIT License | Copyright (c) 2021 Jan Androsiuk

ndx-forecasing-using-lstm's People

Contributors

janandrosiuk avatar slaniewski avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.