The ndx-forecasing-using-lstm from slaniewski

About the Project

Rolling LSTM modelling framework for stock data prediction using candlestick data, technical indicators, and macroeconomic indicator.

Requirements & Run

Install python >= 3.9.* and latest pip. Preferably using miniconda.
Install required padckages using pip: pip install -r requirements.txt
Download data and place it to data/raw. Otherwise, sample data is already in the repository.
Run main.py

Example financial time series data: (NDX OHLCV candlestick) [yahoo finance]
Initial Claims time series data [Initial Claims - Federal Reserve Bank of St. Louis]

Description of modules

1. src.data.preprocessing

Input: raw datasets (csv): config[raw] -> data/raw

This module preprocesses and joins datasets: 
OHLCV, Initial Claims (ICSA), Technical Indicators, and transforms the target variable.

Output: joined dataset (pkl, csv) config[prep][JoinedDfPkl] -> data/input/joined.pkl

2. src.data.windowSplit

Input: joined dataset (pkl) config[prep][JoinedDfPkl] -> data/input/joined.pkl

This module splits the dataset into train and test windows. 
There are 3 parameters to consider: lookback, train-window, test-window.
First of all, data is divided into train-test windows in a way that training period of next window moves 
over a test period of the previous window (see diagram #1 below).

Moreover, each train and test period is also handled using the rolling window approach (see diagram #2 below).

This approach utilizes lookback period which allows the model to train on small batches of recent data.
At the end, the code with default config should generate arrays with following dimensions:
Train window (features, targets): (N, train, look_back, n_feat), (N, train, look_forward, n_targets)
Test window (features, targets): (N, test, look_back, n_feat), (N, test, look_forward, n_targets)
where:
- N                  = resulting number of train-test windows
- look_back          = look-back period for feature matrix in each train window
- look_forward       = how many days ahead should the model predict the target (default = 1) (target period in diagram above)
- n_feat, n_targets  = number of features / targets in joined dataset
- train, test        = train, test periods

Default settings example:
Train window dimensions (features, targets): (70, 504, 63, 19), (70, 504, 1, 1)
Test window dimensions (features, targets): (70, 126, 63, 19), (70, 126, 1, 1)

Output: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

3. src.model.modelFitPredict

Input: window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl

The module utilizes keras Sequential model:
builds the framework, trains it on window data, and generates predictions using hyperparameters from config.ini.

Output:

numpy array of predictions (pkl): config[prep][PredictionsArray] -> data/output/latest_preds.pkl
data-to-evaluate (csv, pkl): data/output/model_eval_data_<timestamp>.pkl
model configuration (json): reports/model_config_<timestamp>.json

3.1. src.model.performanceMetrics

Input: data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl

Calculates Equity Line and performance metrics: 
- Annualized Return Ratio, 
- Annualized Standard Deviation, 
- Information Ratio, 
- Maximum Loss Duration

Output:

Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl

4. src.visualization.plotResults

Input:

data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl
window split dictionary (pkl): config[prep][WindowSplitDict] -> data/input/window_split.pkl
model configuration (json): reports/model_config_<timestamp>.json
Performance metrics dictionary (json): reports/performance_metrics_<timestamp>.json
Equity Line array (pkl): data/output/eq_line_<timestamp>.pkl

Visualizes results. 
Includes information about model configuration, comparison between real vs predicted data, and performance metrics.

Output:

Equity Line plot (png): reports/figures/equity_line_<timestamp>.png
Predictions histogram (png): reports/figures/predictions_histogram_<timestamp>.png

Remarks

Further improvements to be included:

Averaging the results from many runtimes (random seed cannot be currently set due to the large amount of stochastic processes)
Hyper-param tuning between windows
Real time approach

slaniewski / ndx-forecasing-using-lstm Goto Github PK

ndx-forecasing-using-lstm's Introduction

About the Project

Requirements & Run

Description of modules

Remarks

License

ndx-forecasing-using-lstm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent