mjuchli / ctc-executioner Goto Github PK

Master Thesis: Limit order placement with Reinforcement Learning

Python 1.22% Jupyter Notebook 95.93% JavaScript 0.02% TypeScript 0.04% TeX 2.78%

openai-gym openai-gym-environment openai-gym-agents execution-strategy reinforcement-learning order-placement limit-order-book match-engine dqn q-learning

ctc-executioner's Introduction

Order placement with Reinforcement Learning

CTC-Executioner is a tool that provides an on-demand execution/placement strategy for limit orders on crypto currency markets using Reinforcement Learning techniques. The underlying framework provides functionalities which allow to analyse order book data and derive features thereof. Those findings can then be used in order to dynamically update the decision making process of the execution strategy.

The methods being used are based on a research project (master thesis) currently proceeding at TU Delft.

Documentation

Comprehensive documentation and concepts explained in the academic report

For hands-on documentation and examples see Wiki

Usage

Load orderbooks

orderbook = Orderbook()
orderbook.loadFromEvents('data/example-ob-train.tsv')
orderbook.summary()
orderbook.plot(show_bidask=True)

orderbook_test = Orderbook()
orderbook_test.loadFromEvents('data/example-ob-test.tsv')
orderbook_test.summary()

Create and configure environments

import gym_ctc_executioner
env = gym.make("ctc-executioner-v0")
env.setOrderbook(orderbook)

env_test = gym.make("ctc-executioner-v0")
env_test.setOrderbook(orderbook_test)

ctc-executioner's People

Contributors

Stargazers

Watchers

ctc-executioner's Issues

[Framework] Support different features in Environment

Allow to define various features in order to support Q-Learning and different DQN approaches

[RL] Improve reward function

Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.

[Framework|RL] Introduce non-linearity and noise in artificial data set

Currently the artificial data set configurator only allows to create linear price trends (up/down/flat).
We should also be able to simulate a

1. price trend according to some function f (e.g. sin, or a step function).
2. and introduce some sort of noise

With that we can show that the model is capable of approximating more complex functions (1) and can generalize on the main pattern (2).

[Admin] Meeting on 14.03.18

Agenda

Research questions (material A, B, C):

rule out possible pitfalls and/or ambiguities
make sure we rise important/valid questions
being able to (re)define the questions precisely

Process/Pipeline (material E)

discuss the process such that I can work future months really targeted
and make sure it will cover the defined questions

Patterns in orders (material D)

reason about an attempt to classify the impact on order execution based on past order/trade behaviour

Schedule talk

Material

A) Understand the basic statistical definitions of the order execution problem: http://discovery.ucl.ac.uk/1359852/1/Chaiyakorn%20Yingsaeree%20-%20Thesis.pdf

Page 179-180: 7.3 Framework for an order placement strategy
Page 185-187: 7.4.3 Empirical unconditional model using density estimation
(I think those 4 pages define the difficulties of order execution clearly and is therefore a good starting point to remind ourselves what the context really is about).

B) Roughly be able to follow the demonstrated order executions on the artificial order books:
Section "Expected Execution on artificial prices" in https://github.com/backender/ctc-executioner/blob/master/notebooks/order_execution_behaviour.ipynb
e.g. understand that

BUY orders can be placed lower when the price falls
BUY orders must be placed higher when the price rises
SELL orders can be placed higher when the price rises
SELL orders must be placed lower when the price falls

C) Roughly be able to follow my RL approach demonstrated in https://github.com/backender/ctc-executioner/blob/master/notebooks/analysis_average_price.ipynb
e.g. understand that I segmented time and inventory to be able to cancel and resubmit a LIMIT order with the unexecuted inventory at another price level (or submit a MARKET order if time is consumed)

D) Have a look at the order behaviour demonstrated in https://github.com/backender/ctc-executioner/blob/master/notebooks/understanding_events.ipynb
Sorry this is not documented yet but just note that

patterns evolve in the volume map of created or cancelled limit orders
big trades might accelerate price movement in short term

E) Have a look at the draft of the research objectives (https://github.com/backender/ctc-executioner/wiki#research-objectives) as well as the optimization process with which the questions could be answered (https://github.com/backender/ctc-executioner/wiki/5.-Optimization#process).

[Framework] Inventory decreases after partial execution for each step

Instead of decreasing the inventory only after the step at which a partial fill was done, the environment decreases the inventory for each subsequent step.
For example here: the inventory should remain at 0.9

[Framework] Create adapter for real-time market data

[Framework] Rename action related classes

Previously name classes with the name Action actually indicate an Execution.
Therefore a renaming of all the related files and classes should be done.

Preferably, a class called ExecutionSet should be created which features a list of Execution that is filled over the course of an episode. But that is low prio, for now it works with updating orders.

len() not defined for object type None

Hi - very cool repo, I have an issue in that inside the orderbook.py I am trying to pull my info from a orderbook csv file with pre-existing historical data, and it will not allow the generateDict to actually generate a dict due to the clause stating that if the len(self.dictbook< index, and self.dictbook is defined as 'None' in the beginning of the code. Same issue happens later on in different part of ipynb file remake with external orderbook, is there anyway to fix this? I tried replacing with self.dictbook={} and self.dictbook=[], neither fixes the issue.

The issue is always in this part of the code :
~/ctc-executioner/ctc_executioner/orderbook.py in getDictState(self, index)
195 if len(self.dictBook) <= index:
196 raise Exception('Index out of orderbook state.')
--> 197 return self.dictBook[list(self.dictBook.keys())[index]]
198
199 def summary(self):

[Framework] Refactor

Once the academic part is finished, I'll do a code refactoring.
I think the separation between the RL environment and components are okay.
However, at least the following needs to be done:

Orderbook: either make use of existing solution as suggested in #1 or remove unnecessary loading functions and reduce current complexity due to the use of 3 different data structures (list of OrderbookStates, dictionary of OrderbookStates, and dictionary of trades).
MatchEngine: make use of a compatible open source match engine or introduce concurrency.
Consistent naming: #17

[Research] (Re)define research questions

Research questions have to be redefined precisely. Let's try to define the questions such that the experiments done so far contribute towards answering the questions substantially.

Furthermore, a high-level process pipeline should be laid out which not only serves as an overview of the capabilities of the final product but also leads me to work targeted towards answering the research questions and build a fully functional executioner.

[RL] Extend feature set

As a first step in order to extend the features to be used during the learning process, incorporate:

Volume
Fluctuation

In a subsequent step, and in combination with #5 , the aim is to train on:

bids and asks from the previous order book states

[Analysis] Understanding order book event data

The goal is to get an understanding of order book event data. With that, I will hopefully be able to get an intention on what features (e.g. signals) might be extracted and being fed to the reinforcement learner.

A good starting point might be to replicate: http://rickyhan.com/jekyll/update/2017/09/24/visualizing-order-book.html
An even more detailed analysis has been done here: http://parasec.net/transmission/order-book-visualisation/

[Framework] PyLimitBook as order book implementation

Use given implementation from https://github.com/danielktaylor/PyLimitBook and fix breaking changes from other components, including match engine.

[Research] Quote cancellation

Was a quote cancelled and replaced at a different price level, or cancelled without replacement?

[Framework] Visualize execution

In order to analyze executions and the involved order placements in greater detail, i think it makes sense to visualize the actions taken by the learner.
An extended order book plot which shows order placements and the resulted rewards might help.

[Admin] Meeting on 15.01.18

Progress

Framework
- Orders (incl. Types)
- Order book
- Match Engine
- Action State
- Action
- Actions Space
Reward functions tested with Q-Learning
- Cumulative Reward
- Profit on backtest

Next steps

Increase Action State, e.g. more features
Use policy gradient, e.g. recurrent neural nets
Imitation Learning

Questions

Feedback on reward function
Feedback on evaluation techniques
Currently learns on random timestamp in order book. Would sequential learning and testing make sense?

[Research] Define time horizon

Currently I am concerned about the time horizon of order executions.
I came to realize that previous research focusses, if specified, on various time horizons in order to get the orders executed:

>=1 day: [1]
>=1 hour: [2, 4]
>5 minutes: -
>=1 second: [3]
<1 second (HFT): [5]

[1] Reinforcement Learning for Optimized Trade Execution
[2] Optimal Trade Execution: An Evolutionary Approach
[3] Multiple Kernel Learning on the Limit Order Book
[4] Modeling Stock Order Flows and Learning Market-Making from Data
[5] “Market making” in an order book model and its impact on the spread

The ambition of this project has always been to optimize execution on a seconds/minute basis (e.g. 0-5 minutes), which was simply evolved through my personal desires as an individual trader.
However, this intention has to be backed by other traders wish to execute their orders within this time horizon.

[Framework] Refactor ActionSpace

Use OpenAI/gym or NervanaSystems/coach in order to have a standardized reinforcement learning environement.

[RL] Build deep RL model

The idea is to lay out a DQL setup which allows to train a neural network and then predict limit level for a given state.

https://ai.intel.com/demystifying-deep-reinforcement-learning/
https://medium.freecodecamp.org/deep-reinforcement-learning-where-to-start-291fb0058c01
http://karpathy.github.io/2016/05/31/rl/
http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
https://keon.io/deep-q-learning/
https://github.com/farizrahman4u/qlearning4k
https://keon.io/deep-q-learning/#Implementing-Mini-Deep-Q-Network-DQN
https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html (good math explanation)

[Framework] Order book event data

Accumulate order book event data and create import function in order to generate an Orderbook.

[agent_keras_rl.py] ValueError: Error when checking : expected reshape_1_input to have 6 dimensions, but got array with shape (1, 1, 51, 10, 2)

Hello Mr. Marc Juchli,

Good day.

When I execute python3 agent_keras_rl.py

agent_keras_rl.py source code:

import logging
import numpy as np

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

from ctc_executioner.order_side import OrderSide
from ctc_executioner.orderbook import Orderbook
from ctc_executioner.agent_utils.action_plot_callback import ActionPlotCallback
from ctc_executioner.agent_utils.live_plot_callback import LivePlotCallback

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, LSTM, Reshape
from keras.optimizers import Adam, SGD
from keras import regularizers
from keras import optimizers
from collections import deque
import gym

#logging.basicConfig(level=logging.INFO)

from rl.callbacks import Callback
class EpsDecayCallback(Callback):
    def __init__(self, eps_poilcy, decay_rate=0.95):
        self.eps_poilcy = eps_poilcy
        self.decay_rate = decay_rate
    def on_episode_begin(self, episode, logs={}):
        self.eps_poilcy.eps *= self.decay_rate
        print('eps = %s' % self.eps_poilcy.eps)

def createModel():
    # Neural Net for Deep-Q learning Model
    model = Sequential()
    model.add(Reshape((env.observation_space.shape[0], env.observation_space.shape[1]*2), input_shape=(1, 1)+env.observation_space.shape))
    #model.add(Flatten(input_shape=(env.observation_space.shape[0], env.observation_space.shape[1], env.observation_space.shape[2])))
    model.add(LSTM(512, activation='tanh', recurrent_activation='tanh'))
    #model.add(Dense(4*env.bookSize*env.lookback))
    #model.add(Dense(env.bookSize*env.lookback))#, kernel_regularizer=regularizers.l2(0.01), activity_regularizer=regularizers.l1(0.01)))
    #model.add(Dense(4*env.bookSize))
    #model.add(Activation('relu'))
    #model.add(Flatten())
    model.add(Dense(len(env.levels)))
    model.add(Activation('linear'))
    #model.compile(optimizers.SGD(lr=.1), "mae")
    model.summary()
    return model

def loadModel(name):
    # load json and create model
    from keras.models import model_from_json
    json_file = open(name + '.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights(name + '.h5')
    print('Loaded model "' + name + '" from disk')
    return model

def saveModel(model, name):
    # serialize model to JSON
    model_json = model.to_json()
    with open(name + '.json', "w") as json_file:
        json_file.write(model_json)
    # serialize weights to HDF5
    model.save_weights(name + '.h5')
    print('Saved model "' + name + '" to disk')



# # Load orderbook
orderbook = Orderbook()
orderbook.loadFromEvents('data/events/ob-train.tsv')
orderbook_test = orderbook
orderbook.summary()

import datetime
orderbook = Orderbook()
config = {
    'startPrice': 10000.0,
    # 'endPrice': 9940.0,
    'priceFunction': lambda p0, s, samples: p0 + 10 * np.sin(2*np.pi*10 * (s/samples)),
    'levels': 50,
    'qtyPosition': 0.1,
    'startTime': datetime.datetime.now(),
    'duration': datetime.timedelta(seconds=1000),
    'interval': datetime.timedelta(seconds=1)
}
orderbook.createArtificial(config)
orderbook.summary()
#orderbook.plot(show_bidask=True)


import gym_ctc_executioner
env = gym.make("ctc-executioner-v0")
import gym_ctc_marketmaker
#env = gym.make("ctc-marketmaker-v0")
env.setOrderbook(orderbook)

#model = loadModel(name='model-sell-artificial-2')
#model = loadModel(name='model-sell-artificial-sine')
model = createModel()
nrTrain = 100000
nrTest = 10

policy = EpsGreedyQPolicy()
memory = SequentialMemory(limit=5000, window_length=1)
# nb_steps_warmup: the default value for that in the DQN OpenAI baselines implementation is 1000
dqn = DQNAgent(model=model, nb_actions=len(env.levels), memory=memory, nb_steps_warmup=100, target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

cbs_train = []
cbs_train = [LivePlotCallback(nb_episodes=20000, avgwindow=20)]
dqn.fit(env, nb_steps=nrTrain, visualize=True, verbose=2, callbacks=cbs_train)
saveModel(model=model, name='model-sell-artificial-sine')

#cbs_train = []
#cbs_test = []
#cbs_test = [ActionPlotCallback(nb_episodes=nrTest)]
#dqn.test(env, nb_episodes=nrTest, visualize=True, verbose=2, callbacks=cbs_test)

Which lead to the following error message:

python3 agent_keras_rl.py 
Using TensorFlow backend.
Attempt to load from cache.
Order book in cache. Load...
Number of states: 4883
Duration: 5510.176
States per second: 0.8861785903027416
Change of price per second: 4.8450654909561335
Number of states: 1001
Duration: 1000.0
States per second: 1.001
Change of price per second: 0.39930722266504565
[2019-05-28 14:28:51,734] Making new env: ctc-executioner-v0
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_1 (Reshape)          (None, 51, 20)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 512)               1091584   
_________________________________________________________________
dense_1 (Dense)              (None, 101)               51813     
_________________________________________________________________
activation_1 (Activation)    (None, 101)               0         
=================================================================
Total params: 1,143,397
Trainable params: 1,143,397
Non-trainable params: 0
_________________________________________________________________
2019-05-28 14:28:52.824047: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Training for 100000 steps ...
Traceback (most recent call last):
  File "agent_keras_rl.py", line 114, in <module>
    dqn.fit(env, nb_steps=nrTrain, visualize=True, verbose=2, callbacks=cbs_train)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/rl/core.py", line 160, in fit
    action = self.forward(observation)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/rl/agents/dqn.py", line 217, in forward
    q_values = self.compute_q_values(state)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/rl/agents/dqn.py", line 69, in compute_q_values
    q_values = self.compute_batch_q_values([state]).flatten()
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/rl/agents/dqn.py", line 64, in compute_batch_q_values
    q_values = self.model.predict_on_batch(batch)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/keras/models.py", line 1039, in predict_on_batch
    return self.model.predict_on_batch(x)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/keras/engine/training.py", line 1946, in predict_on_batch
    self._feed_input_shapes)
  File "/home/dragon/quant/python/ctc-executioner/venv/lib/python3.6/site-packages/keras/engine/training.py", line 113, in _standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking : expected reshape_1_input to have 6 dimensions, but got array with shape (1, 1, 51, 10, 2)

I am using Ubuntu 18.04.2 x64 and Python version 3.6.7

Thanks