Comments (6)
@developeralgo8888, when good backtest generalisation results will be obtained. Nevertheless, any real-time trading interface development ideas are welcome.
from btgym.
Hi, @Kismuz ,
I am planning to take a deeper look into the GPS/imitation learning approach. Could you pls kindly share some experience with that? Does this direction show any good potential for the trading task? Thank you so much!
from btgym.
@mysl,
My implementation of GPS is quite simple, code can be found at btgym.research.gps
; notebook: examples/guided_a3c.ipynb
;
First, there is very simple Oracle class: for a backtesting training episode we know data in advance so we can estimate optimal trading strategy. It is bit complex because we need solve optimisation task taking in account all broker and account conditions. Instead I implemented 'advisor' indicating is it either time to buy, hold or sell. It uses quite primitive algorithm: just estimates local price peaks and emits signal with some repetition. Those signals can be seen at episode rendering chart if ExpertObserver
is added to strategy. So Oracle scans entire episode data just before episode starts, estimates advises and appends it to observation step-by-step.
Next we need incorporate it to our loss. Oracle signals are actually encoded to action probabilities so we can compare it against those emitted by policy; I have found that it is sufficient to estimate loss only on buy
and sell
actions and omit the rest. And we just sum this loss with base A3C loss with some lambda weight to control the strength Oracle have over algorithm.
The trick here is to find balance between guidance and actual learning.
This approach works and it works well.
Guided loss especially beneficial at early stages of learning when there is danger to stuck at local 'do nothing' solution. Guided loss effectively prevents that and almost doubles convergence speed.
It could be annealed to zero at later stages as imperfect advices can prevent from finding optimal policy, especially with such primitive advisor as mine.
I advice you to look at the code as it is very simple and play with notebook to get the feeling of GPS impact on training:
-
try to train at bigger (one year) dataset with guided_lambda=0 and see gradients dying and policy doing nothing; set lambda to 1.0 - 5.0 to see gradients remain consistent and policy improving;
-
on synthetic sine wave dataset set lower and higher lambdas and see how higher values can slow down convergence speed as policy approaches optimal (it can bee seen by how fast episode length is contracting: as policy became close to optimal, it can reach the target and terminate earlier, so episode length should drop);
from btgym.
@Kismuz Thanks for the detail explanation. I will take a look based on your advice. Does this approach help on the generalization issue you mentioned?
BTW, it looks like the recent ICLR 2018 best paper is dealing with nonstationary environments. Maybe that could be helpful in trading context too?
from btgym.
https://arxiv.org/abs/1710.03641
from btgym.
One of my pillow-books now :)
from btgym.
Related Issues (20)
- Is there any real-life cases of successful application of reinforcement learning in trading / asset management? HOT 4
- Overestimated Value Function in Actor Critic Framework HOT 7
- signal.pause() - workers exit, but signal never received -- software issue? (debian linux) HOT 16
- loading multiple features - question ? HOT 3
- Amazing project <3
- PR Request for Docker addition HOT 2
- Train Test routine sampling - IndexError HOT 2
- BTgymMultiData - Sync between different data stream HOT 5
- Discussion: Long Episode Duration HOT 3
- Tutorial: Integration with TF-Agents RL Framework HOT 4
- Erroneous static_RNN policy behavior explanation.
- 2020
- BTGym Slack Join Link Broken HOT 1
- Problem with dependencies in installation on window HOT 1
- Examples that do more that randomly selects an action?
- Support Tensorflow 2 HOT 14
- ValueError: Axis limits cannot be NaN or Inf HOT 1
- INFOS
- Use btgym custom environment
- _pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup Pandas on pandas.core.frame failed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from btgym.