aminhp / gym-anytrading Goto Github PK

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

License: MIT License

Python 100.00%

openai-gym reinforcement-learning q-learning dqn trading trading-environments forex stocks gym-environments trading-algorithms

gym-anytrading's Introduction

gym-anytrading

AnyTrading is a collection of OpenAI Gym environments for reinforcement learning-based trading algorithms.

Trading algorithms are mostly implemented in two markets: FOREX and Stock. AnyTrading aims to provide some Gym environments to improve and facilitate the procedure of developing and testing RL-based algorithms in this area. This purpose is obtained by implementing three Gym environments: TradingEnv, ForexEnv, and StocksEnv.

TradingEnv is an abstract environment which is defined to support all kinds of trading environments. ForexEnv and StocksEnv are simply two environments that inherit and extend TradingEnv. In the future sections, more explanations will be given about them but before that, some environment properties should be discussed.

Note: For experts, it is recommended to check out the gym-mtsim project.

Installation

Via PIP

pip install gym-anytrading

From Repository

git clone https://github.com/AminHP/gym-anytrading
cd gym-anytrading
pip install -e .

## or

pip install --upgrade --no-deps --force-reinstall https://github.com/AminHP/gym-anytrading/archive/master.zip

Environment Properties

First of all, you can't simply expect an RL agent to do everything for you and just sit back on your chair in such complex trading markets! Things need to be simplified as much as possible in order to let the agent learn in a faster and more efficient way. In all trading algorithms, the first thing that should be done is to define actions and positions. In the two following subsections, I will explain these actions and positions and how to simplify them.

Trading Actions

If you search on the Internet for trading algorithms, you will find them using numerous actions such as Buy, Sell, Hold, Enter, Exit, etc. Referring to the first statement of this section, a typical RL agent can only solve a part of the main problem in this area. If you work in trading markets you will learn that deciding whether to hold, enter, or exit a pair (in FOREX) or stock (in Stocks) is a statistical decision depending on many parameters such as your budget, pairs or stocks you trade, your money distribution policy in multiple markets, etc. It's a massive burden for an RL agent to consider all these parameters and may take years to develop such an agent! In this case, you certainly will not use this environment but you will extend your own.

So after months of work, I finally found out that these actions just make things complicated with no real positive impact. In fact, they just increase the learning time and an action like Hold will be barely used by a well-trained agent because it doesn't want to miss a single penny. Therefore there is no need to have such numerous actions and only Sell=0 and Buy=1 actions are adequate to train an agent just as well.

Trading Positions

If you're not familiar with trading positions, refer here. It's a very important concept and you should learn it as soon as possible.

In a simple vision: Long position wants to buy shares when prices are low and profit by sticking with them while their value is going up, and Short position wants to sell shares with high value and use this value to buy shares at a lower value, keeping the difference as profit.

Again, in some trading algorithms, you may find numerous positions such as Short, Long, Flat, etc. As discussed earlier, I use only Short=0 and Long=1 positions.

Trading Environments

As I noticed earlier, now it's time to introduce the three environments. Before creating this project, I spent so much time to search for a simple and flexible Gym environment for any trading market but didn't find one. They were almost a bunch of complex codes with many unclear parameters that you couldn't simply look at them and comprehend what's going on. So I concluded to implement this project with a great focus on simplicity, flexibility, and comprehensiveness.

In the three following subsections, I will introduce our trading environments and in the next section, some IPython examples will be mentioned and briefly explained.

TradingEnv

TradingEnv is an abstract class which inherits gym.Env. This class aims to provide a general-purpose environment for all kinds of trading markets. Here I explain its public properties and methods. But feel free to take a look at the complete source code.

Properties:

df: An abbreviation for DataFrame. It's a pandas' DataFrame which contains your dataset and is passed in the class' constructor.

prices: Real prices over time. Used to calculate profit and render the environment.

signal_features: Extracted features over time. Used to create Gym observations.

window_size: Number of ticks (current and previous ticks) returned as a Gym observation. It is passed in the class' constructor.

action_space: The Gym action_space property. Containing discrete values of 0=Sell and 1=Buy.

observation_space: The Gym observation_space property. Each observation is a window on signal_features from index current_tick - window_size + 1 to current_tick. So _start_tick of the environment would be equal to window_size. In addition, initial value for _last_trade_tick is window_size - 1 .

shape: Shape of a single observation.

history: Stores the information of all steps.

Methods:

seed: Typical Gym seed method.

reset: Typical Gym reset method.

step: Typical Gym step method.

render: Typical Gym render method. Renders the information of the environment's current tick.

render_all: Renders the whole environment.

close: Typical Gym close method.

Abstract Methods:

_process_data: It is called in the constructor and returns prices and signal_features as a tuple. In different trading markets, different features need to be obtained. So this method enables our TradingEnv to be a general-purpose environment and specific features can be returned for specific environments such as FOREX, Stocks, etc.

_calculate_reward: The reward function for the RL agent.

_update_profit: Calculates and updates total profit which the RL agent has achieved so far. Profit indicates the amount of units of currency you have achieved by starting with 1.0 unit (Profit = FinalMoney / StartingMoney).

max_possible_profit: The maximum possible profit that an RL agent can obtain regardless of trade fees.

ForexEnv

This is a concrete class which inherits TradingEnv and implements its abstract methods. Also, it has some specific properties for the FOREX market. For more information refer to the source code.

Properties:

frame_bound: A tuple which specifies the start and end of df. It is passed in the class' constructor.

unit_side: Specifies the side you start your trading. Containing string values of left (default value) and right. As you know, there are two sides in a currency pair in FOREX. For example in the EUR/USD pair, when you choose the left side, your currency unit is EUR and you start your trading with 1 EUR. It is passed in the class' constructor.

trade_fee: A default constant fee which is subtracted from the real prices on every trade.

StocksEnv

Same as ForexEnv but for the Stock market. For more information refer to the source code.

Properties:

frame_bound: A tuple which specifies the start and end of df. It is passed in the class' constructor.

trade_fee_bid_percent: A default constant fee percentage for bids. For example with trade_fee_bid_percent=0.01, you will lose 1% of your money every time you sell your shares.

trade_fee_ask_percent: A default constant fee percentage for asks. For example with trade_fee_ask_percent=0.005, you will lose 0.5% of your money every time you buy some shares.

Besides, you can create your own customized environment by extending TradingEnv or even ForexEnv or StocksEnv with your desired policies for calculating reward, profit, fee, etc.

Examples

Create an environment

import gymnasium as gym
import gym_anytrading

env = gym.make('forex-v0')
# env = gym.make('stocks-v0')

This will create the default environment. You can change any parameters such as dataset, frame_bound, etc.

Create an environment with custom parameters

I put two default datasets for FOREX and Stocks but you can use your own.

from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL

custom_env = gym.make(
    'forex-v0',
    df=FOREX_EURUSD_1H_ASK,
    window_size=10,
    frame_bound=(10, 300),
    unit_side='right'
)

# custom_env = gym.make(
#     'stocks-v0',
#     df=STOCKS_GOOGL,
#     window_size=10,
#     frame_bound=(10, 300)
# )

It is to be noted that the first element of frame_bound should be greater than or equal to window_size.

Print some information

print("env information:")
print("> shape:", env.unwrapped.shape)
print("> df.shape:", env.unwrapped.df.shape)
print("> prices.shape:", env.unwrapped.prices.shape)
print("> signal_features.shape:", env.unwrapped.signal_features.shape)
print("> max_possible_profit:", env.unwrapped.max_possible_profit())

print()
print("custom_env information:")
print("> shape:", custom_env.unwrapped.shape)
print("> df.shape:", custom_env.unwrapped.df.shape)
print("> prices.shape:", custom_env.unwrapped.prices.shape)
print("> signal_features.shape:", custom_env.unwrapped.signal_features.shape)
print("> max_possible_profit:", custom_env.unwrapped.max_possible_profit())

env information:
> shape: (24, 2)
> df.shape: (6225, 5)
> prices.shape: (6225,)
> signal_features.shape: (6225, 2)
> max_possible_profit: 4.054407219413578

custom_env information:
> shape: (10, 2)
> df.shape: (6225, 5)
> prices.shape: (300,)
> signal_features.shape: (300, 2)
> max_possible_profit: 1.1228998536878634

Here max_possible_profit signifies that if the market didn't have trade fees, you could have earned 4.054414887146572 (or 1.1229001800089833) units of currency by starting with 1.0. In other words, your money is almost quadrupled.

Plot the environment

env.reset()
env.render()

Short and Long positions are shown in red and green colors.
As you see, the starting position of the environment is always Short.

A complete example

import numpy as np
import matplotlib.pyplot as plt

import gymnasium as gym
import gym_anytrading
from gym_anytrading.envs import TradingEnv, ForexEnv, StocksEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL


env = gym.make('forex-v0', frame_bound=(50, 100), window_size=10)
# env = gym.make('stocks-v0', frame_bound=(50, 100), window_size=10)

observation = env.reset(seed=2023)
while True:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

    # env.render()
    if done:
        print("info:", info)
        break

plt.cla()
env.unwrapped.render_all()
plt.show()

info: {'total_reward': 27.89616584777832, 'total_profit': 0.989812615901, 'position': <Positions.Long: 1>}

You can use render_all method to avoid rendering on each step and prevent time-wasting.
As you see, the first 10 points (window_size=10) on the plot don't have a position. Because they aren't involved in calculating reward, profit, etc. They just display the first observations. So the environment's _start_tick and initial _last_trade_tick are 10 and 9.

More examples

Here are some examples that mix gym-anytrading with some well-known libraries, such as Stable-Baselines3 and QuantStats, and show how to utilize our trading environments in other RL or trading libraries.

Extend and manipulate TradingEnv

In case you want to process data and extract features outside the environment, it can be simply done by two methods:

Method 1 (Recommended):

def my_process_data(env):
    start = env.frame_bound[0] - env.window_size
    end = env.frame_bound[1]
    prices = env.df.loc[:, 'Low'].to_numpy()[start:end]
    signal_features = env.df.loc[:, ['Close', 'Open', 'High', 'Low']].to_numpy()[start:end]
    return prices, signal_features


class MyForexEnv(ForexEnv):
    _process_data = my_process_data


env = MyForexEnv(df=FOREX_EURUSD_1H_ASK, window_size=12, frame_bound=(12, len(FOREX_EURUSD_1H_ASK)))

Method 2:

def my_process_data(df, window_size, frame_bound):
    start = frame_bound[0] - window_size
    end = frame_bound[1]
    prices = df.loc[:, 'Low'].to_numpy()[start:end]
    signal_features = df.loc[:, ['Close', 'Open', 'High', 'Low']].to_numpy()[start:end]
    return prices, signal_features


class MyStocksEnv(StocksEnv):
    
    def __init__(self, prices, signal_features, **kwargs):
        self._prices = prices
        self._signal_features = signal_features
        super().__init__(**kwargs)

    def _process_data(self):
        return self._prices, self._signal_features

    
prices, signal_features = my_process_data(df=STOCKS_GOOGL, window_size=30, frame_bound=(30, len(STOCKS_GOOGL)))
env = MyStocksEnv(prices, signal_features, df=STOCKS_GOOGL, window_size=30, frame_bound=(30, len(STOCKS_GOOGL)))

Related Projects

A more complicated version of anytrading with five actions, three positions, and a better reward function is developed in the DI-engine project. It is a mid-level tool (somewhere between anytrading and mtsim), appropriate for semi-experts. More information and documentation can be found here.

gym-anytrading's People

Contributors

Stargazers

Watchers

Forkers

andribiz vyorick hasanzad jerudamaja nayan96 jacketme wwxfromtju lukemshannonhill munkichung pelikhovp hgazali carlomigs ajs1ngh septumcapital profintegra lymsh codlaug kuan-li ap-rl-research timpara dsadulla sbhadade zysilence simonesalvucci ersawant bwang12 giancds jeisonbatista nborggren kelvin-76 abstractguy fallendev wjsxlb2017 mohala562 iluvmf qthen kylinliu virneo super-pirata ivanfoong henryforyou nhu2000 shubhsoni ryanbacastow kekeke29341 macdonaldezra bartoszkaszewczuk gittrainee323 webclinic017 glongh boogies gloomystar zhuzhenping 2vpetrov smashingeric cbio71 lujoselu98 jccg mikoim lucifer2288 bionicles zhangjielun1994 albertvillanova fdoperezi rafmacalaba amineaboussalah datamining4finance farshidbalan maxcodextc vinaykachare xiaoli-chen ngoduyvu gliu92 raphaelmansuy saleemjawad breadpowder lu0x1a0 hiforex wallace-163 pjus ashbabu perishabledave sword134 josebarreiros 0trade qwang-big smikelm kenjikun watchsea ufosky-ai lgh0504 alabmh0d satanblack swarajthakur mzs0207 brezels christram xicocaio thegrapesofwrath newshah

gym-anytrading's Issues

how to enable live peper trading?

hello, is there a way to implement a live paper trading?
or just a way to feed the price in real time?

Problem with DummyVecEnv wrapped inside a VecNormalize wrapper with render_all() method

The problem is something inside the DummyVecEnv which resets the environment automatically after it is done.

Also, there was a mistake in your code. Try this:

env_maker = lambda: gym.make('forex-v0', frame_bound=(100, 5000), window_size=10)
env = DummyVecEnv([env_maker])

# Training Env
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)

# Testing Env 
env = env_maker()
observation = env.reset()

while True:
    observation = observation[np.newaxis, ...]
    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)
    # env.render()
    if done:
        print("info:", info)
        break

# Plotting results
plt.cla()
env.render_all()
plt.show()

Originally posted by @AminHP in #1 (comment)

I saw this reply on a similar problem I had with the render_all() method. Though in my case I am using a VecNormalize() wrapper around my DummyVecEnv. In the solution quoted a DummyVecEnv was made that was used for training, and then another env was instantiated for the prediction/testing that could be used with render all. In my case this won't work since I need the VecNormalize to normalize observations and reward.

env = make_vec_env(env_maker, n_envs=1, monitor_dir=log_dir)
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)

model = PPO2('MlpLstmPolicy', env, verbose=1, nminibatches=1, policy_kwargs=policy_kwargs,)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=log_dir, env=env, verbose=1)
# model = PPO2('MlpLstmPolicy', env, verbose=1)

model.learn(total_timesteps=5000, callback=callback, log_interval=10)

env.norm_reward = False
env.training = False

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

# I get the expected reward here using evaluate_policy()

plt.figure(figsize=(15,6))
plt.cla()
env.render_all()
plt.show()

# This part doesn't work because of the same error

What can I do to use render_all() method (Or any other attribute like env.history for that matter) while maintaining the VecNormalize() environment?

What is the purpose of this code? len(frame_bound) == 2 (Question)

What is the purpose of this code in forex_env?
assert len(frame_bound) == 2? I am getting this error.
What is the purpose of that? my parameters are window_size: 10 frame_bound: (50, 100)

 def __init__(self, df, window_size, frame_bound, unit_side='left'):
        
        print("df: ",df," window_size: ",window_size," frame_bound: ", frame_bound)
        assert len(frame_bound) == 2
        assert unit_side.lower() in ['left', 'right']

        self.frame_bound = frame_bound
        self.unit_side = unit_side.lower()
        super().__init__(df, window_size)

        self.trade_fee = 0.0003  # unit

PIP package not updated

This issue is just to notify that PIP package is not updated, it's still at the first commit dated 22/09/2019.

Request

I would really like to see Binary Options supported by this, and am working myself on trying and completing this (by hiring someone) as I lack the knowledge / am struggling on how to complete this. Nadex is my prefered binary options exchange, however, i can see others benefiting from something such-as iq option

Thank you!

_update_profit - function inner workings

Greetings,

First of all, Many thanks to AminHP for sharing the project.

I have trouble understanding the ForexEnv's _update_profit function.

I understand that as I am using the Euro as my base currency with EURUSD pair, I should use unit_side='left'. Am I correct with this assumption?
The _total_profit variable is updated, only when a buy action is given and the existing position is short. From these rules, I understand, that the _update_profit function takes only short trades into account, calculating the _total_profit from latest short-trades only. Is this assumption correct?

Would you please be kind and clarify me, does the _update_profit function take into account profits from long-trades with Euro currency and if it does, how does it work?

def _update_profit(self, action):
        trade = False
        if ((action == Actions.Buy.value and self._position == Positions.Short) or
            (action == Actions.Sell.value and self._position == Positions.Long)):
            trade = True

        if trade or self._done:
            current_price = self.prices[self._current_tick]
            last_trade_price = self.prices[self._last_trade_tick]

            if self.unit_side == 'left':
                if self._position == Positions.Short: 
                    
# Here the _total_profit variable is updated only if given action is Buy and existing position is Short.

                    quantity = self._total_profit * (last_trade_price - self.trade_fee)
                    self._total_profit = quantity / current_price

            elif self.unit_side == 'right':
                if self._position == Positions.Long:
                    quantity = self._total_profit / last_trade_price
                    self._total_profit = quantity * (current_price - self.trade_fee)

Clarifications on Frequency of Algo trades

I am playing around with some financial data and testing various models, and I'd like to understand some things:

Firstly, is there a way to plot all the training buy/sell positions against the prices the bot bought and sold at? I can only see reward over time plotted.
In the attachment, can you explain why the reward jumps up from 378 to 379 on the x-axis when it goes from selling a low price to buying a high price? Or is price info hidden and the bot is actually buying a low price and selling a high price in the background?
My biggest challenge is how often should the bot run because every time it is run, it generates a signal. So if I run it every day, it will generate a signal very day, if I run it every 4 days, it will generate a signal every 4 days etc.....it basically generates a signal based on the timesteps of your dataset. So if you have hourly data, is it best practice to run it every hour, and if you have daily data, is it best practice to run it every day? How can we change the reward function to penalize very short term trades and make it trade every once in a while?

Alternate way of training the agent to reduce training time (Suggestion/Discussion)

Hello,
I've been thinking whether or not it would be possible to integrate this kind of feature in gymanytrading, basically, it goes like this:

The problem

From what I know about RL the policy gradients are initialized randomly and the agent is rewarded according to its actions, within trading this potentially means that it can take millions of iterations across the dataset before it even comes up with a strategy that is remotely successful, thereafter it spends time optimizing the strategy which again can take a long time. In the end, you are presented with a model that is attempting to maximize its reward, based on the reward structure this can mean that if you are maximizing the total net worth you might end up with a model applying a scalping strategy, where you personally would have liked a model that was swing trading. So how can we potentially adjust for this and also make it faster in the process? Normally you would add several reward functions to reward it based on what kind of strategy you want it to employ, however, there might be another way.

The solution

A theoretical concept that I have been messing around within my head is that if we instead of define reward functions by networth, sortino ratio etc. we should perhaps go in our dataset and place buy and sell markers either manually or mathematically. These buy and sell markers are where you want the RL to ideally enter and exit trades, therefore we reward only the RL when it trades at these points/prices. The obvious concern here is overfitting:
First, we have to address the fact that in all forms of ML you have to split your dataset into training and testing this will allow people to see whether or not it's actually overfitting on the training dataset.
Second of all, a precision parameter could be set, this parameter would determine the range in percent from the buy and sell prices specified which we would still reward the RL for buying and selling at.

An example

Take a daily chart of Apple, if we are going to apply a swing trading strategy, the perfect buy entry would occur on the 23rd of March 2020 at price 212,61$. We would mark this as our buy entry. The perfect sell exit would occur on the 13th of July 2020 at a price of 399,82$, we then mark this as the sell exit. You would keep doing this either manually or mathematically across the entire dataset on which you want to train on.
Next, we would set a precision parameter in this case we set it to 1,5 meaning that we will still reward the agent if it buys at a price of +-1,5% from 212,61$ and sells at a price of +-1,5% from 399,82$.

The impact on the model

So how would this impact our RL? My theory is that our RL agent will be trying to create a strategy that generates entry and exit signals according to these buy and sell points, this will allow for more controllability for the user who can now specify what strategy they want to employ (swing trading, scalping etc.). Besides controllability, the RL would presumably train faster since it doesn't need to find out which entry and exit points generates the most reward (we did that for it), therefore it would instead spend its time going over the dataset to find signals that would trigger withing the precision range of our entries and exits, of course, if these signals also trigger outside the range it gets punished for it so as to avoid it just constantly generating buy and sell signals on each bar.

Final words

This is of course just my take on things and I am posting it here because of two reasons. Number 1 being that this could potentially become a unique feature only in gymanytrading (I haven't seen this elsewhere) given that the other reason I'm posting will hold its ground.
Reason number 2 is to get feedback on this idea, I am still fairly new to RL so if some of the experts out there think that this won't work because of A, B, C or D well then comment here and let's get a discussion going. After all it is in everyone's best interest to get gymanytrading to be as good as possible, even if it means implementing new ideas that might not have been tried before since they could turn out to be the best ideas.

Model learns the opposite direction, worst possible reward

Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.

Clarification: Position Prediction

I would like to clarify something. So I imported the standard environment 'stocks-v0' but then overlayed a custom environment on top of that with extra features beyond just OHVLC. I split my dataframe into training and testing. Lets say the shape[0] of my df is 100,000 with row 100,000 as the latest trade data row. If I train on the first X rows, then test on X+1 rows up to row 99,999, is the position that the model spits out, 1 or 0, for the 100,000 row?

[QUESTION] Stable Baseline render vectorized forex enviroment

Hello, I have a question...

I'm currently using the stable baselines library to train a model using your 'forex-v0' environment.

env = DummyVecEnv([lambda: gym.make('forex-v0', frame_bound=(10, 500), window_size=10)])
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=5000)

After training the model I perform a test using your code:

observation = env.reset()
while True:
        action = model.predict(observation)
        observation, reward, done, info = env.step(action)
        # env.render()
        if done:
            print("info:", info)
            break

# Plotting results
plt.cla()
env.render_all()
plt.show()

But unfortunately I get a DummyVecEnv has no render_all() which makes sense to me because now the environment is in a Vector.
The thing I don't understand is how I can call env.render_all() in the Vector.
My confusion it's because when I call env.render() everything works fine, but not when I call env.render_all()

Clarifications: Confused on how to implement in production(live)

@AminHP I am having a hard time wrapping my head around on how to implement this in a live environment for paper trading, just to hook everything up E2E.

env_maker = lambda: gym.make(
    'stocks-v0',
    df=test_df,
    window_size=window_size,
    frame_bound=(start_index, end_index)
)

The above snippet is how to create an environment for the agent/model to step through. But in order to create the environment, we have to pass in a DataFrame. In the real world, we won't know the current day OHCLV until the markets close. So how would we be able to use a trained model in a current environment with up to date data and features (observations)? Unless it's predicted actions are actually for the next day?

Side question: Why on observation = observation[np.newaxis, ...] while stepping through the env do we have to reduce observation by 1 dimension before predicting? I don't think observations (signal_features) is changing in the environment.

Thank you!

Extracting results for quantstats

Hello,
after creating a model and running results = model.learn(int(1000)) how do I use the results to compare with a benchmark in quantstats?
Currently the results doesn't hold the data that quantstats expect to be able to use in conjunction with qs.reports.html(results, "SPY", output="D:\ReinforcementLearning\BaseLines\Trading\Myreport.html")

strong baseline with extend env

def my_process_data(df, window_size, frame_bound):
    prices = df.loc[:, 'NDX'].to_numpy()
    prices[frame_bound[0] - window_size]  # validate index (TODO: Improve validation)
    prices = prices[frame_bound[0]-window_size:frame_bound[1]]
    signal_features = df.to_numpy()#np.column_stack((prices, diff))
    return prices, signal_features

class MyForexEnv(StocksEnv):
    def __init__(self, prices, signal_features, **kwargs):
        self._prices = prices
        self._signal_features = signal_features
        super().__init__(**kwargs)
    def _process_data(self):
        return self._prices, self._signal_features

window_size = 30
start_index = window_size
end_index = len(df)

#env = MyForexEnv(df=df, window_size=10, frame_bound=(start_index, end_index))
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env = MyForexEnv( prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))

env_maker = lambda: gym.make('env')

env = DummyVecEnv([env_maker])

im trying using the extended env with strong baseline but i keep getting errors:
TypeError: argument of type 'MyForexEnv' is not iterable
or

class MyForexEnv(StocksEnv):
    def __init__(self, prices = prices, signal_features = signal_features, **kwargs):
        self._prices = prices
        self._signal_features = signal_features
        super().__init__(**kwargs)
    def _process_data(self):
        return self._prices, self._signal_features
window_size = 30
start_index = window_size
end_index = len(df)
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env_maker = lambda: gym.make(MyForexEnv,prices =prices, signal_features =signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index) )
env = DummyVecEnv([env_maker])

which returns:
TypeError: argument of type 'type' is not iterable

i have also tried using just the def in the new class definition to get the data but makes no difference.
or:

class MyForexEnv(gym.ActionWrapper):
    def __init__(self, env, prices = prices, signal_features = signal_features, **kwargs):
        self.trade_fee_bid_percent = 0.05
        self.trade_fee_ask_percent = 0.05
        self._prices = prices
        self._signal_features = signal_features
        super(MyForexEnv, self).__init__(env)
    def _process_data(self):
        return self._prices, self._signal_features
env = MyForexEnv(gym.make("stocks-v0"), prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))

still not working. any idea?

flatten()

I'm finding it difficult to use this because of the need to flatten the signal_features into a single vector so as to simplify later shapes for higher order matrix multiplications.

Using this environment for Course Project

Hi @AminHP ,

I wanted to use this environment as a part of my course project for RL. So, I wanted to ask you if anything has been implemented in this environment prior to this. And if possible if you can give me some resources of algorithms implemented for this environment.

Best,
Kunal

Question: TODO on price validation

@AminHP First off, thank you for your amazing work on this. This has been very helpful for my understanding.

This is just a question on the TODO comment located in the stocks_env line 21 link and forex_env line 22 link.

What did you have in mind for "validating the indices"? Thank you.

[Question] Is having a high reward and low profit a normal case?

Hello,

Is reward calculation ok? I have a high reward but on a loss profit.

I am using stable baselines.

I am using this signal features.

def my_process_data(env):
    start = env.frame_bound[0] - env.window_size
    end = env.frame_bound[1]
    prices = env.df.loc[:, 'Close'].to_numpy()[start:end]
    # print(env.df)
    indi = Indicators(env.df)
    signal_features = env.df.loc[:, ['Close', 'Open', 'High', 'Low','Volume']].to_numpy()[start+1:end]
    #signal_features = env.df.loc[:, ['Close','Volume']].to_numpy()[start+1:end]
    
   
    rsi = indi.rsi(5,1)
    rsicolumn = rsi.to_numpy()[start:end].reshape(-1,1)
    print("rsi shape: ",rsicolumn.shape)
    signal_features = np.append(signal_features, rsicolumn, axis=1)
    
    # print(signal_features)
    return prices, signal_features

MISS

lee

How to deploy trained model as app or into back testing frameworks like backtrader to predict buy or Sell ?

How to deploy the trained OpenAI gym model for stocking trading as app or into back testing frameworks like backtrader to predict buy or Sell ?

I think there are serious issues with this ENV.

I was writing tests for this and its becoming more and more clear this gym has some serious deficiencies. I dont think anyone should be using it in production and your READM ideally would reflect that. At a base level the only 2 actions and states are long or short which is very wrong and messes with whatever algorithm is being used to train. Many algorithms depend on a gaussian action space. i.e -1 or [0, 0, 1], 0 or [0, 0, 0], 1 or [1, 0, 0].

Forex Environment profit calculation: only sell leg is applied with 3 pips commission

Hi, I notice in the forexenv, there is a commission applied at the sell leg but not at the buy leg.
May I understand the reason?

Thanks in advance for your explanation

Rgds

DayTrade

Hi .

Hello @AminHP, how are you?
I've been studying for that time and searching more and more about this world of trade and more and more, going to daytrade (intraday).
In this case, for this project, would it be possible for us to use it for training between a specific time during the day?
I have the data, but I'm not sure how to use it to start these trainings and see how it could be useful for our world!
Do you have any suggestions on how to do it, to use this data during a period of the day or even an example, like a light for my head, on how to use this project and learn more?
Thank you, sir!

TF-Agents Implementation Example

Is it possible to create an implementation for TF-Agents also ?

Multi-asset datasets

Do you plan to support multi-asset datasets?

StocksEnv profit calculation does not consider short ?

Not sure if here is the best place to ask a question.

In the _calculate_reward function, the reward does not seem to consider when shorting. Trade is true, but it does not add reward when we short and price goes down or remove reward if price goes up when shorting.

if trade:
    current_price = self.prices[self._current_tick]
    last_trade_price = self.prices[self._last_trade_tick]
    price_diff = current_price - last_trade_price

    if self._position == Positions.Long:
        step_reward += price_diff

Shouldn't it be changed to:

if trade:
    current_price = self.prices[self._current_tick]
    last_trade_price = self.prices[self._last_trade_tick]
    price_diff = current_price - last_trade_price

    if self._position == Positions.Long:
        step_reward += price_diff
    else:
        step_reward -= price_diff # Change here to account for shorting

ValueError: Cannot feed value of shape (1, 3926, 2) for Tensor 'deepq/input/Ob:0', which has shape '(?, 3927, 2)'

Hello,
with the below code I am presented with the error stated in the title. I want my window to be as big as my df frame.

custom_env = gym.make('stocks-v0', df = data, window_size = 3927, frame_bound = (1, 3927))

The error:

Traceback (most recent call last):
  File "d:\ReinforcementLearning\BaseLines\Trading\RL Trading.py", line 131, in <module>
    results = model.learn(int(100000))
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\dqn.py", line 216, in learn
    action = self.act(np.array(obs)[None], update_eps=update_eps, **kwargs)[0]
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\build_graph.py", line 159, in act
    return _act(obs, stochastic, update_eps)
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 287, in <lambda>
    return lambda *args, **kwargs: func(*args, **kwargs)[0]
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 330, in __call__
    results = sess.run(self.outputs_update, feed_dict=feed_dict, **kwargs)[:-1]
  File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
    run_metadata_ptr)
  File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 1111, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 3926, 2) for Tensor 'deepq/input/Ob:0', which has shape '(?, 3927, 2)'

Change reward function

Would it be possible to make a function comparable to add_signals for changing the reward function? It would be nice to use custom KPI's as rewards for example risk-adjusted return.

Kind regards.

Reward computation for stockEnv

Hi AminHp,

Really great work, the code is very pleasant to read. I have a question regarding the _calculate_reward function in the StocksEnvs : why is the step_reward only updated when we sell a long position ? As I understand buying after a short position should also generate profit/a loss and thus the agent should be rewarded accordingly but it is not taken into account if i'm correct. Forgive me if this is a noob question, I just got into finance and stock trading yesterday.

Best regards,

Expiration

Do believe expiration to be a significant factor in forex

Buy/Sell Step Logic Not Working in Stocks?

Hey!

Just found the repo and love it, but I am wondering what's going on re: the stocks environment.

According to the step logic in the trading_env super class:

        trade = False
        if ((action == Actions.Buy.value and self._position == Positions.Short) or
            (action == Actions.Sell.value and self._position == Positions.Long)):
            trade = True

...and after a reset, the initial position is a Short. So this to me reads that a Buy position will only be opened if the current position is a short, and a Buy action is generated. But when I review my render:

...you can see it starts with a bunch of red sells. How? Why? :'(

Also, using that same step logic, I would assume that if a Sell/Short position is already active, no other Short/Sells would be issued, but there's still consecutive red dots on that render, same as green for buys. What do these dots denote exactly?

I have another question about the quantstats report too. If the render says "Total Profit: 0.6828566" or whatever it's profited, how come the quantstats report is so down??

Thanks! Love the work!

Have you tried using multiple cpu on the Example here in A2C?

I am trying to use multiple cpu for the example provided on this link?

I tried to change the environment to multiple cpu.

env = DummyVecEnv([env_maker for i in range(16)])

But I have a problem in the done and info in stable baselines. It seems they turned into arrays.

There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.

#env = env_maker()
#observation = env.reset()

while True:
    #observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done:
        print("info:", info)
        break

------------------------------

Error:

```python
ValueError                                Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
     10 
     11     # env.render()
---> 12     if done:
     13         print("info:", info)
     14         break

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Regarding Reward and Profit

Hi,

I recently used anytrading with a custom environment and stable baseline's PPO2 algo.

After running the evaluation part 10 times my output was something like

info {'total_reward': 24392200.00000009, 'total_profit': 0.9844407417070604, 'position': 0}
info {'total_reward': 48881799.99999967, 'total_profit': 1.011612620710015, 'position': 0}
info {'total_reward': 51085300.00000165, 'total_profit': 1.013074701451891, 'position': 1}
info {'total_reward': 14793399.999999177, 'total_profit': 0.9767670021357563, 'position': 0}
info {'total_reward': 17957400.000001136, 'total_profit': 0.9815584135159401, 'position': 0}
info {'total_reward': -2354400.0000011073, 'total_profit': 0.9607471236716814, 'position': 1}
info {'total_reward': 20103799.9999998, 'total_profit': 0.9839828662099608, 'position': 0}
info {'total_reward': 19209400.000002127, 'total_profit': 0.9826626717429163, 'position': 1}
info {'total_reward': 14625800.00000124, 'total_profit': 0.9773373249065562, 'position': 1}
info {'total_reward': 53867999.99999998, 'total_profit': 1.0180095847348958, 'position': 1}

As far as i understand profit, if profit is >1 it is profit otherwise <1 is loss. My question is why is the total_reward positive in cases where the total_profit is actually <1 (loss).

Also it seems like it trades too frequently even when it shouldn't, Can we add one such action like wait to see a bigger price difference or trend. (Sorry if there's a proper term for it, I am new to trading stuff)

Unable to plot in virtual Environment

QObject::moveToThread: Current thread (0x141cf30) is not the object's thread (0x1a84fd0).
Cannot move to target thread (0x141cf30)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/jothi/Software/btgym/venv/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.

Aborted (core dumped)

Running to 1 or no trade on evaluation

I am running to 1 or no trade on evaluation. I am just using sample code in TF DQN. The collect_step will trigger trade but evaluation step in compute_avg_return only has 1 or 0 trade.


for _ in range(num_iterations):

  # Collect a few steps using collect_policy and save to the replay buffer.
  for _ in range(collect_steps_per_iteration):
    collect_step(train_env, agent.collect_policy, replay_buffer)

  # Sample a batch of data from the buffer and update the agent's network.
  experience, unused_info = next(iterator)
  train_loss = agent.train(experience).loss

  step = agent.train_step_counter.numpy()

  if step % log_interval == 0:
    print('Time = {0}, step = {1}: loss = {2}'.format(datetime.now(), step, train_loss))
  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
    print('Evaluate Time = {0}, step = {1}: Average Return = {2}'.format(datetime.now(), step, avg_return))
    returns.append(avg_return)

Problems

Muzero Integration

Hello, i was trying to integrate gym anytrading with muzero-general, but i got this error:
File "Development/Python/Muzero-GymAnytrading/self_play.py", line 137, in play_game ), f"Observation should be 3 dimensionnal instead of len(n_obs): {len(n_obs)} dimensionnal. Got observation of shape: n_obs: {n_obs}" AssertionError: Observation should be 3 dimensionnal instead of len(n_obs): 4 dimensionnal. Got observation of shape: n_obs: (1, 1, 10, 2)

Do you know what it means and how to resolve it?

Thank you,
Marco.

Confusion over results if run multiple times

Hi,
Sorry if this is the wrong place to ask this but I couldn't find anywhere else. I love the package, excellent work, and I'm sure I am doing something wrong but I wonder if you can explain why when I run the same thing multiple times I get such wildly different results. For example, running this to train / test it 10 times:

import gym
import gym_anytrading
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL

env = gym.make('forex-v0', frame_bound=(50, 100), window_size=10)

for i in range(10):
  observation = env.reset()
  while True:
      action = env.action_space.sample()
      observation, reward, done, info = env.step(action)
      if done:
          print("info:", info)
          break

I get:

info: {'total_reward': -50.99999999999439, 'total_profit': 0.9875980085384239, 'position': 0}
info: {'total_reward': 24.099999999995784, 'total_profit': 0.9886818462999193, 'position': 1}
info: {'total_reward': 24.499999999987313, 'total_profit': 0.9893252791394607, 'position': 0}
info: {'total_reward': 138.10000000000767, 'total_profit': 0.9953009801405461, 'position': 1}
info: {'total_reward': 107.10000000001328, 'total_profit': 0.9926679505350279, 'position': 1}
info: {'total_reward': 127.00000000000375, 'total_profit': 0.996177843192774, 'position': 0}
info: {'total_reward': -144.90000000000117, 'total_profit': 0.9813550423422519, 'position': 1}
info: {'total_reward': -128.90000000000293, 'total_profit': 0.9843355398695747, 'position': 0}
info: {'total_reward': 45.699999999999626, 'total_profit': 0.9912142586709967, 'position': 0}
info: {'total_reward': -39.39999999999389, 'total_profit': 0.9859639867316038, 'position': 1}

Wouldn't I expect them all to have the same position given it's the same data / training, or am I missing something fundamental here?

Thank you

Next Steps

After a good bit of looking I haven't been able to find a way to illustrate what the training process has come up with so that I can take it into next steps and write a trading script to use the methodology/model the computer came up with.

With custom imported data I've been able to get relatively consistent 'Explained Variance' close to 1. Which to my knowledge, means that the model and the actual data have very small discrepencies. Meaning the model could potentially be used to make a trading methodology that will have perhaps consistent wins.

My trouble is seeing what exactly the contents of the model the computer came up with. Using quant-stats I can easily see its performance over the test period but that's not quite as useful as seeing how exactly the computer traded to be able to achieve quoted return data.

Any guidance or advice would be appreciated!

How to implement a continuous action space?

say I wanted to train actions to be in a range between 0 and 1, whereby the number represents the percentage of the networth I should be invested in the asset. The resulting action is then to buy or sell the difference.

Visualize training behavior (feature suggestion)

Hello,
Today I stumpled upon this mighty fine github repo that allows you to visualize the training behaviour of an RL that runs on OpenAI's Gym. https://github.com/deepmind/bsuite

Just thought that it might be a useful addition to gymanytrading!

Reproducibility of result calculation (total_profit) using fixed test data for ForexEnv

Hi, I played with the forex model using this gym.
I created an RL A2C model with stable-baselines3 and tested against this gym. Somehow I always get different 'total_profit' calculation whilst 'max possible profit' is fixed. Can anybody advise on how do I tweak the code so that I can get consistent 'total_profit' result? I have tried to fix the seed using env.seed(32) and env.action_space.seed(32) but I still get different result in 'total_profit' calculation

Rgds,
Harry

Examples for RLLIB

I am struggling on how to use this environment with Ray's RLLIB.

Any idea or sample?

Issue with quantstats

Hi @AminHP
trying your code into example "a2c_quantstats.ipynb" the last part does not run and I got the following error:
File "", line 3, in
net_worth = pd.Series(env.history['total_profit'], index=df.index[start_index+1:end_index])

AttributeError: 'ForexEnv' object has no attribute 'history'

I can not figure out how to fix it, do you have any suggestion?
Thank you

[QUESTION] Error in TradeEnv running check_env() from stable baselines

Hi there,

I've encountered an issue running the following code:

import gym
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK
from stable_baselines.common.env_checker import check_env


env = gym.make('forex-v0', frame_bound=(10, 500), window_size=10)
check_env(env, warn=False, skip_render_check=False)

The output I get is:
AssertionError: The observation returned by the reset() method does not match the given observation space

I've debugged your TradingEnv class and did't see any issue, so I've thought the problem could be in check_env().

I've debugged check_env() as well, but everything seems fine there.
Then I went for the last test which was running check_env() with the classic CartPole-v0 from gym, here check_env() didn't trow any exception and run smoothly.

This is the code for the CartPole-v0:

import gym
from gym.envs.classic_control import CartPoleEnv
from stable_baselines.common.env_checker import check_env

env = gym.make('CartPole-v0')
check_env(env, warn=False, skip_render_check=False)

Do you have any clue why this is happening? I'm confused lol

Loading a saved model the right way

I just want to confirm if I am loading a saved model the right way on a test set.

So firstly I ran my preferred model and saved it

My env variable before the model load is only on the test set

Then I do

model = A2C.load(load_path, env=env)
obs = env.reset()
while True:
obs = obs[np.newaxis, ...]
action, _states = model.predict(obs)
........

Sorry I don't know how to indent lines of code in Github comments (I thought tab would do it)

How do you use your init.py to register it to gym?

How to register gym-anytrading env to Gym?

I see that you have the init.py. How do you do this?

Thank you,
Vic

no only buy and sell actions

Hello, Thanks for the great work. With all due respect I believe your assumption is faulty and there needs to be a "do nothing" event. Imagine the market goes sideways and the price variance is smaller than spread for the time longer than the window. in that case no action is the best action.

What is total_reward and total_profit

Hi, I was going over the code where when we render our results we get the result benchmarks such as

info {'total_reward': 8.100000000000023, 'total_profit': 0.7996927239889693, 'position': 1}

I just want to know what these parameters mean espically total_reward and total_profit.

stable-basellines3 example

It will be very helpful if you can point me some examples using stable-baselines3. I am still not sure how it comparable to stable-baselines as they have big warning box to compare performance. Appreciated.