Git Product home page Git Product logo

bsuite's Introduction

Behaviour Suite for Reinforcement Learning (bsuite)

PyPI Python version PyPI version pytest

radar plot


bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent with two main objectives.

  1. To collect clear, informative and scalable problems that capture key issues in the design of efficient and general learning algorithms.
  2. To study agent behavior through their performance on these shared benchmarks.

This library automates evaluation and analysis of any agent on these benchmarks. It serves to facilitate reproducible, and accessible, research on the core issues in RL, and ultimately the design of superior learning algorithms.

Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of the experiments from a committee of prominent researchers.

For a more comprehensive overview, see the accompanying paper.

Technical overview

bsuite is a collection of experiments, defined in the experiments subdirectory. Each subdirectory corresponds to one experiment and contains:

  • A file defining an RL environment, which may be configurable to provide different levels of difficulty or different random seeds (for example).
  • A sequence of keyword arguments for this environment, defined in the SETTINGS variable found in the experiment's file.
  • A file defining plots used in the provided Jupyter notebook.

bsuite works by logging results from "within" each environment, when loading environment via a load_and_record* function. This means any experiment will automatically output data in the correct format for analysis using the notebook, without any constraints on the structure of agents or algorithms.

We collate all of the results and analysis in a pre-made jupyter notebook

Getting started

If you are new to bsuite you can get started in our colab tutorial. This Jupyter notebook is hosted with a free cloud server, so you can start coding right away without installing anything on your machine. After this, you can follow the instructions below to get bsuite running on your local machine.


We have tested bsuite on Python 3.6 & 3.7. To install the dependencies:

  1. Optional: We recommend using a Python virtual environment to manage your dependencies, so as not to clobber your system installation:

    python3 -m venv bsuite
    source bsuite/bin/activate
    pip install --upgrade pip setuptools
  2. Install bsuite directly from PyPI:

    pip install bsuite
  3. Optional: To also install dependencies for the baselines examples (excluding OpenAI and Dopamine examples), run:

    pip install bsuite[baselines]


Complete descriptions of each environment and their corresponding experiments are found in the analysis/results.ipynb Jupyter notebook.

These environments all have small observation sizes, allowing for reasonable performance with a small network on a CPU.

Loading an environment

Environments are specified by a bsuite_id string, for example "deep_sea/7". This string denotes the experiment and the (index of the) environment settings to use, as described in the technical overview section.

For a full description of each environment and its corresponding experiment settings, see the paper.

import bsuite

env = bsuite.load_from_id('catch/0')

The sequence of bsuite_ids required to run all experiments can be accessed programmatically via:

from bsuite import sweep


This module also contains bsuite_ids for each experiment individually via uppercase constants corresponding to the experiment name, for example:


In addition, sequences of bsuite_ids with the same tag can be loaded via:

from bsuite import sweep


The TAGS variable groups bsuite environments together by their underlying tag, so all the basic tasks or scale tasks can be loaded with:


Loading an environment with logging included

We include two implementations of automatic logging, available via:

We also include a terminal logger in logging/, exposed via bsuite.load_and_record_to_terminal.

It is easy to write your own logging mechanism, if you need to save results to a different storage system. See the CSV implementation for the simplest reference.

Interacting with an environment

Our environments implement the Python interface defined in dm_env.

More specifically, all our environments accept a discrete, zero-based integer action (or equivalently, a scalar numpy array with shape ()).

To determine the number of actions for a specific environment, use

num_actions = env.action_spec().num_values

Each environment returns observations in the form of a numpy array.

We also expose a bsuite_num_episodes property for each environment in bsuite. This allows users to run exactly the number of episodes required for bsuite's analysis, which may vary between environments used in different experiments.

Example run loop for a hypothetical agent with a step() method.

for _ in range(env.bsuite_num_episodes):
  timestep = env.reset()
  while not timestep.last():
    action = agent.step(timestep)
    timestep = env.step(action)

Using bsuite in 'OpenAI Gym' format

To use bsuite with a codebase that uses the OpenAI Gym interface, use the GymFromDMEnv class in utils/

import bsuite
from bsuite.utils import gym_wrapper

env = bsuite.load_and_record_to_csv('catch/0', results_dir='/path/to/results')
gym_env = gym_wrapper.GymFromDMEnv(env)

Note that bsuite does not include Gym in its default dependencies, so you may need to pip install it separately.

Baseline agents

We include implementations of several common agents in the [baselines/] subdirectory, along with a minimal run-loop.

See the installation section for how to include the required dependencies at install time. These dependencies are not installed by default, since bsuite does not require users to use any specific machine learning library.

Running the entire suite of experiments

Each of the agents in the baselines folder contains a run script which serves as an example which can run against a single environment or against the entire suite of experiments, by passing the --bsuite_id=SWEEP flags; this will start a pool of processes with which to run as many experiments in parallel as the host machine allows. On a 12 core machine, this will complete overnight for most agents. Alternatively, it is possible to run on Google Compute Platform using, steps of which are outlined below.

Running experiments on Google Cloud Platform does the following in order:

  1. Create an instance with specified specs (by default 64-core CPU optimized).
  2. git clones bsuite and installs it together with other dependencies.
  3. Runs the specified agent (currently limited to /baselines) on a specified environment.
  4. Copies the resulting SQLite file to /tmp/bsuite.db from the remote instance to you local machine.
  5. Shuts down the created instance.

In order to run the script, you first need to create a billing account. Then follow the instructions here to setup and initialize Cloud SDK. After completing gcloud init, you are ready to run bsuite on Google Cloud.

For this make executable and run it:

chmod +x

After the instance is created, the instance name will be printed. Then you can ssh into the instance by selecting Compute Engine -> Instances and clicking SSH. Note that this is not necessary, as the result will be copied to your local machine once it is ready. However, sshing might be convenient if you want to make local changes to agent and environments. In this case, after sshing, do


to activate the virtual environment. Then you can run agents via

python ~/bsuite/bsuite/baselines/dqn/ --bsuite_id=SWEEP

for instance.


bsuite comes with a ready-made analysis Jupyter notebook included in analysis/results.ipynb. This notebook loads and processes logged data, and produces the scores and plots for each experiment. We recommend using this notebook in conjunction with Colaboratory.

We provide an example of a such bsuite report here.

bsuite Report

You can use bsuite to generate an automated 1-page appendix, that summarizes the core capabilities of your RL algorithm. This appendix is compatible with most major ML conference formats. For example output run,

pdflatex bsuite/reports/neurips_2019/neurips_2019.tex

More examples of bsuite reports can be found in the reports/ subdirectory.


If you use bsuite in your work, please cite the accompanying paper:

    title={Behaviour Suite for Reinforcement Learning},
    author={Osband, Ian and
            Doron, Yotam and
            Hessel, Matteo and
            Aslanides, John and
            Sezener, Eren and
            Saraiva, Andre and
            McKinney, Katrina and
            Lattimore, Tor and
            {Sz}epesv{\'a}ri, Csaba and
            Singh, Satinder and
            Van Roy, Benjamin and
            Sutton, Richard and
            Silver, David and
            van Hasselt, Hado},
    booktitle={International Conference on Learning Representations},

bsuite's People


alexminnaar avatar aslanides avatar davidjanz avatar iosband avatar mtthss avatar rchen152 avatar suryabhupa avatar tkoeppe avatar tomhennigan avatar yasuiniko avatar yilei avatar yotam avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bsuite's Issues

Can't reproduce DQN performance

I noticed you changed the optimizer and some hyper-parameters in DQN compared to those in the "Nature" paper, well, from my side I can't reproduce results by taking any of the two settings, could you share a learning curve of "Breakout"? I have been struggling with the hyper-parameters optimization for two months. Thanks.

Adding a pendulum environment/experiment

I wanted to propose adding a pendulum experiment to bsuite. I think it fits the targeted, simple, challenging, scalable, fast criteria outlined in the bsuite paper. Also, now that #8 has been merged, DMEnvFromGym can be used to convert the Openai pendulum environment to a bsuite environment without having to reimplement it i.e. something like

env = DMEnvFromGym(gym.make('Pendulum-v0'))

If there is interest I would be happy to work on it. Also, let me know if there are any concerns with implementing homegrown environments vs importing them from third parties like Openai.

Rendering control environments

In dm_control there was the ability to render control tasks like cartpole swingup, even though the environment only had a dynamics-based observation space. It would be nice to have that ability here, especially since the environment is from dm_control.
It looks like the rendering function in the Gym wrapper just returns the last observation (in both human and rgb_array mode), which doesn't really work for a lot of tasks in bsuite when the observation is not an rgb_array.

Is there some way to see grab RGB output for bsuite environments?

For now, I hacked in the cartpole specific viewer from Gym to and It works and it doesn't look horrible, but it's not exactly ideal.

BootDQN+ not matching claimed performance

Several runs on deep_sea/0 i.e., DeepSea with N=10 take longer than 100 episodes, some even longer than 2^10=1024 episodes when running the default_agent BootDQN with no modifications.

To reproduce, this is the code I am running in Colab with a GPU runtime:

# first install bsuite[baselines]
import bsuite
from bsuite.baselines import experiment
from import dqn
from import boot_dqn

SAVE_PATH_DQN = './logs/test_boot'
env = bsuite.load_and_record("deep_sea/0", save_path=SAVE_PATH_DQN, overwrite=True)
agent = boot_dqn.default_agent(
), env, num_episodes=env.bsuite_num_episodes)

I reran this multiple times and have had a few runs with > 1024 bad episodes.

Westworld host attribute matrix

Hi DeepMind.

No issue here; just came here to say that this project scaringly reminds me od Westworlds host attribute matrix 😄

Even some attributes are similar, such as exploration == curiosity.



Documentation: Clarify mapping from high-level agent properties to experiments and environments

Hello Ian and others!

I'm having a look at bsuite after Ian Osband's talk at the Simons Institute Deep RL workshop. After spending a few minutes browsing the documentation and source code here on GitHub I had a suggestion for improving the documentation.

My first question when browsing this project is "The radar plot on the readme is lovely, I wonder what experiments contribute to a good ____________ score". Where the blank is e.g. 'generalization'.

After browsing the source code for a few minutes this isn't immediately obvious. I can see a little bit of information regarding this at the example colab notebook. It would be nice to promote this mapping to a 'first class' member of the documentation somewhere :)

ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

When running in openai_ppo
python3 ./ --bsuite_id=SWEEP

I get this error:
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

I ran and all the tests passes

  • python 3.6
  • tensorflow 1.14
  • Ubuntu 18.10

Here is the log:

[Last finished: bandit/15]: 3%|███▋ | 15/468 [34:10<88:11:07, 700.81s/it]concurrent.futures.process._RemoteTraceback:
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.6/concurrent/futures/", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.6/concurrent/futures/", line 153, in
return [fn(*args) for args in chunk]
File "./", line 81, in run
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 108, in learn
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 111, in init
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 595, in apply_gradients
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 135, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 1153, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 183, in create_zeros_slot
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 157, in create_slot_with_initializer
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/", line 65, in _create_slot_var
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/", line 1479, in get_variable
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/", line 1220, in get_variable
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/", line 547, in get_variable
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/", line 499, in _true_getter
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/", line 848, in _get_single_variable
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 111, in init
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 108, in learn
File "./", line 81, in run


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./", line 107, in
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/", line 300, in run
_run_main(main, args)
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/", line 251, in _run_main
File "./", line 100, in main
pool.map_mpi(run, bsuite_sweep)
File "/home/amdfanboy/github/bsuite/bsuite/baselines/utils/", line 53, in map_mpi
for bsuite_id in, bsuite_ids):
File "/usr/lib/python3.6/concurrent/futures/", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.6/concurrent/futures/", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.6/concurrent/futures/", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/", line 384, in __get_result
raise self._exception
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 111, in init
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/", line 108, in learn
File "./", line 81, in run


In deep_sea the threshold is set to 0.8 instead of 0.9 as the paper indicates.

The threshold is set to 0.8 by default instead of 0.9 as the paper indicates "The summary ‘score’
computes the percentage of runs for which the average regret drops below 0.9 faster than the 2^N episodes expected by dithering."

def find_solution(df_in: pd.DataFrame,
sweep_vars: Sequence[Text] = None,
merge: bool = True,
thresh: float = 0.8) -> pd.DataFrame:

Question about DQN's loss

Hi, I have one simple question about DQN's loss here.

Why do you use tf.reduce_sum instead of tf.reduce_mean here??
Are there some reasons for it? Have experiments in the paper been done calculating loss which sums over the batch??

Sorry for asking such a simple question, but I would really appreciate it if you answer my question.

Anyway, this is a great project !!
Thank you :)

`Catch._observation` does not follow the other environments with `_get_observation`

I didn't find a parent common interface for all the bsuite environments, but a common patter is to have a method get_observation to collect the current observation.

Catch, however, is the only environment to have an _observation method in place of a _get_observation one.

Is there any specific reason why?
If not, would it be reasonable to omogenise the interface and make Catch compliant?

I usually use a simple interface to interoperate between gym, bsuite, dm_env and other common libraries, and the lack of a shared interface for bsuite.Environments is an obstacle.

See also #44 for a tentative edit of Catch.
It does note modify the parent Environment yet.

Edu broken after last commit

Good morning,

You inserted an error in the file in your last commit.
I report the error I receive below when executing the pip install command.

pip install \bsuite
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\Torquato\Anaconda3\envs\nlp\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\'"'"'; __file__='"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
         cwd: C:\Users\Torquato\Downloads\bsuite\
    Complete output (1 lines):
    error in bsuite setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
ERROR: Command errored out with exit status 1: python egg_info Check the logs for full command output.

I believe that the line that causes the problem is highlighted in the code below.

baselines_jax_require = [
    'git+git://'   <------------------

Pip is not able to recognise that command as a valid requirement specifier.

Thank you very much.
Michelangelo Conserva

Cannot import Random

After installation, I was getting the 'cannot import Random', there was conflict between your bsuite/baselines/random/ and system random.Random class (as suggested here). After refactoring to random_baseline.RandomBaseline everything seems to work.

DQN mnist & mountain car performance


while working on a PyTorch DQN agent for BSuite experiments, I noticed quite bad results on the mnist and mountain car experiments. I see that a similar question was addressed here, but the thread was closed.

To further investigate, I created a new conda environment, downloaded and installed a fresh copy of BSuite and ran the DQN agent from the baselines. The only settings I've changed were "bsuite_id" to "SWEEP" and the save path.

When you compare the results from both agents with the barplot on page 16 of the BSuite manuscript, you notice that both agents have worse performance on mnist and mountaincar and better performance on catch.

Were there any changes on the environments that I missed? The DQN agent from the manuscript did use the default parameters from the baseline directory, correct?


dependency on trfl breaks TF2

The public trfl builts don't support tf2.* and hence pip install bsuite[baselines] fails.

(base) ➜  Developer conda create -y -n bsuite python=3.6.9
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.7.12
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base -c defaults conda

## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/bsuite

  added / updated specs:
    - python=3.6.9

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2020.1.1-0
  certifi            pkgs/main/osx-64::certifi-2020.4.5.1-py36_0
  libcxx             pkgs/main/osx-64::libcxx-4.0.1-hcfea43d_1
  libcxxabi          pkgs/main/osx-64::libcxxabi-4.0.1-hcfea43d_1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.2.1-h475c297_4
  ncurses            pkgs/main/osx-64::ncurses-6.2-h0a44026_0
  openssl            pkgs/main/osx-64::openssl-1.1.1f-h1de35cc_0
  pip                pkgs/main/osx-64::pip-20.0.2-py36_1
  python             pkgs/main/osx-64::python-3.6.9-h359304d_0
  readline           pkgs/main/osx-64::readline-7.0-h1de35cc_5
  setuptools         pkgs/main/osx-64::setuptools-46.1.3-py36_0
  sqlite             pkgs/main/osx-64::sqlite-3.31.1-ha441bb4_0
  tk                 pkgs/main/osx-64::tk-8.6.8-ha441bb4_0
  wheel              pkgs/main/osx-64::wheel-0.34.2-py36_0
  xz                 pkgs/main/osx-64::xz-5.2.4-h1de35cc_4
  zlib               pkgs/main/osx-64::zlib-1.2.11-h1de35cc_3

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
# To activate this environment, use
#     $ conda activate bsuite
# To deactivate an active environment, use
#     $ conda deactivate

(base) ➜  Developer conda activate bsuite
(bsuite) ➜  Developer pip install git+[baselines]
Collecting bsuite[baselines]
  Cloning to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
  Running command git clone -q /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
Processing /Users/omega/Library/Caches/pip/wheels/8e/28/49/fad4e7f0b9a1227708cbbee4487ac8558a7334849cb81c813d/absl_py-0.9.0-cp36-none-any.whl
Collecting dm_env
  Using cached dm_env-1.2-py3-none-any.whl (22 kB)
Collecting matplotlib
  Using cached matplotlib-3.2.1-cp36-cp36m-macosx_10_9_x86_64.whl (12.4 MB)
Collecting numpy
  Using cached numpy-1.18.2-cp36-cp36m-macosx_10_9_x86_64.whl (15.2 MB)
Collecting pandas
  Using cached pandas-1.0.3-cp36-cp36m-macosx_10_9_x86_64.whl (10.2 MB)
Collecting plotnine
  Using cached plotnine-0.6.0-py3-none-any.whl (4.1 MB)
Collecting scipy
  Using cached scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl (28.5 MB)
Collecting scikit-image
  Using cached scikit_image-0.16.2-cp36-cp36m-macosx_10_6_intel.whl (30.4 MB)
Collecting six
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Processing /Users/omega/Library/Caches/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6/termcolor-1.1.0-cp36-none-any.whl
Collecting dm-sonnet
  Using cached dm_sonnet-2.0.0-py3-none-any.whl (254 kB)
Collecting dm-tree
  Using cached dm_tree-0.1.4-cp36-cp36m-macosx_10_9_x86_64.whl (93 kB)
Collecting tensorflow
  Using cached tensorflow-2.1.0-cp36-cp36m-macosx_10_11_x86_64.whl (120.8 MB)
Collecting tensorflow_probability
  Using cached tensorflow_probability-0.9.0-py2.py3-none-any.whl (3.2 MB)
Collecting trfl@ git+git://
  Cloning git:// to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
  Running command git clone -q git:// /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
Collecting tqdm
  Using cached tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.2.0-cp36-cp36m-macosx_10_9_x86_64.whl (60 kB)
Collecting python-dateutil>=2.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1
  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Collecting cycler>=0.10
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting pytz>=2017.2
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Collecting descartes>=1.1.0
  Using cached descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting statsmodels>=0.9.0
  Using cached statsmodels-0.11.1-cp36-cp36m-macosx_10_13_x86_64.whl (8.4 MB)
Collecting patsy>=0.4.1
  Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting mizani>=0.6.0
  Using cached mizani-0.6.0-py2.py3-none-any.whl (61 kB)
Collecting PyWavelets>=0.4.0
  Using cached PyWavelets-1.1.1-cp36-cp36m-macosx_10_9_x86_64.whl (4.3 MB)
Collecting imageio>=2.3.0
  Using cached imageio-2.8.0-py3-none-any.whl (3.3 MB)
Collecting networkx>=2.0
  Using cached networkx-2.4-py3-none-any.whl (1.6 MB)
Collecting pillow>=4.3.0
  Using cached Pillow-7.1.1-cp36-cp36m-macosx_10_10_x86_64.whl (2.2 MB)
Processing /Users/omega/Library/Caches/pip/wheels/32/42/7f/23cae9ff6ef66798d00dc5d659088e57dbba01566f6c60db63/wrapt-1.12.1-cp36-cp36m-macosx_10_7_x86_64.whl
Collecting tabulate>=0.7.5
  Using cached tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting keras-applications>=1.0.8
  Using cached Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
Processing /Users/omega/Library/Caches/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd/gast-0.2.2-cp36-none-any.whl
Collecting grpcio>=1.8.6
  Using cached grpcio-1.28.1-cp36-cp36m-macosx_10_9_x86_64.whl (2.6 MB)
Collecting protobuf>=3.8.0
  Using cached protobuf-3.11.3-cp36-cp36m-macosx_10_9_x86_64.whl (1.3 MB)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from tensorflow->bsuite[baselines]) (0.34.2)
Collecting tensorboard<2.2.0,>=2.1.0
  Using cached tensorboard-2.1.1-py3-none-any.whl (3.8 MB)
Collecting keras-preprocessing>=1.1.0
  Using cached Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41 kB)
Collecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.2.0-py3-none-any.whl (63 kB)
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
  Using cached tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448 kB)
Collecting astor>=0.6.0
  Using cached astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting google-pasta>=0.1.6
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting cloudpickle>=1.2.2
  Using cached cloudpickle-1.3.0-py2.py3-none-any.whl (26 kB)
Collecting decorator
  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting palettable
  Using cached palettable-3.3.0-py2.py3-none-any.whl (111 kB)
Collecting h5py
  Using cached h5py-2.10.0-cp36-cp36m-macosx_10_6_intel.whl (3.0 MB)
Requirement already satisfied: setuptools in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from protobuf>=3.8.0->tensorflow->bsuite[baselines]) (46.1.3.post20200330)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.1-py2.py3-none-any.whl (18 kB)
Collecting google-auth<2,>=1.6.3
  Downloading google_auth-1.14.0-py2.py3-none-any.whl (88 kB)
     |████████████████████████████████| 88 kB 707 kB/s
Collecting werkzeug>=0.11.15
  Using cached Werkzeug-1.0.1-py2.py3-none-any.whl (298 kB)
Collecting markdown>=2.6.8
  Using cached Markdown-3.2.1-py2.py3-none-any.whl (88 kB)
Collecting requests<3,>=2.21.0
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting rsa<4.1,>=3.1.4
  Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting cachetools<5.0,>=2.0.0
  Using cached cachetools-4.1.0-py3-none-any.whl (10 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->bsuite[baselines]) (2020.4.5.1)
Collecting idna<3,>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.1.0-py2.py3-none-any.whl (147 kB)
Collecting pyasn1>=0.1.3
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Building wheels for collected packages: bsuite, trfl
  Building wheel for bsuite ( ... done
  Created wheel for bsuite: filename=bsuite-0.0.0-py3-none-any.whl size=177123 sha256=1d0d8738f92032e854e3ec9211a76fbc031b484dbd62dc75b27acfaf729bab93
  Stored in directory: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-ephem-wheel-cache-mkj55_9i/wheels/7b/5e/ac/15fb44dea4f625a5cf4801445436f8a50d023233f734fc7d41
  Building wheel for trfl ( ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-wheel-7q14rby_
       cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
  Complete output (5 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  error: could not create 'build': File exists
  ERROR: Failed building wheel for trfl
  Running clean for trfl
Successfully built bsuite
Failed to build trfl
Installing collected packages: six, absl-py, dm-tree, numpy, dm-env, kiwisolver, python-dateutil, pyparsing, cycler, matplotlib, pytz, pandas, scipy, descartes, patsy, statsmodels, palettable, mizani, plotnine, PyWavelets, pillow, imageio, decorator, networkx, scikit-image, termcolor, wrapt, tabulate, dm-sonnet, h5py, keras-applications, gast, grpcio, protobuf, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, idna, urllib3, chardet, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, werkzeug, markdown, tensorboard, keras-preprocessing, opt-einsum, tensorflow-estimator, astor, google-pasta, tensorflow, cloudpickle, tensorflow-probability, trfl, tqdm, bsuite
    Running install for trfl ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl
         cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
    Complete output (5 lines):
    running install
    running build
    running build_py
    creating build
    error: could not create 'build': File exists
ERROR: Command errored out with exit status 1: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl Check the logs for full command output.
(bsuite) ➜  Developer python -c "import trfl"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'trfl'

`bsuite` using `` which is deprecated.

When trying to run bsuite.environments import catch, I run into the error:

AttributeError: module 'numpy' has no attribute 'int'.
`` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing ``, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:

Changing manually the to int did the job.

Is bsuite maintained?

How is the 'generalization' score computed?

In the notebook (, I only found the description of 6 scores (basis, noise, scale, exploration, memory, and credit assignment). I wonder how is the generalization score computed? Thank you!

Environment seeding

How to set a seed in a bsuite environment instance? In the notebook, the output of sweep.SETTINGS has a seed attribute which is not None:

Loaded bsuite_id: bandit_noise/0.
bsuite_id=bandit_noise/0, settings={'noise_scale': 0.1, 'seed': 0}, num_episodes=10000
Loaded bsuite_id: bandit_noise/1.
bsuite_id=bandit_noise/1, settings={'noise_scale': 0.1, 'seed': 1}, num_episodes=10000
Loaded bsuite_id: bandit_noise/2.
bsuite_id=bandit_noise/2, settings={'noise_scale': 0.1, 'seed': 2}, num_episodes=10000
Loaded bsuite_id: bandit_noise/3.
bsuite_id=bandit_noise/3, settings={'noise_scale': 0.1, 'seed': 3}, num_episodes=10000

but when I printed it again on my own computer, seed was None (if I do that in the notebook, seed was None but there's an extra mapping_seed which was not None).

I tried two methods to seed the environment: (1) sweep.SETTINGS[bsuite_id]['seed']=0; (2) doing env.seed() after wrapping it with OpenAI env, but neither worked (multiple experiments, same seed, different results). A minimal example to demonstrate these two seeding methods are not working:

import random
import torch as t
import numpy as np
import bsuite
from bsuite import sweep
from bsuite.utils import gym_wrapper

def set_seed(seed, deterministic=True):
    if deterministic:
        t.backends.cudnn.deterministic = True
        t.backends.cudnn.benchmark = False

bsuite_id = 'cartpole_swingup/0'
raw_env = bsuite.load_from_id(bsuite_id)

# method 1
for episode in range(10):
    timestep = raw_env.reset()
    total_reward = 0
    while not timestep.last():
        action = np.random.choice(raw_env.action_spec().num_values)
        timestep = raw_env.step(action)
        total_reward += timestep.reward

# method 2
env = gym_wrapper.GymFromDMEnv(raw_env)
for episode in range(10):
    timestep = env.reset()
    total_reward = 0
    done = False
    while not done:
        action = np.random.choice(raw_env.action_spec().num_values)
        sn,r,done,_ = env.step(action)
        total_reward += r

The signature for `update` does not allow for sarsa or n-step methods?

Hi There!

Thanks very much for bsuite, it is a great resource for reproducible research.

I have a question on the framework.
I am setting up some pedagogic implementation of canonical rl algorithms, among which, sarsa.

Is there any design pattern you had in mind for n-step methods or any method that requires access to experience from longer transitions?
I am currently solving the issue with sarsa by computing the next action with the select_action method in the update function.
What about n-step methods or model-based methods?

bsuite_tutorial problem when build PPO OpenAI baseline agent

There is a small problem I had when building PPO OpenAI baseline agent in the bsuite_tutorial.

  • After I logged results to CSV file using the following code,
from baselines.common.vec_env import dummy_vec_env
from baselines.ppo2 import ppo2
from bsuite.utils import gym_wrapper
import tensorflow as tf

SAVE_PATH_PPO = './demo_results/bsuite/ppo'
def _load_env():
raw_env = bsuite.load_and_record(
save_path=SAVE_PATH_PPO, logging_mode='csv', overwrite=True)
return gym_wrapper.GymFromDMEnv(raw_env)
env = dummy_vec_env.DummyVecEnv([_load_env])
  • I got bsuite_id_-_bandit_noise-0.csv file like this:
  • When I ran the next cell, there is an assertion error.
    env=env, network='mlp', lr=1e-3, gamma=.99,
    total_timesteps=10000, nsteps=100)

input shape is (1, 1)
AssertionError                            Traceback (most recent call last)
<ipython-input-2-d47907e196cf> in <module>
      1 ppo2.learn(
      2     env=env, network='mlp', lr=1e-3, gamma=.99,
----> 3     total_timesteps=10000, nsteps=100)

~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/ppo2/ in learn(network, env, total_timesteps, eval_env, seed, nsteps, ent_coef, lr, vf_coef, max_grad_norm, gamma, lam, log_interval, nminibatches, noptepochs, cliprange, save_interval, load_path, model_fn, **network_kwargs)
    177             # or if it's just worse than predicting nothing (ev =< 0)
    178 #             print( returns.shape,values.shape)
--> 179             ev = explained_variance(values, returns)
    180             logger.logkv("misc/serial_timesteps", update*nsteps)
    181             logger.logkv("misc/nupdates", update)

~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/common/ in explained_variance(ypred, y)
     35     """
---> 36     assert y.ndim == 1 and ypred.ndim == 1
     37     vary = np.var(y)
     38     return np.nan if vary==0 else 1 - np.var(y-ypred)/vary


  • I found this due to mismatched shape of values(100, 1) and returns(10000, 1) before explained_variance(values, returns).

  • When I add one line in 'baselines/ppo2/', it seems to run correctly.

       #batch of steps to batch of rollouts
        mb_obs = np.asarray(mb_obs, dtype=self.obs.dtype)
        mb_rewards = np.asarray(mb_rewards, dtype=np.float32)
        mb_actions = np.asarray(mb_actions)
        mb_values = np.asarray(mb_values, dtype=np.float32)
        mb_values = mb_values.reshape(mb_rewards.shape)  <<<  add this line
        mb_neglogpacs = np.asarray(mb_neglogpacs, dtype=np.float32)
        mb_dones = np.asarray(mb_dones, dtype=np.bool)
        last_values = self.model.value(tf.constant(self.obs))._numpy()
  • final result
Stepping environment...
| eplenmean               | nan            |
| eprewmean               | nan            |
| fps                                 | 271            |
| loss/approxkl           | 2.5486004e-08  |
| loss/clipfrac              | 0.0            |
| loss/policy_entropy     | 2.3978922      |
| loss/policy_loss        | -2.7894964e-09 |
| loss/value_loss         | 0.061606925    |
| misc/explained_variance | 0              |
| misc/nupdates                  | 100            |
| misc/serial_timesteps   | 10000          |
| misc/time_elapsed        | 37.5           |
| misc/total_timesteps    | 10000          |
  • p.s. I use tf2.1.0 and checkout to tf2 branch after git clone baselines.

Tensorflow BOOT DQN agent loses performance after first iteration


I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by.
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.

Code for reproduction (double-checked in a newly installed env):

import bsuite
from import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment

bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]

for id in bsuite_sweep:
    env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
    agent = boot_dqn.default_agent(
    ), env, num_episodes=300)

Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).

In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <> with <bsuite.baselines.jax>).

The same can be observed in colab:


bsuite_tutorial.ipynb error - load bsuite environments as OpenAI gym-

Hey there!
Thanks for open sourcing this tool for understanding better behavior in RL agents :)

There seems to be an error in the colab tutorial when executing load bsuite environments as OpenAI gym cell

#@title Simple to load bsuite environments as OpenAI gym

from bsuite.utils import gym_wrapper
raw_env = bsuite.load_from_id(bsuite_id='memory_len/0')
env = gym_wrapper.GymWrapper(raw_env)
isinstance(env, gym.Env)


env = gym_wrapper.GymWrapper(raw_env)


env = gym_wrapper.GymFromDMEnv(raw_env)

, like the documentation pinpoints ?

My apologies for not submitting a PR here, I was not able to access the Colab doc .

Have a nice day !

Converting Openai gym environments to bsuite environments

I thought that it might be useful to be able to use openai gym environments within bsuite since there are so many of them. I noticed that there is a wrapper here that converts bsuite environments to openai gym environments, so in my fork I made a reverse wrapper that would convert an openai gym environment to a bsuite environement here. It's pretty untested right now but if you are interested I would be happy to clean it up and make a PR - I think this could be a useful feature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.