Git Product home page Git Product logo

intellabs / coach Goto Github PK

View Code? Open in Web Editor NEW
2.3K 127.0 460.0 73.55 MB

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Home Page: https://intellabs.github.io/coach/

License: Apache License 2.0

Python 94.23% CSS 0.31% Jupyter Notebook 5.10% Makefile 0.24% Batchfile 0.04% HTML 0.01% Dockerfile 0.05% Shell 0.02%
coach openai-gym reinforcement-learning tensorflow rl carla imitation-learning mujoco roboschool deep-learning

coach's Introduction

⚠️ DISCONTINUATION OF PROJECT - This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Coach

License Docs DOI Downloads Downloads

Coach Logo

Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms.

It exposes a set of easy-to-use APIs for experimenting with new RL algorithms, and allows simple integration of new environments to solve. Basic RL components (algorithms, environments, neural network architectures, exploration policies, ...) are well decoupled, so that extending and reusing existing components is fairly painless.

Training an agent to solve an environment is as easy as running:

coach -p CartPole_DQN -r

Fetch Slide Pendulum Starcraft
Doom Deathmatch CARLA MontezumaRevenge
Doom Health Gathering PyBullet Minitaur Gym Extensions Ant

Table of Contents

Benchmarks

One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors. To address this problem, we are releasing a set of benchmarks that shows Coach reliably reproduces many state of the art algorithm results.

Installation

Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.

For some information on installing on Ubuntu 17.10 with Python 3.6.3, please refer to the following issue: #54

In order to install coach, there are a few prerequisites required. This will setup all the basics needed to get the user going with running Coach on top of OpenAI Gym environments:

# General
sudo -E apt-get install python3-pip cmake zlib1g-dev python3-tk python-opencv -y

# Boost libraries
sudo -E apt-get install libboost-all-dev -y

# Scipy requirements
sudo -E apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran -y

# PyGame
sudo -E apt-get install libsdl-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev
libsmpeg-dev libportmidi-dev libavformat-dev libswscale-dev -y

# Dashboard
sudo -E apt-get install dpkg-dev build-essential python3.5-dev libjpeg-dev  libtiff-dev libsdl1.2-dev libnotify-dev 
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev libwebkitgtk-3.0-dev
libgstreamer-plugins-base1.0-dev -y

# Gym
sudo -E apt-get install libav-tools libsdl2-dev swig cmake -y

We recommend installing coach in a virtualenv:

sudo -E pip3 install virtualenv
virtualenv -p python3 coach_env
. coach_env/bin/activate

Finally, install coach using pip:

pip3 install rl_coach

Or alternatively, for a development environment, install coach from the cloned repository:

cd coach
pip3 install -e .

If a GPU is present, Coach's pip package will install tensorflow-gpu, by default. If a GPU is not present, an Intel-Optimized TensorFlow, will be installed.

In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.

Getting Started

Tutorials and Documentation

Jupyter notebooks demonstrating how to run Coach from command line or as a library, implement an algorithm, or integrate an environment.

Framework documentation, algorithm description and instructions on how to contribute a new agent/environment.

Basic Usage

Running Coach

To allow reproducing results in Coach, we defined a mechanism called preset. There are several available presets under the presets directory. To list all the available presets use the -l flag.

To run a preset, use:

coach -r -p <preset_name>

For example:

  • CartPole environment using Policy Gradients (PG):

    coach -r -p CartPole_PG
  • Basic level of Doom using Dueling network and Double DQN (DDQN) algorithm:

    coach -r -p Doom_Basic_Dueling_DDQN

Some presets apply to a group of environment levels, like the entire Atari or Mujoco suites for example. To use these presets, the requeseted level should be defined using the -lvl flag.

For example:

  • Pong using the Neural Episodic Control (NEC) algorithm:

    coach -r -p Atari_NEC -lvl pong

There are several types of agents that can benefit from running them in a distributed fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process. To specify the number of workers to run, use the -n flag.

For example:

  • Breakout using Asynchronous Advantage Actor-Critic (A3C) with 8 workers:

    coach -r -p Atari_A3C -lvl breakout -n 8

It is easy to create new presets for different levels or environments by following the same pattern as in presets.py

More usage examples can be found here.

Running Coach Dashboard (Visualization)

Training an agent to solve an environment can be tricky, at times.

In order to debug the training process, Coach outputs several signals, per trained algorithm, in order to track algorithmic performance.

While Coach trains an agent, a csv file containing the relevant training signals will be saved to the 'experiments' directory. Coach's dashboard can then be used to dynamically visualize the training signals, and track algorithmic behavior.

To use it, run:

dashboard

Coach Design

Distributed Multi-Node Coach

As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents. For usage instructions please refer to the documentation here.

Batch Reinforcement Learning

Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach. There are example presets and a tutorial.

Supported Environments

  • OpenAI Gym:

    Installed by default by Coach's installer (see note on MuJoCo version below).

  • ViZDoom:

    Follow the instructions described in the ViZDoom repository -

    https://github.com/mwydmuch/ViZDoom

    Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory.

  • Roboschool:

    Follow the instructions described in the roboschool repository -

    https://github.com/openai/roboschool

  • GymExtensions:

    Follow the instructions described in the GymExtensions repository -

    https://github.com/Breakend/gym-extensions

    Additionally, add the installation directory to the PYTHONPATH environment variable.

  • PyBullet:

    Follow the instructions described in the Quick Start Guide (basically just - 'pip install pybullet')

  • CARLA:

    Download release 0.8.4 from the CARLA repository -

    https://github.com/carla-simulator/carla/releases

    Install the python client and dependencies from the release tarball:

    pip3 install -r PythonClient/requirements.txt
    pip3 install PythonClient
    

    Create a new CARLA_ROOT environment variable pointing to CARLA's installation directory.

    A simple CARLA settings file (CarlaSettings.ini) is supplied with Coach, and is located in the environments directory.

  • Starcraft:

    Follow the instructions described in the PySC2 repository -

    https://github.com/deepmind/pysc2

  • DeepMind Control Suite:

    Follow the instructions described in the DeepMind Control Suite repository -

    https://github.com/deepmind/dm_control

  • Robosuite:

    Note: To use Robosuite-based environments, please install Coach from the latest cloned repository. It is not yet available as part of the rl_coach package on PyPI.

    Follow the instructions described in the robosuite documentation (see note on MuJoCo version below).

Note on MuJoCo version

OpenAI Gym supports MuJoCo only up to version 1.5 (and corresponding mujoco-py version 1.50.x.x). The Robosuite simulation framework, however, requires MuJoCo version 2.0 (and corresponding mujoco-py version 2.0.2.9, as of robosuite version 1.2). Therefore, if you wish to run both Gym-based MuJoCo environments and Robosuite environments, it's recommended to have a separate virtual environment for each.

Please note that all Gym-Based MuJoCo presets in Coach (rl_coach/presets/Mujoco_*.py) have been validated only with MuJoCo 1.5 (including the reported benchmark results).

Supported Algorithms

Coach Design

Value Optimization Agents

Policy Optimization Agents

General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

Citation

If you used Coach for your work, please use the following citation:

@misc{caspi_itai_2017_1134899,
  author       = {Caspi, Itai and
                  Leibovich, Gal and
                  Novik, Gal and
                  Endrawis, Shadi},
  title        = {Reinforcement Learning Coach},
  month        = dec,
  year         = 2017,
  doi          = {10.5281/zenodo.1134899},
  url          = {https://doi.org/10.5281/zenodo.1134899}
}

Contact

We'd be happy to get any questions or contributions through GitHub issues and PRs.

Please make sure to take a look here before filing an issue or proposing a PR.

The Coach development team can also be contacted over email

Disclaimer

Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and environments are planned to be added to the framework. Feedback and contributions from the open source and RL research communities are more than welcome.

coach's People

Contributors

ajay191191 avatar anabwan avatar balajismaniam avatar brollb avatar codyjhsieh avatar cxxgtxy avatar dandanelbaz avatar gal-leibovich avatar galnov avatar guyjacob avatar itaicaspi avatar itaicaspi-intel avatar jamescasbon avatar justinshenk avatar leopd avatar michaelbeale-il avatar mimoralea avatar nikhilbarhate99 avatar nzmora avatar piesposito avatar redknightlois avatar ryanpeach avatar safrooze avatar scttl avatar shadiendrawis avatar thomelane avatar timokau avatar x77a1 avatar yazgoo avatar zach-nervana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coach's Issues

CARLA Render on Multiple GPUs?

Does the default coach system render CARLA on multiple GPUs? Or does it just set it to the default one? I couldn't get it running to check, so I figured this may be faster to get an answer.

invalid object?

I just tested running coach from source: 4fe9cba

on both ubuntu 14.04 and Macos high Sierra and I get the exact same error:

python coach.py -p CartPole_DQN -r

/home/jtoy/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
Warning: failed to import the following packages - RoboSchool, GymExtensions, ViZDoom, CARLA, Neon
Please enter an experiment name: test
Using tensorflow framework
Traceback (most recent call last):
  File "coach.py", line 275, in <module>
    env_instance = create_environment(tuning_parameters)
  File "/home/jtoy/sandbox/touchnet/related_projects/coach/environments/__init__.py", line 32, in create_environment
    env = eval(env_type)(tuning_parameters)
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 267, in eval
    ret = eng_inst.evaluate()
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 75, in evaluate
    res = self._evaluate()
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 122, in _evaluate
    return ne.evaluate(s, local_dict=scope, truediv=truediv)
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 807, in evaluate
    zip(names, arguments)]
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 806, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 704, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object

I installed all dependencies with pip install -r requirements_coach.txt

Is there an issue in master or am I missing something basic?

Dashboard boot up error

CMD:
(coach_env) tshh@tshh_fast_comp:~/coach$ python3 dashboard.py

ECHO:
2018-03-07 13:47:01,401 Starting Bokeh server version 0.12.7 (running on Tornado 5.0)
ERROR:init() got an unexpected keyword argument 'io_loop'

Testing Coach on Ubuntu 17.10 and Python 3.6.3

Hi,

I am currently testing Coach on Ubuntu 17.10 and Python 3.6.3.

With a few changes in the install.sh file, Coach seems to be fully functional.

Dashboard requirements:

  • replace python3.5-dev to python3.6-dev
  • Inside coach_env: pip install wxpython (instead of apt-get)

NGraph (install protobuf compiler before installing NGraph):

sudo apt install protobuf-compiler

Intel Optimized TensorFlow (replace CPU link)

pip install https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl

I havent tested CARLA yet, but so far so good.

Needless to say, great work with Coach!

run carla env : observation assert False AssertionError

carla version 0.7 and 0.71 both same error
and tensorflow=cpu 14.1 and tensorflow-gpu=1.6 is samw error

(coach_env) sdc@frankwang-Tri01:~/Projects/coach$ python3 coach.py -p Carla_A3C -n 1
/home/sdc/Projects/coach/coach_env/lib/python3.5/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Warning: failed to import the following packages - RoboSchool, Neon, ViZDoom, PyBullet, GymExtensions
Please enter an experiment name: c

Using tensorflow framework
Traceback (most recent call last):
File "coach.py", line 271, in
env_instance = create_environment(tuning_parameters)
File "/home/sdc/Projects/coach/environments/init.py", line 32, in create_environment
env = eval(env_type)(tuning_parameters)
File "/home/sdc/Projects/coach/environments/carla_environment_wrapper.py", line 136, in init
self.reset(True)
File "/home/sdc/Projects/coach/environments/environment_wrapper.py", line 165, in reset
self._restart_environment_episode(force_environment_reset)
File "/home/sdc/Projects/coach/environments/carla_environment_wrapper.py", line 229, in _restart_environment_episode
observation = self.step([1.0, 0])['observation']
File "/home/sdc/Projects/coach/environments/environment_wrapper.py", line 140, in step
self._update_state()
File "/home/sdc/Projects/coach/environments/carla_environment_wrapper.py", line 185, in _update_state
'measurements': [measurements.player_measurements.forward_speed],
File "/home/sdc/Projects/coach/environments/environment_wrapper.py", line 81, in observation
assert False
AssertionError


Results stored at: ./experiments/c/19_03_2018-23_30
Total runtime: 0:00:29.358656


Do you want to discard the experiment results (Warning: this cannot be undone)? (y/N)

crash when recovering Pendulum_DDPG training

CMD:
(coach_env) tshh@tshh_fast_comp:~/coach$ python3 coach.py -r -p Pendulum_DDPG -e Pendulum_DDPG_try -s 3600 -v -crd ~/coach/experiments/Pendulum_DDPG_try/07_03_2018-10_52

Part of ECHO:
Creating agent 0
Loading checkpoint: /home/tshh/coach/experiments/Pendulum_DDPG_try/07_03_2018-10_52/1.ckpt
Loading checkpoint: /home/tshh/coach/experiments/Pendulum_DDPG_try/07_03_2018-10_52/1.ckpt
2018-03-07 11:04:18.188301: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key actor/online/actor/online/network_0/observation/fc1/bias/Adam_1 not found in checkpoint
2018-03-07 11:04:18.188347: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key actor/online/actor/online/network_0/middleware_embedder/fc1/kernel/Adam not found in checkpoint
2018-03-07 11:04:18.188448: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key actor/online/actor/online/network_0/observation/fc1/bias/Adam not found in checkpoint
2018-03-07 11:04:18.188499: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key actor/online/actor/online/network_0/observation/fc1/kernel/Adam not found in checkpoint

PS:
Pendulum_DDPG training works.
Just delete and reinstall coach today.

Error in 'create_environment'

Hi, I just follow the installation instruction on the website. My coach version is '0.9'.
When use the example command 'python coach.py -p CartPole_DQN'
I got this error:

Warning: failed to import the following packages - Neon, ViZDoom, RoboSchool, GymExtensions, PyBullet
Please enter an experiment name: qwer
Using tensorflow framework
Traceback (most recent call last):
File "coach.py", line 295, in
env_instance = create_environment(tuning_parameters)
File "/home/terry/robot_proj/intel_coach/coach-0.9.0/environments/init.py", line 32, in create_environment
env = eval(env_type)(tuning_parameters)
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/pandas/core/computation/eval.py", line 267, in eval
ret = eng_inst.evaluate()
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/pandas/core/computation/engines.py", line 75, in evaluate
res = self._evaluate()
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/pandas/core/computation/engines.py", line 122, in _evaluate
return ne.evaluate(s, local_dict=scope, truediv=truediv)
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/numexpr/necompiler.py", line 789, in evaluate
zip(names, arguments)]
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/numexpr/necompiler.py", line 788, in
signature = [(name, getType(arg)) for (name, arg) in
File "/home/terry/anaconda3/envs/py35/lib/python3.5/site-packages/numexpr/necompiler.py", line 686, in getType
raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object

I printed out 'env_type' and 'tuning_parameters'. They are:
GymEnvironmentWrapper
<presets.CartPole_DQN object at 0x7f30e7ce09e8>

[Update]
It turns out eval('GymEnvironmentWrapper') is the reason of this error. However, 'GymEnvironmentWrapper' is imported from 'from environments.gym_environment_wrapper import *' at the top of the script.

Thanks a lot if anybody has the idea.

Adding SELU activation function to tf general_network

Hi guys,
I really appreciate your awesome work! I would like you to add the selu activation function to your awesome project.

At the tf architecture in general_network line 39

def get_activation_function(self, activation_function_string):
    activation_functions = {
        'relu': tf.nn.relu,
        'tanh': tf.nn.tanh,
        'sigmoid': tf.nn.sigmoid,
        'elu': tf.nn.elu,
        'selu': tf.nn.selu,         # <- New Self-normalizing neural networks
        'none': None
    }

The paper:
https://arxiv.org/abs/1706.02515

And the tf funct:
https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/selu

Kind regards, keep going with this awesome project.
Adrian Brunetto

AWS wrapper

It would be good if there's a wrapper to run parallel jobs on AWS similar to rllab (basically have the docker file pulled from docker hub for this and set up everything with the AWS key).

"CreateSession still waiting for response from worker" using Multi-threaded Algorithms

Hey

I am trying to launch the example presets using the -n args for multithreading.
For instance, using the following command

python coach.py -p CartPole_PG -n 2 -v

I get these errors

2018-03-09 13:21:46.936783: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-03-09 13:21:46.936836: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: naboo21
2018-03-09 13:21:46.936853: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: naboo21
2018-03-09 13:21:46.936903: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 387.26.0
2018-03-09 13:21:46.936942: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Modu

And then these infinitely

2018-03-09 13:21:58.067204: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-03-09 13:21:58.067340: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-03-09 13:21:58.067353: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:worker/replica:0/task:2
2018-03-09 13:21:58.114851: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0

However, when I run the single threaded agent using

python coach.py -p CartPole_PG -v

It works well, a GPU is found.

When I run directly the parallel_actor.py via

python3 ./parallel_actor.py --ps_hosts=localhost:34576 --worker_hosts=localhost:49588,localhost:38853,localhost:38037 --job_name=worker --load_json=./experiments/09_03_2018-13_01/run_dict_worker2.json

It founds also well a GPU

It seems like the subprocess parallel_actor.py does not have the visibility of GPUs
Any ideas ?

dumping gifs not working?

Dumping gifs seems to not work, I tried running passing the -dg option with a couple of environments for over 30 mins with over 400 episodes and it never seems to output gifs
I ran these commands:
python3 coach.py -p CartPole_DQN -dg
python3 coach.py -p Breakout_A3C -dg

here you can see output:
https://cl.ly/173A092l112H

find experiments/
experiments/
experiments/asdsad
experiments/asdsad/20_03_2018-14_29
experiments/asdsad/20_03_2018-14_29/run_dict_worker3.json
experiments/asdsad/20_03_2018-14_29/run_dict_worker0.json
experiments/asdsad/20_03_2018-14_29/run_dict_worker1.json
experiments/asdsad/20_03_2018-14_29/run_dict_worker4.json
experiments/asdsad/20_03_2018-14_29/run_dict_worker2.json
experiments/asdads
experiments/asdads/20_03_2018-14_30
experiments/asdads/20_03_2018-14_30/run_dict.json
experiments/asdads/20_03_2018-14_30/worker_20_03_2018-14_30_0.csv
experiments/sadd
experiments/sadd/20_03_2018-14_29
experiments/sadd/20_03_2018-14_29/worker_20_03_2018-14_29_0.csv
experiments/sadd/20_03_2018-14_29/run_dict.json
experiments/gifs
experiments/gifs/20_03_2018-14_36
experiments/gifs/20_03_2018-14_36/run_dict.json
experiments/gifs/20_03_2018-14_36/worker_20_03_2018-14_36_0.csv
experiments/efewf
experiments/efewf/20_03_2018-14_36
experiments/efewf/20_03_2018-14_36/run_dict.json
experiments/efewf/20_03_2018-14_36/worker_20_03_2018-14_36_0.csv

Can’t use LSTM as Middleware

Hi

First of all, thank you for this convenient and cool library.

During my experimentations, I couldn’t figure it out how to use the LSTM Middleware.
I tested it on two different environments (CartPole and VizDoom Basic) and i got this error message:

Traceback (most recent call last):
  File "./parallel_actor.py", line 165, in <module>
    agent.improve()
  File "/code/coach/agents/agent.py", line 503, in improve
    self.act()
  File "/code/coach/agents/agent.py", line 435, in act
    self.reset_game()
  File "/code/coach/agents/agent.py", line 201, in reset_game
    network.curr_rnn_c_in = network.middleware_embedder.c_init
AttributeError: 'NetworkWrapper' object has no attribute 'middleware_embedder'

I used the following preset:

class Doom_Basic_A3C(Preset):
    def __init__(self):
        Preset.__init__(self, ActorCritic, Doom, CategoricalExploration)
        self.env.level = 'basic'
        self.agent.policy_gradient_rescaler = 'A_Value'
        self.learning_rate = 0.0001
        self.num_heatup_steps = 100
        self.env.reward_scaling = 100.
        self.agent.discount = 0.99
        self.agent.apply_gradients_every_x_episodes = 1
        self.agent.num_steps_between_gradient_updates = 30
        self.agent.beta_entropy = 0.01
        self.clip_gradients = 40
        self.agent.middleware_type = MiddlewareTypes.LSTM

I run the experiment with this command:

python3 coach.py -v -p Doom_Basic_A3C -dg -e test_lstm_v1 -s 100 -n 16 -cp "num_training_iterations=1000; evaluation_episodes=100; evaluate_every_x_episodes=500"

Did I forget something to configure?

Thanks in advance!
BR, Gabriel

Install Script Assumes HTTPS_PROXY or https_proxy Is Set

The install.sh script won't work if you don't have to set the proxy environment variables.

https://github.com/NervanaSystems/coach/blob/2a3a6f4a68aeabd2b69dc215e89594dcd436e82d/install.sh#L3-L5

Which is then used in these two calls to pip:

https://github.com/NervanaSystems/coach/blob/2a3a6f4a68aeabd2b69dc215e89594dcd436e82d/install.sh#L145

https://github.com/NervanaSystems/coach/blob/2a3a6f4a68aeabd2b69dc215e89594dcd436e82d/install.sh#L151

If $HTTPS_PROXY is blank, then you'll get the following error:

+ echo 'Installing Coach requirements'
Installing Coach requirements
+ pip install -r ./requirements_coach.txt --proxy

Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

--proxy option requires an argument

The --proxy $HTTPS_PROXY arguments should be removed. Pip is able to automatically pick up the proxy environmental variables.

PPO reference

This library looks great -- beautiful work.
The proper reference for PPO is the arxiv paper from my coauthors and me (https://arxiv.org/abs/1707.06347). As acknowledged by Heess et al. in their DPPO paper, their algorithm was based on my github code, which didn't have a corresponding publication at the time.

TensorFlow installation with Coach failed

Hi,

Tried follow installation instructions. But after very long, more than half an hour installation process I found TensorFlow not working when I tried an example script:

python coach.py -r -p Pendulum_ClippedPPO -n 8
Traceback (most recent call last):
  File "coach.py", line 23, in <module>
    from architectures import *
  File "/home/viktor/DeepRL/coach/architectures/__init__.py", line 20, in <module>
    from architectures.tensorflow_components.general_network import *
  File "/home/viktor/DeepRL/coach/architectures/tensorflow_components/general_network.py", line 17, in <module>
    from architectures.tensorflow_components.embedders import *
  File "/home/viktor/DeepRL/coach/architectures/tensorflow_components/embedders.py", line 20, in <module>
    class InputEmbedder:
  File "/home/viktor/DeepRL/coach/architectures/tensorflow_components/embedders.py", line 21, in InputEmbedder
    def __init__(self, input_size, activation_function=tf.nn.relu, name="embedder"):
AttributeError: module 'tensorflow' has no attribute 'nn'

Also I supposed from the description that it'll install in the virtual env, but instead this installation also has broken my default TensorFlow-GPU installation. So:

  1. Why installation is so long? Installtion of the OpenAI baselines with just "pip install -e ." takes less than a minute for example, the same as rllab.
  2. How this TF installation bug can be fixed?

Install Script Error When Headless and Installing Neon

When running the install.sh script in an automated/headless manner, I get this error:

./install.sh: line 171: [: -eq: unary operator expected

Ran with:

./install.sh -c -p -g -N -ne -d

Running the headless installation way, $INSTALL_NEON never gets set.

Comparing as if it was a string would have worked since an unset variable would just become "" on a string comparison w/ ==. Since it's a numeric comparison w/ -eq it causes an error.

InputEmbedder is not currently customizable

The only method for selecting the InputEmbeder is by specifying agent.embedder_complexity. Ideally custom implementations should also be easy to specify. Right now it is fairly involved. Do you guys have ideas about how this might done in a general way? It looks like it might make sense to allow providing a custom input_mapping dictionary whose values override the default values provided in get_input_embedder.

Any benchmarking?

Is there any performance (speed and training curves) benchmarking? There are a great number of open-source RL alternatives on Github, but very few of them actually match (or even get close to) Deepmind's published results, for example.

OpenAI Gym Latest Version Incompatible with Example

The latest OpenAI gym doesn't come with BreakoutDeterministic-v3.

They mention it here: https://github.com/openai/gym#whats-new

2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at v4. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py <= 0.0.21. Note that the v4 environments will not give identical results to existing v3 results, although differences are minor.

To downgrade you can run:

pip install gym[atari]==0.8.2

This gets the Breakout_A3C preset to work. I haven't tried replacing the preset with BreakoutDeterministic-v4 yet.

Looks like Breakout wouldn't work without installing gym[atari] vs just gym anyway:

https://github.com/NervanaSystems/coach/blob/6009b73eb60f6a79daa2a0028f8feafda3635f7d/install.sh#L171

And it's for this preset:

https://github.com/NervanaSystems/coach/blob/e813eaf304d728a7655b5343ca1eb7cd2ba32e74/presets.py#L1123

Building CXX object error when installing with Neon support

sudo ./install.sh works fine when not installing neon support
On AWS Ubuntu 16.02, python 2.7.12, 3.5.2:

Error when Install neon support option chosen:

/home/ubuntu/coach/mkl-dnn/src/cpu/jit_avx512_common_conv_kernel.cpp: In lambda function:
/home/ubuntu/coach/mkl-dnn/src/cpu/jit_avx512_common_conv_kernel.cpp:2666:42: error: assuming signed overflow does not occur when assuming that (X + c) < X is always false [-Werror=strict-overflow]
auto emit_fma_block = [&](int kh_step) {
^
cc1plus: all warnings being treated as errors
src/CMakeFiles/mkldnn.dir/build.make:782: recipe for target 'src/CMakeFiles/mkldnn.dir/cpu/jit_avx512_common_conv_kernel.cpp.o' failed
make[2]: *** [src/CMakeFiles/mkldnn.dir/cpu/jit_avx512_common_conv_kernel.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 40%] Linking CXX static library libmkldnn_gtest.a
[ 40%] Built target mkldnn_gtest
CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/mkldnn.dir/all' failed
make[1]: *** [src/CMakeFiles/mkldnn.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

dump_output_csv rewrites log file on every update

It seems that the file generated by dump_output_csv is removed and rewritten every 10 episodes by default. Ideally the log file could be written to incrementally instead of from scratch so that the log file can both be up to date and also efficient.

On fast local storage, inefficient writes are not a big deal, however I am currently writing my logs to nfs where this inefficiency is definitely an issue.

Action Exploration and Monte Carlo Tree Search?

Hi,
first of all, very great job!!
Only some wishes about your RL library:

  1. Action Exploration Settings
  2. Monte Carlo Tree Search
    So far as I knew, there is no RL library exists togerther with MCTS which is the key method of AlphaGo. If Intel builds standard constructure for MCTS and RL, that would be fantastic. Then Coach will have the ability not only to rebuild AlphaGo and also other far more complicated algo. We are very exciting to see that happens.

Best,

L. Lu

Wrong Python Interpreter If Installed With --no_virtual_environment

If you install with --no_virtual_environment then the calls in coach.py to run python call the system's Python 2 instead of 3.

Tested inside of a fresh ubuntu:16.04 Docker image. Installed with ./install.sh -c -p -g -N -ne -d.

Might be as simple as changing python to python3 in these two locations?

https://github.com/NervanaSystems/coach/blob/6009b73eb60f6a79daa2a0028f8feafda3635f7d/coach.py#L276

https://github.com/NervanaSystems/coach/blob/6009b73eb60f6a79daa2a0028f8feafda3635f7d/coach.py#L299

NEC agent recovery problem

coached Pong_NEC for 3 days and got positive total reword each episode and using '-s 3600'.
then try to recovery the agent using '-crd' option from the experiment folder,
total reword each episode drop to around -21 ~ -19 .

So, DND dictionary need to be save and recovery also, other than tensors , right ?

coach is missing top-level package for import-ability

coach is not organized with a top-level coach directory and is not pip installable, which makes it not importable unless the top-level script is in the coach directory.

Another issue with the current organization/install is that it makes running standard unit tests difficult since the unit tests can't import the coach modules. For now I have resorted to the following hack at the top of every unit test file:

import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))

When running with multiple workers, got this error: InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'main/online/network_0/observation/observation' with dtype float and shape [?,4,1]

I'm getting the following error when running this:
python3 coach.py -p CartPole_A3C -s 600 -n 2

Traceback (most recent call last):
  File "./parallel_actor.py", line 170, in <module>
    agent.improve()
  File "/home/user1/gitrepo/coach/agents/agent.py", line 479, in improve
    network.sync()
  File "/home/user1/gitrepo/coach/architectures/network_wrapper.py", line 95, in sync
    self.update_online_network()
  File "/home/user1/gitrepo/coach/architectures/network_wrapper.py", line 112, in update_online_network
    self.online_network.set_weights(self.global_network.get_weights(), rate)
  File "/home/user1/gitrepo/coach/architectures/tensorflow_components/architecture.py", line 322, in set_weights
    old_weights, new_weights = self.tp.sess.run([self.get_weights(), weights])
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 521, in run
    run_metadata=run_metadata)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 892, in run
    run_metadata=run_metadata)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 967, in run
    raise six.reraise(*original_exc_info)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/six.py", line 686, in reraise
    raise value
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1024, in run
    run_metadata=run_metadata)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 827, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'main/online/network_0/observation/observation' with dtype float and shape [?,4,1]
	 [[Node: main/online/network_0/observation/observation = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:worker/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'main/online/network_0/observation/observation', defined at:
  File "./parallel_actor.py", line 102, in <module>
    exec('agent = ' + tuning_parameters.agent.type + '(env_instance, tuning_parameters, replicated_device=device, '
  File "<string>", line 1, in <module>
  File "/home/user1/gitrepo/coach/agents/actor_critic_agent.py", line 26, in __init__
    PolicyOptimizationAgent.__init__(self, env, tuning_parameters, replicated_device, thread_id, create_target_network)
  File "/home/user1/gitrepo/coach/agents/policy_optimization_agent.py", line 37, in __init__
    self.replicated_device, self.worker_device)
  File "/home/user1/gitrepo/coach/architectures/network_wrapper.py", line 71, in __init__
    self.global_network, network_is_local=True)
  File "/home/user1/gitrepo/coach/architectures/tensorflow_components/general_network.py", line 40, in __init__
    TensorFlowArchitecture.__init__(self, tuning_parameters, name, global_network, network_is_local)
  File "/home/user1/gitrepo/coach/architectures/tensorflow_components/architecture.py", line 70, in __init__
    self.get_model(tuning_parameters)
  File "/home/user1/gitrepo/coach/architectures/tensorflow_components/general_network.py", line 124, in get_model
    input_placeholder, embedding = input_embedder()
  File "/home/user1/gitrepo/coach/architectures/tensorflow_components/embedders.py", line 34, in __call__
    self.input = tf.placeholder("float", shape=(None,) + self.input_size, name=self.get_name())
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1599, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3091, in _placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/user1/gitrepo/coach/coach_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'main/online/network_0/observation/observation' with dtype float and shape [?,4,1]
	 [[Node: main/online/network_0/observation/observation = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:worker/replica:0/task:0/device:CPU:0"]()]]

dashboard requirements

on ubuntu 16.04 with latest tornado, the dashboard doesn't work,I get this error:
python3 dashboard.py
2018-03-31 20:46:20,060 Starting Bokeh server version 0.12.7 (running on Tornado 5.0.1)
ERROR:init() got an unexpected keyword argument 'io_loop'

which specific version of tornado do you need with bokeh? It seems like this is a recent tornado issue:
bokeh/bokeh#7308

end training after reaching certain criteria

I've trained a dozen or so networks with coach now. To end training,I always hit control-c after running for a couple of days, not sure if it will ever end if I just leave it. Is there a way to end early training early, such as after N number of episodes or after N number of hours. If not, is the useful to add?

Error running with presets

Hey, I try to run DDPG without Presets with this command

 python coach.py -at DDPG -ept AdditiveNoiseExploration -et CartPole-v0

and get these errors,

Traceback (most recent call last):
  File "coach.py", line 255, in <module>
    tuning_parameters = json_to_preset(json_run_dict_path)
  File "/home/jackie/code/coach/presets.py", line 27, in json_to_preset
    tuning_parameters = Preset(eval(run_dict['agent_type']), eval(run_dict['environment_type']),
  File "<string>", line 1, in <module>
NameError: name 'CartPole' is not defined

I know what eval() means. However I can't find ways to fix this issue.
Any ideas? Or just providind an example without Presets will be nice. THX.

How to get Coach working in a different framework?

Hi,

I wanna ask how to get Coach to work with other frameworks like MXnet? I see the tensorflow_components and neon_components folder, it's probably that I need to write my own components?

It seems like that's all I need to do to include a new framework I think. I would probably send a PR supporting MXNet / Gluon at some later time.

reusing and deploying models?

Hi, I've trained a few models inside of coach now, love using coach to quickly test out algorithms.
After I've trained these models, is there a way to reuse or deploy these models in a live system?
Lets say I trained a few different algorithms with my env called "SolveMeaningOfLiveEnv" and PPO does best. Now I want to take the model and weights and use that in my production system with the coach system. I read through the docs but didn't see this kind of functionality, did I miss it? Is it currently possible or are there plans to be able to do this?

better support for custom environments

At the moment, the following steps are required to use an environment not included with coach:

  1. implement an environment which matches coach custom environment type
  2. import your environment inside of environments/__init__.py
  3. a new entry should be added to the EnvTypes enum mapping the environment name to the wrapper's class name, also in environments/__init__.py
  4. a new configuration class should be implemented for defining the environment's parameters and added to configurations.py
  5. a new preset should be added to presets.py

these steps are from here: http://coach.nervanasys.com/contributing/add_env/index.html

Many of these steps require modifying coach internal files. In order to track these changes in a version control system, coach itself must be forked and modified. This is inconvenient for multiple reasons:

  1. users can not simply run pip install coach --upgrade to get upgrades.
  2. users must continually rebase their changes on top of the latest coach changes and resolve any merge conflicts that occur.
  3. users may have environments defined in existing projects and git repositories. with the current set up, they will need to continue committing to their existing repositories and also make sure their coach fork matches their environment repository, and make separate commits to the environment wrapper if necessary.

Additionally, this pattern of modifying coaches internals in order to add environments or run experiments is not very new user-friendly.

Here is an example from rllab documentation:

from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from examples.point_env import PointEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

env = normalize(PointEnv())
policy = GaussianMLPPolicy(
    env_spec=env.spec,
)
baseline = LinearFeatureBaseline(env_spec=env.spec)
algo = TRPO(
    env=env,
    policy=policy,
    baseline=baseline,
)
algo.train()

The user is required to create a new environment wrapper following their environment pattern similar to our step 1. After that however, the user only needs to write a simple python script with a few imports and some configuration.

Looking through coach.py, it looks like it shouldn't be terribly difficult to provide a similar interface, though I am not entirely sure. What do you guys think?

inputs should be specified as a dictionary instead of list

Right now, the input to the neural networks are represented as lists and only converted into a dictionary at the last minute. The order of this list needs to be kept synchronized in multiple places which makes understanding the code and debugging more difficult. It also makes it difficult to compose multiple concepts which expect special orders.

More specifically, tuning_parameters.InputTypes input order must match Agent.inputs and Agent.extract_batch.

This is also a helpful first step towards allowing the environment to supply multiple inputs. For example right now there can only be a single image input.

Error with Vizdoom

It seems there is something wrong with the Vizdoom environment.
I try VIZDOOM_ROOT=/home/nina/ViZDoom python3 coach.py -p Doom_Basic_Dueling_DDQN , but receive the following error.

Traceback (most recent call last):
File "coach.py", line 271, in
env_instance = create_environment(tuning_parameters)
File "/home/nina/test/RL/coach/environments/init.py", line 32, in create_environment
env = eval(env_type)(tuning_parameters)
File "/home/nina/test/RL/coach/environments/doom_environment_wrapper.py", line 132, in init
self.reset()
File "/home/nina/test/RL/coach/environments/environment_wrapper.py", line 171, in reset
self._update_state()
File "/home/nina/test/RL/coach/environments/doom_environment_wrapper.py", line 140, in _update_state
'measurements': state.game_variables,
File "/home/nina/test/RL/coach/environments/environment_wrapper.py", line 81, in observation
assert False
AssertionError

windows play

windows cmd run:python coach.py -r -p CartPole_PG
get error
��[93mWarning: failed to import the following packages - RoboSchool, Neon, ViZDoom, PyBullet, GymExtensions�[0m
�[30;46mPlease enter an experiment name: �[0m
if i just wann play gym 'Pendulum-v0' not RoboSchool, Neon, ViZDoom, PyBullet,What should I do?
installed :
Intel® Distribution for Python (python3.6)
tensorflow 1.3
annoy 1.9.1
Pillow 4.3.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.