Git Product home page Git Product logo

a2c's Introduction

A2C

An implementation of Synchronous Advantage Actor Critic (A2C) in TensorFlow. A2C is a variant of advantage actor critic introduced by OpenAI in their published baselines. However, these baselines are difficult to understand and modify. So, I made the A2C based on their implementation but in a clearer and simpler way.

What's new to OpenAI Baseline?

  1. Support for Tensorboard visualization per running agent in an environment.
  2. Support for different policy networks in an easier way.
  3. Support for environments other than OpenAI gym in an easy way.
  4. Support for video generation of an agent acting in the environment.
  5. Simple and easy code to modify and begin experimenting. All you need to do is plug and play!

Asynchronous vs Synchronous Advantage Actor Critic

Asynchronous advantage actor critic was introduced in Asynchronous Methods for Deep Reinforcement Learning. The difference between both methods is that in asynchronous AC, parallel agents update the global network each one on its own. So, at a certain time, the weights used by an agent maybe different than the weights used by another agent leading to the fact that each agent plays with a different policy to explore more and more of the environment. However, in synchronous AC, all of the updates by the parallel agents are collected to update the global network. To encourage exploration, stochastic noise is added to the probability distribution of the actions predicted by each agent.



Environments Supported

This implementation allows for using different environments. It's not restricted to OpenAI gym environments. If you want to attach the project to another environment rather than that provided by gym, all you have to do is to inherit from the base class BaseEnv in envs/base_env.py, and implement all the methods in a plug and play fashion (See the gym environment example class). You also have to add the name of the new environment class in A2C.py\env_name_parser() method.

The methods that should be implemented in a new environment class are:

  1. make() for creating the environment and returning a reference to it.
  2. step() for taking a step in the environment and returning a tuple (observation images, reward float value, done boolean, any other info).
  3. reset() for resetting the environment to the initial state.
  4. get_observation_space() for returning an object with attribute tuple shape representing the shape of the observation space.
  5. get_action_space() for returning an object with attribute n representing the number of possible actions in the environment.
  6. render() for rendering the environment if appropriate.

Policy Networks Supported

This implementation comes with the basic CNN policy network from OpenAI baseline. However, it supports using different policy networks. All you have to do is to inherit from the base class BasePolicy in models\base_policy.py, and implement all the methods in a plug and play fashion again :D (See the CNNPolicy example class). You also have to add the name of the new policy network class in models\model.py\policy_name_parser() method.

Tensorboard Visualization

This implementation allows for the beautiful Tensorboard visualization. It displays the time plots per running agent of the two most important signals in reinforcement learning: episode length and total reward in the episode. All you have to do is to launch Tensorboard from your experiment directory located in experiments/.

tensorboard --logdir=experiments/my_experiment/summaries


Video Generation

During training, you can generate videos of the trained agent acting (playing) in the environment. This is achieved by changing record_video_every in the configuration file from -1 to the number of episodes between two generated videos. Videos are generated in your experiment directory.

During testing, videos are generated automatically if the optional monitor method is implemented in the environment. As for the gym included environment, it's already been implemented.

Usage

Main Dependencies

Python 3 or above
tensorflow 1.3.0
numpy 1.13.1
gym 0.9.2
tqdm 4.15.0
bunch 1.0.1
matplotlib 2.0.2
Pillow 4.2.1

Run

python main.py config/test.json

The file 'test.json' is just an example of a file having all parameters to train on environments. You can create your own configuration file for training/testing.

In the project, two configuration files are provided as examples for training on Pong and Breakout Atari games.

Results

Model Game Average Score Max Score
CNNPolicy Pong 17 21
CNNPolicy Breakout 650 850

Updates

  • Inference and training are working properly.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Reference Repository

OpenAI Baselines

a2c's People

Contributors

mg2033 avatar moemen95 avatar mrgemy95 avatar o-tawab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

a2c's Issues

LSTM policy

Does the current model implementation also include an lstm policy?

Model doesn't make use of the GPU

I have started training the model on breakout and it is a little slow. It is only using around 500 MB of the GPU. Even when increasing the number of environments to 20 the use of the GPU is the same. I think this may be the reason openAI coded their model the way they did. It uses around 7GB at least for the ACER model. I need to check for A2C.

About updating.

Thank you for publishing your A2C codes.
In the updating block, you are using torch de-touch method. And it seems to me as same as stop using no grad method on calculating advantage like my [code](Thank you for publishing your A2C codes.
In the updating block, you are using torch de-touch method. And it seems to me as same as stop using no grad method on calculating advantage like my code.
But my code doesn't learn at all. Is my idea wrong?
Thanks.).
But my code doesn't learn at all. Is my idea wrong?
Thanks.

num_env problem

Hello, I have read the code carefully, and I have some doubts about num_env.

1: If this parameter equal to 4, is it equivalent to training four models? Or is it something like accelerated training?

2: I used openai baseline and get one summary when using 8 num_envs to train one model but I get 4 summaries when the num envs is 4 and using your code . I read the loger code of openAI and your code , I found that openAI add all infos of all envs to one summary but your code add info to its own FileWriter summary . is it right ? If I only want one summary , can i simply add all infos ? is this right ? if not , How can i get only one summry when i use multiply envs to train one model ?

3: when I test pong using A2C , it cost about 8k to coverage , but when I use openAI baseline ,it only costs about 500 steps to coverage , this makes me very confused .

Any suggestions ?

Bests.

Dose A2C support experience replay?

I read your code and implement a version with experience replay.
However, I find that the loss explode after a few frames(almost 1000). Value loss would be very large and action loss would be very negatively large.Is it code error or A2C doesn't support experience replay in theory?

config parameters

Sir, can you please clarify what is the use of unroll_time_steps and num_stack and config parameters ?

number_of_classes

self.img_height, self.img_width, self.num_classes = observation_space_params

Hello sir,

I was trying to understand your code, but got confused, what is num_classes ? Is it meant to be a number of channels in an input image ? (3 for RGB and 1 for gray-scale) ? If so, I was very confused as you refer it as num_classes through out the project.

Help running code

I am not sure what i am doing wrong but I am in the A2C folder and when I run:

(gym) teves@teves:~/A2C$ python main.py config/breakout.json
usage: main.py [-h] [--version] [--config CONFIG]
main.py: error: unrecognized arguments: config/breakout.json
Add a config file using '--config file_name.json'

or if I run:

(gym) teves@teves:~/A2C$ python main.py --config config/breakout.json
Add a config file using '--config file_name.json'

How shall I run this?

Time to converge

Could you elaborate in the Readme on how much time/episodes does it take to converge on the environments?

Update TF to version 2.

Since the TF 2.0 Keras API has been frozen for beta, it's possible to convert the code to TF2 without fear of having to deal with API changes in the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.