Git Product home page Git Product logo

rlcycle's Introduction

RLcycle

License: MIT

Code style: black Total alerts

RLcycle (pronounced as "recycle") is a reinforcement learning (RL) agents framework. RLcycle provides ready-made RL agents, as well as reusable components for easy prototyping.

Currently, RLcycle provides:

  • DQN + enhancements, Distributional: C51, Quantile Regression, Rainbow-DQN.
  • Noisy Networks for parameter space noise
  • A2C (data parallel) and A3C (gradient parallel).
  • DDPG, both Lillicrap et al. (2015) and Fujimoto et al., (2018) versions.
  • Soft Actor Critic with automatic entropy coefficient tuning.
  • Prioritized Experience Replay and n-step updates for all off-policy algorithms.

RLcycle uses:

  • PyTorch for computations and building and optimizing models.
  • Hydra for configuring and building agents.
  • Ray for parallelizing learning.
  • WandB for logging training and testing.

See below for an introduction and guide to using RLcycle, performance benchmarks, and future plans.

Contributing

If you have any questions or suggestions, feel free to raise an issue or reach out at cjy2129 [at] columbia [dot] edu!

Getting Started

To install:

conda create --name myenv python=3.6.9 pip
conda activate myenv
git clone https://github.com/cyoon1729/RLcycle.git
cd RLcycle
pip install -U -r requirements.txt
pip install -e .
0. A quick look at Hydra for managing configurations Let's first take a look at one of the many useful things Hydra can do:
"""Instantiating classes from yaml files"""

# in ./examples/rectangle.yaml
shape:
    class: examples.shapes.Rectangle
        params:
            height: 5
            width: 4

Initializing shapes.Rectangle with hydra as in the above yaml file:

"""Instantiating classes from yaml files"""

# in ./examples/shapes.py
class Rectangle:
    def __init__(self, width: float, height: float):
        self.width = width
        self.height = height
       
    def get_area(self):
        return width * height

# in ./examples/main.py
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="./examples/rectangle.yaml")
def main(cfg: DictConfig):
    shape = hydra.utils.instantiate(layer_info)
    print(shape.__class__.__name__)  # 'Rectangle'
    print(shape.get_area()) # 20
    
if __main__ == "__main__":
    main()

If you would like to know more about Hydra, check their documentation out!

1. Running experiments

Run the run_agent.py file and specify the experiment configuration as below:

python run_agent.py configs=atari/rainbow_dqn

Alternatively, you can specify the configuration (yaml) file in metaconfig.yaml.

# in ./metaconfig.yaml
defaults:
 - configs=atari/rainbow_dqn

To modify experiment arguments or hyperparameters, you can add the flags as below:

python run_agent.py configs=atari/rainbow_dqn configs.experiment_info.env.name=AlienNoFrameskip-v4

python run_agent.py configs=atari/rainbow_dqn configs.hyper_params.batch_size=64

python run_agent.py configs=pybullet/sac configs.hyper_params.batch_size=64
2. Building configurations for RLcycle Let's take `atari/rainbow_dqn.yaml` for example:
experiment_info:
  experiment_name: Rainbow DQN
  agent: rlcycle.dqn_base.agent.DQNBaseAgent
  learner: rlcycle.dqn_base.learner.DQNLearner
  loss: rlcycle.dqn_base.loss.CategoricalLoss
  action_selector: rlcycle.dqn_base.action_selector.CategoricalActionSelector
  device: cuda
  log_wandb: True

  # Environment info
  env:
    name: "PongNoFrameskip-v4"
    is_atari: True
    is_discrete: True
    frame_stack: True

  # Experiment default arguments:
  total_num_episodes: 5000
  test_interval: 100  # Test every 50 episodes
  test_num: 5  # Number of episodes to test during test phase
  render_train: False  # Render all episode steps during training
  render_test: True # Render tests

defaults:
  - hyper_params: rainbow
  - models: duelingC51

under experiment_info, we have the fundamental arguments for running RL experiments: which classes (agent, learner, loss) we want to use, and the gym evironment and experiment configurations.

The defaults points to rlcycle/configs/atari/hyper_params/rainbow.yaml for hyperparameters, and rlcycle/configs/atari/models/duelingC51.yaml for model configurations. Taking a closer look at these files, we have:

hyper_params:
  batch_size: 64
  replay_buffer_size: 100000 
  use_per: False
  per_alpha: 0.5  # PER alpha value
  per_beta: 0.4  # PER beta value
  per_beta_max: 1.0
  per_beta_total_steps: 300000

  # Exploration configs
  eps: 1.0  # epsilon-greedy exploration
  eps_final: 0.0  # minimum epsilon value for exploration
  max_exploration_frame: 100000  # eps = eps_final at most until # steps

  # Others
  update_starting_point: 40000 # update steps when buffer has # experiences stored
  gamma: 0.99
  tau: 0.005
  q_reg_coeff: 0.0
  gradient_clip: 10.0
  n_step: 3
  train_freq: 4

  # Optimizer
  learning_rate: 0.0000625 # 0.0003
  weight_decay: 0.0
  adam_eps: 0.00015

a pretty standard organization for RL experiment hyperparameters, and

model:
  class: rlcycle.common.models.value.DuelingCategoricalDQN
  params:
    model_cfg:
      state_dim: undefined
      action_dim: undefined
      num_atoms: 51
      v_min: -10
      v_max: 10

      use_conv: True
      use_noisy: True
      conv_features:
        feature1:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 4
             output_size: 32
             kernel_size: 8
             stride: 4
             activation_fn: relu
        feature2:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 32
             output_size: 64
             kernel_size: 4
             stride: 2
             activation_fn: relu
        feature3:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 64
             output_size: 64
             kernel_size: 3
             stride: 1
             activation_fn: relu   
             
      advantage:
        fc1:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: undefined
            output_size: 512
            post_activation_fn: relu
        fc2:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: 512
            output_size: undefined
            post_activation_fn: identity

      value:
        fc1:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: undefined
            output_size: 512
            post_activation_fn: identity
        fc2:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: 512
            output_size: 1
            post_activation_fn: identity

where we define the parameters for the model and each of its layers. Note that the fields with values "undefined" will be defined inside the respective python object. For a more simple model yaml configuration file, take a look at rlcycle/configs/atari/models/dqn.yaml.

Extra (Good to know): How RLcycle instantiates components (models, learners, agents, etc)

Most of the components in RLcycle are instantiated via hydra.utils.instantiate, as illustrated in the section above. Examples:

  1. models:
"""in ./configs/lunarlander/models/dqn.yaml"""
model:
  class: rlcycle.common.models.value.DQNModel
  params:
    model_cfg:
      state_dim: undefined
      action_dim: undefined
      fc:
        input:
          class: rlcycle.common.models.layers.LinearLayer
          params: 
            input_size: undefined
            output_size: 128
            post_activation_fn: relu           
        hidden:
          hidden1:
            class: rlcycle.common.models.layers.LinearLayer
            params: 
              input_size: 128
              output_size: 128
              post_activation_fn: relu
          hidden2:
            class: rlcycle.common.models.layers.LinearLayer
            params: 
              input_size: 128
              output_size: 128
              post_activation_fn: relu
        output:
          class: rlcycle.common.models.layers.LinearLayer
          params:
            input_size: 128
            output_size: undefined
            post_activation_fn: identity

Building the above model:

"""in ./rlcycle/build.py"""
def build_model(model_cfg: DictConfig, device: torch.device):
    """Build model from DictConfigs via hydra.utils.instantiate()"""
    model = hydra.utils.instantiate(model_cfg)
    return model.to(device)
  1. learners:
# in ./rlcycle/dqn_base/agent.py
#....
self.learner = build_learner(
    self.experiment_info, self.hyper_params, self.model_cfg
)
# ...

# in ./rlcycle/build.py
def build_learner(
    experiment_info: DictConfig, hyper_params: DictConfig, model: DictConfig
):
    """Build learner from DictConfigs via hydra.utils.instantiate()"""
    learner_cfg = DictConfig(dict())
    learner_cfg["class"] = experiment_info.learner
    learner_cfg["params"] = dict(
        experiment_info=experiment_info,
        hyper_params=hyper_params,
        model_cfg=model
    )
    learner = hydra.utils.instantiate(learner_cfg)
    return learner
  1. agents:
# in ./tests/test_dqn.py
@hydra.main(config_path="../configs/lunarlander/dqn.yaml", strict=False)
def main(cfg: DictConfig):
    agent = build_agent(**cfg)
    agent.train()

# in ./rlcycle.build.py
def build_agent(
    experiment_info: DictConfig, hyper_params: DictConfig, model: DictConfig
):
    """Build agent from DictConfigs via hydra.utils.instantiate()"""
    agent_cfg = DictConfig(dict())
    agent_cfg["class"] = experiment_info.agent
    agent_cfg["params"] = dict(
        experiment_info=experiment_info,
        hyper_params=hyper_params,
        model_cfg=model
    )
    agent = hydra.utils.instantiate(agent_cfg)
    return agent

This should help you get started with building & running agents in RLcycle!

Benchmarks

Hyperparameters were not rigorously tuned; most follow those presented in the original papers, with some modifications to work around memory usage.

Click the dropdown below!

Atari PongNoFrameskip-v4

For more information, visit the WandB log .

atari-pong

Atari BreakoutNoFrameskip-v4 (Will be ready soon)
PyBullet Reacher-v2

For more information, visit the WandB log .

reacher

PyBullet HalfCheetah-v2 (Will be ready soon)

Future Plans

Below are some things I hope to incorporate to RLcycle:

  • TRPO and PPO (medium priority)
  • IQN (low priority)
  • Compatibility with my distributed RL framework distributedRL. (i.e. Ape-X for all off-policy algorithms). (high priority)

References

Repositories

Papers

  1. "Human-level control through deep reinforcement learning." Mnih et al., 2015.
  2. "Dueling Network Architectures for Deep Reinforcement Learning." Wang et al., 2015.
  3. "Prioritized Experience Replay." Schaul et al., 2015.
  4. "Noisy Networks for Exploration." Fortunato et al., 2017.
  5. "A Distributional Perspective on Reinforcement Learning." Bellemare et al., 2017.
  6. "Rainbow: Combining Improvements in Deep Reinforcement Learning." Hessel et al., 2017.
  7. "Distributional Reinforcement Learning with Quantile Regression." Dabney et al., 2017.
  8. "Asynchronous methods for Deep Reinforcement Learning." Mnih et al., 2016
  9. "Continuous control with deep reinforcement learning." Lillicrap et al., 2015.
  10. "Addressing function approximation error in actor-critic methods." Fujimoto et al., 2018
  11. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor." Haarnoja et al., 2018
  12. "Soft Actor-Critic Algorithms and Applications." Haarnoja et al., 2019"

rlcycle's People

Contributors

cyoon1729 avatar dependabot[bot] avatar lgtm-migrator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rlcycle's Issues

Weird memory leak behavior

DQN-based algorithms have memory leak while training on Atari. The leak doesn't always happen (inconsistent), or happens suddenly in the middle of training.

Why are there two q_net in the sac2018.py file?

This is a great job :)

I am confused about sac2018.py implementation,Why are there two q_net in the sac2018.py file? (code link)
I did not see this description in the original paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

In addition, I want to verify one thing. If I understand correctly, sac2018.py is the implementation of Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, and sac2019.py is the implementation of Soft Actor-Critic Algorithms and Applications. Looking forward to your reply.

Documentation

Hi,
Do you can add some documentation to your code? Or a HowTo? I was running your code but dont see that you save the learned model right? Because i would like to test it.

Agent initialize only synchronous Critic?

Copy critic target parameters

    for target_param, param in zip(self.critic_target.parameters(), self.critic.parameters()):
        target_param.data.copy_(param.data)

Actor need to sync?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.