Reinforcement Learning Exploration Baselines (RLeXplore)

RLeXplore is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, which can be deployed in arbitrary algorithms in a plug-and-play manner. In particular, RLeXplore is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.

Notice

This repo has been merged with a new project: https://github.com/RLE-Foundation/Hsuanwu, in which more reasonable implementations are provided!

Invoke the intrinsic reward module by:

from hsuanwu.xplore.reward import ICM, RIDE, ...

Module List

Module	Remark	Repr.	Visual	Reference
PseudoCounts	Count-Based exploration	✔️	✔️	Never Give Up: Learning Directed Exploration Strategies
ICM	Curiosity-driven exploration	✔️	✔️	Curiosity-Driven Exploration by Self-Supervised Prediction
RND	Count-based exploration	❌	✔️	Exploration by Random Network Distillation
GIRM	Curiosity-driven exploration	✔️	✔️	Intrinsic Reward Driven Imitation Learning via Generative Model
NGU	Memory-based exploration	✔️	✔️	Never Give Up: Learning Directed Exploration Strategies
RIDE	Procedurally-generated environment	✔️	✔️	RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
RE3	Entropy Maximization	❌	✔️	State Entropy Maximization with Random Encoders for Efficient Exploration
RISE	Entropy Maximization	❌	✔️	Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning
REVD	Divergence Maximization	❌	✔️	Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

🐌: Developing.

Repr.: The method involves representation learning.

Visual: The method works well in visual RL.

Example

Due to the large differences in the calculation of different intrinsic reward methods, Hsuanwu has the following rules:

The environments are assumed to be vectorized;
The compute_irs function of each intrinsic reward module has a mandatory argument samples, which is a dict like:
- obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>
- actions (n_steps, n_envs, action_shape) <class 'torch.Tensor'>
- rewards (n_steps, n_envs) <class 'torch.Tensor'>
- next_obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>

Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and its $k$-nearest neighbor within a mini-batch. Thus it suffices to provide obs data to compute the reward. The following code provides a usage example of RE3:

from hsuanwu.xplore.reward import RE3
from hsuanwu.env import make_dmc_env
import torch as th

if __name__ == '__main__':
    num_envs = 7
    num_steps = 128
    # create env
    env = make_dmc_env(env_id="cartpole_balance", num_envs=num_envs)
    print(env.observation_space, env.action_space)
    # create RE3 instance
    re3 = RE3(
        observation_space=env.observation_space,
        action_space=env.action_space
    )
    # compute intrinsic rewards
    obs = th.rand(size=(num_steps, num_envs, *env.observation_space.shape))
    intrinsic_rewards = re3.compute_irs(samples={'obs': obs})
    
    print(intrinsic_rewards.shape, type(intrinsic_rewards))
    print(intrinsic_rewards)

# Output:
# {'shape': [9, 84, 84]} {'shape': [1], 'type': 'Box', 'range': [-1.0, 1.0]}
# torch.Size([128, 7]) <class 'torch.Tensor'>

hsuanwudev / rl-exploration-baselines Goto Github PK

rl-exploration-baselines's Introduction

Reinforcement Learning Exploration Baselines (RLeXplore)

Notice

Module List

Example

rl-exploration-baselines's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent