openrl-lab / openrl Goto Github PK

Unified Reinforcement Learning Framework

Home Page: https://openrl-docs.readthedocs.io

License: Apache License 2.0

Python 99.74% Makefile 0.10% Shell 0.12% Dockerfile 0.04%

distributed-reinforcement-learning embodied google-research-football gym gymnasium multi-agent-reinforcement-learning pytorch reinforcement-learning robotics self-playing

openrl's People

Contributors

Stargazers

Watchers

openrl's Issues

optimize agent save

🐛 Bug

agent.save() is not well implemented. The saved file for nlp task is too large.

To Reproduce

from openrl import ...

Relevant log output / Error message

No response

System Info

v0.0.7

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

[Bug]: 执行示例代码，报错 KeyError: 'reward'

🐛 Bug

执行示例代码，报错了，好像是创建环境报错了

To Reproduce

train_ppo.py

from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

env = make("CartPole-v1", env_num=9) # 创建环境，并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练，并设置环境运行总步数为20000

创建用于测试的环境，并设置环境并行数为9，设置渲染模式为group_human

env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env) # 训练好的智能体设置需要交互的环境
obs, info = env.reset() # 环境进行初始化，得到初始的观测值和环境信息
while True:
action, _ = agent.act(obs) # 智能体根据环境观测输入预测下一个动作
# 环境根据动作执行一步，得到下一个观测值、奖励、是否结束、环境信息
obs, r, done, info = env.step(action)
if any(done): break
env.close() # 关闭测试环境

Relevant log output / Error message

File "/Users/env/venv/lib/python3.8/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/Users/env/venv/lib/python3.8/site-packages/openrl/rewards/base_reward.py", line 15, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

MacOS 12.1 (21C52)
MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
2.3 GHz 四核Intel Core i7
16 GB 3733 MHz LPDDR4X

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

add MAT algorithm

🚀 Feature

add MAT algorithm

paper link: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add connect3

🚀 Feature

Add Connect3 for self-play development

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request]add drone environment

🚀 Feature

[Feature Request]add drone environment: https://github.com/utiasDSL/gym-pybullet-drones

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add quadruped robot dog environment

🚀 Feature

[Feature Request] add quadruped robot dog environment

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] abstract data generator

🚀 Feature

abstract data generator

Motivation

abstract data generator

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] support arena set seed

🚀 Feature

[Feature Request] Support arena set seed

keep parallel and sequential modes get the same results.

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add SAC algorithm

🚀 Feature

add SAC algorithm

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] merge dm_control env to gymnasium

🚀 Feature

[Feature Request] merge dm_control env to gymnasium

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Feature request for saving best model，saving checkpoints，and custom stop condition

🚀 Feature

1.Saving best model during training；
2.Saving checkpoint and loading checkpoint；
3.Custom stop condition，such as: number of episodes，mean rewards......

Motivation

1.The final model may not be the best model；
2.Supports of checkpoints can be easy for tuning hyperparameters；
3.ppo_agent seems only support total time steps as the stop condition right now.

Additional context

1.stable_baselines3.common.eval_callback,
2.stable_baselines3.common.checkpoint_callback,
3.rllib

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request]support JiDi evaluation

🚀 Feature

[Feature Request]support JiDi evaluation

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add AWR algorithm

🚀 Feature

[Feature Request] Add AWR algorithm

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

deepspeed support

🚀 Feature

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Links: https://github.com/microsoft/DeepSpeed

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add off policy algorithm template

🚀 Feature

Add off policy algorithm, and take dqn as an example

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add docker image

🚀 Feature

add docker image

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

add a2c algorithm

🚀 Feature

add A2C algorithm in openrl

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add Retro Environment

🚀 Feature

Add Retro Environment

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

Tutorials for how to add custom environment

📚 Documentation

Tutorials for how to add custom environment

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

add introduction to OpenRL Wrappers

📚 Documentation

add introduction to OpenRL Wrappers

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

Roadmap for OpenRL

This issue will keep tracking of OpenRL's updates of the subsequent few versions:

v1.0.0

support multi-machine/node training

v0.2.1 (in progress)

#296
#15

v0.2.0

0.1.9

0.1.8

0.1.7

#235
#240

0.1.6

#230

0.1.5

0.1.4

0.1.3

#212

0.1.2

#201
#209

0.1.1

0.1.0

#177
#182

0.0.15

#169
#174

0.0.14

#158
#161

0.0.13

0.0.12

0.0.11

0.0.10

0.0.9

0.0.8

improve code testing coverage
#40
#41
fix minor bugs

0.0.7

v0.0.6

[Bug]: incorrect key “reward” of data in base_reward.py:L15

🐛 Bug

It should be "rewards" instead of "reward".
def step_reward( self, data: Dict[str, Any] ) -> Union[np.ndarray, List[Dict[str, Any]]]: print(data.keys()) rewards = data["reward"].copy() infos = []
dict_keys(['values', 'action_log_probs', 'step', 'buffer', 'actions', 'obs', 'rewards', 'dones', 'infos'])

To Reproduce

from openrl import ...

Relevant log output / Error message

Traceback (most recent call last):
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
    agent = train()
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
    agent.train(total_time_steps=20000)
  File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/data/workspace/openrl/openrl/rewards/base_reward.py", line 16, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

[Feature Request] Implement checkpoint for selfplay

🚀 Feature

[Feature Request] Implement checkpoint for selfplay

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

support for pettingzoo

🚀 Feature

pettingzoo is multi-agent environment.

link: pettingzoo

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add ignore to make lint

🚀 Feature

add ignore to make lint

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add SMAC envs

🚀 Feature

add SMAC envs
paper link：The StarCraft Multi-Agent Challenge

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add behavior cloning method

🚀 Feature

add behavior cloning method

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] load model from sb3 and OpenDILab

🚀 Feature

load model from https://huggingface.co/sb3
load model from https://huggingface.co/OpenDILabCommunity

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

minor bug

openrl/openrl/modules/networks/utils/mlp.py

Line 16 in 5d16780

["tanh", "relu", "leaky_relu", "leaky_relu"][activation_id]

Here the "leaky_relu" is written twice.

[Feature Request] Add cpu number check in make

🚀 Feature

Add CPU number check to make function.
if the user wants to allocate larger environment number than their CPU numbers during asynchronous mode, raise the error.

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add QMIX

🚀 Feature

add QMIX

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add vdn algorithm

🚀 Feature

Add vdn algorithm, including vdn_net, vdn_module, etc.

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

Add DDPG-Beta

🚀 Feature

Add DDPG algorithm

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Question] `.gitignore` seems not complete

❓ Question

When I use Vscode to view the source code, I find that .vscode/ is not ignored by .gitignore . It may cause inconvenience if anyone what to submit a PR.

[Feature Request] make registration of gym and gymnasium based env more easily

🚀 Feature

Make registration of gym and gymnasium based env more easily

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] request new gymnasium version

🚀 Feature

the gymnasium is too old, please update

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] callbacks info log to wandb

🚀 Feature

callbacks info log to wandb

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Bug]: The variable 'step_rew_funcs' is not defined in BaseReward class.

🐛 Bug

The variable 'step_rew_funcs' is not defined in BaseReward class. In addition, the 'step' function of the RewardWrapper class should also be modified accordingly.
'
if extra_data:
extra_data.update({"actions": action})
extra_data.update({"obs": obs})
extra_data.update({"rewards": rewards})
extra_data.update({"dones": dones})
extra_data.update({"infos": infos})
rewards, new_infos = self.reward_class.step_reward(data=extra_data)

        num_envs = len(infos)
        for i in range(num_envs):
            infos[i].update(new_infos[i])

To Reproduce

from openrl import ...

Relevant log output / Error message

Traceback (most recent call last):
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
    agent = train()
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
    agent.train(total_time_steps=20000)
  File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 50, in step
    infos[i].update(new_infos[i])
IndexError: list index out of range

System Info

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

[Feature Request] Add Super-Mario Environment

🚀 Feature

Add Super Mario Environment

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] selfplay support more than two players

🚀 Feature

[Feature Request] selfplay support more than two players

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] Add A2PO algorithm

🚀 Feature

[Feature Request] Add A2PO algorithm

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

support for self-play training

🚀 Feature

train agents via self-playing

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Bug]: SyntaxError: invalid syntax

🐛 Bug

"E:\Users\xing\Anaconda3\lib\site-packages\openrl-0.0.7-py3.7.egg\openrl\envs\vec_env\wrappers\base_wrapper.py", line 263
return results[0], reward, *results[2:]
SyntaxError: invalid syntax

To Reproduce

from openrl.envs.common import make

Relevant log output / Error message

from openrl.envs.common import make

from openrl.envs.common import make
return results[0], reward, *results[2:]
                               ^
SyntaxError: invalid syntax

System Info

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

[Feature Request] optimizer offlineenv to shuffle the trajectory

🚀 Feature

[Feature Request] optimizer offlineenv to shuffle the trajectory

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] improve test cover rate

🚀 Feature

[Feature Request] improve test conver rate

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add Envpool environment

🚀 Feature

[Feature Request] add Envpool environment

Envpool: https://github.com/sail-sg/envpool

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] add isaac gym

🚀 Feature

add isaac gym

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Feature Request] optimize replay buffer

🚀 Feature

optimize replayer buffer

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

[Bug]: KeyError: 'reward'

🐛 Bug

base_reward.py中 rewards = data["reward"].copy()，没有reward，reward_wrapper.py是rewards，修改之后还有其他问题。。。

To Reproduce

# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

env = make("CartPole-v1", env_num=9) # 创建环境，并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练，并设置环境运行总步数为20000

Relevant log output / Error message

Traceback (most recent call last):
  File "/home/user/code/python/train_ppo.py", line 9, in <module>
    agent.train(total_time_steps=20000) # 开始训练，并设置环境运行总步数为20000
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/rewards/base_reward.py", line 18, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I have version numbers, operating system and environment, where applicable

[Feature Request] build docker with git action

🚀 Feature

[Feature Request] build docker with git action

Motivation

No response

Additional context

No response

Checklist

I have checked that there is no similar issues in the repo
I have read the documentation

openrl-lab / openrl Goto Github PK

openrl's People

Contributors

Stargazers

Watchers

Forkers

openrl's Issues

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

🐛 Bug

To Reproduce

train_ppo.py

创建用于测试的环境，并设置环境并行数为9，设置渲染模式为group_human

Relevant log output / Error message

System Info

Checklist

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature

Motivation

Additional context

Checklist

🚀 Feature