Git Product home page Git Product logo

openrl-lab / openrl Goto Github PK

View Code? Open in Web Editor NEW
602.0 7.0 59.0 8.19 MB

Unified Reinforcement Learning Framework

Home Page: https://openrl-docs.readthedocs.io

License: Apache License 2.0

Python 99.74% Makefile 0.10% Shell 0.12% Dockerfile 0.04%
distributed-reinforcement-learning embodied google-research-football gym gymnasium multi-agent-reinforcement-learning pytorch reinforcement-learning robotics self-playing

openrl's People

Contributors

childtang avatar huangshiyu13 avatar ifqrrrr avatar kingjuno avatar strivebfq avatar wentsechen avatar yiwenai avatar zrz-sh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

openrl's Issues

optimize agent save

🐛 Bug

agent.save() is not well implemented. The saved file for nlp task is too large.

To Reproduce

from openrl import ...

Relevant log output / Error message

No response

System Info

v0.0.7

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

[Bug]: 执行示例代码,报错 KeyError: 'reward'

🐛 Bug

执行示例代码,报错了,好像是创建环境报错了

To Reproduce

train_ppo.py

from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

env = make("CartPole-v1", env_num=9) # 创建环境,并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000

创建用于测试的环境,并设置环境并行数为9,设置渲染模式为group_human

env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env) # 训练好的智能体设置需要交互的环境
obs, info = env.reset() # 环境进行初始化,得到初始的观测值和环境信息
while True:
action, _ = agent.act(obs) # 智能体根据环境观测输入预测下一个动作
# 环境根据动作执行一步,得到下一个观测值、奖励、是否结束、环境信息
obs, r, done, info = env.step(action)
if any(done): break
env.close() # 关闭测试环境

Relevant log output / Error message

File "/Users/env/venv/lib/python3.8/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/Users/env/venv/lib/python3.8/site-packages/openrl/rewards/base_reward.py", line 15, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

MacOS 12.1 (21C52)
MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
2.3 GHz 四核Intel Core i7
16 GB 3733 MHz LPDDR4X

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

[Feature Request] support arena set seed

🚀 Feature

[Feature Request] Support arena set seed

keep parallel and sequential modes get the same results.

Motivation

No response

Additional context

No response

Checklist

[Feature Request] Feature request for saving best model,saving checkpoints,and custom stop condition

🚀 Feature

1.Saving best model during training;
2.Saving checkpoint and loading checkpoint;
3.Custom stop condition,such as: number of episodes,mean rewards......

Motivation

1.The final model may not be the best model;
2.Supports of checkpoints can be easy for tuning hyperparameters;
3.ppo_agent seems only support total time steps as the stop condition right now.

Additional context

1.stable_baselines3.common.eval_callback,
2.stable_baselines3.common.checkpoint_callback,
3.rllib

Checklist

deepspeed support

🚀 Feature

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Links: https://github.com/microsoft/DeepSpeed

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

add a2c algorithm

🚀 Feature

add A2C algorithm in openrl

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

Roadmap for OpenRL

This issue will keep tracking of OpenRL's updates of the subsequent few versions:

v1.0.0

  • support multi-machine/node training

v0.2.1 (in progress)

v0.2.0

0.1.9

0.1.8

0.1.7

0.1.6

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

0.0.15

0.0.14

0.0.13

0.0.12

0.0.11

0.0.10

0.0.9

0.0.8

  • improve code testing coverage
  • #40
  • #41
  • fix minor bugs

0.0.7

v0.0.6

  • import hugging face models/datasets
  • dictionary observation support
  • multi-agent training support
  • dialog training support
  • fix existing bugs
  • add more examples
  • optimize Contributing docs
  • fix existing errors in openrl-docs

[Bug]: incorrect key “reward” of data in base_reward.py:L15

🐛 Bug

It should be "rewards" instead of "reward".
def step_reward( self, data: Dict[str, Any] ) -> Union[np.ndarray, List[Dict[str, Any]]]: print(data.keys()) rewards = data["reward"].copy() infos = []
dict_keys(['values', 'action_log_probs', 'step', 'buffer', 'actions', 'obs', 'rewards', 'dones', 'infos'])

To Reproduce

from openrl import ...

Relevant log output / Error message

Traceback (most recent call last):
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
    agent = train()
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
    agent.train(total_time_steps=20000)
  File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/data/workspace/openrl/openrl/rewards/base_reward.py", line 16, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

No response

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

support for pettingzoo

🚀 Feature

pettingzoo is multi-agent environment.

link: pettingzoo

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

[Feature Request] Add cpu number check in make

🚀 Feature

  • Add CPU number check to make function.
  • if the user wants to allocate larger environment number than their CPU numbers during asynchronous mode, raise the error.

Motivation

No response

Additional context

No response

Checklist

Add DDPG-Beta

🚀 Feature

Add DDPG algorithm

Motivation

No response

Additional context

No response

Checklist

[Question] `.gitignore` seems not complete

❓ Question

When I use Vscode to view the source code, I find that .vscode/ is not ignored by .gitignore . It may cause inconvenience if anyone what to submit a PR.
image

[Bug]: The variable 'step_rew_funcs' is not defined in BaseReward class.

🐛 Bug

The variable 'step_rew_funcs' is not defined in BaseReward class. In addition, the 'step' function of the RewardWrapper class should also be modified accordingly.
'
if extra_data:
extra_data.update({"actions": action})
extra_data.update({"obs": obs})
extra_data.update({"rewards": rewards})
extra_data.update({"dones": dones})
extra_data.update({"infos": infos})
rewards, new_infos = self.reward_class.step_reward(data=extra_data)

        num_envs = len(infos)
        for i in range(num_envs):
            infos[i].update(new_infos[i])

'

To Reproduce

from openrl import ...

Relevant log output / Error message

Traceback (most recent call last):
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
    agent = train()
  File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
    agent.train(total_time_steps=20000)
  File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 50, in step
    infos[i].update(new_infos[i])
IndexError: list index out of range

System Info

No response

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

support for self-play training

🚀 Feature

train agents via self-playing

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

[Bug]: SyntaxError: invalid syntax

🐛 Bug

"E:\Users\xing\Anaconda3\lib\site-packages\openrl-0.0.7-py3.7.egg\openrl\envs\vec_env\wrappers\base_wrapper.py", line 263
return results[0], reward, *results[2:]
SyntaxError: invalid syntax
2023-05-04_163415

To Reproduce

from openrl.envs.common import make

Relevant log output / Error message

from openrl.envs.common import make

from openrl.envs.common import make
return results[0], reward, *results[2:]
                               ^
SyntaxError: invalid syntax

System Info

No response

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

[Bug]: KeyError: 'reward'

🐛 Bug

base_reward.py中 rewards = data["reward"].copy(),没有reward,reward_wrapper.py是rewards,修改之后还有其他问题。。。
image
image

To Reproduce

# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

env = make("CartPole-v1", env_num=9) # 创建环境,并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000

Relevant log output / Error message

Traceback (most recent call last):
  File "/home/user/code/python/train_ppo.py", line 9, in <module>
    agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/runners/common/ppo_agent.py", line 134, in train
    driver.run()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 227, in run
    self._inner_loop()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
    rollout_infos = self.actor_rollout()
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
    obs, rewards, dones, infos = self.envs.step(actions, extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
    returns = self.env.step(action, extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
    rewards, new_infos = self.reward_class.step_reward(data=extra_data)
  File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/rewards/base_reward.py", line 18, in step_reward
    rewards = data["reward"].copy()
KeyError: 'reward'

System Info

No response

Checklist

  • I have checked that there is no similar issues in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have version numbers, operating system and environment, where applicable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.