openrl-lab / openrl Goto Github PK
View Code? Open in Web Editor NEWUnified Reinforcement Learning Framework
Home Page: https://openrl-docs.readthedocs.io
License: Apache License 2.0
Unified Reinforcement Learning Framework
Home Page: https://openrl-docs.readthedocs.io
License: Apache License 2.0
agent.save() is not well implemented. The saved file for nlp task is too large.
from openrl import ...
No response
v0.0.7
执行示例代码,报错了,好像是创建环境报错了
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
env = make("CartPole-v1", env_num=9) # 创建环境,并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000
env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env) # 训练好的智能体设置需要交互的环境
obs, info = env.reset() # 环境进行初始化,得到初始的观测值和环境信息
while True:
action, _ = agent.act(obs) # 智能体根据环境观测输入预测下一个动作
# 环境根据动作执行一步,得到下一个观测值、奖励、是否结束、环境信息
obs, r, done, info = env.step(action)
if any(done): break
env.close() # 关闭测试环境
File "/Users/env/venv/lib/python3.8/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
rewards, new_infos = self.reward_class.step_reward(data=extra_data)
File "/Users/env/venv/lib/python3.8/site-packages/openrl/rewards/base_reward.py", line 15, in step_reward
rewards = data["reward"].copy()
KeyError: 'reward'
MacOS 12.1 (21C52)
MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
2.3 GHz 四核Intel Core i7
16 GB 3733 MHz LPDDR4X
add MAT algorithm
paper link: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
No response
No response
No response
No response
Add Connect3 for self-play development
No response
No response
[Feature Request]add drone environment: https://github.com/utiasDSL/gym-pybullet-drones
No response
No response
[Feature Request] add quadruped robot dog environment
No response
No response
abstract data generator
abstract data generator
No response
[Feature Request] Support arena set seed
keep parallel and sequential modes get the same results.
No response
No response
add SAC algorithm
No response
No response
[Feature Request] merge dm_control env to gymnasium
No response
No response
1.Saving best model during training;
2.Saving checkpoint and loading checkpoint;
3.Custom stop condition,such as: number of episodes,mean rewards......
1.The final model may not be the best model;
2.Supports of checkpoints can be easy for tuning hyperparameters;
3.ppo_agent seems only support total time steps as the stop condition right now.
1.stable_baselines3.common.eval_callback,
2.stable_baselines3.common.checkpoint_callback,
3.rllib
[Feature Request]support JiDi evaluation
No response
No response
[Feature Request] Add AWR algorithm
No response
No response
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Links: https://github.com/microsoft/DeepSpeed
No response
No response
No response
No response
Add off policy algorithm, and take dqn as an example
No response
No response
No response
No response
add A2C algorithm in openrl
No response
No response
No response
No response
Add Retro Environment
No response
No response
Tutorials for how to add custom environment
add introduction to OpenRL Wrappers
This issue will keep tracking of OpenRL's updates of the subsequent few versions:
v1.0.0
v0.2.1 (in progress)
v0.2.0
0.1.9
0.1.8
0.1.7
0.1.6
0.1.5
0.1.4
0.1.3
0.1.2
0.1.1
0.1.0
0.0.15
0.0.14
0.0.13
0.0.12
0.0.11
0.0.10
0.0.9
0.0.8
0.0.7
v0.0.6
It should be "rewards" instead of "reward".
def step_reward( self, data: Dict[str, Any] ) -> Union[np.ndarray, List[Dict[str, Any]]]: print(data.keys()) rewards = data["reward"].copy() infos = []
dict_keys(['values', 'action_log_probs', 'step', 'buffer', 'actions', 'obs', 'rewards', 'dones', 'infos'])
from openrl import ...
Traceback (most recent call last):
File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
agent = train()
File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
agent.train(total_time_steps=20000)
File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
driver.run()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
self._inner_loop()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
rollout_infos = self.actor_rollout()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
obs, rewards, dones, infos = self.envs.step(actions, extra_data)
File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
returns = self.env.step(action, extra_data)
File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
rewards, new_infos = self.reward_class.step_reward(data=extra_data)
File "/data/workspace/openrl/openrl/rewards/base_reward.py", line 16, in step_reward
rewards = data["reward"].copy()
KeyError: 'reward'
No response
[Feature Request] Implement checkpoint for selfplay
No response
No response
pettingzoo is multi-agent environment.
link: pettingzoo
No response
No response
No response
No response
add ignore to make lint
No response
No response
add SMAC envs
paper link:The StarCraft Multi-Agent Challenge
No response
No response
add behavior cloning method
No response
No response
No response
No response
Here the "leaky_relu" is written twice.
No response
No response
add QMIX
No response
No response
Add vdn algorithm, including vdn_net, vdn_module, etc.
No response
No response
Add DDPG algorithm
No response
No response
Make registration of gym and gymnasium based env more easily
No response
No response
the gymnasium is too old, please update
No response
No response
callbacks info log to wandb
No response
No response
The variable 'step_rew_funcs' is not defined in BaseReward class. In addition, the 'step' function of the RewardWrapper class should also be modified accordingly.
'
if extra_data:
extra_data.update({"actions": action})
extra_data.update({"obs": obs})
extra_data.update({"rewards": rewards})
extra_data.update({"dones": dones})
extra_data.update({"infos": infos})
rewards, new_infos = self.reward_class.step_reward(data=extra_data)
num_envs = len(infos)
for i in range(num_envs):
infos[i].update(new_infos[i])
'
from openrl import ...
Traceback (most recent call last):
File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 39, in <module>
agent = train()
File "/data/workspace/openrl/examples/cartpole/train_ppo.py", line 17, in train
agent.train(total_time_steps=20000)
File "/data/workspace/openrl/openrl/runners/common/ppo_agent.py", line 134, in train
driver.run()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 227, in run
self._inner_loop()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
rollout_infos = self.actor_rollout()
File "/data/workspace/openrl/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
obs, rewards, dones, infos = self.envs.step(actions, extra_data)
File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
returns = self.env.step(action, extra_data)
File "/data/workspace/openrl/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 50, in step
infos[i].update(new_infos[i])
IndexError: list index out of range
No response
Add Super Mario Environment
No response
No response
[Feature Request] selfplay support more than two players
No response
No response
[Feature Request] Add A2PO algorithm
No response
No response
train agents via self-playing
No response
No response
No response
No response
"E:\Users\xing\Anaconda3\lib\site-packages\openrl-0.0.7-py3.7.egg\openrl\envs\vec_env\wrappers\base_wrapper.py", line 263
return results[0], reward, *results[2:]
SyntaxError: invalid syntax
from openrl.envs.common import make
from openrl.envs.common import make
from openrl.envs.common import make
return results[0], reward, *results[2:]
^
SyntaxError: invalid syntax
No response
[Feature Request] optimizer offlineenv to shuffle the trajectory
No response
No response
[Feature Request] improve test conver rate
No response
No response
[Feature Request] add Envpool environment
Envpool: https://github.com/sail-sg/envpool
No response
No response
add isaac gym
No response
No response
optimize replayer buffer
No response
No response
base_reward.py中 rewards = data["reward"].copy(),没有reward,reward_wrapper.py是rewards,修改之后还有其他问题。。。
# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
env = make("CartPole-v1", env_num=9) # 创建环境,并设置环境并行数为9
net = Net(env) # 创建神经网络
agent = Agent(net) # 初始化训练器
agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000
Traceback (most recent call last):
File "/home/user/code/python/train_ppo.py", line 9, in <module>
agent.train(total_time_steps=20000) # 开始训练,并设置环境运行总步数为20000
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/runners/common/ppo_agent.py", line 134, in train
driver.run()
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 227, in run
self._inner_loop()
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 112, in _inner_loop
rollout_infos = self.actor_rollout()
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/drivers/onpolicy_driver.py", line 189, in actor_rollout
obs, rewards, dones, infos = self.envs.step(actions, extra_data)
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/vec_monitor_wrapper.py", line 37, in step
returns = self.env.step(action, extra_data)
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/envs/vec_env/wrappers/reward_wrapper.py", line 46, in step
rewards, new_infos = self.reward_class.step_reward(data=extra_data)
File "/home/user/anaconda3/envs/OpenRL/lib/python3.9/site-packages/openrl/rewards/base_reward.py", line 18, in step_reward
rewards = data["reward"].copy()
KeyError: 'reward'
No response
[Feature Request] build docker with git action
No response
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.