starry-sky6688 / maddpg Goto Github PK

Pytorch implementation of the MARL algorithm, MADDPG, which correspondings to the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments".

Python 100.00%

maddpg's Introduction

MADDPG

This is a pytorch implementation of MADDPG on Multi-Agent Particle Environment(MPE), the corresponding paper of MADDPG is Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.

Requirements

python=3.6.5
Multi-Agent Particle Environment(MPE)
torch=1.1.0

Quick Start

$ python main.py --scenario-name=simple_tag --evaluate-episodes=10

Directly run the main.py, then the algrithm will be tested on scenario 'simple_tag' for 10 episodes, using the pretrained model.

Note

We have train the agent on scenario 'simple_tag', but the model we provide is not the best because we don't want to waste time on training, you can keep training it for better performence.
There are 4 agents in simple_tag, including 3 predators and 1 prey. we use MADDPG to train predators to catch the prey. The prey's action can be controlled by you, in our case we set it random.
The default setting of Multi-Agent Particle Environment(MPE) is sparse reward, you can change it to dense reward by replacing 'shape=False' to 'shape=True' in file multiagent-particle-envs/multiagent/scenarios/simple_tag.py/.

maddpg's People

Contributors

Stargazers

Watchers

maddpg's Issues

关于done的处理

我在MADDPG.train函数计算q_target部分中并没有对done为True或False进行处理呢，replay_buffer中也没有存储done的值，这样会不会有些不妥呢？

Is there a code to replace the network structure with RNN？

Package not found

The multiagent package is not found in utils.py,
def make_env(args):
from multiagent.environment import MultiAgentEnv
import multiagent.scenarios as scenarios

Is anything I got wrong?

simple_spread场景

作者你好！我在simple_tag上运行算法有效果，但是在simple_spread上运行能收敛，但是学出的策略非常差，不知道你有没有在这个场景上做过测试？（我跑过openai的官方代码，是能训练出效果的，但是那边是tensorflow实现的，害怕后续改进的时候调试比较麻烦）

关于Actor参数更新的问题

我看到MADDPG中在更新actor是用的是

而Critic网络中计算只是把状态和动作拼接在一起：

可按照论文伪代码中写的好像是乘？

这块我还不是太理解，希望您能解答。感谢

world_comm问题

请问我在运行simple_tag的时候是没有问题的但是在运行simple_world_comm的时候报错 AttributeError: 'MultiDiscrete' object has no attribute 'n' 有什么好的解决方法吗谢谢您

Ping 请求找不到主机 980557467a6447567958453142524552515279317459584e305a58493d.sub.deliverycontent.online。请检查该名称，然后重试。

Why is “Ping 请求找不到主机 980557467a6447567958453142524552515279317459584e305a58493d.sub.deliverycontent.online。请检查该名称，然后重试。”？ How to solve it？

其他两个球学不到正确的策略

我将ANN替换成了SNN，接口都和您的代码保持一致，但是回报率一直稳定在600左右，最后发现是只有一个球可以学到正确的策略，其他两个球都在原地打转，可以问一下是什么原因导致的吗

AttributeError: 'MultiDiscrete' object has no attribute 'n'

When i run 'python main.py --scenario-name=simple_world_comm --evaluate-episodes=10', i encounter the error.
I change the script from 'simple_tag' to 'simple_world_comm'.
How can i solve it, thanks.

epsilon 更新的问题

MADDPG/runner.py

Line 62 in 2c5a93f

self.epsilon = max(0.05, self.noise - 0.0000005)

是否应该改为 self.epsilon = max(0.05, self.epsilon - 0.0000005) 呢？

The actor loss in maddpg seems to be wrong.

Here there is no predefined actor_loss variable in advance.

MADDPG/maddpg/maddpg.py

Line 90 in b50ca15

actor_loss = - self.critic_network(o, u).mean()

RuntimeWarning: invalid value encountered in logaddexp

运行这个环境的时候您有遇到输出这个警告吗？

这个似乎是MPE环境中core.py出现的问题，您知道怎么解决吗

大佬请问这边当我先更新critic时再更新actor（论文是这样的）这个会报因为inplace操作导致梯度的更新失败。。真的改不出来了

    self.critic_optim.zero_grad()
    critic_loss.backward()
    self.critic_optim.step()

    self.actor_optim.zero_grad()
    actor_loss.backward()
    self.actor_optim.step()

当我把这个顺序调整后，这个会报错：因为inplace操作导致梯度的更新失败。。感激了

关于参数问题

作者您好，我看您的代码在AC网络定义的时候，定义了max_action，并将其赋值为high_action，我看其他文件中high_action的值为1，那么AC定义文件中的max_action似乎没有改变动作或者输出值。想请教一下为什么要定义一个max_action?

AttributeError: 'Box' object has no attribute 'n'

请问下，这个报错AttributeError: 'Box' object has no attribute 'n'怎么解决啊？
#1 这个提出的解决方法里说的OpenAI PE 指的是什么啊？

为什么这里maddpg/maddpy.py里其他agent的动作不变呢，王树森老师的《深度强化学习》书本里面说是要变的，请您解答一下，感谢~

关于增加agent数量后算法结构的调整

当把agent数量增大到10 以上后， critic 网络的输入量过大，学习效率低，是否应该对critic网络的结构进行修改（比如增大隐藏层神经元的数量或者增大层数），望赐教！

action_shape.append(content.n) AttributeError: 'Box' object has no attribute 'n'

Hi, when I first run the python main.py --scenario-name=simple_tag --evaluate-episodes=10, I met the problem as :
action_shape.append(content.n) AttributeError: 'Box' object has no attribute 'n'
Thanks a lot~

run是学习的部分，evaluate是评估，那在哪里训练呢？

load模型重载的地方好像没用到，请问在哪用呢

target_network里全连接层的weight/bias的require_grad=True

MADDPG/maddpg/maddpg.py

Line 69 in b50ca15

with torch.no_grad():

is there anyone has excuted this repository?

is there anyone has excuted this repository? how about the performance? i cannot get convergence in my case.

set evaluate==False ,no model saved in model

关于反方的奖励问题

您好，我看到您代码中反方的运动是随机的，在实际运行代码的时候反方的运动一般是在原地抖动，被正方agent撞击之后才会发生较大的轨迹变化，算是裹挟着走。
那么，能否给反方加上一个DDPG网络让反方也能够选择对自己奖励高的运动呢，还有就是如果加上之后，应该如何进行双方网络的优化训练呢？是设置我方和反方两个reward值分别进行训练吗？
如果加上之后，展示结果的reward随episode变化的函数需要分正方和反方进行展示吗？
谢谢！

关于GPU版本以及scenario问题

1.请问有GPU版本实现？
2.这个代码只是针对simple_tag这个场景吗？
3.num_adversaries是不是只在simple_tag中有意义，在别的场景下，比如simple_spread下需要设置为0？

你好，可以问下关于mpe环境的相关问题吗

environment中reset_callback,observation_callback,done_callback这种不是参数吗？为啥会被当成函数调用呢？

请问这个模型是用于连续动作场景还是离散动作场景呀

大佬你好，请问这个模型是用于连续动作场景还是离散动作场景呀。在代码中添加输出动作的取值，发现是连续的值。但是代码只有在env.discrete_action_space为True的场景下能跑得起来，如果改成False会报错。

MADDPG可以添加动作屏蔽吗

作者您好，想请教一下MADDPG上可以添加action mask来屏蔽当前禁止的动作吗？谢谢！

请问为什么在reference环境无法使用

非常感谢你的代码，这里有一个小问题，这个代码我在其他环境可以运行，但是在simple_reference还有simple_world_comm环境中报错，提示是‘multidiscrete’没有 n这个属性，请问您有解决的方法嘛？

你好，关于模型存储问题

maddpg文件中save_model函数存储名称和存储次数相关，但是init函数加载模型和存储次数无关，请问这个怎么理解呢？

AttributeError: 'MultiDiscrete' object has no attribute 'n'

The job is wonderful! But when I want to run "python main.py --scenario-name=simple_world_comm --evaluate-episodes=10",it show me an error like this, it seems that it can't deal with the problem with more than two kinds of agents.Is there anything I can do to solve the problem? Thank you!

agent.silent = false通信的问题

设置为agent.slient = false的时候，出现AttributeError: 'MultiDiscrete' object has no attribute 'n'问题
改回agent.slient = true时就可以顺利运行
请问出现这个问题是跟maddpg的结构有关吗

关于参数共享机制的使用

大佬你好，我想请问一下，如果想将参数共享机制运用到MADDPG中，那么是只需要创建一个Actor网络和一个Critic网络呢？还是应该创建一个Actor网络和n个Critic网络呢（n为智能体数量）？

学习曲线先好后坏

对于navigation task，训练3个agent 去合作趋近每个目标，

最后得到的训练曲线是上升后下降，不知道是什么原因，是因为训练的episodes不足吗？
对于cooperative的reward function 一般都是采用shared同一个reward的，对于这个reward的设计有什么要求吗？举个例子，对于navigation task，它的reward是对于每个landmark 求最小的agent到他的距离，为什么不能是对于每个agent，求到最小的landmark的距离作为reward 了？

麻烦大佬赐教