Git Product home page Git Product logo

maddpg's Introduction

MADDPG

This is a pytorch implementation of MADDPG on Multi-Agent Particle Environment(MPE), the corresponding paper of MADDPG is Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.

Requirements

Quick Start

$ python main.py --scenario-name=simple_tag --evaluate-episodes=10

Directly run the main.py, then the algrithm will be tested on scenario 'simple_tag' for 10 episodes, using the pretrained model.

Note

  • We have train the agent on scenario 'simple_tag', but the model we provide is not the best because we don't want to waste time on training, you can keep training it for better performence.

  • There are 4 agents in simple_tag, including 3 predators and 1 prey. we use MADDPG to train predators to catch the prey. The prey's action can be controlled by you, in our case we set it random.

  • The default setting of Multi-Agent Particle Environment(MPE) is sparse reward, you can change it to dense reward by replacing 'shape=False' to 'shape=True' in file multiagent-particle-envs/multiagent/scenarios/simple_tag.py/.

maddpg's People

Contributors

starry-sky6688 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

maddpg's Issues

关于done的处理

我在MADDPG.train函数计算q_target部分中并没有对done为True或False进行处理呢,replay_buffer中也没有存储done的值,这样会不会有些不妥呢?

Package not found

The multiagent package is not found in utils.py,
def make_env(args):
from multiagent.environment import MultiAgentEnv
import multiagent.scenarios as scenarios

Is anything I got wrong?

simple_spread场景

作者你好!我在simple_tag上运行算法有效果,但是在simple_spread上运行能收敛,但是学出的策略非常差,不知道你有没有在这个场景上做过测试?(我跑过openai的官方代码,是能训练出效果的,但是那边是tensorflow实现的,害怕后续改进的时候调试比较麻烦)

关于Actor参数更新的问题

我看到MADDPG中在更新actor是用的是
微信截图_20220328214211
而Critic网络中计算只是把状态和动作拼接在一起:
微信截图_20220328214223
可按照论文伪代码中写的好像是乘?
微信截图_20220328214655
这块我还不是太理解,希望您能解答。感谢

world_comm问题

请问 我在运行simple_tag的时候是没有问题的 但是在运行simple_world_comm的时候报错 AttributeError: 'MultiDiscrete' object has no attribute 'n' 有什么好的解决方法吗 谢谢您

其他两个球学不到正确的策略

我将ANN替换成了SNN,接口都和您的代码保持一致,但是回报率一直稳定在600左右,最后发现是只有一个球可以学到正确的策略,其他两个球都在原地打转,可以问一下是什么原因导致的吗

关于参数问题

作者您好,我看您的代码在AC网络定义的时候,定义了max_action,并将其赋值为high_action,我看其他文件中high_action的值为1,那么AC定义文件中的max_action似乎没有改变动作或者输出值。想请教一下为什么要定义一个max_action?

关于增加agent数量后算法结构的调整

当把agent数量增大到10 以上后, critic 网络的输入量过大, 学习效率低, 是否应该对critic网络的结构进行修改(比如增大隐藏层神经元的数量或者增大层数), 望赐教!

关于反方的奖励问题

您好,我看到您代码中反方的运动是随机的,在实际运行代码的时候反方的运动一般是在原地抖动,被正方agent撞击之后才会发生较大的轨迹变化,算是裹挟着走。
那么,能否给反方加上一个DDPG网络让反方也能够选择对自己奖励高的运动呢,还有就是如果加上之后,应该如何进行双方网络的优化训练呢?是设置我方和反方两个reward值分别进行训练吗?
如果加上之后,展示结果的reward随episode变化的函数需要分正方和反方进行展示吗?
谢谢!

关于GPU版本以及scenario问题

1.请问有GPU版本实现?
2.这个代码只是针对simple_tag这个场景吗?
3.num_adversaries是不是只在simple_tag中有意义,在别的场景下,比如simple_spread下需要设置为0?

请问为什么在reference环境无法使用

非常感谢你的代码,这里有一个小问题,这个代码我在其他环境可以运行,但是在simple_reference还有simple_world_comm环境中报错,提示是‘multidiscrete’没有 n这个属性,请问您有解决的方法嘛?

你好,关于模型存储问题

maddpg文件中save_model函数存储名称和存储次数相关,但是init函数加载模型和存储次数无关,请问这个怎么理解呢?

AttributeError: 'MultiDiscrete' object has no attribute 'n'

The job is wonderful! But when I want to run "python main.py --scenario-name=simple_world_comm --evaluate-episodes=10",it show me an error like this, it seems that it can't deal with the problem with more than two kinds of agents.Is there anything I can do to solve the problem? Thank you!
AttributeError

agent.silent = false通信的问题

设置为agent.slient = false的时候,出现AttributeError: 'MultiDiscrete' object has no attribute 'n'问题
改回agent.slient = true时就可以顺利运行
请问出现这个问题是跟maddpg的结构有关吗

关于参数共享机制的使用

大佬你好,我想请问一下,如果想将参数共享机制运用到MADDPG中,那么是只需要创建一个Actor网络和一个Critic网络呢?还是应该创建一个Actor网络和n个Critic网络呢(n为智能体数量)?

学习曲线先好后坏

对于navigation task, 训练3个agent 去合作趋近每个目标,

plt

最后得到的训练曲线是上升后下降, 不知道是什么原因, 是因为训练的episodes不足吗?
对于cooperative的reward function 一般都是采用shared同一个reward的, 对于这个reward的设计有什么要求吗?举个例子, 对于navigation task, 它的reward是对于每个landmark 求最小的agent到他的距离, 为什么不能是 对于每个agent,求到最小的landmark的距离作为reward 了?

麻烦大佬赐教

关于训练时间

请问你给的model下的agent参数是在图片横坐标多少episode后得到的模型呢?最终reward可以达到多少?
是将所有time-steps = 2000000,走完后得到的模型吗?还是说通过加载每次的time-steps完成后的模型,然后多次运行了程序获得的最终模型呢?

关于对手策略的问题

作者您好,我看到代码中应该是只对我方智能体设定了MADDPG策略,对手似乎并没有设置策略,那请问这样的话,对手是随机走动吗?

反派的策略

请问在您的代码基础上 可以为反方添加DDPG的学习策略吗 正方使用maddpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.