xuehy / pytorch-maddpg Goto Github PK

View Code? Open in Web Editor NEW

607.0 12.0 122.0 739 KB

A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)

Python 100.00%

pytorch-rl multiagent-reinforcement-learning

pytorch-maddpg's Introduction

An implementation of MADDPG

1. Introduction

This is a pytorch implementation of multi-agent deep deterministic policy gradient algorithm.

The experimental environment is a modified version of Waterworld based on MADRL.

2. Environment

The main features (different from MADRL) of the modified Waterworld environment are:

evaders and poisons now bounce at the wall obeying physical rules
sizes of the evaders, pursuers and poisons are now the same so that random actions will lead to average rewards around 0.
need exactly n_coop agents to catch food.

3. Dependency

pytorch
visdom
python==3.6.1 (recommend using the anaconda/miniconda)
if you need to render the environments, opencv is required

4. Install

Install MADRL.
Replace the madrl_environments/pursuit directory with the one in this repo.
python main.py

if scene rendering is enabled, recommend to install opencv through conda-forge.

5. Results

two agents, cooperation = 2

The two agents need to cooperate to achieve the food for reward 10.

the average

one agent, cooperation = 1

6. TODO

reproduce the experiments in the paper with competitive environments.

pytorch-maddpg's People

Contributors

Stargazers

Watchers

Forkers

jeffersoncong coreyhahn williamd4112 yizhi-fang kelvinson chenglongchen amoliu gray-m litoeknee livey keniuniu cndota butterflyaichinese ml-lab patrick-fitzgerald andrealbh wujia0 victorleelk gumpfly wh-forker iit-lab dgiunchi lokitkhemka flybirp cuijie12358 seongwonleee hongxin001 chengchaonihao hyzcn wanghuimu zcakzhu junchenjin guoliangxun zhangyx96 pohanchi hatleon chloe4d nidao66 anselmiao newplan zhaoyangacc teenspirit-hao tiantiantian123 scorpjd liluoniuniu zhoushiyang12 adhipradhana ksggjx billysx fanscy chenkehan21 ovechou zachkeer xinzhangradar arm-comal gpeng2119 zhangtjtongxue amanda2024 agent-only mrreochen ecustboy leishiqi xsimba123 liy05577 51n84d aicools kailashg26 rainandwind1 yangqinzhu ancerhaides lunyuan51 geonhee-lee metavai ricardo-vv hsuth1996 jerrymakesjelly devin521314 timefly-1989 malsenwi qhfan yingyuan0414 stevenyuan666 zhili-zh ycl010203 protony-cyber wgxhihihi rootheng zhu-by zijiwang cheryyunl zheng547 cswangle tianyu-z niceboy120 koino1 yuantian013 war3gu yuezhentian xueliu8617112 ballball-yy

pytorch-maddpg's Issues

question of the way to update the actor

In my understanding, the actor does not have any loss. The chain rule is needed to update the actor, but why there is the loss for the actor in your code? Could you give me the hint why you update your actor in this way?

Running on Thread?

I haven't looked at the code. Is it possible to train the model with multiple threads?

About OrnsteinUhlenbeckProcess

Hello! I am studying maddpg via your code. Thanks to you
At that time, I have one question to you.
I didn't find the OrnsteinUhlenbeckProcess using main.py or maddpg.py.
Didn't you using this process?
Thanks.

ConnectionRefusedError

当我运行时，会出现
`------------------------------------------------------------
/home/clb/anaconda3/envs/tensorflow/bin/python /home/clb/桌面/pytorch-maddpg-master/main.py
Setting up a new session...
Exception in user code:

Traceback (most recent call last):
F`/home/clb/anaconda3/envs/tensorflow/bin/python /home/clb/桌面/pytorch-maddpg-master/main.py
Setting up a new session...
Exception in user code:ile "/home/clb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/urllib3-1.25.8-py3.6.egg/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/home/clb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/urllib3-1.25.8-py3.6.egg/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/home/clb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/urllib3-1.25.8-py3.6.egg/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused