replicable-marl / marllib Goto Github PK

View Code? Open in Web Editor NEW

847.0 847.0 138.0 115.59 MB

One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)

Home Page: https://marllib.readthedocs.io

License: MIT License

Python 92.58% CMake 0.03% Shell 0.08% C++ 6.50% C 0.46% Dockerfile 0.09% Jupyter Notebook 0.24% Jsonnet 0.01%

deep-reinforcement-learning multi-agent-reinforcement-learning pytorch ray rllib

marllib's People

Contributors

Stargazers

Watchers

Forkers

pku-marl xuehaipan xflxfl1992 tianlin0509 xinyingking theohhhu ai-awesome-repos benjamesbabala haochihlin rxma1805 billfield web-sb davidhefan wangqingjie betty-study aaronzhangl jackygood xhtshr mate-huaboy y0203j liyaangy reflect0 smilejasmine zyh1994 bakerjc-bgner sihongho wenhaoma-uts mzhao98 xtbj zerlinwang goingmyway yjzx121 hyp-hjk xhf119 xx529 krasjet-yu superwbb007 singh-jayant luyuru605 llt1 ethanlingo junjunjunhj hanyuyingfd47 loganzhao1997 morphlng wangning1996chris luershuai david-man 3urn1ng parisfal drzt zlj123-max reler-rl wwongkamjan luorq3 lllsy00 cor3bit cby-pku shengaopan hippozhibos uglyghost hideask xxdm dqhieuu anle2017 arnaudgardille smarts027 followfates goethedinger janrope fayelhassan nonejou072 jamesliu hccz95 anamtahir7 zhang-xiaoshi chuangzhang1999 zc19950602 mrz1999 raykr muerterauda citbrains brunchtea astonisinghsr amertzani lsb829 beaulolve zhyxnjd yongzhengcui saradaamruthasai rw-jhyt xiaojun-chang zhihuilics lognam-huang chrish-404 kwangki-kim lzqw godofgeek jin58857 erlebnisw

marllib's Issues

UnsupportedSpaceException and TuneError

Hi, I saw the previous issue：TuneError，and used the new APIs，the bug still persists

I run the code below：

from marllib import marl
env = marl.make_env(environment_name="mpe", map_name="simple_spread")
iddpg = marl.algos.iddpg(hyperparam_source="mpe")
model = marl.build_model(env, iddpg, {"core_arch": "mlp", "encode_layer": "128-256"})
iddpg.fit(env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group")

the error as follow:

(pid=125495) File "/home/hjl/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/ddpg/ddpg_tf_policy.py", line 436, in validate_spaces
(pid=125495) raise UnsupportedSpaceException(
(pid=125495) ray.rllib.utils.error.UnsupportedSpaceException: Action space (Discrete(5)) of <ray.rllib.policy.policy_template.IDDPGTorchPolicy object at 0x7f62a9d889d0> is not supported for DDPG.
(IDDPGTrainer pid=125496)
Traceback (most recent call last):
File "/home/hjl/桌面/代码测试/main.py", line 9, in
iddpg.fit(env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group")
File "/home/hjl/anaconda3/envs/marllib/lib/python3.8/site-packages/MARLlib-master/marllib/marl/init.py", line 309, in fit
run_il(self.config_dict, env_instance, model_class, stop=stop)
File "/home/hjl/anaconda3/envs/marllib/lib/python3.8/site-packages/MARLlib-master/marllib/marl/algos/run_il.py", line 196, in run_il
results = POlICY_REGISTRY[exp_info["algorithm"]](model, exp_info, run_config, env_info, stop_config,
File "/home/hjl/anaconda3/envs/marllib/lib/python3.8/site-packages/MARLlib-master/marllib/marl/algos/scripts/iddpg.py", line 109, in run_iddpg
results = tune.run(IDDPGTrainer,
File "/home/hjl/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/tune.py", line 624, in run
raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [IDDPGTrainer_mpe_simple_spread_cb637_00000])
(RolloutWorker pid=125495)

Could anyone help me？
Thanks！！！

Documentation about the hyperparameters

One comments shown in marl/algos/hyperparams/finetuned/mpe/maddpg.yaml suggests

# Detailed explanation for each hyper parameter can be found in ray/rllib/agents/ddpg/ddpg.py

However, it looks like ray just updated their documentations. There are no detailed explanations in ray/rllib/agents/ddpg/ddpg.py.

how to render trained model

Hello.
Trained result confirmed that the log was saved.
Can I render trained model using this log?
How can I do that?
(I want to recall the saved results and reproduce them.)

Petting zoo Sisl

Can you please provide support for petting zoo Sisl environments like waterworld, multi walker etc

Thanks

[Solved][Bug]executing load_and_render_model.py

I'm new to MARLlib and am currently in the process of understanding all the great things it can do :)
Unfortunately, when executing python load_and_render_model.py from the examples directory, I get the following error:

2023-06-13 16:57:15,259 ERROR trial_runner.py:1124 -- Trial MAPPOTrainer_mpe_simple_spread_95240_00000: Error processing restore. Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 1117, in _process_trial_restore self.trial_executor.fetch_result(trial) File "/opt/conda/lib/python3.9/site-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/opt/conda/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/ray/worker.py", line 1625, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(TypeError): ray::MAPPOTrainer.restore_from_object() (pid=144888, repr=MAPPOTrainer) File "/opt/conda/lib/python3.9/site-packages/ray/tune/trainable.py", line 433, in restore_from_object self.restore(checkpoint_path) File "/opt/conda/lib/python3.9/site-packages/ray/tune/trainable.py", line 411, in restore self.load_checkpoint(checkpoint_path) File "/opt/conda/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 830, in load_checkpoint self.__setstate__(extra_data) File "/opt/conda/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py", line 289, in __setstate__ Trainer.__setstate__(self, state) File "/opt/conda/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1813, in __setstate__ self.workers.local_worker().restore(state["worker"]) File "/opt/conda/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1274, in restore objs = pickle.loads(objs) TypeError: an integer is required (got type bytes)

I'd appreciate any pointer to what is maybe going wrong. Thank you !

Mamujoco v4

Currently, the supported version of mujoco is v2. It doesn't have many functionalities available in v4. Can you please update mujoco to version v4.

Thank you.

TuneError

whene I run the code belwo:

python3 marl/main.py --algo_config=qmix [--finetuned] --env_config=smac with env_args.map_name=3m

I got this error:

(pid=341594) [2023-03-20 15:46:49,756 E 341594 341920] raylet_client.cc:159: IOError: Broken pipe [RayletClient] Failed to disconnect from raylet.
Traceback (most recent call last):
File "marl/main.py", line 53, in
run_vd(config_dict)
File "/media/user/APPS/AI Safety/MultiAgents RL/Algorithms/MARLlib/marl/algos/run_vd.py", line 218, in run_vd
results = POlICY_REGISTRY[config_dict["algorithm"]](config_dict, common_config, env_info_dict, stop)
File "/media/user/APPS/AI Safety/MultiAgents RL/Algorithms/MARLlib/marl/algos/scripts/vdn_qmix_iql.py", line 78, in run_joint_q
results = tune.run(Trainer,
File "/home/user/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/tune.py", line 624, in run
raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [QMIX_grouped_smac_3m_43fcd_00000])
(pid=341594) CloseHandler: 127.0.0.1:51406 disconnected

could any one help me with that?

im using
ubuntu: 20.04
python: 3.8.16 (conda)
torch: 1.13.1+cu117
ray: 1.8.0

PettingZoo version and missing dependence.

To replicate the result of your work in the MPE environment, the following packages should be installed (and are not mentioned in your documentation)

pip install gym==0.21.0
pip install pettingzoo==1.21.0
pip install supersuit ==3.3.0
pip install icecream

From progress.csv to result.csv

Hello,
maybe I've read over this/its basic knowledge but while looking at your example results I noticed how different the csv looks.
If I did understand correctly (and my small training worked) I "only" get a progress.csv and a result.json and the progress.csv has a lot of information which is really hard to read. I wanted to ask how did you manage to do that? What data did you take out of the progress.csv?

Thank you and sorry if its a inconvenient

ValueError: The parameter loc has invalid values

In localmode, the code will not report an error, but when localmode=False, the following error will be reported every time the 36th iteration is reached:
Failure # 1 (occurred at 2023-05-09_21-08-13)
Traceback (most recent call last):
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\tune\trial_runner.py", line 890, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\tune\ray_trial_executor.py", line 788, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): �[36mray::VDA2CTrainer.train()�[39m (pid=23324, ip=127.0.0.1, repr=VDA2CTrainer)
File "E:\Linghao\MARLlib-sy_dev_0\marllib\marl\algos\core\VD\vda2c.py", line 65, in value_mix_actor_critic_loss
dist = dist_class(logits, model)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\models\torch\torch_action_dist.py", line 186, in init
self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\torch\distributions\normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\torch\distributions\distribution.py", line 53, in init
raise ValueError("The parameter {} has invalid values".format(param))
ValueError: The parameter loc has invalid values

The above exception was the direct cause of the following exception:

�[36mray::VDA2CTrainer.train()�[39m (pid=23324, ip=127.0.0.1, repr=VDA2CTrainer)
File "python\ray_raylet.pyx", line 558, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 596, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 565, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 569, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 519, in ray._raylet.execute_task.function_executor
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray_private\function_manager.py", line 576, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\agents\trainer.py", line 682, in train
raise e
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\agents\trainer.py", line 668, in train
result = Trainable.train(self)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\tune\trainable.py", line 283, in train
result = self.step()
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\agents\trainer_template.py", line 206, in step
step_results = next(self.train_exec_impl)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\iter.py", line 756, in next
return next(self.built_iterator)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
for item in it:
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
for item in it:
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\util\iter.py", line 791, in apply_foreach
result = fn(item)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\execution\train_ops.py", line 230, in call
results = policy.learn_on_loaded_batch(
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\policy\torch_policy.py", line 632, in learn_on_loaded_batch
return self.learn_on_batch(batch)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
return func(self, *a, **k)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\policy\torch_policy.py", line 529, in learn_on_batch
grads, fetches = self.compute_gradients(postprocessed_batch)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\policy\policy_template.py", line 336, in compute_gradients
return parent_cls.compute_gradients(self, batch)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
return func(self, *a, **k)
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\policy\torch_policy.py", line 709, in compute_gradients
tower_outputs = self._multi_gpu_parallel_grad_calc(
File "C:\Users\DELL\miniconda3\envs\marllib4\lib\site-packages\ray\rllib\policy\torch_policy.py", line 1083, in _multi_gpu_parallel_grad_calc
raise last_result[0] from last_result[1]
ValueError: The parameter loc has invalid values
In tower 0 on device cpu

There should be no conflicting packages at the moment

Updating of critic model in COMA etc

Hi.

I am reading through MARLlib's implementation for a related academic project. In looking at the COMA model implementation, I see functions to update the critic (and actor) models by calling backward() and step() on the gradients. However, I cannot see how and where these are ever called, so it's not clear to me how you are getting RLLib to update the critic weights. Is this done implicitly because cc_rnn subclasses both nn.Module and TorchModel through the parent base_rnn?

Thanks for producing this library as well. It's a mammoth effort :)

Regarding inferencing the learnt policy

Hi,
We have created our custom environment for and wrapped it in a gym class. After training using MAPPO, we got the .pkl files. Can you elaborate upon how to inference the learned policy ?
We already have a visualization of the env using pygame and just want to load the learned policies and see them play.

Thanks in advance.

Enhancing the model architecture customization?

Hi! We are trying to apply MARLlib to fulfill our task. The observation from our custom environment can be a little too complicated for a simple MLP/CNN encoder. We want to apply a pretrained model to improve the feature extraction.

Furthermore, the decision network, which is currently RNN based or MLP based, can only tune a little parameter in the config file. It would be great if we can directly use a self-designed torch model (and this can solve the problem of loading pretrained model), or at least release the full customization ability like in Ray.

Wondering if there is any plan to enhance these kinds of ability?

Upgrade Ray and Gym

It would be great to upgrade Ray to v2.5 and Gym to Gymnasium to ensure compatibilty

What version of MAgent is compatible with MARLlib?

According to the doc about Environments, it would be fine to just use pip install pettingzoo[magent]. However, the latest version of pettingzoo has moved MAgent to a dedicated project, so its unusable now.

I've compared the code in envs/base_env/magent.py with pettingzoo's previous version.

MARLlib/envs/base_env/magent.py

Lines 5 to 14 in e1ddcef

 from pettingzoo.magent import adversarial_pursuit_v3, battle_v3, battlefield_v3, combined_arms_v5, gather_v3, \ 

 tiger_deer_v3 

 REGISTRY = {} 

 REGISTRY["adversarial_pursuit"] = adversarial_pursuit_v3.parallel_env 

 REGISTRY["battle"] = battle_v3.parallel_env 

 REGISTRY["battlefield"] = battlefield_v3.parallel_env 

 REGISTRY["combined_arms"] = combined_arms_v5.parallel_env 

 REGISTRY["gather"] = gather_v3.parallel_env 

 REGISTRY["tiger_deer"] = tiger_deer_v3.parallel_env

It seems like we have to use version before 1.15.0 (e.g. 1.14.0). Yet, after pip install pettingzoo[magent]==1.14.0 and run a test, it reports problem as: "cannot import name 'aec_to_parallel' from 'pettingzoo.utils.conversions'"

I've tried newer version of MAgent and changed magent.py accordingly, but it will raise other problems.

So which version has been tested exactly?

MetaDrive env: "AttributeError: 'NoneType' object has no attribute 'reset'"

Hello,

I am trying to run an algorithm on a MetaDrive environment as follows:

python marl/main.py --algo_config=mappo --env_config=metadrive with env_args.map_name=Roundabout

I have installed the MetaDrive environment as per the documentation:

pip install metadrive-simulator==0.2.3

But I get the following error after training for some time:

2023-01-31 09:56:28,733 ERROR trial_runner.py:924 -- Trial MAPPOTrainer_metadrive_Roundabout_45631_00000: Error processing event.
Traceback (most recent call last):
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 890, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::MAPPOTrainer.train_buffered() (pid=20470, ip=192.168.0.98, repr=MAPPOTrainer)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 224, in train_buffered
    result = self.train()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 682, in train
    raise e
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 668, in train
    result = Trainable.train(self)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 283, in train
    result = self.step()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 240, in step
    evaluation_metrics = self.evaluate()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 958, in evaluate
    self.evaluation_workers.local_worker().sample()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 753, in sample
    batches = [self.input_reader.next()]
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 103, in next
    batches = [self.get_data()]
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 233, in get_data
    item = next(self._env_runner)
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 586, in _env_runner
    base_env.poll()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/env/base_env.py", line 422, in poll
    obs[i], rewards[i], dones[i], infos[i] = env_state.poll()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/env/base_env.py", line 478, in poll
    self.reset()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/env/base_env.py", line 528, in reset
    self.last_obs = self.env.reset()
  File "/home/oscar/msc/MARLlib/envs/base_env/metadrive.py", line 69, in reset
    original_obs = self.env.reset()
  File "/home/oscar/anaconda3/envs/marllib/lib/python3.8/site-packages/metadrive/envs/base_env.py", line 291, in reset
    self.engine.reset()
AttributeError: 'NoneType' object has no attribute 'reset'

Any guidance would be appreciated,
Thanks.

[Solved] colab tutorial error

It seems that your tutorial on Colab has some problems in the installation of environmental installation. What is your python version. This is not an urgent problem, but it affects the learning of the beginner for the project。(*￣︶￣)

Question about the patches

Hi, I'm working on this wonderful repo.
May I have your attention?
I am new to RLlib as well as MARLlib. I want to know why and how to use the 'patches'.
As shown in the pictures below.

Thanks for your help!

[Bug][Solved]opponent next action inference in centralized Q critic

Is the centralized_critic_q postprocessing only implemented for MADDPG algorithm?

MARLlib/marl/algos/utils/postprocessing.py

Lines 347 to 370 in a0fe513

 # grab the opponent next action manually 

 all_opponent_batch_next_action_ls = [] 

 for opp_index in range(opponent_agents_num): 

 opp_policy = opponent_batch_list[opp_index][0] 

 opp_batch = copy.deepcopy(opponent_batch[opp_index]) 

 input_dict = {} 

 input_dict["obs"] = {} 

 input_dict["obs"]["obs"] = opp_batch["new_obs"][:, 

 action_mask_dim: action_mask_dim + obs_dim] 

 seq_lens = opp_batch["seq_lens"] 

 state_ls = [] 

 start_point = 0 

 for seq_len in seq_lens: 

 state = convert_to_torch_tensor(opp_batch["state_out_0"][start_point], policy.device) 

 start_point += seq_len 

 state_ls.append(state) 

 state = [torch.stack(state_ls, 0)] 

 input_dict = convert_to_torch_tensor(input_dict, policy.device) 

 seq_lens = convert_to_torch_tensor(seq_lens, policy.device) 

 opp_next_action, _ = opp_policy.model.policy_model(input_dict, state, seq_lens) 

 opp_next_action = convert_to_numpy(opp_next_action) 

 all_opponent_batch_next_action_ls.append(opp_next_action) 

 sample_batch["next_opponent_actions"] = np.stack( 

 all_opponent_batch_next_action_ls, 1)

To my best knowledge for RLlib, the implementation above only works with strict conditions. It only works for DDPG / TD3 policies with both normalize_actions and clip_actions set to False.

policy.model doesn't have the attribute policy_model.
In the sample batch, SampleBatch.ACTIONS are the real actions taken while interacting with the environment. For example, for discrete action spaces, SampleBatch.ACTIONS are integers, not action logits.
Missing action logits to action distribution mapping.
Missing action unsquashing (unnormalize actions from [-1, +1] to [low, high]).

How to save SMAC replays

Hi !

Thanks a lot for the nice library that I am still discovering.
I am currently trying to save SMAC replays after having successfully trained a MAPPO algorithm on the "3m" map.
However, I can't figure out where the call to SMAC's "save_replay()" should be done when calling the render() method.
Could you help me, and maybe add this point to the documentation as it may be useful for other users ?

Thank you !

ERROR:AttributeError: 'MAA2CTrainer' object has no attribute '_local_ip'

It turns out wrong when I run the example:
python marl/main.py --algo_config=maa2c --env_config=mpe with env_args.map_name=simple_adversary

Here the log is:
/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/pettingzoo/utils/conversions.py:91: UserWarning: The observation_spaces dictionary is deprecated. Use the observation_space function instead.
warnings.warn(
/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/pettingzoo/utils/conversions.py:105: UserWarning: The action_spaces dictionary is deprecated. Use the action_space function instead.
warnings.warn(
use fc encoder
2022-11-10 10:48:46,928 WARNING sample.py:401 -- DeprecationWarning: wrapping <function run_cc.. at 0x7f4908387700> with tune.function() is no longer needed
2022-11-10 10:48:47,099 WARNING worker.py:496 -- ray.get_gpu_ids() will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
:task_name:bundle_reservation_check_func
:actor_name:MAA2CTrainer
2022-11-10 10:48:47,183 WARNING deprecation.py:38 -- DeprecationWarning: simple_optimizer has been deprecated. This will raise an error in the future!
2022-11-10 10:48:47,183 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2022-11-10 10:48:47,194 E 10507 10507] core_worker.cc:1561: Pushed Error with JobID: 01000000 of type: task with message: ray::MAA2CTrainer.init() (pid=10507, ip=10.31.217.80, repr=MAA2CTrainer)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 137, in init
Trainer.init(self, config, env, logger_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 623, in init
super().init(config, logger_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 107, in init
self.setup(copy.deepcopy(self.config))
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 147, in setup
super().setup(config)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 776, in setup
self._init(self.config, self.env_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 171, in _init
self.workers = self._make_workers(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 858, in _make_workers
return WorkerSet(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 110, in init
self._local_worker = self._make_worker(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 406, in _make_worker
worker = cls(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 584, in init
self._build_policy_map(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1384, in build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 143, in create_policy
self[policy_id] = class(observation_space, action_space,
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 241, in init
dist_class, logit_dim = ModelCatalog.get_action_dist(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 287, in get_action_dist
raise NotImplementedError("Unsupported args: {} {}".format(
NotImplementedError: Unsupported args: Discrete(5) None at time: 1.66805e+09
2022-11-10 10:48:47,195 ERROR actor.py:746 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::MAA2CTrainer.init() (pid=10507, ip=10.31.217.80)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 137, in init
Trainer.init(self, config, env, logger_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 623, in init
super().init(config, logger_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 107, in init
self.setup(copy.deepcopy(self.config))
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 147, in setup
super().setup(config)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 776, in setup
self._init(self.config, self.env_creator)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 171, in _init
self.workers = self._make_workers(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 858, in _make_workers
return WorkerSet(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 110, in init
self._local_worker = self._make_worker(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 406, in _make_worker
worker = cls(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 584, in init
self._build_policy_map(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1384, in build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 143, in create_policy
self[policy_id] = class(observation_space, action_space,
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 241, in init
dist_class, logit_dim = ModelCatalog.get_action_dist(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 287, in get_action_dist
raise NotImplementedError("Unsupported args: {} {}".format(
NotImplementedError: Unsupported args: Discrete(5) None
[2022-11-10 10:48:47,197 E 10507 10507] core_worker.cc:1561: Pushed Error with JobID: 01000000 of type: task with message: ray::MAA2CTrainer.get_auto_filled_metrics()::Exiting (pid=10507, ip=10.31.217.80, repr=MAA2CTrainer)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 179, in get_auto_filled_metrics
NODE_IP: self._local_ip,
AttributeError: 'MAA2CTrainer' object has no attribute '_local_ip' at time: 1.66805e+09
[2022-11-10 10:48:47,701 E 10507 10507] core_worker.cc:1561: Pushed Error with JobID: 01000000 of type: task with message: ray::MAA2CTrainer.train_buffered()::Exiting (pid=10507, ip=10.31.217.80, repr=MAA2CTrainer)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 224, in train_buffered
result = self.train()
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 682, in train
raise e
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 668, in train
result = Trainable.train(self)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 283, in train
result = self.step()
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 206, in step
step_results = next(self.train_exec_impl)
AttributeError: 'MAA2CTrainer' object has no attribute 'train_exec_impl' at time: 1.66805e+09
Traceback (most recent call last):
File "marl/main.py", line 42, in
run_cc(config_dict)
File "/home/zyy/Documents/rl/MARLlib/marl/algos/run_cc.py", line 182, in run_cc
results = POlICY_REGISTRY[config_dict["algorithm"]](config_dict, common_config, env_info_dict, stop)
File "/home/zyy/Documents/rl/MARLlib/marl/algos/scripts/maa2c.py", line 45, in run_maa2c
results = tune.run(MAA2CTrainer,
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/tune.py", line 603, in run
_report_progress(runner, progress_reporter)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/tune.py", line 68, in _report_progress
reporter.report(trials, done, sched_debug_str, executor_debug_str)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/progress_reporter.py", line 520, in report
print(self._progress_str(trials, done, *sys_info))
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/progress_reporter.py", line 279, in _progress_str
user_metrics = self._infer_user_metrics(trials, self._infer_limit)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/progress_reporter.py", line 325, in _infer_user_metrics
if not t.last_result:
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial.py", line 433, in last_result
self._get_default_result_or_future()
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial.py", line 409, in _get_default_result_or_future
self._default_result_or_future = ray.get(
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::MAA2CTrainer.get_auto_filled_metrics()::Exiting (pid=10507, ip=10.31.217.80, repr=MAA2CTrainer)
File "/home/zyy/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 179, in get_auto_filled_metrics
NODE_IP: self._local_ip,
AttributeError: 'MAA2CTrainer' object has no attribute '_local_ip

Does this lib support running jobs on ray cluster?

Hi, does this lib support running jobs on ray cluster, e.g., ray k8s cluster?

Trial Pending

I am trying to run MADDPG in mpe(with simple_adversary). However, the algorithm stucks with a log shows the trial is pending.

I saw a similar issue here: ray-project/ray#16425. It says that this is due to the resource allocation.

I am using a 16 core cpu and a GTX 1650s gpu. So I tried to set ray.yaml file as follows:

num_workers: 2
num_gpus: 1
num_cpus_per_worker: 8
num_gpus_per_worker: 0.5

I also tried several different options such as:

num_workers: 2
num_gpus: 1
num_cpus_per_worker: 1
num_gpus_per_worker: 0.3

However, no matter how I alter the setting. The algorithm still stucks with trial pending.

Heterogeneous policy?

Can the configuration files be changed to make each agent uses different algorithms during training? For example, one agent uses the IPPO algorithm while another agent uses IQL algorithm.

VMAS integration

Hello,

We have recently developed a vectorised version of MPE with more environments and robotics scenarios.

https://github.com/proroklab/VectorizedMultiAgentSimulator

It is by default compatible with the VectorEnv RLLib interface.
Would this work straight away in your framework? Are you interested in adding it to the list of supported envs?

Have a look at the project and let me know.

Best,

Matteo

Custom Environment

Hi, In the context of my research I made my own environment and I am using RLlib to solve it but still with not much sucess. I came across this project and find it amazing and have two questions:

Can the MARLlib algorithms solve a custom environment? from the documentation it seems that they are only available for specific environment
These algorithms are implemented following the RLlib api? Would the RLlib team have interest in integrating these in the project?

best and thank you

Integrating with VMAS vectorized simulator

Hello,

In my lab we have created a MARL simulator and benchmarking platform called VMAS:
https://github.com/proroklab/VectorizedMultiAgentSimulator.

It is a vectorized simulator using pytorch which contains all the Multiagent Particle Environments scenarios and an additional set of 12 multi-robot scenarios.

Have a look at the repo, it would be nice to make our environments available in your project and it should be pretty easy since we support the RLLib VectorEnv interface.

Is it a hard-fork of the Ray Repo?

Any plan to update the base Ray version in the future?

Great work! Is this compatible with latest RLLib?

Great job! I like this comprehensive benchmark

I just developed CoPO, a MARL algorithm explicitly modeling the coordination between self-interested agents, based on latest RLLib ray=2.2.0. If this repo is compatible I would like to contribute my code to enrich this project.

Can I use MARLlib in my custom environment?

Hello there, I'm really impressed with the work that your project has accomplished so far. However, I noticed that it seems to only support the 15 environments that are currently mentioned in the readme and guide. I'm interested in using your framework for my work and I'm wondering if it would be possible to implement support for additional or custom environments. Alternatively, would it be something that you plan to include in your future work?

Hyperparameters in sy_dev vs main branch

Hi,

It seems a lot of the finetuned hyperparameters from the main branch are missing in the sy_dev branch. Are the hyperparameters from the main branch still valid in the new sy_dev branch as well?

Thank you.

Problem on implementation of HAPPO

After testing HAPPO, I found that in happo_surrogate_loss, no other agents are considered for each self-agent. I wonder if there is any problem?

AttributeError: module 'gym.wrappers' has no attribute 'Monitor'

Got this error after installing MARLLib on a fresh conda environment 😥

The script I ran is the first example in README.md. I have also ran marllib/patch/add_patch.py before.

PS: It can be fixed by manually downgrading gym to <=0.21.0, but I'm not sure it doesn't break something else

Error: No matching distribution found for gym==1.21.0

Error described as belows:

(marllib) [[email protected] scripts]$ pip install gym==1.21.0
ERROR: Could not find a version that satisfies the requirement gym==1.21.0 (from versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10, 0.2.11, 0.2.12, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.8, 0.4.9, 0.4.10, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.5.7, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.8.0.dev0, 0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.5, 0.10.8, 0.10.9, 0.10.11, 0.11.0, 0.12.0, 0.12.1, 0.12.4, 0.12.5, 0.12.6, 0.13.0, 0.13.1, 0.14.0, 0.15.3, 0.15.4, 0.15.6, 0.15.7, 0.16.0, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.18.0, 0.18.3, 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 0.24.0, 0.24.1, 0.25.0, 0.25.1, 0.25.2, 0.26.0, 0.26.1, 0.26.2)
ERROR: No matching distribution found for gym==1.21.0

Do you mean 0.21.0?

Example Policy is very poor

Hi there, I ran the example given in the ReadMe, and the model basically doesn't learn at all? At time step 0 the reward was -116.198 with episode_reward_max -69 and min -191.415, by the end of the training (313 iterations, ts 1001600) the reward min is less (-206.354, the reward max is only marginally higher -60.4847 and the reward is barely less -105.824.

Is this an expected result? I am using the model exactly as provided in the ReadMe file.

How I can run MARLlib in the Jupyter ?

plz, guide me about this.
thanks.

Backward compatibility

Hi! I'm glad to see the latest upgrade of MARLlib with update to date document. But I'm a bit confused about the relationship of these Api-based usage with previous console-based usage. So, I want to make sure something:

I don't see the main.py anymore, does this mean that the console-based usage is completely deprecated?
Besides the change of usage, is there any algorithm related improvement? (Is there a detailed version info?)
What is the relationship between previously required 4 config file and the new Api? Also, it seems like there are some new configurations, for example:

# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "mlp", "encode_layer": "128-256"})

# start training
mappo.fit(env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group")

Is there a full document of what exactly can be configured? Or should I just refer to Ray's document?

Regarding inference of the learned policy after training

Hello,
I tried the method mentioned in one of the issues for inferencing but I am running into issues with loading the environment with the config files saved after training . I did the training of a custom environment with MAPPO and saved the checkpoints. Any help on this would be appreciated?

Documentation outdated and installation irreproducible

There is large mismatch between the Readme and the docs. The docs seem to be for an older version of the project? E.g., the example training script from the docs does not work anymore (I believe it has been replaced by the main.py file so that the code can be run from the terminal?).

Also, what version of PettingZoo are you using? I am trying to run the MAgent examples, but by now there is an MAgent2 library which has taken over the PettingZoo[magent] repo. I have tried to find the older version of PettingZoo that matches your code to run the main.py file, but I cannot find it, and I keep getting errors because of this. Can you provide it, please, and also update your docs/readme asap? This is important for the long-term health of your project.

Thank you.

Regarding addition of New algorithm and new environment

Hi
Could you please add the following into MARLlib

new algorithm DGN https://github.com/PKU-RL/DGN
New environment routing

got some problem when rendering overcooked env! needing help, thx

how can I use metadrive environment?

I want to use meta drive environment.
but It shows the same error: "No module named 'metadrive.envs'"
I'm using

gym 0.20.0
metadrive 1.4.9.1
metadrive-simulator 0.2.3 (as written in the document)

Can you give me some help? Thank you.

Issues with MARLlib

Hello,

I am trying to run marllib and am running into issues. After following the installation for the PettingZoo MPE environment i am running into the follwing error:

ray.tune.error.TuneError: ('Trials did not complete', [IPPOTrainer_mpe_simple_adversary_227a8_00000])

This is caused by the following error:

(pid=8044)   File "path/to/.virtualenvs/venv/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 287, in get_action_dist
(pid=8044)     raise NotImplementedError("Unsupported args: {} {}".format(
(pid=8044) NotImplementedError: Unsupported args: Box(0.0, 1.0, (5,), float32) None

I am running it with the following config:

{
   "local_mode":false,
   "algorithm":"ppo",
   "env":"mpe",
   "env_args":{
      "map_name":"simple_adversary",
      "continuous_actions":true,
      "max_cycles":25
   },
   "algo_args":{
      "use_gae":true,
      "lambda":1.0,
      "kl_coeff":0.2,
      "batch_episode":10,
      "num_sgd_iter":5,
      "vf_loss_coeff":1.0,
      "lr":0.0005,
      "entropy_coeff":0.01,
      "clip_param":0.3,
      "vf_clip_param":10.0,
      "batch_mode":"complete_episodes"
   },
   "model_arch_args":{
      
   },
   "share_policy":"group",
   "evaluation_interval":10,
   "framework":"torch",
   "num_workers":0,
   "num_gpus":0,
   "num_cpus_per_worker":1,
   "num_gpus_per_worker":0,
   "stop_iters":9999999,
   "stop_timesteps":2000000,
   "stop_reward":999999,
   "seed":123,
   "mask_flag":false,
   "global_state_flag":false,
   "opp_action_in_cc":true
}

I am using WSL2 with a virtual environment inside. This is my pip list:

Package              Version    Editable project location
-------------------- ---------- -------------------------------
aiosignal            1.3.1
asttokens            2.2.1
async-timeout        4.0.2
attrs                22.2.0
certifi              2022.12.7
charset-normalizer   3.0.1
click                8.1.3
cloudpickle          2.2.1
colorama             0.4.6
contourpy            1.0.7
cycler               0.11.0
distlib              0.3.6
dm-tree              0.1.8
executing            1.2.0
filelock             3.9.0
fonttools            4.38.0
frozenlist           1.3.3
grpcio               1.51.3
gym                  0.21.0
gym-notices          0.0.8
gymnasium            0.27.1
gymnasium-notices    0.0.1
icecream             2.1.3
idna                 3.4
imageio              2.26.0
importlib-metadata   4.13.0
importlib-resources  5.12.0
jax-jumpy            0.2.0
jsonschema           4.17.3
kiwisolver           1.4.4
lz4                  4.3.2
marllib              1.0.0      /path/to/MARLlib
matplotlib           3.7.0
msgpack              1.0.4
networkx             3.0
numpy                1.24.2
packaging            23.0
pandas               1.5.3
PettingZoo           1.22.3
Pillow               9.4.0
pip                  22.3.1
pkgutil_resolve_name 1.3.10
platformdirs         3.0.0
protobuf             3.20.3
pygame               2.1.3.dev8
Pygments             2.14.0
pyparsing            3.0.9
pyrsistent           0.19.3
python-dateutil      2.8.2
pytz                 2022.7.1
PyWavelets           1.4.1
PyYAML               6.0
ray                  1.8.0
redis                4.5.1
requests             2.28.2
scikit-image         0.19.3
scipy                1.10.1
setuptools           65.5.1
six                  1.16.0
SuperSuit            3.7.1
tabulate             0.9.0
tensorboardX         2.6
tifffile             2023.2.3
torch                1.9.1
typing_extensions    4.5.0
urllib3              1.26.14
virtualenv           20.19.0
wheel                0.38.4
zipp                 3.15.0

What could cause this error? It seems to be incompatibility issues with gym.

Thanks in advance!

ValueError: illegal action space

When I use qmix to the mpe environment like this
python marl/main.py --algo_config=qmix --env_config=mpe with env_args.map_name=simple_spread

it will produce a problem below
ValueError: illegal action space

I checked the value of space_act, the value of it is Discrete(5), it is the instance of Discrete, but the result of isinstance(space_act, Discrete) is False.

Why does this create such a problem?Looking for your help.

problem with action space

Hi,

Brilliant repo on MARL benchmarks!

I encountered issues with action space when run training on MPE maps. On mpe_cooperative I run python marl/main.py --algo_config="a2c" --finetuned --env_config="mpe" with env_args.map_name="simple_spread" and get the following error (
I followed the installation instruction at Python 3.8.0 and gym==0.21.0.):

Failure # 1 (occurred at 2022-12-09_06-28-12)
Traceback (most recent call last):
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 890, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/worker.py", line 1627, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::IA2CTrainer.__init__()�[39m (pid=25756, ip=10.103.0.40)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 137, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 623, in __init__
    super().__init__(config, logger_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 107, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 147, in setup
    super().setup(config)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 776, in setup
    self._init(self.config, self.env_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 171, in _init
    self.workers = self._make_workers(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 858, in _make_workers
    return WorkerSet(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 110, in __init__
    self._local_worker = self._make_worker(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 406, in _make_worker
    worker = cls(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 539, in __init__
    policy_dict = _determine_spaces_for_multi_agent_dict(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1486, in _determine_spaces_for_multi_agent_dict
    raise ValueError(
ValueError: `action_space` not provided in PolicySpec for shared_policy and env does not have an action space OR no spaces received from other workers' env(s) OR no `action_space` specified in config!

In addition, I get the following error when running on mpe_mixed:

Failure # 1 (occurred at 2022-12-09_06-26-45)
Traceback (most recent call last):
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 890, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/worker.py", line 1627, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::IA2CTrainer.__init__()�[39m (pid=24584, ip=10.103.0.40)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 137, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 623, in __init__
    super().__init__(config, logger_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/tune/trainable.py", line 107, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 147, in setup
    super().setup(config)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 776, in setup
    self._init(self.config, self.env_creator)
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 171, in _init
    self.workers = self._make_workers(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 858, in _make_workers
    return WorkerSet(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 110, in __init__
    self._local_worker = self._make_worker(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 406, in _make_worker
    worker = cls(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 584, in __init__
    self._build_policy_map(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1384, in _build_policy_map
    self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 143, in create_policy
    self[policy_id] = class_(observation_space, action_space,
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 241, in __init__
    dist_class, logit_dim = ModelCatalog.get_action_dist(
  File "/home/yansong/anaconda3/envs/marllib/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 287, in get_action_dist
    raise NotImplementedError("Unsupported args: {} {}".format(
NotImplementedError: Unsupported args: Discrete(5) None

Any thoughts on these?

Great Thanks.

Regarding using trained policies

Hello,

We have created our custom environment for and wrapped it in a gym class. After training using MAPPO, we got checkpoint files including params and .pkl files.

We want to now use the trained policy to evaluate certain observation spaces that we feed the policy. How should we go about this?

Older Version of MuJoCo

Could we have the latest version's of MuJoCo. For example Half_Cheetah_v3 since this one has different properties to the previous versions.

Support for SMARTS environment

Hello,

Are there any plans for supporting the SMARTS platform in the foreseeable future?

Thanks.

Whether it can be deployed on Windows

Hi，Thank you very much for your work, which has inspired me a lot, but I still have some questions .
###first,I want to deploy this project on Windows, but there are some errors, The error at mul_manager = multiprocessing.manager () in marl\algos\utils\centralized_critic_hetero.py will look like the following:

d:\廖文华\code\MARLlib
Backend TkAgg is interactive backend. Turning interactive mode on.
d:\廖文华\code\MARLlib
Backend TkAgg is interactive backend. Turning interactive mode on.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\廖文华\code\MARLlib\marl\main.py", line 9, in <module>
    from marl.algos.run_il import run_il
  File "d:\廖文华\code\MARLlib\marl\algos\run_il.py", line 15, in <module>
    from marl.algos.scripts import POlICY_REGISTRY
  File "d:\廖文华\code\MARLlib\marl\algos\scripts\__init__.py", line 13, in <module>
    from marl.algos.scripts.happo import run_happo
  File "d:\廖文华\code\MARLlib\marl\algos\scripts\happo.py", line 4, in <module>
    from marl.algos.core.CC.happo import HAPPOTrainer
  File "d:\廖文华\code\MARLlib\marl\algos\core\CC\happo.py", line 24, in <module>
    from marl.algos.utils.centralized_critic_hetero import (
  File "d:\廖文华\code\MARLlib\marl\algos\utils\centralized_critic_hetero.py", line 16, in <module>
    mul_manager = multiprocessing.Manager()
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\context.py", line 57, in Manager
    m.start()
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\managers.py", line 579, in start
    self._process.start()
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\ProgramData\Anaconda3\envs\muti_agent\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I dont kown why and how to solve it,could you please give me some advice.

###second,could you please show your pip list or conda list?I want to see some lib's version in your project ,thanks!!!

Trained baseline models missing + some results missing?

Would it possible to provide the trained models (weights) as well?

Moreover, when looking at the results, I could not find the results for the MAgent Gather game, though it is mentioned in the docs and in the MAgent env file.

	from pettingzoo.magent import adversarial_pursuit_v3, battle_v3, battlefield_v3, combined_arms_v5, gather_v3, \
	tiger_deer_v3

	REGISTRY = {}
	REGISTRY["adversarial_pursuit"] = adversarial_pursuit_v3.parallel_env
	REGISTRY["battle"] = battle_v3.parallel_env
	REGISTRY["battlefield"] = battlefield_v3.parallel_env
	REGISTRY["combined_arms"] = combined_arms_v5.parallel_env
	REGISTRY["gather"] = gather_v3.parallel_env
	REGISTRY["tiger_deer"] = tiger_deer_v3.parallel_env

	# grab the opponent next action manually
	all_opponent_batch_next_action_ls = []
	for opp_index in range(opponent_agents_num):
	opp_policy = opponent_batch_list[opp_index][0]
	opp_batch = copy.deepcopy(opponent_batch[opp_index])
	input_dict = {}
	input_dict["obs"] = {}
	input_dict["obs"]["obs"] = opp_batch["new_obs"][:,
	action_mask_dim: action_mask_dim + obs_dim]
	seq_lens = opp_batch["seq_lens"]
	state_ls = []
	start_point = 0
	for seq_len in seq_lens:
	state = convert_to_torch_tensor(opp_batch["state_out_0"][start_point], policy.device)
	start_point += seq_len
	state_ls.append(state)
	state = [torch.stack(state_ls, 0)]
	input_dict = convert_to_torch_tensor(input_dict, policy.device)
	seq_lens = convert_to_torch_tensor(seq_lens, policy.device)
	opp_next_action, _ = opp_policy.model.policy_model(input_dict, state, seq_lens)
	opp_next_action = convert_to_numpy(opp_next_action)
	all_opponent_batch_next_action_ls.append(opp_next_action)
	sample_batch["next_opponent_actions"] = np.stack(
	all_opponent_batch_next_action_ls, 1)

replicable-marl / marllib Goto Github PK

marllib's People

Contributors

Stargazers

Watchers

Forkers

marllib's Issues

Recommend Projects

Recommend Topics

Recommend Org