Git Product home page Git Product logo

benchmarl's Introduction

BenchMARL

BenchMARL

tests codecov Documentation Status Python pypi version Downloads Discord Shield arXiv

python benchmarl/run.py algorithm=mappo task=vmas/balance

Examples Open In Colab Static Badge

Watch the talk on multi-agent simulation and learning in BenchMARL and TorchRL.

What is BenchMARL 🧐?

BenchMARL is a Multi-Agent Reinforcement Learning (MARL) training library created to enable reproducibility and benchmarking across different MARL algorithms and environments. Its mission is to present a standardized interface that allows easy integration of new algorithms and environments to provide a fair comparison with existing solutions. BenchMARL uses TorchRL as its backend, which grants it high performance and state-of-the-art implementations. It also uses hydra for flexible and modular configuration, and its data reporting is compatible with marl-eval for standardised and statistically strong evaluations.

BenchMARL core design tenets are:

  • Reproducibility through systematical grounding and standardization of configuration
  • Standardised and statistically-strong plotting and reporting
  • Experiments that are independent of the algorithm, environment, and model choices
  • Breadth over the MARL ecosystem
  • Easy implementation of new algorithms, environments, and models
  • Leveraging the know-how and infrastructure of TorchRL, without reinventing the wheel

Why would I BenchMARL πŸ€”?

Why would you BenchMARL, I see you ask. Well, you can BenchMARL to compare different algorithms, environments, models, to check how your new research compares to existing ones, or if you just want to approach the domain and want to easily take a picture of the landscape.

Table of contents

How to use

Notebooks

  • Open In Colab   Running BenchMARL experiments.

Install

Install TorchRL

You can install TorchRL from PyPi.

pip install torchrl

For more details, or for installing nightly versions, see the TorchRL installation guide.

Install BenchMARL

You can just install it from github

pip install benchmarl

Or also clone it locally to access the configs and scripts

git clone https://github.com/facebookresearch/BenchMARL.git
pip install -e BenchMARL

Install environments

All enviornment dependencies are optional in BenchMARL and can be installed separately.

VMAS
pip install vmas
PettingZoo
pip install "pettingzoo[all]"
MeltingPot
pip install dm-meltingpot
SMACv2

Follow the instructions on the environment repository.

Here is how we install it on linux.

Run

Experiments are launched with a default configuration that can be overridden in many ways. To learn how to customize and override configurations please refer to the configuring section.

Command line

To launch an experiment from the command line you can do

python benchmarl/run.py algorithm=mappo task=vmas/balance

Example

Thanks to hydra, you can run benchmarks as multi-runs like:

python benchmarl/run.py -m algorithm=mappo,qmix,masac task=vmas/balance,vmas/sampling seed=0,1

Example

The default implementation for hydra multi-runs is sequential, but parallel and slurm launchers are also available.

Script

You can also load and launch your experiments from within a script

 experiment = Experiment(
    task=VmasTask.BALANCE.get_from_yaml(),
    algorithm_config=MappoConfig.get_from_yaml(),
    model_config=MlpConfig.get_from_yaml(),
    critic_model_config=MlpConfig.get_from_yaml(),
    seed=0,
    config=ExperimentConfig.get_from_yaml(),
)
experiment.run()

Example

You can also run multiple experiments in a Benchmark.

benchmark = Benchmark(
    algorithm_configs=[
        MappoConfig.get_from_yaml(),
        QmixConfig.get_from_yaml(),
        MasacConfig.get_from_yaml(),
    ],
    tasks=[
        VmasTask.BALANCE.get_from_yaml(),
        VmasTask.SAMPLING.get_from_yaml(),
    ],
    seeds={0, 1},
    experiment_config=ExperimentConfig.get_from_yaml(),
    model_config=MlpConfig.get_from_yaml(),
    critic_model_config=MlpConfig.get_from_yaml(),
)
benchmark.run_sequential()

Example

Concept

The goal of BenchMARL is to bring different MARL environments and algorithms under the same interfaces to enable fair and reproducible comparison and benchmarking. BenchMARL is a full-pipline unified training library with the goal of enabling users to run any comparison they want across our algorithms and tasks in just one line of code. To achieve this, BenchMARL interconnects components from TorchRL, which provides an efficient and reliable backend.

The library has a default configuration for each of its components. While parts of this configuration are supposed to be changed (for example experiment configurations), other parts (such as tasks) should not be changed to allow for reproducibility. To aid in this, each version of BenchMARL is paired to a default configuration.

Let's now introduce each component in the library.

Experiment. An experiment is a training run in which an algorithm, a task, and a model are fixed. Experiments are configured by passing these values alongside a seed and the experiment hyperparameters. The experiment hyperparameters cover both on-policy and off-policy algorithms, discrete and continuous actions, and probabilistic and deterministic policies (as they are agnostic of the algorithm or task used). An experiment can be launched from the command line or from a script. See the run section for more information.

Benchmark. In the library we call benchmark a collection of experiments that can vary in tasks, algorithm, or model. A benchmark shares the same experiment configuration across all of its experiments. Benchmarks allow to compare different MARL components in a standardized way. A benchmark can be launched from the command line or from a script. See the run section for more information.

Algorithms. Algorithms are an ensemble of components (e.g., losss, replay buffer) which determine the training strategy. Here is a table with the currently implemented algorithms in BenchMARL.

Name On/Off policy Actor-critic Full-observability in critic Action compatibility Probabilistic actor
MAPPO On Yes Yes Continuous + Discrete Yes
IPPO On Yes No Continuous + Discrete Yes
MADDPG Off Yes Yes Continuous No
IDDPG Off Yes No Continuous No
MASAC Off Yes Yes Continuous + Discrete Yes
ISAC Off Yes No Continuous + Discrete Yes
QMIX Off No NA Discrete No
VDN Off No NA Discrete No
IQL Off No NA Discrete No

Tasks. Tasks are scenarios from a specific environment which constitute the MARL challenge to solve. They differ based on many aspects, here is a table with the current environments in BenchMARL

Environment Tasks Cooperation Global state Reward function Action space Vectorized
VMAS 27 Cooperative + Competitive No Shared + Independent + Global Continuous + Discrete Yes
SMACv2 15 Cooperative Yes Global Discrete No
MPE 8 Cooperative + Competitive Yes Shared + Independent Continuous + Discrete No
SISL 2 Cooperative No Shared Continuous No
MeltingPot 49 Cooperative + Competitive Yes Independent Discrete No

Note

BenchMARL uses the TorchRL MARL API for grouping agents. In competitive environments like MPE, for example, teams will be in different groups. Each group has its own loss, models, buffers, and so on. Parameter sharing options refer to sharing within the group. See the example on creating a custom algorithm for more info.

Models. Models are neural networks used to process data. They can be used as actors (policies) or, when requested, as critics. We provide a set of base models (layers) and a SequenceModel to concatenate different layers. All the models can be used with or without parameter sharing within an agent group. Here is a table of the models implemented in BenchMARL

Name Decentralized Centralized with local inputs Centralized with global input
MLP Yes Yes Yes
GNN Yes Yes No
CNN Yes Yes Yes
Deepsets Yes Yes Yes

And the ones that are work in progress

Name Decentralized Centralized with local inputs Centralized with global input
RNN (GRU and LSTM) Yes Yes Yes

Fine-tuned public benchmarks

Warning

This section is under a work in progress. We are constantly working on fine-tuning our experiments to enable our users to have access to state-of-the-art benchmarks. If you would like to collaborate in this effort, please reach out to us.

In the fine_tuned folder we are collecting some tested hyperparameters for specific environments to enable users to bootstrap their benchmarking. You can just run the scripts in this folder to automatically use the proposed hyperparameters.

We will tune benchmarks for you and publish the config and benchmarking plots on Wandb publicly

Currently available ones are:

  • VMAS: Conf Static Badge

In the following, we report a table of the results:

Environment

Sample efficiency curves (all tasks)

Performance profile

Aggregate scores

VMAS

Reporting and plotting

Reporting and plotting is compatible with marl-eval. If experiment.create_json=True (this is the default in the experiment config) a file named {experiment_name}.json will be created in the experiment output folder with the format of marl-eval. You can load and merge these files using the utils in eval_results to create beautiful plots of your benchmarks. No more struggling with matplotlib and latex!

Example

aggregate_scores sample_efficiancy performace_profile

Extending

One of the core tenets of BenchMARL is allowing users to leverage the existing algorithm and tasks implementations to benchmark their newly proposed solution.

For this reason we expose standard interfaces with simple abstract methods for algorithms, tasks and models. To introduce your solution in the library, you just need to implement the abstract methods exposed by these base classes which use objects from the TorchRL library.

Here is an example on how you can create a custom algorithm Example.

Here is an example on how you can create a custom task Example.

Here is an example on how you can create a custom model Example.

Configuring

As highlighted in the run section, the project can be configured either in the script itself or via hydra. We suggest to read the hydra documentation to get familiar with all its functionalities.

Each component in the project has a corresponding yaml configuration in the BenchMARL conf tree. Components' configurations are loaded from these files into python dataclasses that act as schemas for validation of parameter names and types. That way we keep the best of both words: separation of all configuration from code and strong typing for validation! You can also directly load and validate configuration yaml files without using hydra from a script by calling ComponentConfig.get_from_yaml().

Experiment

Experiment configurations are in benchmarl/conf/config.yaml. Running custom experiments is extremely simplified by the Hydra configurations. The default configuration for the library is contained in the benchmarl/conf folder.

When running an experiment you can override its hyperparameters like so

python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.lr=0.03 experiment.evaluation=true experiment.train_device="cpu"

Experiment hyperparameters are loaded from benchmarl/conf/experiment/base_experiment.yaml into a dataclass ExperimentConfig defining their domain. This makes it so that all and only the parameters expected are loaded with the right types. You can also directly load them from a script by calling ExperimentConfig.get_from_yaml().

Here is an example of overriding experiment hyperparameters from hydra Example or from a script Example.

Algorithm

You can override an algorithm configuration when launching BenchMARL.

python benchmarl/run.py task=vmas/balance algorithm=masac algorithm.num_qvalue_nets=3 algorithm.target_entropy=auto algorithm.share_param_critic=true

Available algorithms and their default configs can be found at benchmarl/conf/algorithm. They are loaded into a dataclass AlgorithmConfig, present for each algorithm, defining their domain. This makes it so that all and only the parameters expected are loaded with the right types. You can also directly load them from a script by calling YourAlgorithmConfig.get_from_yaml().

Here is an example of overriding algorithm hyperparameters from hydra Example or from a script Example.

Task

You can override a task configuration when launching BenchMARL. However this is not recommended for benchmarking as tasks should have fixed version and parameters for reproducibility.

python benchmarl/run.py task=vmas/balance algorithm=mappo task.n_agents=4

Available tasks and their default configs can be found at benchmarl/conf/task. They are loaded into a dataclass TaskConfig, defining their domain. Tasks are enumerations under the environment name. For example, VmasTask.NAVIGATION represents the navigation task in the VMAS simulator. This allows autocompletion and seeing all available tasks at once. You can also directly load them from a script by calling YourEnvTask.TASK_NAME.get_from_yaml().

Here is an example of overriding task hyperparameters from hydra Example or from a script Example.

Model

You can override the model configuration when launching BenchMARL. By default an MLP model will be loaded with the default config. You can change it like so:

python benchmarl/run.py task=vmas/balance algorithm=mappo model=layers/mlp model=layers/mlp model.layer_class="torch.nn.Linear" "model.num_cells=[32,32]" model.activation_class="torch.nn.ReLU"

Available models and their configs can be found at benchmarl/conf/model/layers. They are loaded into a dataclass ModelConfig, defining their domain. You can also directly load them from a script by calling YourModelConfig.get_from_yaml().

Here is an example of overriding model hyperparameters from hydra Example or from a script Example.

Sequence model

You can compose layers into a sequence model. Available layer names are in the benchmarl/conf/model/layers folder.

python benchmarl/run.py task=vmas/balance algorithm=mappo model=sequence "model.intermediate_sizes=[256]" "model/[email protected]=mlp" "model/[email protected]=mlp" "+model/[email protected]=mlp" "model.layers.l3.num_cells=[3]"

Add a layer with "+model/[email protected]=mlp".

Remove a layer with "~model.layers.l2".

Configure a layer with "model.layers.l1.num_cells=[3]".

Here is an example of creating a sequence model from hydra Example or from a script Example.

Features

BenchMARL has several features:

  • A test CI with integration and training test routines that are run for all simulators and algorithms
  • Integration in the official TorchRL ecosystem for dedicated support

Logging

BenchMARL is compatible with the TorchRL loggers. A list of logger names can be provided in the experiment config. Example of available options are: wandb, csv, mflow, tensorboard or any other option available in TorchRL. You can specify the loggers in the yaml config files or in the script arguments like so:

python benchmarl/run.py algorithm=mappo task=vmas/balance "experiment.loggers=[wandb]"

The wandb logger is fully compatible with experiment restoring and will automatically resume the run of the loaded experiment.

Checkpointing

Experiments can be checkpointed every experiment.checkpoint_interval collected frames. Experiments will use an output folder for logging and checkpointing which can be specified in experiment.save_folder. If this is left unspecified, the default will be the hydra output folder (if using hydra) or (otherwise) the current directory where the script is launched. The output folder will contain a folder for each experiment with the corresponding experiment name. Their checkpoints will be stored in a "checkpoints" folder within the experiment folder.

python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=3 experiment.on_policy_collected_frames_per_batch=100 experiment.checkpoint_interval=100

To load from a checkpoint, pass the absolute checkpoint file name to experiment.restore_file.

python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=6 experiment.on_policy_collected_frames_per_batch=100 experiment.restore_file="/hydra/experiment/folder/checkpoint/checkpoint_300.pt"

Example

Callbacks

Experiments optionally take a list of Callback which have several methods that you can implement to see what's going on during training such as on_batch_collected, on_train_end, and on_evaluation_end.

Example

Citing BenchMARL

If you use BenchMARL in your research please use the following BibTeX entry:

@article{bettini2023benchmarl,
      title={BenchMARL: Benchmarking Multi-Agent Reinforcement Learning},
      author={Matteo Bettini and Amanda Prorok and Vincent Moens},
      year={2023},
      journal={arXiv preprint arXiv:2312.01472},
}

License

BenchMARL is licensed under the MIT License. See LICENSE for details.

benchmarl's People

Contributors

kaleabtessera avatar matteobettini avatar mchoilab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benchmarl's Issues

Error Vmas Transport

python fine_tuned/vmas/vmas_run.py algorithm=mappo task=vmas/transport

Error executing job with overrides: ['algorithm=mappo', 'task=vmas/transport']
Traceback (most recent call last):
File "/hdd3/marl_new/BenchMARL/fine_tuned/vmas/vmas_run.py", line 31, in
hydra_experiment()
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/iitdpc/miniconda3/envs/marl_bench/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/hdd3/marl_new/BenchMARL/fine_tuned/vmas/vmas_run.py", line 25, in hydra_experiment
experiment: Experiment = load_experiment_from_hydra(cfg, task_name=task_name)
File "/hdd3/marl_new/BenchMARL/benchmarl/hydra_config.py", line 37, in load_experiment_from_hydra
return Experiment(
File "/hdd3/marl_new/BenchMARL/benchmarl/experiment/experiment.py", line 323, in init
self._setup()
File "/hdd3/marl_new/BenchMARL/benchmarl/experiment/experiment.py", line 344, in _setup
self._setup_collector()
File "/hdd3/marl_new/BenchMARL/benchmarl/experiment/experiment.py", line 447, in _setup_collector
self.collector = SyncDataCollector(
File "/hdd3/marl_new/BenchMARL/torchrl_custom/collectors/collectors.py", line 636, in init
self._tensordict = env.reset()
File "/hdd3/marl_new/BenchMARL/torchrl_custom/envs/common.py", line 1495, in reset
return self._reset_proc_data(tensordict, tensordict_reset)
File "/hdd3/marl_new/BenchMARL/torchrl_custom/envs/transforms/transforms.py", line 765, in _reset_proc_data
self._reset_check_done(tensordict, tensordict_reset)
File "/hdd3/marl_new/BenchMARL/torchrl_custom/envs/common.py", line 1547, in _reset_check_done
raise RuntimeError(
RuntimeError: The done entry 'done' was (partially) True after a call to reset() in env TransformedEnv(
env=VmasEnv(num_envs=600, n_agents=4, batch_size=torch.Size([600]), device=cuda:0) (scenario=transport),
transform=Compose(
RewardSum(keys=[('agents', 'reward')]))).

timesteps

Hi,

How timesteps are controlled for vmas?
I ran balance env with mappo with tuned hyperparameters. In wandb it shows 165 steps.
But when I plotted with marl-eval I can see 1e7 timesteps.
Screenshot 2024-01-16 at 8 54 10 PM
bal

Also, the plot is slightly different from the one given for tuned MAPPO + Balance.

Beginner questions

Hello,

I'll start with a disclaimer saying that I am a novice when it comes to reinforcement learning and RL frameworks. My goal is to determine if applying specific structural changes to the MultiAgentMLP and to the loss in various algorithms would lead to different outcomes for specific tasks. For this purpose I wanted to work with an environment where the agent reward has a shared and individual component, have the agents not share parameters or critics or observations. I think that I managed to select a baseline this by using the simple_reference environment and setting share_policy_params: False, share_param_critic: False and using the MADDPG/IDDPG algorithm.

However, I am confronted with 2 questions:

  1. What would be the best approach to implement logging per agent so that we can see individual losses and rewards even if the agents are part of the same group?
  2. Did I understand correctly that in the DDPG loss, the loss_actor for example is first computed per agent (as it has the shape batch_size x n_agents) and then it's reduced to an average which is a single number that is used to calculate the gradients for both agent's MLPs? I was guessing that each agent would have its own loss component drive its MLP's gradients.

LSTM in benchmarl

hello.
I eager to see dead agents(variable agents) and LSTM in your benchmark.
all the best.

Install error "No matching distribution found for torchrl>=0.2.0"

Hi, I am receiving the following error when trying to install: "ERROR: Could not find a version that satisfies the requirement torchrl>=0.2.0 (from benchmarl) (from versions: 0.0.1a0, 0.0.1b0, 0.0.1rc0, 0.0.2a0, 0.0.3, 0.0.4a0, 0.0.4b0, 0.0.4, 0.0.5, 0.1.0, 0.1.1)
ERROR: No matching distribution found for torchrl>=0.2.0"

new environments

hey,
I want you please if it possible add these new envs to benchmarl envs.
I developed two new envs and I think it can make envs more realistic:
navigation and discovery with obstacles that you can change size of obstacles:
new_envs.zip

Vmas supported tasks

Hi,

Only the below tasks supported from the vmas?

    vmas_balance_config
    vmas_navigation_config
    vmas_sampling_config
    vmas_simple_adversary_config
    vmas_simple_crypto_config
    vmas_simple_push_config
    vmas_simple_reference_config
    vmas_simple_speaker_listener_config
    vmas_simple_spread_config
    vmas_simple_tag_config
    vmas_simple_world_comm_config
    vmas_transport_config
    vmas_wheel_config

using picture except simple shape

hi.
I suggest please set image attribute for Agent and Landmark(goal) class that we can put nice image for our agent or landmark in rendering.
Thank you

using a gym environment in benchmarl

Hi everyone,

I am new to benchmarl and would like to create/integrate my own environment in order to train it in benchmarl.

I followed this tutorial on how to create a new task, but I do not exactly understand how to integrate my own environment.

Is it necessary to add the environment to torchrl like you did here?

Can I just import my gym environment and use its constructor as in this line? Or do I have to modify my environment to be able to use it in Benchmarl?

Mis-matched TD keys causing RT Error when training with 'collect_with_grad':True

This error occurs when running on a NVIDIA Tesla P100. I have also tested this on Apple M3, where the error is not thrown..

Experiment Config:
`Algorithm: maddpg, Task: vmas/navigation

Loaded config:

experiment:
sampling_device: cuda
train_device: cuda
buffer_device: cuda
share_policy_params: false
prefer_continuous_actions: true
collect_with_grad: true
gamma: 0.9
lr: 0.0005
adam_eps: 1.0e-06
clip_grad_norm: true
clip_grad_val: 5.0
soft_target_update: true
polyak_tau: 0.005
hard_target_update_frequency: 5
exploration_eps_init: 0.8
exploration_eps_end: 0.01
exploration_anneal_frames: null
max_n_iters: 1000
max_n_frames: null
on_policy_collected_frames_per_batch: 6000
on_policy_n_envs_per_worker: 10
on_policy_n_minibatch_iters: 45
on_policy_minibatch_size: 400
off_policy_collected_frames_per_batch: 6000
off_policy_n_envs_per_worker: 10
off_policy_n_optimizer_steps: 1000
off_policy_train_batch_size: 128
off_policy_memory_size: 1000000
off_policy_init_random_frames: 0
evaluation: true
render: false
evaluation_interval: 60000
evaluation_episodes: 100
evaluation_deterministic_actions: true
loggers: []
create_json: true
save_folder: null
restore_file: null
checkpoint_interval: 600000
checkpoint_at_end: false
keep_checkpoints_num: 3
algorithm:
share_param_critic: true
loss_function: l2
delay_value: true
use_tanh_mapping: true
task:
max_steps: 100
n_agents: 3
collisions: true
agents_with_same_goal: 1
observe_all_goals: false
shared_rew: false
split_goals: false
lidar_range: 0.35
agent_radius: 0.1
model:
name: mlp
num_cells:

  • 256
  • 256
    layer_class: torch.nn.Linear
    activation_class: torch.nn.Tanh
    activation_kwargs: null
    norm_class: null
    norm_kwargs: null
    critic_model:
    name: mlp
    num_cells:
  • 256
  • 256
    layer_class: torch.nn.Linear
    activation_class: torch.nn.Tanh
    activation_kwargs: null
    norm_class: null
    norm_kwargs: null
    seed: 0`

Full Hydra Stack Trace:
`Traceback (most recent call last):
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 425, in _stack
keys = _check_keys(list_of_tensordicts, strict=True)
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/utils.py", line 1523, in _check_keys
raise KeyError(
KeyError: "got keys {'action', 'episode_reward', 'info', 'observation', 'param', 'reward'} and {'action', 'episode_reward', 'info', 'observation', 'param'} which are incompatible"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/run.py", line 42, in
hydra_experiment()
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/run.py", line 38, in hydra_experiment
experiment.run()
File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 553, in run
raise err
File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 545, in run
self._collection_loop()
File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 575, in _collection_loop
batch = self.rollout_env.rollout(
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/torchrl/envs/common.py", line 2567, in rollout
out_td = torch.stack(tensordicts, len(batch_size), out=out)
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/base.py", line 388, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 496, in _stack
out = {
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 497, in
key: stack_fn(key, values, is_not_init, is_tensor)
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 494, in stack_fn
return _stack(values, dim, maybe_dense_stack=maybe_dense_stack)
File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 432, in _stack
raise RuntimeError(
RuntimeError: The sets of keys in the tensordicts to stack are exclusive. Consider using LazyStackedTensorDict.maybe_dense_stack instead.`

Plotting

Hi,

Is it possible to run multiple expmnts. with tuned hyperparameters like below?
python fine_tuned/vmas/vmas_run.py -m algorithm=mappo,maddpg task=vmas/balance seed=0,1,2
(Two algos on balance with three seeds)

If not, can you suggest how multiple runs can be plotted in a single plot if I have .json in "marl_eval" format for each run? I tried but was unable to get a single plot with multiple json files.

Thanks.

adding other vmas environment

hey!
this repo is very amazing and I like it.
I have a question why vmas envs like discovery and flocking don't added to this repo.
it will make this repo unique and awesome!!

Vmas/Kinematic Bicycle

Error executing job with overrides: ['algorithm=mappo', 'task=vmas/debug/kinematic_bicycle']
Traceback (most recent call last):
File "/hdd3/marl_new/new_bench_marl/BenchMARL/fine_tuned/vmas/vmas_run.py", line 25, in hydra_experiment
experiment: Experiment = load_experiment_from_hydra(cfg, task_name=task_name)
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/hydra_config.py", line 37, in load_experiment_from_hydra
return Experiment(
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/experiment/experiment.py", line 324, in init
self._setup()
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/experiment/experiment.py", line 344, in _setup
self._setup_algorithm()
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/experiment/experiment.py", line 421, in _setup_algorithm
self.losses = {
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/experiment/experiment.py", line 422, in
group: self.algorithm.get_loss_and_updater(group)[0]
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/algorithms/common.py", line 122, in get_loss_and_updater
policy_for_loss=self.get_policy_for_loss(group),
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/algorithms/common.py", line 183, in get_policy_for_loss
group: self._get_policy_for_loss(
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/algorithms/mappo.py", line 141, in _get_policy_for_loss
actor_module = model_config.get_model(
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/models/common.py", line 250, in get_model
return self.associated_class()(
File "/hdd3/marl_new/new_bench_marl/BenchMARL/benchmarl/models/mlp.py", line 55, in init
self.mlp = MultiAgentMLP(
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torchrl/modules/models/multiagent.py", line 162, in init
[
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torchrl/modules/models/multiagent.py", line 163, in
MLP(
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torchrl/modules/models/models.py", line 214, in init
layers = self._make_net(device)
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torchrl/modules/models/models.py", line 225, in _make_net
create_on_device(
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torchrl/modules/models/utils.py", line 122, in create_on_device
return module_class(*args, device=device, **kwargs)
File "/home/iitdpc/miniconda3/envs/mb_meta/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: Trying to create tensor with negative dimension -2: [-2, 256]

[DO NOT CLOSE] Library TODOs and call for contributions

Hello people!

In this issue I will list the things I would really like to have in BenchMARL and will tick them off as they are implemented!

It is also a really good place to find something you would like to contribute.

Features

  • Parameter sharing between actors and critics (started in #95, still need to implement sharing in PPO)
  • support for variable number of agents (dead agents) (#103)
  • support for turn-based environments (#76)
  • Parallel collection for non-vectorized environments
  • Improve checkpointing (just keep last checkpoints) (#102)
  • Prioritized replay buffers

Models

Algorithms

  • MPO and V-MPO
  • HARL algorithms (#52)

Environments

Suggestion: depend on id-marl-eval for their JSON logger

Hi there πŸ‘‹

Just a suggestion, but we've recently integrated our MarlEvalLogger from Mava into the id-marl-eval pypi package. We originally got this logger from this repo and made minimal changes to it so that it should be general enough to work with all repos. So if you would like to switch to depending on that logger we're happy to take that (small) maintenance burden off your shoulders.

Links to marl eval logger, a quick readme about the tools and how we use it in mava

If you've got any questions about it I'm more than happy to answer them πŸ˜„

MADDPG Config

Thank you for the amazing work you've put into VMAS and BenchMARL.

I tried and failed to reproduce the results from proroklab/VectorizedMultiAgentSimulator#62 and I am not sure if I am missing some important piece of configuration.

When I run:
python benchmarl/run.py task=pettingzoo/simple_reference algorithm=mappo
the logs appear to indicate that the training works: the reward increases until it reaches a plateau and the videos show the agents moving.

However, when I try to use MADDPG by running:
python benchmarl/run.py task=pettingzoo/simple_reference algorithm=maddpg
the training process proceeds, but the agents are not exploring the environment or communicating.
This is not singular to this task, as it happens for simple_spread as well.
If I try to use MADDPG with VMAS/nagivation it works as expected.
I experimented with different versions of PettingZoo and benchMARL, but it didn't seem to make a difference so I'm thinking that I may be missing something.

My config is the default:

algorithm_config:
  desc: null
  value:
    delay_value: true
    loss_function: l2
    share_param_critic: true
    use_tanh_mapping: true
algorithm_name:
  desc: null
  value: maddpg
continuous_actions:
  desc: null
  value: true
environment_name:
  desc: null
  value: pettingzoo
experiment_config:
  desc: null
  value:
    adam_eps: 1.0e-06
    checkpoint_interval: 300000.0
    clip_grad_norm: true
    clip_grad_val: 5.0
    create_json: true
    evaluation: true
    evaluation_deterministic_actions: true
    evaluation_episodes: 200
    evaluation_interval: 60000
    exploration_anneal_frames: 1000000
    exploration_eps_end: 0.01
    exploration_eps_init: 0.8
    gamma: 0.9
    hard_target_update_frequency: 5
    loggers:
    - wandb
    lr: 5.0e-05
    max_n_frames: 10000000
    max_n_iters: null
    off_policy_collected_frames_per_batch: 6000
    off_policy_init_random_frames: 0
    off_policy_memory_size: 1000000
    off_policy_n_envs_per_worker: 60
    off_policy_n_optimizer_steps: 1000
    off_policy_train_batch_size: 128
    on_policy_collected_frames_per_batch: 60000
    on_policy_minibatch_size: 4096
    on_policy_n_envs_per_worker: 600
    on_policy_n_minibatch_iters: 45
    polyak_tau: 0.005
    prefer_continuous_actions: true
    render: true
    restore_file: null
    sampling_device: cuda
    save_folder: artifacts
    share_policy_params: false
    soft_target_update: true
    train_device: cuda
model_config:
  desc: null
  value:
    activation_class: torch.nn.modules.activation.Tanh
    activation_kwargs: null
    layer_class: torch.nn.modules.linear.Linear
    norm_class: null
    norm_kwargs: null
    num_cells:
    - 256
    - 256
model_name:
  desc: null
  value: mlp
on_policy:
  desc: null
  value: false
seed:
  desc: null
  value: 0
task_config:
  desc: null
  value:
    continuous_actions: true
    local_ratio: 0.5
    max_cycles: 100
    task: simple_reference_v3
task_name:
  desc: null
  value: simple_reference

Visualization

Hi

For how many runs the plot std dev is taken (the shaded part in the plots)?
And these different runs are with different seeds?

environemnt_sample_efficiency_curves

Vmas/transport task fails to start.

Starting a training via cli 'python benchmarl/run.py algorithm=mappo task=vmas/transport' gives the error after a while (changing max steps in transport.yaml belates the raise error function hit):

BenchMARL/.venv/lib/python3.10/site-packages/torchrl/envs/common.py", line 1527, in _reset_check_done
raise RuntimeError(
RuntimeError: Env done entry 'done' was (partially) True after reset on specified '_reset' dimensions. This is not allowed.

Request for Example of AEC API Usage with Agent Masking in Petting Zoo

I've been exploring the BenchMARL library and am impressed with its capabilities and designβ€”great work!

I am currently interested in implementing a multi-agent reinforcement learning scenario using the AEC (Agent-Environment Cycle) API in petting zoo, particularly for environments that require sequential turn-based actions like in a Chess game. In this context, I need to apply masking at the agent level rather than action masking.

Could you provide an example or guidance on how to adapt the AEC API for such a use case? Any examples of AEC API usage with agent masking in a Chess-like environment would be incredibly helpful.

Thank you for your assistance and for the excellent work on BenchMARL.

Suggestion of integrating HARL algorithms

Hello. Thank you for your amazing work. I appreciate the efforts to provide a unified library of MARL algorithms and environments for benchmarking and reproducibility. To better achieve this goal, I suggest integrating HARL algorithms, which achieve SOTA results on various benchmarks and are theoretically underpinned. Their papers have been accepted to JMLR and ICLR 2024 (spotlight). As they represent important advancements in MARL and are now increasingly used as baselines, integrating them should be helpful to the adoption of this library.

No module named 'pettingzoo.utils.all_modules'`

im kinda stuck

python version:3.9.16
torchrl version: 0.2.1
pettingzoo version: 1.24.2
benchmarl version: 1.0.0

i have this versions but when i try to run :python benchmarl/run.py algorithm=mappo task=pettingzoo/simple_spread i get wired errors like :

ModuleNotFoundError: No module named 'multi_agent_ale_py' and when i tried using pip install 'multi_agent_ale_py i get this error : Could not build wheels for multi_agent_ale_py, which is required to install pyproject.toml-based projects

thanks for help :(

Properly Citing BenchMARL

I'm currently writing up findings for a study which utilized BenchMARL, and couldn't find any instructions in the repo on how to properly cite the package in my work, nor an associated paper to cite in its stead. Is there a desired method for us to give proper credit to the software, or would just mentioning it somewhere in our paper suffice?

evalution

hi.
what does it mean when in a scenario eval_mean_reward increasing also critic loss increasing?
can we call it over-fitting?

[Bug] Multiwalker fails to run

If you try to run multiwalker, like this:

python benchmarl/run.py algorithm=ippo task=pettingzoo/multiwalker

You get this error:

File "/home/kale-ab/miniconda3/envs/benchmarl/lib/python3.10/site-packages/pettingzoo/sisl/multiwalker/multiwalker_base.py", line 411, in reset
    self._generate_terrain(self.hardcore)
  File "/home/kale-ab/miniconda3/envs/benchmarl/lib/python3.10/site-packages/pettingzoo/sisl/multiwalker/multiwalker_base.py", line 730, in _generate_terrain
    for i in range(self.terrain_length):
TypeError: 'float' object cannot be interpreted as an integer

This is because terrain_length is passed in as the incorrect type.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.