This repository contains code for the paper "Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning". https://arxiv.org/abs/2210.03022
Add the possibility to use a CNN architecture for all implemented algorithms in the code base. Appropriate reshaping might be needed, in that case, make sure runner.py is compatible with it too
We still need to implement additional PPO training details to benefit from the full performance of IPPO and MAPPO [1,2]. Here are the things that should be implemented:
Feature Pruning: Form a state by concatenating environment provided global state and agent's local observation and then prune out redundant information. This is highly environment specific so we might need to change the obs_to_state_wrapper to account for that. No change needed elsewhere.
Value Normalization: Regress value network output to the normalized value target. This was found to help the training significantly for MAPPO
Recurrent-MAPPO: MAPPO that operates with RNNs (GRU for example) instead of simple MLPs
Frame stacking: Provide a stack of observations instead of only one
Integrate the MARLGrid environment into the codebase and include coordination and heterogeneity levels
Put all relevant files under the folder src/envs/marlgrid/. Once the environment code is ready, make sure it's callable from src/envs/__init__.py using the get_env function.