theohhhu / updet Goto Github PK
View Code? Open in Web Editor NEWOfficial Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)
License: MIT License
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)
License: MIT License
for 7m-5m-3m transfer learning,when I use the updet model combined with qmix,there is a dimension mismatch problem.
Only the updet model combined with VDN support transfer learning?
Hello, your paper helped me a lot. Your charts are also very good. Can you open source a copy of the drawing code for me? I would be very grateful.
Hi!
I find your ideia very interesting, but the only part that I do not understand is how you group actions.
On figure 7, regarding the testing environment on SMAC, it is not state any specific grouping.
Could you help me understand?
Thanks!
Hi, I am very interested in this project. Do you have any plan to release the source code?
4m_vs_5m
我尝试debug您的代码。
1.VSCode配置(无法配置README.md 里面要求的参数:with env_args.map_name=5m_vs_6m):
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"args": ["--config=qmix", "--env-config=sc2"]
}
]
}
2.使用CPU运行
3.发现basic_controller.py的方法_build_inputs_transformer有数组尺寸错误问题:
arranged_obs.size()
torch.Size([1, 3, 30])
显然,arranged_obs摊平的尺寸为90。
下面这行代码试图把它的尺寸修改为(-1,11,5)
reshaped_obs = arranged_obs.view(-1, 1 + (self.args.enemy_num - 1) + self.args.ally_num, self.args.token_dim)
这显然是不行的,麻烦您抽空帮忙解答。截图如下
Traceback (most recent call last):
File "main.py", line 19, in
ex = Experiment("pymarl")
File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/experiment.py", line 75, in init
_caller_globals=caller_globals)
File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/ingredient.py", line 57, in init
gather_sources_and_dependencies(_caller_globals)
File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 487, in gather_sources_and_dependencies
sources = gather_sources(globs, experiment_path)
File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 440, in get_sources_from_imported_modules
return get_sources_from_modules(iterate_imported_modules(globs), base_path)
File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 409, in get_sources_from_modules
filename = os.path.abspath(mod.file)
File "/home/username/anaconda3/lib/python3.7/posixpath.py", line 371, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Perhaps there's a problem about the path variable, how to solve it?
Thanks
When I install below
pip install -r requirements.text
smac 1.0.0 has requirement pygame>=2.0.0, but you'll have pygame 1.9.4 which is incompatible.
is poped up
I am wondering in which part the observation-entity and action-groups match in your released code.
hello, I'm interested in your work. I want to reproduce the transfer learning result. As you mentioned, it can be deployed to other scenarios without changing the model's architecture. And there is a figure given. I want to reproduce it.
# --- Defaults ---
# --- pymarl options ---
runner: "episode" # Runs 1 env for an episode
mac: "basic_mac" # Basic controller
env: "sc2" # Environment name
env_args: {} # Arguments for the environment
batch_size_run: 1 # Number of environments to run in parallel
test_nepisode: 20 # Number of episodes to test for
test_interval: 2000 # Test after {} timesteps have passed
test_greedy: True # Use greedy evaluation (if False, will set epsilon floor to 0
log_interval: 2000 # Log summary of stats after every {} timesteps
runner_log_interval: 2000 # Log runner stats (not test stats) every {} timesteps
learner_log_interval: 2000 # Log training stats every {} timesteps
t_max: 10000 # Stop running after this many timesteps
use_cuda: True # Use gpu by default unless it isn't available
buffer_cpu_only: True # If true we won't keep all of the replay buffer in vram
# --- Logging options ---
use_tensorboard: True # Log results to tensorboard
save_model: True # Save the models to disk
save_model_interval: 2000000 # Save models after this many timesteps
checkpoint_path: "" # Load a checkpoint from this path
evaluate: False # Evaluate model for test_nepisode episodes and quit (no training)
load_step: 0 # Load model trained on this many timesteps (0 if choose max possible)
save_replay: False # Saving the replay of the model loaded from checkpoint_path
local_results_path: "results" # Path for local results
# --- RL hyperparameters ---
gamma: 0.99
batch_size: 32 # Number of episodes to train on
buffer_size: 32 # Size of the replay buffer
lr: 0.0005 # Learning rate for agents
critic_lr: 0.0005 # Learning rate for critics
optim_alpha: 0.99 # RMSProp alpha
optim_eps: 0.00001 # RMSProp epsilon
grad_norm_clip: 10 # Reduce magnitude of gradients above this L2 norm
# --- Agent parameters. Should be set manually. ---
agent: "updet" # Options [updet, transformer_aggregation, rnn]
rnn_hidden_dim: 64 # Size of hidden state for default rnn agent
obs_agent_id: False # Include the agent's one_hot id in the observation
obs_last_action: False # Include the agent's last action (one_hot) in the observation
# --- Transformer parameters. Should be set manually. ---
token_dim: 5 # Marines. For other unit type (e.g. Zeolot) this number can be different (6).
emb: 32 # embedding dimension of transformer
heads: 3 # head number of transformer
depth: 2 # block number of transformer
ally_num: 8 # number of ally (5m_vs_6m)
enemy_num: 8 # number of enemy (5m_vs_6m)
# --- Experiment running params ---
repeat_id: 1
label: "default_label"
This is the config I used to train 8m and I change ally_num
and enemy_num
to 5. Should I change checkpoint_path
? Is the figure you given showing the win rate in training process? How can I get the same one?
Hi, i wanna know why I run 5m_vs_6m win rate is only about 20%-40% in qmix, the same happens on 'rnn' and 'updet', I also use sc2 version 4.10, run 2M steps, torch=1.4.1, the code did not make any changes.
Is there anyone else like this? How do I reproduce these results, if you can answer too thanks!
Hello, I am very interested in your work! I have learned the code, especially the class "TransformerAggregationAgent". But I have not found where you implement the policy decoupling. The only thing I find is
q_agg = torch.mean(outputs, 1)
q = self.q_linear(q_agg)
I am confused that you calculatte the mean along the action dimension and then map the result back to the actions. Can you please explain the motivation of this part. Really look forward to your reply.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.