Git Product home page Git Product logo

timechamber's Introduction

TimeChamber: A Massively Parallel Large Scale Self-Play Framework


TimeChamber is a large scale self-play framework running on parallel simulation. Running self-play algorithms always need lots of hardware resources, especially on 3D physically simulated environments. We provide a self-play framework that can achieve fast training and evaluation with ONLY ONE GPU. TimeChamber is developed with the following key features:

  • Parallel Simulation: TimeChamber is built within Isaac Gym. Isaac Gym is a fast GPU-based simulation platform. It supports running thousands of environments in parallel on a single GPU.For example, on one NVIDIA Laptop RTX 3070Ti GPU, TimeChamber can reach 80,000+ mean FPS by running 4,096 environments in parallel.
  • Parallel Evaluation: TimeChamber can fast calculate dozens of policies' ELO rating(represent their combat power). It also supports multi-player ELO calculations by multi-elo. Inspired by Vectorization techniques for fast population-based training, we leverage the vectorized models to evaluate different policy in parallel.
  • Prioritized Fictitious Self-Play Benchmark: We implement a classic PPO self-play algorithm on top of rl_games, with a prioritized player pool to avoid cycles and improve the diversity of training policy.
  • Competitive Multi-Agent Tasks: Inspired by OpenAI RoboSumo and ASE, we introduce three competitive multi-agent tasks(e.g.,Ant Sumo,Ant Battle and Humanoid Strike) as examples. The efficiency of our self-play framework has been tested on these tasks. After days of training,our agent can discover some interesting physical skills like pulling, jumping,etc. Welcome to contribute your own environments!

Installation


Download and follow the installation instructions of Isaac Gym: https://developer.nvidia.com/isaac-gym
Ensure that Isaac Gym works on your system by running one of the examples from the python/examples directory, like joint_monkey.py. If you have any trouble running the samples, please follow troubleshooting steps described in the Isaac Gym Preview Release 3/4 installation instructions.
Then install this repo:

pip install -e .

Quick Start


Tasks

Source code for tasks can be found in timechamber/tasks,The detailed settings of state/action/reward are in here. More interesting tasks will come soon.

Humanoid Strike

Humanoid Strike is a 3D environment with two simulated humanoid physics characters. Each character is equipped with a sword and shield with 37 degrees-of-freedom. The game will be restarted if one agent goes outside the arena. We measure how much the player damaged the opponent and how much the player was damaged by the opponent in the terminated step to determine the winner.

Ant Sumo

Ant Sumo is a 3D environment with simulated physics that allows pairs of ant agents to compete against each other. To win, the agent has to push the opponent out of the ring. Every agent has 100 hp . Each step, If the agent's body touches the ground, its hp will be reduced by 1.The agent whose hp becomes 0 will be eliminated.

Ant Battle

Ant Battle is an expanded environment of Ant Sumo. It supports more than two agents competing against with each other. The battle ring radius will shrink, the agent going out of the ring will be eliminated.

Self-Play Training

To train your policy for tasks, for example:

# run self-play training for Humanoid Strike task
python train.py task=MA_Humanoid_Strike headless=True
# run self-play training for Ant Sumo task
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO headless=True
# run self-play training for Ant Battle task
python train.py task=MA_Ant_Battle train=MA_Ant_BattlePPO headless=True

Key arguments to the training script follow IsaacGymEnvs Configuration and command line arguments . Other training arguments follow rl_games config parameters, you can change them in timechamber/tasks/train/*.yaml. There are some specific arguments for self-play training:

  • num_agents: Set the number of agents for Ant Battle environment, it should be larger than 1.
  • op_checkpoint: Set to path to the checkpoint to load initial opponent agent policy. If it's empty, opponent agent will use random policy.
  • update_win_rate: Win_rate threshold to add the current policy to opponent's player pool.
  • player_pool_length: The max size of player pool, following FIFO rules.
  • games_to_check: Warm up for training, the player pool won't be updated until the current policy plays such number of games.
  • max_update_steps: If current policy update iterations exceed that number, the current policy will be added to opponent player_pool.

Policies Evaluation

To evaluate your policies, for example:

# run testing for Ant Sumo policy
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO test=True num_envs=4 minibatch_size=32 headless=False checkpoint='models/ant_sumo/policy.pth'
# run testing for Humanoid Strike policy
python train.py task=MA_Humanoid_Strike train=MA_Humanoid_StrikeHRL test=True num_envs=4 minibatch_size=32 headless=False checkpoint='models/Humanoid_Strike/policy.pth' op_checkpoint='models/Humanoid_Strike/policy_op.pth'

You can set the opponent agent policy using op_checkpoint. If it's empty, the opponent agent will use the same policy as checkpoint.
We use vectorized models to accelerate the evaluation of policies. Put policies into checkpoint dir, let them compete with each other in parallel:

# run testing for Ant Sumo policy
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO test=True headless=True checkpoint='models/ant_sumo' player_pool_type=vectorized

There are some specific arguments for self-play evaluation, you can change them in timechamber/tasks/train/*.yaml:

  • games_num: Total episode number of evaluation.
  • record_elo: Set True to record the ELO rating of your policies, after evaluation, you can check the elo.jpg in your checkpoint dir.
  • init_elo: Initial ELO rating of each policy.

Building Your Own Task

You can build your own task follow IsaacGymEnvs , make sure the obs shape is correct andinfo contains win,loseanddraw:

import isaacgym
import timechamber
import torch

envs = timechamber.make(
    seed=0,
    task="MA_Ant_Sumo",
    num_envs=2,
    sim_device="cuda:0",
    rl_device="cuda:0",
)
# the obs shape should be (num_agents*num_envs,num_obs).
# the obs of training agent is (:num_envs,num_obs)
print("Observation space is", envs.observation_space)
print("Action space is", envs.action_space)
obs = envs.reset()
for _ in range(20):
    obs, reward, done, info = envs.step(
        torch.rand((2 * 2,) + envs.action_space.shape, device="cuda:0")
    )
# info:
# {'win': tensor([Bool, Bool])
# 'lose': tensor([Bool, Bool])
# 'draw': tensor([Bool, Bool])}

Citing

If you use timechamber in your research please use the following citation:

@misc{InspirAI,
  author = {Huang Ziming, Ziyi Liu, Wu Yutong, Flood Sung},
  title = {TimeChamber: A Massively Parallel Large Scale Self-Play Framework},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/inspirai/TimeChamber}},
}

timechamber's People

Contributors

zeldahuang avatar ziyiliubird avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

timechamber's Issues

Maybe a Bug

It occurred:

NotADirectoryError: [Errno 20] Not a directory: '/home/lzy/lzy/MARL/self-play/TimeChamber/timechamber/models/ant_sumo/policy.pth/../elo.jpg'

when I run policy evaluation:

python train.py task=MA_Ant_Sumo test=True headless=True checkpoint='models/ant_sumo/policy.pth'

Maybe this line code has some problem: TimeChamber/timechamber/learning/ppo_sp_player.py", line 286,

And the bug can be fixed by replacing

plt.savefig(self.params['load_path'] + '/../elo.jpg')

to

parent_path = os.path.dirname(self.params['load_path'])

plt.savefig(os.path.join(parent_path, 'elo.jpg'))

How much time costs on training?

In Humanoid strike task config training file, training process seems epoch has to reach until 100000, which I assume, may take about 10days or more..
Could you give the specfic training days for this?

numEnvs

How long does it take to train humanoid strike? I calculated that my computer takes about a month to train, is that normal? My GPU is RTX 4080.
Why is training one epoch slower than before after setting numEnvs=8192?
What parameters should I change to make training faster?
Please let me know, thank you very much.

ModuleNotFoundError: No module named 'timechamber.ase'

when I run the code, there is an error:
Traceback (most recent call last): File "train.py", line 47, in <module> from timechamber.ase import ase_agent ModuleNotFoundError: No module named 'timechamber.ase'
I find that there is not 'ase' directory in timechamber, how can i get ase?

Bug

if self.player_pool_type == 'multi_thread':

return PFSPPlayerProcessPool(max_length=self.max_his_player_num,

elif self.player_pool_type == 'multi_process':

return PFSPPlayerThreadPool(max_length=self.max_his_player_num,

Maybe line 58, and 59 should exchange with line 61, and 62 ?

Suggestion to use RNN for multi-agent tasks

Hello,
Good job for the amazing work. I noticed that for the task of ma humanoid strike, you used a similar reward design to the one used for boxing agents in this paper:

https://dl.acm.org/doi/abs/10.1145/3450626.3459761

I was thinking that maybe strategic behavior could emerge with the sword fighters and the results could be better if you add a memory module (lstm, gru, transformer) like in the paper. Also, as far as I understand, in the literature of multi-agent partially observable mdp, each agent should take the history of its observations when taking actions to have more accurate belief about the global state and also to account for the non-stationary environment.

Thanks

Can't find isaacgym module

Hello,
When I try to run train.py, the following message appears:

Traceback (most recent call last):
File "train.py", line 33, in
import isaacgym
ModuleNotFoundError: No module named 'isaacgym'

But the thing is, IsaacGym is installed and works on my computer, I can run IsaacGymEnvs without problem. Thanks a lot for any help.

Issue with contact forces when computing r damage in strike task

Hello,

When trying to see the value of each reward term for the task of striking, I noticed that the r_damage can be triggered even when the agents are not hitting each other. After further investigation, I found that the contact buffers can also include self collision in addition to contact between different agents; for example, if an agent's hand collide with its own head it will be included in the contact buffer, therefore it will be used in the reward as if it got hit by the opponent which was not the case.

This could be solved by filtering the contact buffer by specifying the bodies involved in the contact, but I saw that this functionality is not available right now in isaacgym. The workaround I can think of is to detect first if there was contact between specific bodies by measuring the distance between them, once the distance is lower than some threshold, you assume there was contact, then you check the magnitude of the contact force.

Hope I was clear.

How much time costs on training?

In Humanoid strike task config training file, it seems that epoch has to reach until 100000, which I assume, may take about 10days or more..
Could you give the specfic training days for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.