Git Product home page Git Product logo

dhdev0 / muzero-unplugged Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 2.0 2.32 MB

Pytorch Implementation of MuZero Unplugged for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.

License: GNU General Public License v3.0

Python 92.12% Jupyter Notebook 7.47% Dockerfile 0.41%
deep-learning deep-reinforcement-learning gym lstm machine-learning neural-network python3 pytorch reinforcement-learning transformer arxiv arxiv-papers gym-environments monte-carlo-tree-search muzero resnetv1 resnetv2 rl muzero-unplugged

muzero-unplugged's Introduction

Muzero Unplugged

Pytorch Implementation of Muzero Unplugged. Base on Muzero and incorporate the new feature of muzero unplugged.

MuZero Unplugged is an extension of the original MuZero algorithm.

Key features :

  • The Demonstration buffer, which is a collection of expert demonstrations that guide the agent's learning. These demonstrations can come from human players or other agents.
  • The Reanalyze buffer, which is a collection of agents' demonstrations.
  • The introduction of these two new buffers allows the algorithm to work with environments that do not have a simulator, or to reduce the amount of simulation needed as it does not need to wait for the simulator to generate new states.
  • The ability to create your own expert demonstrations by experimenting with the simulation.

MuZero -> MuZero Unplugged -> Stochastic MuZero

Table of contents

Getting started

Local Installation

PIP dependency : requirement.txt

git clone https://github.com/DHDev0/Muzero-unplugged.git

cd Muzero-unplugged

pip install -r requirements.txt

If you experience some difficulty refer to the first cell Tutorial or use the dockerfile.

Docker

Build image: (building time: 22 min , memory consumption: 8.75 GB)

docker build -t muzero_unplugged .

(do not forget the ending dot)

Start container:

docker run --cpus 2 --gpus 1 -p 8888:8888 muzero_unplugged
#or
docker run --cpus 2 --gpus 1 --memory 2000M -p 8888:8888 muzero_unplugged
#or
docker run --cpus 2 --gpus 1 --memory 2000M -p 8888:8888 --storage-opt size=15g muzero_unplugged

The docker run will start a jupyter lab on https://localhost:8888//lab?token=token (you need the token) with all the necessary dependency for cpu and gpu(Nvidia) compute.

Option meaning:
--cpus 2 -> Number of allocated (2) cpu core
--gpus 1 -> Number of allocated (1) gpu
--storage-opt size=15gb -> Allocated storage capacity 15gb (not working with windows WSL)
--memory 2000M -> Allocated RAM capacity of 2GB
-p 8888:8888 -> open port 8888 for jupyter lab (default port of the Dockerfile)

Stop the container:

docker stop $(docker ps -q --filter ancestor=muzero_unplugged)

Delete the container:

docker rmi -f muzero_unplugged

Dependency

Language :

  • Python 3.8 to 3.10 (bound by the retro compatibility of Ray and Pytorch)

Library :

  • torch 1.13.0
  • torchvision 0.14.0
  • ray 2.0.1
  • gymnasium 0.27.0
  • matplotlib >=3.0
  • numpy 1.21.5

More details at: requirement.txt

Usage

Jupyter Notebook

For practical example, you can use the Tutorial.

CLI

Set your config file (example): https://github.com/DHDev0/Muzero-unplugged/blob/main/config/

First and foremost cd to the project folder:

cd Muzero

Construct your dataset through experimentation.

python muzero_cli.py human_buffer config/name_config.json

Training :

python muzero_cli.py train config/name_config.json

Training with report

python muzero_cli.py train report config/name_config.json

Inference (play game with specific model) :

python muzero_cli.py train play config/name_config.json

Training and Inference :

python muzero_cli.py train play config/name_config.json

Benchmark model :

python muzero_cli.py benchmark config/name_config.json

Training + Report + Inference + Benchmark :

python muzero_cli.py train report play benchmark play config/name_config.json

Features

Core Muzero feature:

  • Work for any Gymnasium environments/games. (any combination of continous or/and discrete action and observation space)
  • MLP network for game state observation. (Multilayer perceptron)
  • LSTM network for game state observation. (LSTM)
  • Transformer decoder for game state observation. (Transformer)
  • Residual network for RGB observation using render. (Resnet-v2 + MLP)
  • Residual LSTM network for RGB observation using render. (Resnet-v2 + LSTM)
  • MCTS with 0 simulation (use of prior) or any number of simulation.
  • Model weights automatically saved at best selfplay average reward.
  • Priority or Uniform for sampling in replay buffer.
  • Manage illegal move with negative reward.
  • Scale the loss using the importance sampling ratio.
  • Custom "Loss function" class to apply transformation and loss on label/prediction.
  • Load your pretrained model from tag number.
  • Single player mode.
  • Training / Inference report. (not live, end of training)
  • Single/Multi GPU or Single/Multi CPU for inference, training and self-play.
  • Support mix precision for training and inference.(torch_type: bfloat16,float16,float32,float64)
  • Pytorch gradient scaler for mix precision in training.
  • Tutorial with jupyter notebook.
  • Pretrained weights for cartpole. (you will find weight, report and config file)
  • Commented with link/page to the paper.
  • Support : Windows , Linux , MacOS.
  • Fix pytorch linear layer initialization. (refer to : https://tinyurl.com/ykrmcnce)
  • Support of Gymnasium 0.27.0

Muzero Unplugged new add-on features include:

  • The ability to accommodate any number of players with the provision of player cycle information.
  • The incorporation of reanalyze buffer(offline learning) and reanalyze ratio functionality.
  • The capability to construct human play datasets through experimentation (CLI only).
  • The capability to load human play datasets into the Demonstration buffer or Replay buffer for training.
  • The ability to specify the amount of sampled action that MCTS should utilize.
  • The implementation of a priority scale on neural network and replay buffer priority.
  • Various options for bounding, saving, and deleting games from the reanalyze buffer.
  • The introduction of the reanalyze_fraction_mode, which allows for the statistical or
    quantitative switch between new game and reanalyze data with a ratio of reanalyze buffer vs replay buffer."
  • The implementation of a scaling parameter of the value loss.

TODO:

  • Hyperparameter search. (pseudo-code available in self_play.py)
  • Training and deploy on cloud cluster using Kubeflow, Airflow or Ray for aws,gcp and azure.

How to make your own custom gym environment?

Refer to the Gym documentation

You will be able to call your custom gym environment in muzero after you register it in gym.

Authors

Subjects

Deep reinforcement learning

License

GPL-3.0 license

muzero-unplugged's People

Contributors

dhdev0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.