Git Product home page Git Product logo

popgym's Introduction

Partially Observable Process Gym (POPGym)

tests codecov

POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines.

Setup

Packages will be sent to pypi soon. Until then, to install the environments:

git clone https://github.com/smorad/popgym
cd popgym
pip install .

To install the baselines and dependencies, first install ray

pip install ray[rllib]

You must do this, as ray-2.0.0 erroneously pins an old verison of gym and will cause dependency issues. This has been patched but did not make it into the latest release. Once ray is installed, install popgym:

git clone https://github.com/smorad/popgym
cd popgym
pip install ".[baselines]"

POPGym Environments

POPGym contains Partially Observable Markov Decision Process (POMDP) environments following the Openai Gym interface, where every single environment is procedurally generated. We find that much of RL is a huge pain-in-the-rear to get up and running, so our environments follow a few basic tenets:

  1. Painless setup - popgym environments requires only gym, numpy, and mazelib as core dependencies, and can be installed with a single pip install.
  2. Laptop-sized tasks - None of our environments have large observation spaces or require GPUs to render. Well-designed models should be able to solve a majority of tasks in less than a day.
  3. True generalization - It is possible for memoryless agents to receive high rewards on environments by memorizing the layout of each level. To avoid this, all environments are heavily randomized.

Environment Overview

The environments are split into set or sequence tasks. Ordering matters in sequence tasks (e.g. the order of the button presses in simon matters), and does not matter in set tasks (e.g. the "count" in blackjack does not change if you swap ot-1 and ot-k). We provide a table of the environments. The frames per second (FPS) was computed by running the popgym-perf-test.ipynb notebook on the Google Colab free tier by stepping and resetting single environment for 100k timesteps. We also provide the same benchmark run on a Macbook Air (2020). With multiprocessing, environment FPS scales roughly linearly with the number of processes.

Environment Problem Class Temporal Ordering Colab FPS Macbook Air (2020) FPS
Battleship Long-term memory None 117,158 235,402
Concentration Long-term memory Weak 47,515 157,217
Higher/Lower Card counting None 24,312 76,903
Labyrinth Escape Navigation Strong 1,399 41,122
Labyrinth Explore Navigation Strong 1,374 30,611
Minesweeper Long-term memory None 8,434 32,003
Multiarmed Bandit Noisy dynamics None 48,751 469,325
Autoencode Long-term memory Strong 121,756 251,997
Repeat First Simple None 23,895 155,201
Repeat Previous Simple Strong 50,349 136,392
Stateless Cartpole Control Strong 73,622 218,446
Noisy Stateless Cartpole Noisy dynamics Strong 6,269 66,891
Stateless Pendulum Control Strong 8,168 26,358
Noisy Stateless Pendulum Noisy dynamics Strong 6,808 20,090

Feel free to rerun this benchmark using this colab notebook.

Environment Descriptions

Concentration

The quintessential memory game, sometimes known as "memory". A deck of cards is shuffled and placed face-down. The agent picks two cards to flip face up, if the cards match ranks, the cards are removed from play and the agent receives a reward. If they don't match, they are placed back face-down. The agent must remember where it has seen cards in the past.

Higher/Lower

Guess whether the next card drawn from the deck is higher or lower than the previously drawn card. The agent should keep a count like blackjack and modify bets, but this game is significantly simpler than blackjack.

Battleship

One-player battleship. Select a gridsquare to launch an attack, and receive confirmation whether you hit the target. The agent should use memory to remember which gridsquares were hits and which were misses, completing an episode sooner.

Multiarmed Bandit

Over an episode, solve a multiarmed bandit problem by maximizing the expected reward. The agent should use memory to keep a running mean and variance of bandits.

Minesweeper

Classic minesweeper, but with reduced vision range. The agent only has vision of the surroundings near its last sweep. The agent must use memory to remember where the bombs are

Repeat Previous

Output the t-kth observation for a reward

Repeat First

Output the zeroth observation for a reward

Autoencode

The agent will receive k observations then must output them in the same order

Stateless Cartpole

Classic cartpole, except the velocity and angular velocity magnitudes are hidden. The agent must use memory to compute rates of change.

Stateless Pendulum

Classic pendulum, but the velocity and angular velocity are hidden from the agent. The agent must use memory to compute rates of change.

Laybrinth Escape

Escape randomly-generated labyrinths. The agent must remember wrong turns it has taken to find the exit.

Labyrinth Explore

Explore as much of the labyrinth as possible in the time given. The agent must remember where it has been to maximize reward.

POPGym Baselines

We implement the following baselines as RLlib custom models:

  1. MLP
  2. Positional MLP
  3. Framestacking
  4. Temporal Convolution
  5. Elman Networks
  6. Long Short-Term Memory
  7. Gated Recurrent Units
  8. Independently Recurrent Neural Networks
  9. Fast Autoregressive Transformers
  10. Fast Weight Programmers
  11. Legendre Memory Units
  12. Diagonal State Space Models
  13. Differentiable Neural Computers

To add your own custom model, please inherit from BaseModel and implement the initial_state and memory_forward functions, as well as define your model configuration using MODEL_CONFIG. To use any of these or your own custom model in ray, simply add it to the ray config:

import ray
from popgym.baselines.ray_models.ray_lstm import LSTM
config = {
   ...
   "model": {
      custom_model: LSTM,
      custom_model_config: {"hidden_size": 128}
   }
}
ray.tune.run("PPO", config)

Each model defines a MODEL_CONFIG that you can set by adding keys and values to custom_model_config. See ppo.py for an in-depth example.

Contributing

Steps to follow:

  1. Fork this repo in github
  2. Clone your fork to your machine
  3. Move your environment into the forked repo
  4. Install precommit in the fork (see below)
  5. Write a unittest in tests/, see other tests for examples
  6. Add your environment to ALL_ENVS in popgym/__init__.py
  7. Make sure you don't break any tests by running pytest tests/
  8. Git commit and push to your fork
  9. Open a pull request on github
# Step 4. Install pre-commit in the fork
pip install pre-commit
git clone https://github.com/smorad/popgym
cd popgym
pre-commit install

popgym's People

Contributors

smorad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.