Hierarchies of Reward Machines - Formalism & Environments

Implementation of the formalism and the environments described in the paper Hierarchies of Reward Machines. The implementation of the policy and hierarchy learning algorithms can be found here.

Overview
Installation
Usage
1. CraftWorld Tasks
2. WaterWorld Tasks
Citation
References

Disclaimer: In line with our previous work (Furelos Blanco et al., 2021), we used the term hierarchy of subgoal automata instead of hierarchy of reward machines during the initial stages of the work; hence, the code employs the former term and the name of the package is gym_hierarchical_subgoal_automata.

Overview

The environments in this repository are described in the Hierarchies of Reward Machines paper. In the following lines, we describe some implementation decisions we have made in case you wish to extend the code or easily spot the differences with respect to the codebases we build upon.

The CraftWorld environments are built on top of a modified version of Minigrid, which is installed from this repository. Our modifications are exclusively done in the gym_minigrid/minigrid.py file:

Added colors white, pink, cyan and brown. See the COLORS (link) and COLOR_TO_IDX (link) global variables.
Added objects iron, table, cow, sugarcane, wheat, chicken, redstone, rabbit, squid and workbench. See the OBJECT_TO_IDX (link) global variable and new object classes (link).

The WaterWorld code is based on the one by Toro Icarte et al. (2018).

Installation

The code has been tested using Python 3.7 on Linux and MacOS. We recommend to use a virtual environment since the requirements of this package may affect your current installation. To install the package, run the following commands:

$ cd hrm-formalism-envs
$ pip install -e .

To visualize the reward machines that compose a hierarchy, you should install Graphviz:

# Ubuntu
$ sudo apt install graphviz

# OSX
$ brew install graphviz

Usage

The repository contains implementations for different CraftWorld and WaterWorld tasks, as well as for the hierarchies formalism we present in the aforementioned paper. The example.py and test.py files in the root of this repository illustrate how environments are created and how the methods for traversing a hierarchy (among other things) are used.

To create an environment, you need to use the following Python code:

$ import gym, gym_hierarchical_subgoal_automata
$ env = gym.make(ENV_ID, params={"environment_seed": SEED})

where ENV_ID is the identifier of the environment, and SEED is an integer used as a seed to randomly initialize the environment. Unless random_restart: Trueis specified inside the params dictionary, the environment will always be reset to the same initial state. Once the environment is created you can use env.play() to manually interact with it.

In the following sections we describe the identifiers of each task and the additional parameters that can be specified within the params dictionary.

CraftWorld Tasks

Task Identifiers

The identifiers for the tasks used in the paper are:

Task	Id
Batter	`CraftWorldBatter-v0`
Bucket	`CraftWorldBucket-v0`
Compass	`CraftWorldCompass-v0`
Leather	`CraftWorldLeather-v0`
Paper	`CraftWorldPaper-v0`
Quill	`CraftWorldQuill-v0`
Sugar	`CraftWorldSugar-v0`
Book	`CraftWorldBook-v0`
Map	`CraftWorldMap-v0`
MilkBucket	`CraftWorldMilkBucket-v0`
BookAndQuill	`CraftWorldBookAndQuill-v0`
MilkBucketAndSugar	`CraftWorldMilkBucketAndSugar-v0`
Cake	`CraftWorldCake-v0`

Grid Types

The params for each type of grid used in the paper are given below. Note that the grid_params dictionary must be placed within the params dictionary exemplified above, i.e. env = gym.make(ENV_ID, params={"grid_params": {...}}).

Open Plan (OP)

"grid_params": {
    "grid_type": "open_plan", "width": 7, "height": 7, "use_lava": False, "max_objs_per_class": 1
}

Open Plan + Lava (OPL)

"grid_params": {
    "grid_type": "open_plan", "width": 7, "height": 7, "use_lava": True, "num_lava": 1, "max_objs_per_class": 1
}

Four Rooms (FR)

"grid_params": {
    "grid_type": "four_rooms", "size": 13, "use_lava": False, "max_objs_per_class": 2
}

Four Rooms + Lava (FRL)

"grid_params": {
    "grid_type": "four_rooms", "size": 13, "use_lava": True, "max_objs_per_class": 2
}

Observation Format

The format the observations can be modified using the state_format parameter specified within the params dictionary when creating the environment. There are three possible values:

tabular - An integer representing the position in the grid.
one_hot - Like tabular but in a one hot encoding.
full_obs - The usual Minigrid observation but applied to the whole grid (i.e., not egocentric): one matrix for the object ids, one for the color ids and one for the status of the objects.

WaterWorld Tasks

Task Identifiers

The identifiers for the tasks used in the paper are:

Task	Id
RG	`WaterWorldRG-v0`
BC	`WaterWorldBC-v0`
MY	`WaterWorldMY-v0`
RG&BC	`WaterWorldRGAndBC-v0`
BC&MY	`WaterWorldBCAndMY-v0`
RG&MY	`WaterWorldRGAndMY-v0`
RGB	`WaterWorldRGB-v0`
CMY	`WaterWorldCMY-v0`
RGB&CMY	`WaterWorldRGBAndCMY-v0`

Scenario Types

There are two scenarios: without dead-ends (WOD) and with dead-ends (WD). Dead-ends are graphically represented by black balls that must be avoided by the agent. By default, the environment has no dead-ends (i.e., it is WOD). To enable the presence of dead-ends, you must enable the avoid_black flag inside the params dict, i.e. env = gym.make(ENV_ID, params={"avoid_black": True, ...}).

Citation

If you use this code in your work, please use the following citation:

@inproceedings{FurelosBlancoLJBR23,
  author       = {Daniel Furelos-Blanco and
                  Mark Law and
                  Anders Jonsson and
                  Krysia Broda and
                  Alessandra Russo},
  title        = {{Hierarchies of Reward Machines}},
  booktitle    = {Proceedings of the 40th International Conference on Machine Learning (ICML)},
  year         = {2023}
}

Remember to cite the original papers where the domains were proposed (see Overview for details).

References

Toro Icarte, R.; Klassen, T. Q.; Valenzano, R. A.; and McIlraith, S. A. 2018. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning (ICML). Code.
Furelos-Blanco, D.; Law, M.; Jonsson, A.; Broda, K.; and Russo, A. 2021. Induction and Exploitation of Subgoal Automata for Reinforcement Learning. Journal of Artificial Intelligence Research 70.
Furelos-Blanco, D.; Law, M.; Jonsson, A.; Broda, K.; and Russo, A. 2023. Hierarchies of Reward Machines. Proceedings of the 40th International Conference on Machine Learning (ICML).

ertsiger / hrm-formalism-envs Goto Github PK

hrm-formalism-envs's Introduction

Hierarchies of Reward Machines - Formalism & Environments

Overview

Installation

Usage

CraftWorld Tasks

Task Identifiers

Grid Types

Open Plan (OP)

Open Plan + Lava (OPL)

Four Rooms (FR)

Four Rooms + Lava (FRL)

Observation Format

WaterWorld Tasks

Task Identifiers

Scenario Types

Citation

References

hrm-formalism-envs's People

Contributors

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org