e3b's Introduction

Exploration via Elliptical Episodic Bonuses

This repo contains code for the E3B algorithm described in the NeurIPS 2022 paper Exploration via Elliptical Episodic Bonuses by Mikael Henaff, Roberta Raileanu, Minqi Jiang and Tim Rocktäschel.

E3B is an exploration algorithm designed for contextual MDPs, where the environment changes every episode. Examples of contextual MDPs include procedurally-generated environments such as MiniGrid, MiniHack, NetHack, ProcGen, and embodied AI settings such as Habitat where the agent finds itself in a new indoor space each episode.

The algorithm is simple to implement and operates using an elliptical bonus computed at the episode level, in a feature space induced by an inverse dynamics model.

Running the code

Code to run E3B on MiniHack and Vizdoom uses IMPALA as the base RL algorithm and is contained in the minihack folder. Code to run E3B on Habitat uses DD-PPO and is in the habitat-lab folder. Please see the README files in each folder for further instructions.

Citation

If you use this code in your work, please cite the following:

@inproceedings{E3B,
  title     =     {Exploration via Elliptical Episodic Bonuses},
  author    =     {Mikael Henaff and Roberta Raileanu and Minqi Jiang and Tim Rocktäschel},
  booktitle =     {Advances in Neural Information Processing Systems (NeurIPS)},
  year      =     {2022}
}

Acknowledgements

This repo is built on the Torchbeast code. We also use parts of the RIDE and NovelD codebases for baselines.

License

The majority of this project is licensed under CC-BY-NC, however portions of the project are available under separate license terms: NovelD is licensed under the Apache 2.0 license.

e3b's People

Stargazers

Watchers

e3b's Issues

atari exploration problem

I have tried using several different hyperparameters, but I have been unable to get my model to converge on some difficult exploration games in Atari(Gravitar and H.E.R.O), even after 5 billion timesteps. Can you suggest some effective hyperparameters for these games or provide an explanation for why you think my model is not converging?

Here are the hyperparameters i've tested : intweight (3e-07,2.0), reward_norm (all,intr,ext). I tried both extremes, it seems like the inverse dynamic loss stays around the same value and doesn't decrease.

Recommend Projects

facebookresearch / e3b Goto Github PK

e3b's Introduction

Exploration via Elliptical Episodic Bonuses

Running the code

Citation

Acknowledgements

License

e3b's People

Stargazers

Watchers

Forkers

e3b's Issues

atari exploration problem

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent