This repo contains the code for reproducing the results published in Continual Auxiliary Task Learning. This paper was published in NeurIPS2021.
- Matthew McLeod
- Chunlok Lo
- Andrew Jacobsen
- Matthew Schlegel
- Raksha Kumaraswamy
- Adam White
- Martha White
Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system.
If one was able to run the code as is, the parallel/toml_parallel.jl
file is the entry point. This parallelizes our sweeps and works with all the listed config files.
Julia can be found and downloaded here. You can also find details on the language in the documentation. We only guarantee this code works for versions up-to v1.5.x.
- Install julia version 1.6.x from https://julialang.org/downloads/
- Add to path
cd
to thecuriosity
directory- Change branch to NeurIPS2021
- Instantiate the project:
%> julia --project
julia> ]instantiate
- To run an experiment from its config:
julia --project parallel/toml_parallel.jl <<config_file.toml>>
This will run the experiment and place the results in the folder defined in the toml. To run on a larger cluster is possible, but requires more details. You should see Reproduce.jl for an example/details.
- Figure 2:
- Figure 3:
- Figure 4:
- Figure 5:
- Figure 6:
- Figure 7:
- Figure 8:
Analyzing data with Reproduce.jl and ReproducePlotUtils.jl.
You can see details on how to analyze data with Reproduce.jl and ReproducePlotUtils.jl in the plotting directory. The overall procedure is to:
- Construct an
ItemCollection
from the experiment directory. This is all the settings run in the experiment - search for the subset of the settings you want using
search
- then plotting this subset
There are various utilities to find the best parameter setting given a set of sweep arguments.