A reinforcement learning package for Julia

Home Page: https://juliareinforcementlearning.org

License: Other

Julia 99.98% Dockerfile 0.02%

julia reinforcement-learning machine-learning deep-reinforcement-learning deep-q-network

reinforcementlearning.jl's Introduction

"Make It Work Make It Right Make It Fast"

ReinforcementLearning.jl, as the name says, is a package for reinforcement learning research in Julia.

Our design principles are:

Reusability and extensibility: Provide elaborately designed components and interfaces to help users implement new algorithms.
Easy experimentation: Make it easy for new users to run benchmark experiments, compare different algorithms, evaluate and diagnose agents.
Reproducibility: Facilitate reproducibility from traditional tabular methods to modern deep reinforcement learning algorithms.

🏹 Get Started

julia> ] add ReinforcementLearning

julia> using ReinforcementLearning

julia> run(
           RandomPolicy(),
           CartPoleEnv(),
           StopAfterNSteps(1_000),
           TotalRewardPerEpisode()
       )

The above simple example demonstrates four core components in a general reinforcement learning experiment:

Policy. The RandomPolicy is the simplest instance of AbstractPolicy. It generates a random action at each step.
Environment. The CartPoleEnv is a typical AbstractEnv to test reinforcement learning algorithms.
Stop Condition. The StopAfterNSteps(1_000) is to inform that our experiment should stop after 1_000 steps.
Hook. The TotalRewardPerEpisode structure is one of the most common AbstractHooks. It is used to collect the total reward of each episode in an experiment.

Check out the tutorial page to learn how these four components are assembled together to solve many interesting problems. We also write blog occasionally to explain the implementation details of some algorithms. Among them, the most recommended one is An Introduction to ReinforcementLearning.jl, which explains the design idea of this package.

🙋 Why ReinforcementLearning.jl?

🚀 Fast Speed

[TODO:]

🧰 Feature Rich

[TODO:]

-->

🌲 Project Structure

ReinforcementLearning.jl itself is just a wrapper around several other subpackages. The relationship between them is depicted below:

+-----------------------------------------------------------------------------------+
|                                                                                   |
|  ReinforcementLearning.jl                                                         |
|                                                                                   |
|      +------------------------------+                                             |
|      | ReinforcementLearningBase.jl |                                             |
|      +----|-------------------------+                                             |
|           |                                                                       |
|           |     +--------------------------------------+                          |
|           +---->+ ReinforcementLearningEnvironments.jl |                          |
|           |     +--------------------------------------+                          |
|           |                                                                       |
|           |     +------------------------------+                                  |
|           +---->+ ReinforcementLearningCore.jl |                                  |
|                 +----|-------------------------+                                  |
|                      |                                                            |
|                      |     +-----------------------------+                        |
|                      +---->+ ReinforcementLearningZoo.jl |                        |
|                            +----|------------------------+                        |
|                                 |                                                 |
|                                 |     +-------------------------------------+     |
|                                 +---->+ DistributedReinforcementLearning.jl |     |
|                                       +-------------------------------------+     |
|                                                                                   |
+------|----------------------------------------------------------------------------+
       |
       |     +-------------------------------------+
       +---->+ ReinforcementLearningExperiments.jl |
       |     +-------------------------------------+
       |
       |     +----------------------------------------+
       +---->+ ReinforcementLearningAnIntroduction.jl |
             +----------------------------------------+

✋ Getting Help

Are you looking for help with ReinforcementLearning.jl? Here are ways to find help:

Read the online documentation! Most likely the answer is already provided in an example or in the API documents. Search using the search bar in the upper left.

Chat with us in Julia Slack in the #reinforcement-learnin channel.
Post a question in the Julia discourse forum in the category "Machine Learning" and use "reinforcement-learning" as a tag.
For issues with unexpected behavior or defects in ReinforcementLearning.jl, then please open an issue on the ReinforcementLearning GitHub page with a minimal working example and steps to reproduce.

🖖 Supporting

ReinforcementLearning.jl is a MIT licensed open source project with its ongoing development made possible by many contributors in their spare time. However, modern reinforcement learning research requires huge computing resource, which is unaffordable for individual contributors. So if you or your organization could provide the computing resource in some degree and would like to cooperate in some way, please contact us!

This package is written in pure Julia. Please consider supporting the JuliaLang org if you find this package useful. ❤

✍️ Citing

If you use ReinforcementLearning.jl in a scientific publication, we would appreciate references to the CITATION.bib.

✨ Contributors

Thanks goes to these wonderful people (emoji key):

_jbrea 💻 📖 🚧	_{Jun Tian} 💻 📖 🚧 🤔	_{Aman Bhatia} 📖	_{Alexander Terenin} 💻	_Sid-Bhatia-0 💻	_norci 💻 🚧	_Sriram 💻
_{Pavan B Govindaraju} 💻	_{Alex Lewandowski} 💻	_{Raj Ghugare} 💻	_{Roman Bange} 💻	_{Felix Chalumeau} 💻	_{Rishabh Varshney} 💻	_{Zachary Sunberg} 💻 📖 🚧 🤔
_{Jonathan Laurent} 🤔	_{Andriy Drozdyuk} 📖	_{Ritchie Lee} 🐛	_{Xirui Zhao} 💻	_Nerd 📖	_{Albin Heimerson} 💻 📖 🚧	_{michelangelo21} 🐛
_{GuoYu Yang} 📖 💻 🐛	_{Prasidh Srikumar} 💻	_{Ilan Coulon} 💻	_{Jinrae Kim} 📖 🐛	_luigiannelli 🐛	_{Jacob Boerma} 💻	_{Xavier Valcarce} 🐛
_{Ashwani Rathee} 💻	_{Goran Nakerst} 💻	_ultradian 📖	_{Ikko Ashimine} 📖	_{Krishna Bhogaonker} 🐛	_{Philipp A. Kienscherf} 🐛	_{Stefan Krastanov} 📖
_LaarsOman 📖	_{Bo Lu} 💻	_{Peter Chen} 💻 📖	_{Shuhua Gao} 💻 💬	_{johannes-fischer} 💻	_{Tom Marty} 🐛 💻	_{Abhinav Bhatia} 🐛 💻
_{Harley Wiltzer} 💻 📖 🐛	_{Dylan Asmar} 💻	_{andreyzhitnikov} 🐛	_{Andrea PIERRÉ} 📖	_Mo8it 💻	_{Benoît Legat} 📖	_{Henri Dehaybe} 💻 📖
_NPLawrence 💻	_{Bileam Scheuvens} 📖	_Jarbus 🐛	_{tyleringebrand} 🐛	_baedan 💻	_ll7 📖	_{Matthew LeMay} 📖
_{Ludvig Killingberg} 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

reinforcementlearning.jl's People

Contributors

Stargazers

Watchers

Forkers

findmyway tejank10 juliadocsforks vassial kraftpunk97-zz vballoli stjordanis mpschr apprenticearnab amanbh zeta1999 solivehong knut0815 willtryagain zdm87 artaxerces mobius1d takasho777 jinraekim standardgalactic ashwani-rathee zhaojie1130 jie-jay pilgrimygy jamblejoe michelangelo21 jldc pierretsr vikas1668 yangzm11 peterchen96 ultradian eltociear yikuide sid-bhatia-0 pkienscherf nthman burmecia shuhuagao zetimente oysteinsolheim johannes-fischer playfloor corail-research harwiltz ethanlingo kir0ul bhatiaabhinav tzvetomir mo8it blegat arjunnarayanan nplawrence bileamscheuvens kyrgrizz theloggicalone ray-artfax gojo5t5 joshkimperial ssghost sailingnumbers baedan jonathan-laurent bigfood2307 ll7 thomashopkins32 ludvigk divassanwal kyroiko tyler-ingebrand jeremiahpslewis vietanhle0101 aryan-seth chje-sentient mytolo weixians alibayeh felix660 yosinlpet jalane76 zh4ngx filchristou rgobbel looseterrifyingspacemonkey graeme22 navaxel jingyuxi wayne-kelsick casbex gggoes nickkeepfer jesseylin jamiemair eltinon alexlewandowski qwjyh striketeamone gia67gianni ebschenk joelreymont

reinforcementlearning.jl's Issues

Visualize episodes

It would be nice to have functions to visualize episodes for a choosen policy and environment.

Improve interfaces for model exploration and hyperparameter optimization

This is a low priority issue.

I'll remove the file https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/v0.3.0/src/compare.jl for now, but I think it provides a really good direction for model exploration and comparison and even hyperparameter optimization.

A2C

In DDPG: Add support for vector actions

Currently only scalar actions are supported, how far up on the priority list is it to expand on this?

Add Game 2048

Waiting for @xiaodaigh to register it https://github.com/xiaodaigh/Game2048.jl 😉

Failed to precompile ReinforcementLearning

Getting the following:

(@v1.4) pkg> add ReinforcementLearning
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Resolving package versions...
  Installed Zlib_jll ──── v1.2.11+14
  Installed OpenSSL_jll ─ v1.1.1+4
Downloading artifact: Zlib
######################################################################## 100.0%#   Updating `~/.julia/environments/v1.4/Project.toml`
 [no changes]
   Updating `~/.julia/environments/v1.4/Manifest.toml`
  [458c3c95] ↑ OpenSSL_jll v1.1.1+2 ⇒ v1.1.1+4
  [83775a58] ↑ Zlib_jll v1.2.11+10 ⇒ v1.2.11+14

julia> using ReinforcementLearning
[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
┌ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
│ - If you have ReinforcementLearning checked out for development and have
│   added Dates as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with ReinforcementLearning
└ Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning
ERROR: LoadError: LoadError: UndefVarError: TrackedArray not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [5] top-level scope at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
in expression starting at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /Users/tcf/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_avIbM.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Implement TRPO/ACER

ACER
TRPO

See also John Schulman's python implementations

AbstractActionSelector not exported

In src/components/action_selectors/abstract_action_selector.jl only the AbstractDiscreteActionSelector are exported. However, it if useful to be able to use the AbstractActionSelector for creating my own action selectors, as is done for the other abstract structs. Would it be possible to include an export for AbstractActionSelector?

ERROR: UndefVarError: AtariPreprocessor not defined

I tried examples/atari_DQN.jl and got this error

Rename `AbstractAgent` to `AbstractPolicy`

Since that environments are transparent to policies now, an Agent is just a special case of AbstractPolicy.

Update doc
Deprecate DynaAgent

Support SEED RL (SCALABLE AND EFFICIENT DEEP-RL )

After reading the paper SEED RL: SCALABLE AND EFFICIENT DEEP-RL WITH
ACCELERATED CENTRAL INFERENCE, I think it is doable in Julia based on my current understanding. It may be our first attempt to support distributed RL.

For actors, we already have MultiThreadEnv to run multiple environments in parallel. In the original paper, gRPC (streaming) is used to transfer observations/actions between actors and learners. I'm not sure it is implemented in gRPC.jl or not. I guess using remote_call may not be very efficient. ArrayChannels.jl may also be helpful here.

For the learner part:

It is said that there are three types of threads:

Inference threads
Data prefetching threads
Training threads

The data prefetching threads are the easiest to understand. Simply fetch completed trajectories and update the prioritized buffer. Then generate training data (sending them to devices). It's just working as the extract_experience function in our package.

The training thread (the main Python thread) takes the prefetched trajectories, computes
gradients using the training TPU cores and applies the gradients on the models of all TPU cores
(inference and training) synchronously. The ratio of inference and training cores can be adjusted for
maximum throughput and utilization.

As it is said in the paper, inference and training steps are synchronous. So now we have to control the following parts:

the speed of generating training data (it will consume too much GPU memory), maybe a channel with limited length?
The ratio of inference and training. Just inference several batches after each training? I had tried to implement Ape-X before and found it difficult to control the speed.

Maybe we can start from the R2D2 on a single machine first? And then try SEED RL?

(also cc @jbrea 😃 )

A3C

Should be fairly easy to do with the new optimizer of Flux and @parallel and remote channels.

Prioritized DQN

Support alternative deep learning libraries

Is it easy to write the RL part such that each user can still choose the favorite DL framework?
Currently DQN is expects a Flux model. But it could be interesting to support also Tensorflow or Knet models.

bullet3 environment

https://github.com/bulletphysics/bullet3
See also here

Support julia 1.4

The current version is limited to 1.3.

I tried master on julia 1.4 and the example seems to work fine (I have not ran the test suite).

Would it be possible to relax the compatibility constraint on the julia version?
Thanks!

How should ReinforcementLearning.jl be cited ?

Hi,
I am wandering if there is a particular way this project should be cited in a scientific article!
Thanks

bullet3 environment

https://github.com/bulletphysics/bullet3
See also here

Error tagging new release

The URL of this package does not match that stored in METADATA.jl.
cc: @jbrea

ViZDoom is broken

JuliaReinforcementLearning/ViZDoom.jl#7

vizdoom environment

https://github.com/mwydmuch/ViZDoom

Implement Fully Parameterized Quantile Function for Distributional Reinforcement Learning.

ref: https://arxiv.org/abs/1911.02140

Based on the implementation of IQN, this is relatively easy to support.

Classic environments in separate package?

Opening this up because of JuliaReinforcementLearning/CommonRLInterface.jl#18 (comment). This package has some classic environments implemented, as well as a lot of wrapped environments. In that sense, this package does function as a "glue" package (a one-stop shop as mentioned in the README).

As discussed in the linked comment, there are some features that could be added to the classic environments here. I was considering doing that as a PR, but then I thought it might make more sense to split that into its own package. I could do that in FluxML/Gym.jl (I think we can also take ownership of this repo if preferred), since it already supported the rendering logic which is the most involved part. Then this package could just be a "glue"/wrapper package.

What are folks thoughts on this approach?

Question about AbstractEnv API

In the documentation for AbstractEnv, you write the following remark:

So why don't we adopt the step! method like OpenAI Gym here? The reason is that the async manner will simplify a lot of things here.

Would you care to elaborate what you mean here?

Add built-in support for TensorBoard

Regret Policy Gradients

https://arxiv.org/abs/1810.09026

Knet backend does not work on gpu

I tried the quick example in the documentation with Knet and gpu. It seems to_device does not work for Knet_gpu:

backend  = :Knet
device = :Knet_gpu

ERROR: MethodError: no method matching *(::Array{Float32,2}, ::KnetArray{Float32,2})```

Compatibility issue in ReinforcementLearning & Flux

if I install ReinforcementLearning first, then install Flux, and update,
then ReinforcementLearning will be downgraded to v0.3.0, and can not be compiled.

log:

(v1.4) pkg> status
Status `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [158674fc] ReinforcementLearning v0.5.0

(v1.4) pkg> add Flux
  Resolving package versions...
   Updating `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [587475ba] + Flux v0.10.4
   Updating `~/external-libraries/.julia/environments/v1.4/Manifest.toml`
 [no changes]

(v1.4) pkg> update
   Updating registry at `~/.julia/registries/General`
   Updating `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [587475ba] ↑ Flux v0.10.4 ⇒ v0.11.0
  [158674fc] ↓ ReinforcementLearning v0.5.0 ⇒ v0.3.0
  ...

julia> using ReinforcementLearning
[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
┌ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
│ - If you have ReinforcementLearning checked out for development and have
│   added Dates as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with ReinforcementLearning
└ Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning
ERROR: LoadError: LoadError: LoadError: UndefVarError: functorm not defined
Stacktrace:
 [1] @treelike(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at /home/aistudio/.julia/packages/Flux/IjMZL/src/functor.jl:58
 [2] include(::Module, ::String) at ./Base.jl:377
 [3] include(::String) at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [4] top-level scope at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
 [5] include(::Module, ::String) at ./Base.jl:377
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:449
 [9] top-level scope at ./none:3
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /home/aistudio/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_6euJX.j\
i.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Roadmap of v0.9

Reorganize code structure

Move RLBase, RLCore, RLEnvs, RLZoo, Docs into this package
Update docs
Update CUDA to v3
Evaluate all existing experiments
Make a report

Random Thoughts on v0.3.0

Here I'd like to share some random thoughts on this package in the following three aspects:

Existing core components in current version(v0.3.0)
What are missing to support distributed reinforcement learning algorithms?
The ideal way to do reinforcement learning research.

Feel free to correct me if I misunderstand anything here.

What we have?

RLSetup

RLSetup is used to organize all the necessary information in the training process. It combines agent(learner and policy here), environment and some parameters (like stoppingcriterion, callbacks...) together. Then we can call learn! for training and run! for testing.

Comments:

The concept of RLSetup is very common and useful in software development. (A very similar concept is TestSuite) And it makes the parameters of callback! function (which I'll describe soon) consistent. Because everything we need in a callback are all wrapped in a RLSetup! My only concern is that, different algorithms may need different kinds of parameters for (distributed) training and testing. It is a little vague to cover all these cases in a concept of RLSetup. It would be better to move the extra parameters (like stoppingcriterion, callbacks...) into the learn! and run! functions. And only keep the core components like learner, buffer, policy in the RLSetup.
stoppingcriterion and callbacks seem to share some similarities. I tried to generalize these two here. I haven't test whether there's any performance decrease here. And by doing so it will let stoppingcriterion to have multiple criterions.

callbacks

callbacks are useful for debugging and statistics. Currently, to define a customized callback, we need to do something like this:

# 1. define a struct
mutable struct ReduceEpsilonPerEpisode
    ϵ0::Float64
    counter::Int64
end
# 2. extend the `callback!` function
function callback!(c::ReduceEpsilonPerEpisode, rlsetup, sraw, a, r, done)
    if done
        if c.counter == 1
            c.ϵ0 = rlsetup.policy.ϵ
        end
        c.counter += 1
        rlsetup.policy.ϵ = c.ϵ0 / c.counter
    end
end

Comments:

I found that sometimes it is a little verbose to define a new struct. For example, to log the loss of each step I had to create an empty struct and print necessary info the the extended callback! function. I attempted to modify the callbacks a little to make it into a closure here. But sometimes closure is not that efficient (see discussions here JuliaLang/julia#15276). So there's a tradeoff here. (I also noticed that in the recent versions of Flux.jl, some closure functions of optimisers are changed to struct based methods)
Also the callback! function can be further simplified with a more general definition callback!(c, rlsetup, sraw, a, r, done) = callback!(c,rlsetup) considering that we don't need sraw, a, r, done in most cases.

Learner and Policy

Two core functions around a learner is selectaction and update.

selectaction(learner, policy, state) is called in each step(inside step!) to generate an action. (maybe call it actor would be better?)
update(learner, buffer) is called inside `learn! to update a learner.

And we already have several well tested learners

Comments:

For me, the concept of learner is not very clear in the package (I mean it is too generic here and maybe we can decompose it into several common components?).
I find that policy is sometimes included in a learner. (An example is deepactorcritic.jl)
We'd better to draw a clear line between learners, actors?

Buffer

Here buffer is used for experience replay. One of the most useful buffer is ArrayStateBuffer. It uses a circular buffer to store experiences.

Comments:

I tried to make the buffers more general here. But I'm still not very satisfied with the implementations. Also see discussions here and here. I'll document this part in details in the next section.

Traces

To be honest, I haven't look into the applications of this part. But by reading the source code, I'm wondering if it could be integrated into the concept of buffer. @jbrea

Environment

Environment related code has been split into ReinforcementLearningBase. As @JobJob suggested, we'd better to create a new repo (like Plots.jl I guess?) to support different backends. And we can have many different wrappers to easily introduce new environments. Preprocessors can also be merged into wrappers. I'll make an example repo later and have more discussions there.

Conclusion

In my humble opinion, the components listed above are clear enough to solve many typical RL problems in single machine. For continuous action space problems, @jbrea will take a look later. The only work left is to reorganize the source code a little and clearly define some abstract structs to guide developers on how to implement new algorithms. Some highlights in this repo are:

Model comparison. This part will be very important in the future and needs to be enhanced to support distributed algorithms.
A lot of predefined callbacks are very useful.

What are missing?

To compete with many other packages in RL, there's still a long way to go. And one of the most important part is to support distributed RL algorithms.

Typically, there are two directions to scale up deep reinforcement learning.

To parallelize the computation of gradients.
To distribute the generation and selection of experiences.

For the first one, we need an efficient parameter server and a standalone resource manager to dispatch computations. (I'm not very experienced in this field, you guys may add more details here.) Some questions in mind are:

How to comunicate between learners and actors? pub-sub or poll?
How to do failure tolerance? Maybe we can borrow some ideas from Ray.

For the seconde one, I think we need to carefully design the api first. Although there are many implementations here in Dopamine and here in ray, none of them can be directly ported into Julia (and I believe we can have more efficient implementations). Some critical points are:

Shared Memory or not?
I have had a long discussion on it with @jbrea before. Obviously it's more efficient to treat the next start state as the end state of current transition. I found that it will make the code much more complicated(forgive my programming skills in Julia, maybe we can find a way to address it). Also in the paper of Distributed Prioritized Experience Replay, as the last sentence in Adding Data, F IMPLEMENTATION, of Appendix states, Note that, since we store both the start and the end state with each transition, we are storing some data twice: this costs more RAM, but simplifies the code. So I guess I'm not the only one...
Generalized enough to (Distributed) Prioritized Buffer
There are many practical issues to be addressed.
1. How easily add more meta data for each transition (id, priority, rank order, last active time...)
2. How to queue batches from each actor?
3. The general way to update distributed buffer?
4. Support async?

Multi-agent

Although multi-agent scenarios are not considered in most existing packages, we'd better to think about it in the early stage.

Model Based Algorithms

How/When to train/update a environment model?
The relationship between a environment model between learner and policy?

Compared with Ray

According to the paper about Ray, there are three system layers:

Global Control Store
Bottom-up Distributed Scheduler
In-memory Object Store

For me, the first and second part is relatively easier to understand and re-implement. But the third part is especially difficult for me to figure out how to do it in Julia. If I understand it correctly, Arrow, Plasma is used for processors in one node to avoid serialization/deserialization. I've checked the package Arrow.jl, it seems there's only data transformation and I still don't know how to manage a big memory shared Object Store in Julia across processors like the one in ray.

For the rllib part, the different levels of abstractions are really worth learning from.

Agent
└── Optimizer
       └── Policy Evaluator
               └── Policy Graph
                    └── Env Creator
                          └── Base Env, Multi-Agent Env, Async Env...

So for me, I'm more skilled in implementing the Env Creator part and I can also offer help to design the API of the other parts. But for the system design level, I really feel that I have a lot to learn.

What's the ideal way to do RL research in Julia?

Easy to implement/reproduce the result of popular algorithms.
I emphasize implementation here because so many RL packages just provide a function with a lot of parameters and hide a lot of details there(Just like saying, "Hey look, I've implemented so many fancy algorithms here" but in fact it's pretty hard to figure out what it is doing inside.). One thing I really enjoy while learning and using Julia is that I can easily check the source code to figure out the mechanisms inside and then to make improvements.
Flexible to reuse existing packages.
Like rllib(in Ray), we don't want to limit the users to any specific DL framework. The core components are always replaceable.
Easy to scale.

TODO List

Clearly define all the interfaces(Do consider the distributed cases) and have a discussion here.
Finalize the implementation details of Buffer.
A new reinforcement learning environments repo with different backends.

Add checkpoints

bsuite

Do we want a wrapper of bsuite?

Add MAgent

https://github.com/geek-ai/MAgent

I just did a quick test. It seems easy to support. Ping me if anyone really needs this environment.

StopAfterEpisode with progress meter

Following the quick example, I experimented with two consecutive calls to run. This resulted in total rewards per episode larger than 200 for some episodes in the second call. This cannot be correct and I contribute this to the stop condition which is based on number of steps and therefore might fire at a moment when an episode is till ongoing thus confusing TotalRewardPerEpisode in the second call.

Therefore I switched to StopAfterEpisode as stopping criterium. In this setting I do not see rewards above 200 anymore at subsequent calls to run, but it seems that the progress meter is not working correctly in this case.

I am on v0.4.0 #master.

Add procgen

https://github.com/openai/procgen

Should be relatively easy to support with Clang.jl

Experimental support of Torch.jl

We used to have support for Knet.jl in addition to Flux.jl, but it was dropped since [email protected]. The main reason was that Knet.jl is not very easy to extend. However, most functionalities have remained. I just notice that https://discourse.julialang.org/t/ann-announcing-torch-jl/42390 is relatively mature now. So maybe we can add an example here and compare the performance?

wait FluxML/Torch.jl#27

wait FluxML/Torch.jl#35

Improve code coverage

Although it is not that easy to test the correctness of different algorithms for each PR (unlike FB, we lack enough computation resources...), at least we can run several episodes and make sure that the pipeline is not broken.

Add a stop condition to terminate the experiment after reaching reward threashold

Add reproducible examples for Atari environments

I noticed that people seemed to be more interested in getting started with playing Atari games (like the one here). So I'll spend some spare time on this part in the next several weeks before moving on to implement new algorithms.

Hopefully, #36, #35 and #28 will also be addressed.

Preprocessor for Atari Environments
extend CNN
save/plot
compare speed with dopamine

loadenvironment error

Hi! I was playing with the package, trying out the usage examples from the docs. I am getting UndefVarError for example 2 on line 3, which says loadenvironment("cartpole")

ERROR: UndefVarError: loadenvironment not defined
Stacktrace:
 [1] top-level scope at none:0

I also could not find any reference to loadenvironment function in the codebase, except for in docs. Is it replaced with new function? Any hints?

ERROR: KeyError: key "ArnoldiMethod" not found

I get

  Resolving package versions...
ERROR: KeyError: key "ArnoldiMethod" not found
Stacktrace:
 [1] getindex at .\dict.jl:467 [inlined]
 [2] #37 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:215 [inlined]
 [3] _all(::Pkg.Types.var"#37#40"{Dict{String,Bool}}, ::Dict{String,Base.UUID}, ::Colon) at .\reduce.jl:828
 [4] all at .\reduce.jl:823 [inlined]
 [5] destructure(::Dict{Base.UUID,Pkg.Types.PackageEntry}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:215
 [6] write_manifest at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:233 [inlined]
 [7] write_manifest at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:231 [inlined]
 [8] write_env(::Pkg.Types.EnvCache; update_undo::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:1367
 [9] write_env at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:1366 [inlined]
 [10] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Array{Base.UUID,1}; preserve::Pkg.Types.PreserveLevel, platform::Pkg.BinaryPlatforms.Windows) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Operations.jl:1136
 [11] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; preserve::Pkg.Types.PreserveLevel, platform::Pkg.BinaryPlatforms.Windows, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:189
 [12] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:140
 [13] #add#21 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67 [inlined]
 [14] add(::Array{Pkg.Types.PackageSpec,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67
 [15] do_cmd!(::Pkg.REPLMode.Command, ::REPL.LineEditREPL) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:404
 [16] do_cmd(::REPL.LineEditREPL, ::String; do_rethrow::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:382
 [17] do_cmd at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:377 [inlined]
 [18] (::Pkg.REPLMode.var"#24#27"{REPL.LineEditREPL,REPL.LineEdit.Prompt})(::REPL.LineEdit.MIState, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:546
 [19] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [20] invokelatest at .\essentials.jl:709 [inlined]
 [21] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\LineEdit.jl:2355
 [22] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:1143
 [23] (::REPL.var"#38#42"{REPL.LineEditREPL,REPL.REPLBackendRef})() at .\task.jl:356

when trying to add this package.

Params empty - no tracking

Flux returns tracked Params:

julia> using Flux
julia> params(Dense(1, 1))
Params([Float32[-0.78894603] (tracked), Float32[0.0] (tracked)])

But when using ReinforcementLearning there is no longer tracking:

julia> using ReinforcementLearning
julia> params(Dense(1, 1))
Params([])

(v1.3) pkg> st
    Status `~/.julia/environments/v1.3/Project.toml`
  [c52e3926] Atom v0.11.1
  [6e4b80f9] BenchmarkTools v0.4.3
  [c5f51814] CUDAdrv v3.1.0
  [be33ccc6] CUDAnative v2.4.0
  [3a865a2d] CuArrays v1.2.1
  [587475ba] Flux v0.9.0
  [9ee9e592] IOLogging v0.2.0
  [b6b21f68] Ipopt v0.6.0
  [5fb14364] OhMyREPL v0.5.3
  [91a5bcdd] Plots v0.27.0
  [c36e90e8] PowerModels v0.13.0
  [158674fc] ReinforcementLearning v0.4.0 #master (https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl.git)
  [25e41dd2] ReinforcementLearningEnvironments v0.1.2
  [295af30f] Revise v2.2.2
  [2913bbd2] StatsBase v0.32.0
  [e88e6eb3] Zygote v0.3.4

Document basic environments

According to the suggestions here: JuliaReinforcementLearning/ReinforcementLearningEnvironmentClassicControl.jl#3

We'd better add more docs for the existing classic control environments.

Box2D environment

Would be nice to wrap Box2D and include some environments like the ones of OpenAI Gym.

depends on HDF5?

[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]                                                                                   
ERROR: LoadError: HDF5 is not properly installed. Please run Pkg.build("HDF5") and restart Julia.

How to add this dependency in the project config?

How to define a new environment?

[The question was first asked on slack]

This part is still missing in the latest doc.

The interfaces of AbstractEnvironment are not well organized. Need improve!

To define a single-player environment, the following interfaces must be implemented:

Here an observation can be of any type, as long as the following interfaces are provided:

If the actions in the action space are not always legal at each step, then the following interfaces must defined:

To define multi-player environments, the following interfaces are also needed:

DynamicStyle
get_num_players
env(player, action)
observe(env, player)

Box2D environment

Would be nice to wrap Box2D and include some environments like the ones of OpenAI Gym.

Add MCTS related algorithms

I've spent some time working on reimplementing https://github.com/liuanji/WU-UCT . It seems to work well. I'll add some experiments after JuliaReinforcementLearning/ReinforcementLearningZoo.jl#14 gets merged.

Add dueling DQN

The algorithm for dueling DQN is same as DQN except for when it is split up after the fully connected layers in the model(). So, I have made changes in the src/experiments/atari.jl and src/experiments/rl_envs.jl

warning and error

I run on julia 1.4.2 and with updated packages I get (one error I first thought was in Flux but using Flux completes without errors) :

[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
┌ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
│ - If you have ReinforcementLearning checked out for development and have
│ added Dates as a dependency but haven't updated your primary
│ environment's manifest file, try Pkg.resolve().
│ - Otherwise you may need to report an issue with ReinforcementLearning
└ Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning

ERROR: LoadError: LoadError: LoadError: UndefVarError: functorm not defined
Stacktrace:
[1] @Treelike(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at /Users/jonnorberg/.julia/packages/Flux/IjMZL/src/functor.jl:58
[2] include(::Module, ::String) at ./Base.jl:377
[3] include(::String) at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
[4] top-level scope at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
[5] include(::Module, ::String) at ./Base.jl:377
[6] top-level scope at none:2
[7] eval at ./boot.jl:331 [inlined]
[8] eval(::Expr) at ./client.jl:449
[9] top-level scope at ./none:3
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31

ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /Users/jonnorberg/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_1iI0H.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922

juliareinforcementlearning / reinforcementlearning.jl Goto Github PK

reinforcementlearning.jl's Introduction

🏹 Get Started

🙋 Why ReinforcementLearning.jl?

🚀 Fast Speed

🧰 Feature Rich

🌲 Project Structure

✋ Getting Help

🖖 Supporting

✍️ Citing

✨ Contributors

reinforcementlearning.jl's People

Contributors

Stargazers

Watchers

Forkers

reinforcementlearning.jl's Issues

Reorganize code structure

What we have?

RLSetup

callbacks

Learner and Policy

Buffer

Traces

Environment

Conclusion

What are missing?

Multi-agent

Model Based Algorithms

Compared with Ray

What's the ideal way to do RL research in Julia?

TODO List

Recommend Projects

Recommend Topics

Recommend Org