Git Product home page Git Product logo

reinforcementlearning.jl's Introduction


ReinforcementLearning.jl, as the name says, is a package for reinforcement learning research in Julia.

Our design principles are:

  • Reusability and extensibility: Provide elaborately designed components and interfaces to help users implement new algorithms.
  • Easy experimentation: Make it easy for new users to run benchmark experiments, compare different algorithms, evaluate and diagnose agents.
  • Reproducibility: Facilitate reproducibility from traditional tabular methods to modern deep reinforcement learning algorithms.

๐Ÿน Get Started

julia> ] add ReinforcementLearning

julia> using ReinforcementLearning

julia> run(
           RandomPolicy(),
           CartPoleEnv(),
           StopAfterNSteps(1_000),
           TotalRewardPerEpisode()
       )

The above simple example demonstrates four core components in a general reinforcement learning experiment:

Check out the tutorial page to learn how these four components are assembled together to solve many interesting problems. We also write blog occasionally to explain the implementation details of some algorithms. Among them, the most recommended one is An Introduction to ReinforcementLearning.jl, which explains the design idea of this package.

๐Ÿ™‹ Why ReinforcementLearning.jl?

๐Ÿš€ Fast Speed

[TODO:]

๐Ÿงฐ Feature Rich

[TODO:]

-->

๐ŸŒฒ Project Structure

ReinforcementLearning.jl itself is just a wrapper around several other subpackages. The relationship between them is depicted below:

+-----------------------------------------------------------------------------------+
|                                                                                   |
|  ReinforcementLearning.jl                                                         |
|                                                                                   |
|      +------------------------------+                                             |
|      | ReinforcementLearningBase.jl |                                             |
|      +----|-------------------------+                                             |
|           |                                                                       |
|           |     +--------------------------------------+                          |
|           +---->+ ReinforcementLearningEnvironments.jl |                          |
|           |     +--------------------------------------+                          |
|           |                                                                       |
|           |     +------------------------------+                                  |
|           +---->+ ReinforcementLearningCore.jl |                                  |
|                 +----|-------------------------+                                  |
|                      |                                                            |
|                      |     +-----------------------------+                        |
|                      +---->+ ReinforcementLearningZoo.jl |                        |
|                            +----|------------------------+                        |
|                                 |                                                 |
|                                 |     +-------------------------------------+     |
|                                 +---->+ DistributedReinforcementLearning.jl |     |
|                                       +-------------------------------------+     |
|                                                                                   |
+------|----------------------------------------------------------------------------+
       |
       |     +-------------------------------------+
       +---->+ ReinforcementLearningExperiments.jl |
       |     +-------------------------------------+
       |
       |     +----------------------------------------+
       +---->+ ReinforcementLearningAnIntroduction.jl |
             +----------------------------------------+

โœ‹ Getting Help

Are you looking for help with ReinforcementLearning.jl? Here are ways to find help:

  1. Read the online documentation! Most likely the answer is already provided in an example or in the API documents. Search using the search bar in the upper left.
  1. Chat with us in Julia Slack in the #reinforcement-learnin channel.
  2. Post a question in the Julia discourse forum in the category "Machine Learning" and use "reinforcement-learning" as a tag.
  3. For issues with unexpected behavior or defects in ReinforcementLearning.jl, then please open an issue on the ReinforcementLearning GitHub page with a minimal working example and steps to reproduce.

๐Ÿ–– Supporting

ReinforcementLearning.jl is a MIT licensed open source project with its ongoing development made possible by many contributors in their spare time. However, modern reinforcement learning research requires huge computing resource, which is unaffordable for individual contributors. So if you or your organization could provide the computing resource in some degree and would like to cooperate in some way, please contact us!

This package is written in pure Julia. Please consider supporting the JuliaLang org if you find this package useful. โค

โœ๏ธ Citing

If you use ReinforcementLearning.jl in a scientific publication, we would appreciate references to the CITATION.bib.

โœจ Contributors

Thanks goes to these wonderful people (emoji key):


jbrea

๐Ÿ’ป ๐Ÿ“– ๐Ÿšง

Jun Tian

๐Ÿ’ป ๐Ÿ“– ๐Ÿšง ๐Ÿค”

Aman Bhatia

๐Ÿ“–

Alexander Terenin

๐Ÿ’ป

Sid-Bhatia-0

๐Ÿ’ป

norci

๐Ÿ’ป ๐Ÿšง

Sriram

๐Ÿ’ป

Pavan B Govindaraju

๐Ÿ’ป

Alex Lewandowski

๐Ÿ’ป

Raj Ghugare

๐Ÿ’ป

Roman Bange

๐Ÿ’ป

Felix Chalumeau

๐Ÿ’ป

Rishabh Varshney

๐Ÿ’ป

Zachary Sunberg

๐Ÿ’ป ๐Ÿ“– ๐Ÿšง ๐Ÿค”

Jonathan Laurent

๐Ÿค”

Andriy Drozdyuk

๐Ÿ“–

Ritchie Lee

๐Ÿ›

Xirui Zhao

๐Ÿ’ป

Nerd

๐Ÿ“–

Albin Heimerson

๐Ÿ’ป ๐Ÿ“– ๐Ÿšง

michelangelo21

๐Ÿ›

GuoYu Yang

๐Ÿ“– ๐Ÿ’ป ๐Ÿ›

Prasidh Srikumar

๐Ÿ’ป

Ilan Coulon

๐Ÿ’ป

Jinrae Kim

๐Ÿ“– ๐Ÿ›

luigiannelli

๐Ÿ›

Jacob Boerma

๐Ÿ’ป

Xavier Valcarce

๐Ÿ›

Ashwani Rathee

๐Ÿ’ป

Goran Nakerst

๐Ÿ’ป

ultradian

๐Ÿ“–

Ikko Ashimine

๐Ÿ“–

Krishna Bhogaonker

๐Ÿ›

Philipp A. Kienscherf

๐Ÿ›

Stefan Krastanov

๐Ÿ“–

LaarsOman

๐Ÿ“–

Bo Lu

๐Ÿ’ป

Peter Chen

๐Ÿ’ป ๐Ÿ“–

Shuhua Gao

๐Ÿ’ป ๐Ÿ’ฌ

johannes-fischer

๐Ÿ’ป

Tom Marty

๐Ÿ› ๐Ÿ’ป

Abhinav Bhatia

๐Ÿ› ๐Ÿ’ป

Harley Wiltzer

๐Ÿ’ป ๐Ÿ“– ๐Ÿ›

Dylan Asmar

๐Ÿ’ป

andreyzhitnikov

๐Ÿ›

Andrea PIERRร‰

๐Ÿ“–

Mo8it

๐Ÿ’ป

Benoรฎt Legat

๐Ÿ“–

Henri Dehaybe

๐Ÿ’ป ๐Ÿ“–

NPLawrence

๐Ÿ’ป

Bileam Scheuvens

๐Ÿ“–

Jarbus

๐Ÿ›

tyleringebrand

๐Ÿ›

baedan

๐Ÿ’ป

ll7

๐Ÿ“–

Matthew LeMay

๐Ÿ“–

Ludvig Killingberg

๐Ÿ’ป

This project follows the all-contributors specification. Contributions of any kind welcome!

reinforcementlearning.jl's People

Contributors

albheim avatar allcontributors[bot] avatar aterenin avatar baedan avatar bigfood2307 avatar burmecia avatar casbex avatar findmyway avatar github-actions[bot] avatar gpavanb1 avatar henrideh avatar ilancoulon avatar jbrea avatar jeremiahpslewis avatar joelreymont avatar juliatagbot avatar kir0ul avatar ludvigk avatar mo8it avatar mobius1d avatar mplemay avatar mytolo avatar norci avatar peterchen96 avatar pilgrimygy avatar rajghugare19 avatar rbange avatar shuhuagao avatar sid-bhatia-0 avatar sriram13m avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reinforcementlearning.jl's Issues

Visualize episodes

It would be nice to have functions to visualize episodes for a choosen policy and environment.

Failed to precompile ReinforcementLearning

Getting the following:

(@v1.4) pkg> add ReinforcementLearning
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Resolving package versions...
  Installed Zlib_jll โ”€โ”€โ”€โ”€ v1.2.11+14
  Installed OpenSSL_jll โ”€ v1.1.1+4
Downloading artifact: Zlib
######################################################################## 100.0%#   Updating `~/.julia/environments/v1.4/Project.toml`
 [no changes]
   Updating `~/.julia/environments/v1.4/Manifest.toml`
  [458c3c95] โ†‘ OpenSSL_jll v1.1.1+2 โ‡’ v1.1.1+4
  [83775a58] โ†‘ Zlib_jll v1.2.11+10 โ‡’ v1.2.11+14

julia> using ReinforcementLearning
[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
โ”Œ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
โ”‚ - If you have ReinforcementLearning checked out for development and have
โ”‚   added Dates as a dependency but haven't updated your primary
โ”‚   environment's manifest file, try `Pkg.resolve()`.
โ”‚ - Otherwise you may need to report an issue with ReinforcementLearning
โ”” Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning
ERROR: LoadError: LoadError: UndefVarError: TrackedArray not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
 [3] include(::Module, ::String) at ./Base.jl:377
 [4] include(::String) at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [5] top-level scope at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] top-level scope at none:2
 [8] eval at ./boot.jl:331 [inlined]
 [9] eval(::Expr) at ./client.jl:449
 [10] top-level scope at ./none:3
in expression starting at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
in expression starting at /Users/tcf/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /Users/tcf/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_avIbM.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

AbstractActionSelector not exported

In src/components/action_selectors/abstract_action_selector.jl only the AbstractDiscreteActionSelector are exported. However, it if useful to be able to use the AbstractActionSelector for creating my own action selectors, as is done for the other abstract structs. Would it be possible to include an export for AbstractActionSelector?

Support SEED RL (SCALABLE AND EFFICIENT DEEP-RL )

After reading the paper SEED RL: SCALABLE AND EFFICIENT DEEP-RL WITH
ACCELERATED CENTRAL INFERENCE
, I think it is doable in Julia based on my current understanding. It may be our first attempt to support distributed RL.

image

For actors, we already have MultiThreadEnv to run multiple environments in parallel. In the original paper, gRPC (streaming) is used to transfer observations/actions between actors and learners. I'm not sure it is implemented in gRPC.jl or not. I guess using remote_call may not be very efficient. ArrayChannels.jl may also be helpful here.

For the learner part:

image

It is said that there are three types of threads:

  1. Inference threads
  2. Data prefetching threads
  3. Training threads

The data prefetching threads are the easiest to understand. Simply fetch completed trajectories and update the prioritized buffer. Then generate training data (sending them to devices). It's just working as the extract_experience function in our package.

The training thread (the main Python thread) takes the prefetched trajectories, computes
gradients using the training TPU cores and applies the gradients on the models of all TPU cores
(inference and training) synchronously. The ratio of inference and training cores can be adjusted for
maximum throughput and utilization.

As it is said in the paper, inference and training steps are synchronous. So now we have to control the following parts:

  1. the speed of generating training data (it will consume too much GPU memory), maybe a channel with limited length?
  2. The ratio of inference and training. Just inference several batches after each training? I had tried to implement Ape-X before and found it difficult to control the speed.

Maybe we can start from the R2D2 on a single machine first? And then try SEED RL?

(also cc @jbrea ๐Ÿ˜ƒ )

Support julia 1.4

The current version is limited to 1.3.

I tried master on julia 1.4 and the example seems to work fine (I have not ran the test suite).

Would it be possible to relax the compatibility constraint on the julia version?
Thanks!

Classic environments in separate package?

Opening this up because of JuliaReinforcementLearning/CommonRLInterface.jl#18 (comment). This package has some classic environments implemented, as well as a lot of wrapped environments. In that sense, this package does function as a "glue" package (a one-stop shop as mentioned in the README).

As discussed in the linked comment, there are some features that could be added to the classic environments here. I was considering doing that as a PR, but then I thought it might make more sense to split that into its own package. I could do that in FluxML/Gym.jl (I think we can also take ownership of this repo if preferred), since it already supported the rendering logic which is the most involved part. Then this package could just be a "glue"/wrapper package.

What are folks thoughts on this approach?

Question about AbstractEnv API

In the documentation for AbstractEnv, you write the following remark:

So why don't we adopt the step! method like OpenAI Gym here? The reason is that the async manner will simplify a lot of things here.

Would you care to elaborate what you mean here?

Knet backend does not work on gpu

I tried the quick example in the documentation with Knet and gpu. It seems to_device does not work for Knet_gpu:

backend  = :Knet
device = :Knet_gpu

ERROR: MethodError: no method matching *(::Array{Float32,2}, ::KnetArray{Float32,2})```

Compatibility issue in ReinforcementLearning & Flux

if I install ReinforcementLearning first, then install Flux, and update,
then ReinforcementLearning will be downgraded to v0.3.0, and can not be compiled.

log:

(v1.4) pkg> status
Status `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [158674fc] ReinforcementLearning v0.5.0

(v1.4) pkg> add Flux
  Resolving package versions...
   Updating `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [587475ba] + Flux v0.10.4
   Updating `~/external-libraries/.julia/environments/v1.4/Manifest.toml`
 [no changes]

(v1.4) pkg> update
   Updating registry at `~/.julia/registries/General`
   Updating `~/external-libraries/.julia/environments/v1.4/Project.toml`
  [587475ba] โ†‘ Flux v0.10.4 โ‡’ v0.11.0
  [158674fc] โ†“ ReinforcementLearning v0.5.0 โ‡’ v0.3.0
  ...

julia> using ReinforcementLearning
[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
โ”Œ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
โ”‚ - If you have ReinforcementLearning checked out for development and have
โ”‚   added Dates as a dependency but haven't updated your primary
โ”‚   environment's manifest file, try `Pkg.resolve()`.
โ”‚ - Otherwise you may need to report an issue with ReinforcementLearning
โ”” Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning
ERROR: LoadError: LoadError: LoadError: UndefVarError: functorm not defined
Stacktrace:
 [1] @treelike(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at /home/aistudio/.julia/packages/Flux/IjMZL/src/functor.jl:58
 [2] include(::Module, ::String) at ./Base.jl:377
 [3] include(::String) at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [4] top-level scope at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
 [5] include(::Module, ::String) at ./Base.jl:377
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:449
 [9] top-level scope at ./none:3
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /home/aistudio/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /home/aistudio/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_6euJX.j\
i.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Roadmap of v0.9

Reorganize code structure

  1. Move RLBase, RLCore, RLEnvs, RLZoo, Docs into this package
  2. Update docs
  3. Update CUDA to v3
  4. Evaluate all existing experiments
  5. Make a report

Random Thoughts on v0.3.0

Here I'd like to share some random thoughts on this package in the following three aspects:

  1. Existing core components in current version(v0.3.0)
  2. What are missing to support distributed reinforcement learning algorithms?
  3. The ideal way to do reinforcement learning research.

Feel free to correct me if I misunderstand anything here.

What we have?

RLSetup

RLSetup is used to organize all the necessary information in the training process. It combines agent(learner and policy here), environment and some parameters (like stoppingcriterion, callbacks...) together. Then we can call learn! for training and run! for testing.

Comments:

  1. The concept of RLSetup is very common and useful in software development. (A very similar concept is TestSuite) And it makes the parameters of callback! function (which I'll describe soon) consistent. Because everything we need in a callback are all wrapped in a RLSetup! My only concern is that, different algorithms may need different kinds of parameters for (distributed) training and testing. It is a little vague to cover all these cases in a concept of RLSetup. It would be better to move the extra parameters (like stoppingcriterion, callbacks...) into the learn! and run! functions. And only keep the core components like learner, buffer, policy in the RLSetup.
  2. stoppingcriterion and callbacks seem to share some similarities. I tried to generalize these two here. I haven't test whether there's any performance decrease here. And by doing so it will let stoppingcriterion to have multiple criterions.

callbacks

callbacks are useful for debugging and statistics. Currently, to define a customized callback, we need to do something like this:

# 1. define a struct
mutable struct ReduceEpsilonPerEpisode
    ฯต0::Float64
    counter::Int64
end
# 2. extend the `callback!` function
function callback!(c::ReduceEpsilonPerEpisode, rlsetup, sraw, a, r, done)
    if done
        if c.counter == 1
            c.ฯต0 = rlsetup.policy.ฯต
        end
        c.counter += 1
        rlsetup.policy.ฯต = c.ฯต0 / c.counter
    end
end

Comments:

  1. I found that sometimes it is a little verbose to define a new struct. For example, to log the loss of each step I had to create an empty struct and print necessary info the the extended callback! function. I attempted to modify the callbacks a little to make it into a closure here. But sometimes closure is not that efficient (see discussions here JuliaLang/julia#15276). So there's a tradeoff here. (I also noticed that in the recent versions of Flux.jl, some closure functions of optimisers are changed to struct based methods)
  2. Also the callback! function can be further simplified with a more general definition callback!(c, rlsetup, sraw, a, r, done) = callback!(c,rlsetup) considering that we don't need sraw, a, r, done in most cases.

Learner and Policy

Two core functions around a learner is selectaction and update.

  • selectaction(learner, policy, state) is called in each step(inside step!) to generate an action. (maybe call it actor would be better?)
  • update(learner, buffer) is called inside `learn! to update a learner.

And we already have several well tested learners

Comments:

  1. For me, the concept of learner is not very clear in the package (I mean it is too generic here and maybe we can decompose it into several common components?).
  2. I find that policy is sometimes included in a learner. (An example is deepactorcritic.jl)
  3. We'd better to draw a clear line between learners, actors?

Buffer

Here buffer is used for experience replay. One of the most useful buffer is ArrayStateBuffer. It uses a circular buffer to store experiences.

Comments:

  1. I tried to make the buffers more general here. But I'm still not very satisfied with the implementations. Also see discussions here and here. I'll document this part in details in the next section.

Traces

To be honest, I haven't look into the applications of this part. But by reading the source code, I'm wondering if it could be integrated into the concept of buffer. @jbrea

Environment

Environment related code has been split into ReinforcementLearningBase. As @JobJob suggested, we'd better to create a new repo (like Plots.jl I guess?) to support different backends. And we can have many different wrappers to easily introduce new environments. Preprocessors can also be merged into wrappers. I'll make an example repo later and have more discussions there.

Conclusion

In my humble opinion, the components listed above are clear enough to solve many typical RL problems in single machine. For continuous action space problems, @jbrea will take a look later. The only work left is to reorganize the source code a little and clearly define some abstract structs to guide developers on how to implement new algorithms. Some highlights in this repo are:

  1. Model comparison. This part will be very important in the future and needs to be enhanced to support distributed algorithms.
  2. A lot of predefined callbacks are very useful.

What are missing?

To compete with many other packages in RL, there's still a long way to go. And one of the most important part is to support distributed RL algorithms.

Typically, there are two directions to scale up deep reinforcement learning.

  1. To parallelize the computation of gradients.
  2. To distribute the generation and selection of experiences.

For the first one, we need an efficient parameter server and a standalone resource manager to dispatch computations. (I'm not very experienced in this field, you guys may add more details here.) Some questions in mind are:

  1. How to comunicate between learners and actors? pub-sub or poll?
  2. How to do failure tolerance? Maybe we can borrow some ideas from Ray.

For the seconde one, I think we need to carefully design the api first. Although there are many implementations here in Dopamine and here in ray, none of them can be directly ported into Julia (and I believe we can have more efficient implementations). Some critical points are:

  1. Shared Memory or not?
    I have had a long discussion on it with @jbrea before. Obviously it's more efficient to treat the next start state as the end state of current transition. I found that it will make the code much more complicated(forgive my programming skills in Julia, maybe we can find a way to address it). Also in the paper of Distributed Prioritized Experience Replay, as the last sentence in Adding Data, F IMPLEMENTATION, of Appendix states, Note that, since we store both the start and the end state with each transition, we are storing some data twice: this costs more RAM, but simplifies the code. So I guess I'm not the only one...
  2. Generalized enough to (Distributed) Prioritized Buffer
    There are many practical issues to be addressed.
    1. How easily add more meta data for each transition (id, priority, rank order, last active time...)
    2. How to queue batches from each actor?
    3. The general way to update distributed buffer?
    4. Support async?

Multi-agent

Although multi-agent scenarios are not considered in most existing packages, we'd better to think about it in the early stage.

Model Based Algorithms

  • How/When to train/update a environment model?
  • The relationship between a environment model between learner and policy?

Compared with Ray

According to the paper about Ray, there are three system layers:

  1. Global Control Store
  2. Bottom-up Distributed Scheduler
  3. In-memory Object Store

For me, the first and second part is relatively easier to understand and re-implement. But the third part is especially difficult for me to figure out how to do it in Julia. If I understand it correctly, Arrow, Plasma is used for processors in one node to avoid serialization/deserialization. I've checked the package Arrow.jl, it seems there's only data transformation and I still don't know how to manage a big memory shared Object Store in Julia across processors like the one in ray.

For the rllib part, the different levels of abstractions are really worth learning from.

Agent
โ””โ”€โ”€ Optimizer
       โ””โ”€โ”€ Policy Evaluator
               โ””โ”€โ”€ Policy Graph
                    โ””โ”€โ”€ Env Creator
                          โ””โ”€โ”€ Base Env, Multi-Agent Env, Async Env...

So for me, I'm more skilled in implementing the Env Creator part and I can also offer help to design the API of the other parts. But for the system design level, I really feel that I have a lot to learn.

What's the ideal way to do RL research in Julia?

  1. Easy to implement/reproduce the result of popular algorithms.
    I emphasize implementation here because so many RL packages just provide a function with a lot of parameters and hide a lot of details there(Just like saying, "Hey look, I've implemented so many fancy algorithms here" but in fact it's pretty hard to figure out what it is doing inside.). One thing I really enjoy while learning and using Julia is that I can easily check the source code to figure out the mechanisms inside and then to make improvements.
  2. Flexible to reuse existing packages.
    Like rllib(in Ray), we don't want to limit the users to any specific DL framework. The core components are always replaceable.
  3. Easy to scale.

TODO List

  • Clearly define all the interfaces(Do consider the distributed cases) and have a discussion here.
  • Finalize the implementation details of Buffer.
  • A new reinforcement learning environments repo with different backends.

StopAfterEpisode with progress meter

Following the quick example, I experimented with two consecutive calls to run. This resulted in total rewards per episode larger than 200 for some episodes in the second call. This cannot be correct and I contribute this to the stop condition which is based on number of steps and therefore might fire at a moment when an episode is till ongoing thus confusing TotalRewardPerEpisode in the second call.

Therefore I switched to StopAfterEpisode as stopping criterium. In this setting I do not see rewards above 200 anymore at subsequent calls to run, but it seems that the progress meter is not working correctly in this case.

I am on v0.4.0 #master.

Improve code coverage

Although it is not that easy to test the correctness of different algorithms for each PR (unlike FB, we lack enough computation resources...), at least we can run several episodes and make sure that the pipeline is not broken.

Add reproducible examples for Atari environments

I noticed that people seemed to be more interested in getting started with playing Atari games (like the one here). So I'll spend some spare time on this part in the next several weeks before moving on to implement new algorithms.

Hopefully, #36, #35 and #28 will also be addressed.

  • Preprocessor for Atari Environments
  • extend CNN
  • save/plot
  • compare speed with dopamine

loadenvironment error

Hi! I was playing with the package, trying out the usage examples from the docs. I am getting UndefVarError for example 2 on line 3, which says loadenvironment("cartpole")

ERROR: UndefVarError: loadenvironment not defined
Stacktrace:
 [1] top-level scope at none:0

I also could not find any reference to loadenvironment function in the codebase, except for in docs. Is it replaced with new function? Any hints?

ERROR: KeyError: key "ArnoldiMethod" not found

I get

  Resolving package versions...
ERROR: KeyError: key "ArnoldiMethod" not found
Stacktrace:
 [1] getindex at .\dict.jl:467 [inlined]
 [2] #37 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:215 [inlined]
 [3] _all(::Pkg.Types.var"#37#40"{Dict{String,Bool}}, ::Dict{String,Base.UUID}, ::Colon) at .\reduce.jl:828
 [4] all at .\reduce.jl:823 [inlined]
 [5] destructure(::Dict{Base.UUID,Pkg.Types.PackageEntry}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:215
 [6] write_manifest at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:233 [inlined]
 [7] write_manifest at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\manifest.jl:231 [inlined]
 [8] write_env(::Pkg.Types.EnvCache; update_undo::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:1367
 [9] write_env at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:1366 [inlined]
 [10] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Array{Base.UUID,1}; preserve::Pkg.Types.PreserveLevel, platform::Pkg.BinaryPlatforms.Windows) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Operations.jl:1136
 [11] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; preserve::Pkg.Types.PreserveLevel, platform::Pkg.BinaryPlatforms.Windows, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:189
 [12] add(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:140
 [13] #add#21 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67 [inlined]
 [14] add(::Array{Pkg.Types.PackageSpec,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67
 [15] do_cmd!(::Pkg.REPLMode.Command, ::REPL.LineEditREPL) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:404
 [16] do_cmd(::REPL.LineEditREPL, ::String; do_rethrow::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:382
 [17] do_cmd at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:377 [inlined]
 [18] (::Pkg.REPLMode.var"#24#27"{REPL.LineEditREPL,REPL.LineEdit.Prompt})(::REPL.LineEdit.MIState, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\REPLMode\REPLMode.jl:546
 [19] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [20] invokelatest at .\essentials.jl:709 [inlined]
 [21] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\LineEdit.jl:2355
 [22] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\REPL\src\REPL.jl:1143
 [23] (::REPL.var"#38#42"{REPL.LineEditREPL,REPL.REPLBackendRef})() at .\task.jl:356

when trying to add this package.

Params empty - no tracking

Flux returns tracked Params:

julia> using Flux
julia> params(Dense(1, 1))
Params([Float32[-0.78894603] (tracked), Float32[0.0] (tracked)])

But when using ReinforcementLearning there is no longer tracking:

julia> using ReinforcementLearning
julia> params(Dense(1, 1))
Params([])
(v1.3) pkg> st
    Status `~/.julia/environments/v1.3/Project.toml`
  [c52e3926] Atom v0.11.1
  [6e4b80f9] BenchmarkTools v0.4.3
  [c5f51814] CUDAdrv v3.1.0
  [be33ccc6] CUDAnative v2.4.0
  [3a865a2d] CuArrays v1.2.1
  [587475ba] Flux v0.9.0
  [9ee9e592] IOLogging v0.2.0
  [b6b21f68] Ipopt v0.6.0
  [5fb14364] OhMyREPL v0.5.3
  [91a5bcdd] Plots v0.27.0
  [c36e90e8] PowerModels v0.13.0
  [158674fc] ReinforcementLearning v0.4.0 #master (https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl.git)
  [25e41dd2] ReinforcementLearningEnvironments v0.1.2
  [295af30f] Revise v2.2.2
  [2913bbd2] StatsBase v0.32.0
  [e88e6eb3] Zygote v0.3.4

depends on HDF5?

[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]                                                                                   
ERROR: LoadError: HDF5 is not properly installed. Please run Pkg.build("HDF5") and restart Julia.                                                                   

How to add this dependency in the project config?

How to define a new environment?

[The question was first asked on slack]

This part is still missing in the latest doc.

The interfaces of AbstractEnvironment are not well organized. Need improve!

To define a single-player environment, the following interfaces must be implemented:

Here an observation can be of any type, as long as the following interfaces are provided:

If the actions in the action space are not always legal at each step, then the following interfaces must defined:

To define multi-player environments, the following interfaces are also needed:

Add dueling DQN

The algorithm for dueling DQN is same as DQN except for when it is split up after the fully connected layers in the model(). So, I have made changes in the src/experiments/atari.jl and src/experiments/rl_envs.jl

warning and error

I run on julia 1.4.2 and with updated packages I get (one error I first thought was in Flux but using Flux completes without errors) :

[ Info: Precompiling ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318]
โ”Œ Warning: Package ReinforcementLearning does not have Dates in its dependencies:
โ”‚ - If you have ReinforcementLearning checked out for development and have
โ”‚ added Dates as a dependency but haven't updated your primary
โ”‚ environment's manifest file, try Pkg.resolve().
โ”‚ - Otherwise you may need to report an issue with ReinforcementLearning
โ”” Loading Dates into ReinforcementLearning from project dependency, future warnings for ReinforcementLearning are suppressed.
WARNING: could not import ReinforcementLearningBase.interact! into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.getstate into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.plotenv into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.actionspace into ReinforcementLearning
WARNING: could not import ReinforcementLearningBase.sample into ReinforcementLearning

ERROR: LoadError: LoadError: LoadError: UndefVarError: functorm not defined
Stacktrace:
[1] @Treelike(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at /Users/jonnorberg/.julia/packages/Flux/IjMZL/src/functor.jl:58
[2] include(::Module, ::String) at ./Base.jl:377
[3] include(::String) at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
[4] top-level scope at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31
[5] include(::Module, ::String) at ./Base.jl:377
[6] top-level scope at none:2
[7] eval at ./boot.jl:331 [inlined]
[8] eval(::Expr) at ./client.jl:449
[9] top-level scope at ./none:3
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/flux.jl:12
in expression starting at /Users/jonnorberg/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:31

ERROR: Failed to precompile ReinforcementLearning [158674fc-8238-5cab-b5ba-03dfc80d1318] to /Users/jonnorberg/.julia/compiled/v1.4/ReinforcementLearning/6l2TO_1iI0H.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.