juliareinforcementlearning / reinforcementlearningenvironments.jl Goto Github PK
View Code? Open in Web Editor NEWA one-stop package for different reinforcement learning environments
License: Other
A one-stop package for different reinforcement learning environments
License: Other
JuliaReinforcementLearning/CommonRLInterface.jl#17 This just reminds me that we need to add a general grid world example similar to https://github.com/maximecb/gym-minigrid . (Though it's on my todo list for a really long time...)
So that, https://github.com/JuliaReinforcementLearning/ReinforcementLearningEnvironments.jl/issues/54 and Maze.jl can all rely on this.
I tried making an OpenAI gym's frozen lake like environment (code attached):
frozenlake.zip
The episodes are able to terminate when I run:
RLBase.test_runnable!(Env)
but not when I run:
run(RandomPolicy(action_space(Env)), Env, StopAfterEpisode(1))
What could be the reason for this?
Hi, I was trying to sample random observations for the CartPole Env. I am specifying the type to be FLoat32
during initialization, but the returned observations are in Float64
.
Here is an MWE:
julia> using ReinforcementLearningEnvironments, ReinforcementLearningBase
julia> env = CartPoleEnv(T=Float32)
CartPoleEnv{Float32}(gravity=9.8,masscart=1.0,masspole=0.1,totalmass=1.1,halflength=0.5,polemasslength=0.05,forcemag=10.0,tau=0.02,thetathreshold=0.20943952,xthreshold=2.4,max_steps=200)
julia> get_observation_space(env)
MultiContinuousSpace{Array{Float32,1}}(Float32[-4.8, -1.0f38, -0.41887903, -1.0f38], Float32[4.8, 1.0f38, 0.41887903, 1.0f38])
julia> rand(get_observation_space(env))
4-element Array{Float64,1}:
2.005482937920169
-9.852210595964816e37
0.09814665975717973
2.2353768849015113e37
A struct would be helpful to extract some specific fields from observation result. See example usage here: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/patches/ReinforcementLearningEnvironments.jl
┌ Warning: Package ReinforcementLearningEnvironments does not have PyCall in its dependencies:
│ - If you have ReinforcementLearningEnvironments checked out for development and have
│ added PyCall as a dependency but haven't updated your primary
│ environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with ReinforcementLearningEnvironments
└ Loading PyCall into ReinforcementLearningEnvironments from project dependency, future warnings for ReinforcementLearningEnvironments are suppressed.
Something's not quite right with how optional dependencies for OpenAI gym are set up. PyCall is declared in [extras]
so I'm not sure what the issue is. I'll make a PR if I find it.
DefaultStateStyleEnv wrapper seems to have no impact on state()
function.
Changes the return of DefaultStateStyle(E)
function properly, but later returns first(ss)
stateStyle
How to reproduce:
using ReinforcementLearning
env = TicTacToeEnv()
s_env = state(env)
println(s_env)
E = DefaultStateStyleEnv{Observation{Int}()}(env) # error probably somewhere here
println("DefaultStateStyle(E) is ", DefaultStateStyle(E), " so it has changed")
s = state(E)
println("But state(E) return string: ", s)
s_E_int = state(E, Observation{Int}())
s_env_int = state(env, Observation{Int}())
println("These work properly: ", s_E_int, s_env_int)
returns:
...
...
...
DefaultStateStyle(E) is Observation{Int64}() so it has changed
But state(E) return string: ...
...
...
These work properly: 1 1
It seems not that complicated to support the newly released OpenSpiel, which uses pybind for the python wrapper.
Hello,
im trying writing a simple env to detect up/down of an array using dataframe.
it is extremely easy on python-gym but
but i have a question:
i dont understand how to return the observation, i have seen the guide and the cartpole example but i cant figure it out
does "RLBase.get_state(env::AcrobotEnv) " equal to the observation returned in gym's step?
i have just noticed that in the readme there: GymEnv | PyCall.jl
but no example on how to use it. (Can I just use a gym wrapped environment here? Would it slow down performance?)
also i wonder why the action space is missing in this example: https://juliareinforcementlearning.org/blog/how_to_write_a_customized_environment/
EDIT: found in an example
RLBase.observe(env::MazeEnv) =
(
reward = Float64(env.position == env.goal),
terminal = env.position == env.goal,
state = (env.position[2] - 1) * env.NX + env.position[1],
)
but i still dont understand how the state works and why the .observe isnt in any env
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Making Observation
mutable can allow hooks to modify observations on the fly. But I'm not sure if it is a good design to do so.
This pr adds 1)Algorithmic 2)Atari 3) Toy-text envs . I think the Atari part is already ported here (https://github.com/JuliaReinforcementLearning/ArcadeLearningEnvironment.jl). Maybe I should start with Toy-text envs, they seem to be simpler than the algorithmic ones. Correct me if I am wrong @findmyway . Thank you :)
The rng in OpenSpielEnv without chance node is redundant. Better to set it to nothing.
ReinforcementLearningEnvironments
depends on
[621f4979] + AbstractFFTs v0.5.0
[79e6a3ab] + Adapt v1.0.1
[7d9fca2a] + Arpack v0.4.0
[68821587] + Arpack_jll v3.5.0+2
[b99e7846] + BinaryProvider v0.5.8
[fa961155] + CEnum v0.2.0
[3895d2a7] + CUDAapi v3.1.0
[c5f51814] + CUDAdrv v6.0.0
[be33ccc6] + CUDAnative v2.10.2
[e66e0078] + CompilerSupportLibraries_jll v0.2.0+1
[3a865a2d] + CuArrays v1.7.2
[9a962f9c] + DataAPI v1.1.0
[864edb3b] + DataStructures v0.17.10
[31c24e10] + Distributions v0.22.5
[1a297f60] + FillArrays v0.8.5
[0c68f7d7] + GPUArrays v2.0.1
[28b8d3ca] + GR v0.47.0
[929cbde3] + LLVM v1.3.3
[1914dd2f] + MacroTools v0.5.4
[e1d29d7a] + Missings v0.4.3
[872c559c] + NNlib v0.6.6
[4536629a] + OpenBLAS_jll v0.3.7+6
[efe28fd5] + OpenSpecFun_jll v0.5.3+2
[bac558e1] + OrderedCollections v1.1.0
[90014a1f] + PDMats v0.9.11
[1fd47b50] + QuadGK v2.3.1
[e575027e] + ReinforcementLearningBase v0.6.5
[25e41dd2] + ReinforcementLearningEnvironments v0.2.3
[ae029012] + Requires v1.0.1
[79098fc4] + Rmath v0.6.1
[f50d1b31] + Rmath_jll v0.2.2+0
[a2af1166] + SortingAlgorithms v0.3.1
[276daf66] + SpecialFunctions v0.10.0
[2913bbd2] + StatsBase v0.32.2
[4c63d2b9] + StatsFuns v0.9.4
[a759f4b9] + TimerOutputs v0.5.3
It's not outrageous, but if the dependency on the GPU stuff / NNLib could be taken out, it would make my package lighter...
I'm thinking about making a bridge from POMDPs.jl to this package. Are there any expectations/requirements on what the output of observe
is? Should it be an AbstractVector
for example?
Just created an interesting package for a tutorial:
https://github.com/JuliaReinforcementLearning/SnakeGames.jl
It can be used to test single player or zero-sum two player algorithms.
Inspired by https://agrishchenko.wixsite.com/snakesai/rules
Though all the environments are written based on ReinforcementLearningBase.jl, it's relatively easy to support the minimal interfaces provided in CommonRLInterface.
struct CommonEnvWrapper{T<:RLBase.AbstractEnv} <: CommonRL.AbstractCommonEnv
env::T
end
convert(CommonRL.AbstractCommonEnv, env::RLBase.AbstractEnv) = CommonEnvWrapper(env)
function CommonRL.step!(env::CommonEnvWrapper, action)
env.env(action)
obs = RLBase.observe(env.env)
RLBase.get_state(obs), RLBase.get_reward(obs), RLBase.get_terminal(obs), obs
end
function CommonRL.reset!(env::CommonEnvWrapper)
RLBase.reset!(env.env)
obs = RLBase.observe(env.env)
RLBase.get_state(obs), RLBase.get_reward(obs), RLBase.get_terminal(obs), obs
end
CommonRL.actions(env::CommonEnvWrapper) = RLBase.get_action_space(env)
What do you think? @zsunberg
Hi,
I am experiencing an error while trying to train a basic DQN on MountainCarEnv with a CircularArrayBuffer. It seems that some functions are not found by julia: update!(cb::CircularArrayBuffer{T,N}, data::AbstractArray), _buffer_frame(cb::CircularArrayBuffer, I::Vector{Int}) and function _buffer_frame(cb::CircularArrayBuffer, i::Int).
I managed to run my code by manually adding those functions to my julia file. Getting them from ReinforcementLearningCore.jl/src/utils/base.jl :
@inline function _buffer_frame(cb::CircularArrayBuffer, i::Int)
n = capacity(cb)
idx = cb.first + i - 1
if idx > n
idx - n
else
idx
end
end
_buffer_frame(cb::CircularArrayBuffer, I::Vector{Int}) = map(i -> _buffer_frame(cb, i), I)
function RL.update!(cb::CircularArrayBuffer{T,N}, data::AbstractArray) where {T,N}
select_last_dim(cb.buffer, _buffer_frame(cb, cb.length)) .= data
cb
end
The strange stuff is that I do not experience this issue when I run the absolute same code using the CartPoleEnv.
I am having trouble understanding what is happening and I hope this will help pointing out something to improve in the source code.
Some experiment listed in "List of built-in experiments" are not in the source code folder https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/tree/master/src/experiments, like for example E`JuliaRL_VPG_Pendulum`
, where can i find them?
See gym implementation and how we typically implement these classic examples here.
Should seed!
be part of this interface to allow for easily reproducing exact trajectories? Or is there another standard way of seeding the environment's rng?
I see the api has changed.
I am trying to use Float32 as type for the MountainCarEnv (basically stating that argument T = Float32) and I am having an issue.
It's seems that CartPoleEnv, which is very similar to MountainCarEnv, works perfectly with different types. Thus this might come from the differences in the definition of both environments:
In CartPole, the parameters are directly converted to the chosen type but I can't find a line doing the same in MountainCarEnv.
Am I the only one to get this ? Hope that the suggestion can help fix this issue !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.