timoklein / alphazero-gym Goto Github PK
View Code? Open in Web Editor NEWAlphaZero for continuous control tasks
License: MIT License
AlphaZero for continuous control tasks
License: MIT License
Is there a sample code?
I can not install this project in my osx 11.2.3 from latest vesion of conda. Maybe a requirements.txt is better.
Hi,
I want to use A0 single player method to run on my custom environment.
My state is a picture which has [2, 50, 50] shape. The action is also discrete. I changed the DiscretePolicy self.trunk from the fully connected layer to a CNN. But I'm really confused why it can not work. The policy loss is increased and seems like the agent can not learn anything.
I'm appreciate if you have any suggestions.
Sincerely
I was trying to run 'run_continuous.py' to test the code but I got some errors with the config files.
MCTSContinous.yaml misses a field "model"
RMSProp.yaml misses a field "params"
Both are required arguments and I can not figure out how to solve the problem.
I executed the run_continuous.py file for the continuous agent and found that the policy loss increased approximately linearly with training episodes until it stabilized. Why is the policy loss not reduced?
And I tried to tune some hyperparameters, such as n_rollouts and hidden_dimensions, but it did not work to reduce the policy loss. The episode reward also didn't improve further over the course of the training. So is that a normal phenomenon for this repo?
Recently, I finished reading this repo code. And I found that the entropy bonus of a state value from SAC is only added at the last output step.
This routine let me can't help but thinking:
If the target is to find an action with the best env reward+max entropy, why not calculate it during planning?
I notice the statement in README: " If your laptop is decent it shouldn't take more than an hour.", here I have no idea how long pendulum would take to converge in common. (Since from papers I read, rare of them plot a chart of pendulum, so I have no idea.)
And even more, what does "convergence" mean in the context of pendulum?
Do you mean that with your algorithm, the pendulum would converge to near 0 (the best score) with 1-hour of training?
Personally, with raw SAC, I could make pendulum converge to a range around [-500,-200) within 1 minute on laptop of a Geforce 940M GPU. And it is hard to make the score better, even with more training time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.