Git Product home page Git Product logo

alphazero-gym's People

Contributors

dependabot[bot] avatar timoklein avatar tmoer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alphazero-gym's Issues

Does game Pendulum really need take up to 1 hour to converge?

I notice the statement in README: " If your laptop is decent it shouldn't take more than an hour.", here I have no idea how long pendulum would take to converge in common. (Since from papers I read, rare of them plot a chart of pendulum, so I have no idea.)
And even more, what does "convergence" mean in the context of pendulum?
Do you mean that with your algorithm, the pendulum would converge to near 0 (the best score) with 1-hour of training?

Personally, with raw SAC, I could make pendulum converge to a range around [-500,-200) within 1 minute on laptop of a Geforce 940M GPU. And it is hard to make the score better, even with more training time.

Should entropy bonus be also calculated during planning?

Recently, I finished reading this repo code. And I found that the entropy bonus of a state value from SAC is only added at the last output step.

This routine let me can't help but thinking:
If the target is to find an action with the best env reward+max entropy, why not calculate it during planning?

Config Error

I was trying to run 'run_continuous.py' to test the code but I got some errors with the config files.

MCTSContinous.yaml misses a field "model"
RMSProp.yaml misses a field "params"

Both are required arguments and I can not figure out how to solve the problem.

Why does the policy loss not decrease but increase?

I executed the run_continuous.py file for the continuous agent and found that the policy loss increased approximately linearly with training episodes until it stabilized. Why is the policy loss not reduced?
And I tried to tune some hyperparameters, such as n_rollouts and hidden_dimensions, but it did not work to reduce the policy loss. The episode reward also didn't improve further over the course of the training. So is that a normal phenomenon for this repo?

Use custom environment to run this code

Hi,
I want to use A0 single player method to run on my custom environment.

My state is a picture which has [2, 50, 50] shape. The action is also discrete. I changed the DiscretePolicy self.trunk from the fully connected layer to a CNN. But I'm really confused why it can not work. The policy loss is increased and seems like the agent can not learn anything.

I'm appreciate if you have any suggestions.

Sincerely

Screenshot from 2023-08-07 17-08-27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.