Comments (9)
Actually, you cannot use the Boltzman policy for policy gradient methods, as the interface is lacking the gradient of the logarithm. I will put it in the ToDo list.
For (deep) actor-critic, it exists a Boltzmann policy not based on q functions: it's the "BoltzmannTorchPolicy".
In principle, you can think to use that. However, you need to handle discrete state space by yourself, which may not be easy (I never tried to use a deep actor-critic on a grid world, for obvious reasons... but I understand the curiosity to try things)
from mushroom-rl.
Actually, you cannot use the Boltzman policy for policy gradient methods, as the interface is lacking the gradient of the logarithm.
Ok thanks for clarifying! When I tried last night, I discovered that the Boltzmann policy has no ._approximator
and wasn't working.
I need this categorical policy for discrete actions and discrete state spaces for my research and I'm happy to implement it myself. How would you recommend doing so?
To be clear, I don't think anything should need to be deep in gridworld. Tabular PG and Tabular AC methods should (at least in principle) be applicable to gridworld, right?
from mushroom-rl.
As a work-around, is the following a possible solution to obtain a PG agent in a discrete state space and discrete action space with the following approach?
Use a BoltzmannTorchPolicy
with a torch approximator that is an S x A
matrix. Then in each state s, the policy will slice the correct row from the matrix, softmax the row and sample from a Categorical distribution?
from mushroom-rl.
To explain why, for my research, I want to test policy gradient and actor-critic methods against value-based approaches in tabular domains with discrete action spaces. Is there a way to do this using mushroom-rl?
I'm happy to implement whatever I need to myself, if you give me an outline of what needs to change where (and what pitfalls to watch out for)!
from mushroom-rl.
I just tried this myself, and hit the following error inside REINFORCE
:
self.sum_d_log_pi = np.zeros(self.policy.weights_size)
AttributeError: 'BoltzmannTorchPolicy' object has no attribute 'weights_size'
Specifically stemming from the method:
def _init_update(self):
self.sum_d_log_pi = np.zeros(self.policy.weights_size)
from mushroom-rl.
The simplest approach is to implement the ParametricPolicy interface, with an appropriate policy. This will allow standard policy gradient to work, at least as far as I know. If that's not true, you may want to change the policy gradient approaches to support your setting or implement another approximator to support integer inputs.
I want to remark that you can define the policy however you want, there's no need to use any of the mushroom tools (but they can be helpful for more complex scenarios).
For deep actor-critic, you can use the torch Boltzmann policy, and define an appropriate network that makes sense for an integer input. In general, it doesn't seem to be a very good idea to do so, however, I'll not comment on this point further as it's out of the scope of mushroom and it's a very particular setting. Probably, you cannot expect that a deep actor-critic approach will have amazing results on grid worlds...
from mushroom-rl.
Probably, you cannot expect that a deep actor-critic approach will have amazing results on grid worlds...
I think you're misunderstanding what I want to do.
The goal is simple: REINFORCE in Gridworld using a Categorical policy. No deep learning required. This is maybe the simplest application of REINFORCE and I'm finding it surprisingly difficult to implement.
from mushroom-rl.
The solution for this is described in the post above: implement a Boltzmann policy using the ParametricPolicy interface.
We don't support policy search approaches to finite state space, in general. There are many reasons for this choice. You can try to adapt the existing code following the above solution, but I cannot ensure it will work.
My comment on deep actor-critic is that these approaches, even without deep networks, are unlikely to work. Also, they will be pretty complex to implement in this setting, requiring many complicated assumptions.
Classical actor-critic, if you make standard policy search to work, instead, can be ported similarly.
from mushroom-rl.
Ok thank you.
from mushroom-rl.
Related Issues (20)
- Can't install package HOT 4
- suspected memory leak HOT 8
- How to train an agent in one environment and use it on another slightly different envoirnment HOT 3
- dynaq agent HOT 1
- how to reproduce DQN nature paper? HOT 7
- compress frames HOT 2
- n_steps dqn performs worse. bug? HOT 1
- support for new spaces HOT 2
- PPO for lunar lander [BUG] HOT 10
- Multi modal state support HOT 1
- Save and Load Agent for the Second Time HOT 2
- 'Taxi-v3' error: "ValueError: too many values to unpack (expected 4)" HOT 2
- TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool. HOT 2
- SAC postload optimizer for alpha HOT 2
- Unable to tun atari_dqn.py file in examples HOT 5
- Python 3.11 support HOT 2
- TypeError while running the file minigrid_dqn.py HOT 2
- Can I save pt files after training? HOT 1
- [requirements.txt] Missing requirement for OpenAI gym HOT 4
- [Categorical DQN/Rainbow] Inconsistent behavior of Categorical DQN for an even number of atoms
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mushroom-rl.