Since the Cartpole environment in gym has a 4-tuple state, I used the following to def

I am using the learned policy to get a new dataset as follows: <div class="snippet

Question: Does LSPI work for cartpole constructed in gym rather than using our own environment? about mushroom-rl HOT 4 CLOSED

kishanpb commented on July 19, 2024

Question: Does LSPI work for cartpole constructed in gym rather than using our own environment?

from mushroom-rl.

Comments (4)

boris-il-forte commented on July 19, 2024

You can use Cartpole from gym, if you are using the Gym class of mushroom_rl, that simply interface any openai gym environment with mushroom_rl.

Gym cartpole and mushroom cartpole are different environments. Look at the documentation of mushroom_rl to find the related paper.
Maybe you are not using a sufficient amount of features (try to use generate from GaussianRBF, it will generate a uniform grid in the space).
Another problem may be the exploration: if your initial policy doesn't explore the state space sufficiently, lspi may fail. A common trick is to reuse the learned policy to extract a better dataset.

from mushroom-rl.

kishanpb commented on July 19, 2024

I am using the learned policy to get a new dataset as follows:

# Train
    core.learn(n_episodes=1500, n_episodes_per_fit=100)

This will essentially use the learned policy in every ~100th episode to generate new dataset, until 1500 episodes are executed.

I'll try this suggestion:

Maybe you are not using a sufficient amount of features (try to use generate from GaussianRBF, it will generate a uniform grid in the space).

from mushroom-rl.

kishanpb commented on July 19, 2024

This is hard! I tried various bases. Nothing seems to work! Any more help?

# basis 1
    basis = [PolynomialBasis()]

    s1 = np.array([-4, -3, 0, 3, 4])
    s2 = np.array([-1, 0, 1])
    s3 = np.array([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi]) * .25
    s4 = np.array([-1, 0, 1])
    s = np.array(np.meshgrid(s1,s2,s3,s4)).T.reshape(-1,4)
    for i in s:
        basis.append(GaussianRBF(i, np.array([1.])))

    # basis 2
    basis = [PolynomialBasis()]
    s = ([1,1,1,1], [0,0,0,0] , [1,0,1,0])
    s = np.array(s)
    for i in s:
        basis.append(GaussianRBF(i, np.array([2.])))

    # basis 3
    basis=GaussianRBF.generate(n_centers=[3,3,3,3], low=[-4,-3,-np.pi,-3],\
                                     high=[4,3,np.pi,3], dimensions=[1,1,1,1])
    basis.append(PolynomialBasis())

    # basis 4
    basis = PolynomialBasis.generate(max_degree=10, input_size=4)

from mushroom-rl.

boris-il-forte commented on July 19, 2024

basis 3 is wrong. Remove the dimension parameter so it will use all dimensions.
basis 4 uses unreasonable parameters: a polynomial of degree 10 is way to complex than needed. try with lower grade polynomial (degree 2 or 3).

If nothing works, try to change the algorithm: use sarsa lambda or dqn.

from mushroom-rl.

Recommend Projects

Question: Does LSPI work for cartpole constructed in gym rather than using our own environment? about mushroom-rl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent