Git Product home page Git Product logo

by571 / upside-down-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
76.0 8.0 10.0 2.71 MB

Upside-Down Reinforcement Learning (⅂ꓤ) implementation in PyTorch. Based on the paper published by Jürgen Schmidhuber.

License: MIT License

Jupyter Notebook 100.00%
reinforcement-learning upside-down-reinforcement-learning reinforcement-learning-algorithms continuous-action-space discrete-action-space cartpole-environment upside-down pytorch machine-learning machine-learning-algorithms

upside-down-reinforcement-learning's Introduction

Upside-Down-Reinforcement-Learning DOI

Upside-Down Reinforcement Learning (⅂ꓤ) implementation in Pytorch.
Based on the paper published by Jürgen Schmidhuber: ⅂ꓤ-Paper

This repository contains a discrete action space as well as a continuous action space implementation for the OpenAI gym CartPole environment (continuous version of the environment).

The notebooks include the training of a behavior function as well as an evaluation part, where you can test the trained behavior function. Feed it with an desired reward that the agent shall achieve in a desired time horizon.

Plots for the discrete CartPole Environment:

plot

Plots for the continuous CartPole Environment:

plot

Plots for the LunarLander Environment:

plot

TODO:

  • test some possible improvements mentioned in the paper (6. Future Research Directions).

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research. For citation check DOI or cite as:

@misc{Upside-Down,
  author = {Dittert, Sebastian},
  title = {PyTorch Implementation of Upside-Down RL},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/Upside-Down-Reinforcement-Learning}},
}

upside-down-reinforcement-learning's People

Contributors

by571 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

upside-down-reinforcement-learning's Issues

Save model before plots

Please move:

torch.savebf.state.dict(), name)

to between the lines:

rewards, average, d, h, loss = run_upside_down(max_episodes=200)
plt.figure(figsize=(15,8))

I just lost the results of a 4 day experimental run due to this error:

Episode: 7000 | Rewards: 32.87 | Mean_100_Rewards: 0.38 | Loss: 0.6333
qt.qpa.screen: QXcbConnection: Could not connect to display :50.0
Could not connect to any X display

This happened because of a bug in the x2go server exposed when the internet connection is interrupted and a plot function is called.

PS: I was able to construct a graph (attached) of mean reward as a function of episode number by copying the STDOUT log and parsing out the mean reward. As is obvious I increased the number of episodes to 7000 for this experiment. At about 200 episodes the reward peaked and then gradually declined to 0. Any idea why this would happen? The "game" I had it play was very simple: Track a curve that is the sum of 3 sine waves of varying frequency and amplitude with a 256 time-step history available to help classify the action.

Capture

Can't get results for LunarLander

Hi thanks for sharing your code and implementation.

However, when running your notebook with the LunarLander-v2, even with 1,000 epochs it doesn't seem to be learning anything:

image

Can you share the hyperparameters necessary to reproduce the LunarLander?

Thank you.

No grad

In the inner for loop of algorithm 1 where you do sampling exploration, you don't want to accumulate gradients right, and can add 'with torch.no_grad()' ?

Parameter

You call:
sampling_exploration(buffer)
but the function's parameter requires an int value.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.