Git Product home page Git Product logo

cartpole_v0's Introduction

CartPole_v0

5 different techniques are used to solved the control problem of balancing a pole on a cart. Environment is taken from OpenAI gym CartPole_v0.

  • Random Search (Actor only)
  • Hill Climbing (Actor only)
  • Q-learning (Critic only)
  • Sarsa (Critic only)
  • Monte Carlo (Critic only)

Code

Running

python Main.py

Dependencies

  • gym
  • itertools
  • math
  • matplotlib
  • networkx
  • numpy

Detailed Description

Problem Statement and Environment

The goal is to move the cart to the left and right in a way that the pole on top of it does not fall down. The states of the environment are composed of 4 elements - cart position (x), cart speed (xdot), pole angle (theta) and pole angular velocity (thetadot). For each time step when the pole is still on the cart we get a reward of 1. The problem is considered to be solved if for 100 consecutive episodes the average reward is at least 195.

If we translate this problem into reinforcement learning terminology:

  • action space is 0 (left) and 1 (right)
  • state space is a set of all 4-element lists with all possible combinations of values of x, xdot, theta, thetadot

Solution #1 - Random Search

The policy/action will be simply determined by the following decision rule:

if w_0 * x + w_1 * xdot + w_2 * theta + w_3 * thetadot + b > 0: 
  return 1
else:
  return 0

Each episode we randomly generate the weights w = (w_0, w_1, w_2, w_3, b) and use the above decision rule to move the cart. If we stumble upon a winner (w that leads to 200 steps) then we save it for later. From certain episode, we switch to a "greedy mode" and only pick randomly the previous winning policies. If a policy wins again, we keep it in the winners list. If it does not, we remove it.

Solution #2 - Hill Climbing

We use the same decision rule as in Random Search but instead of generating a new w each episode, we simply wait until a good enough policy is found ( > 150 timesteps) and then apply the following logic until a winner is found.

while reward(w_old) < 200
  w_new = w_old + random_noise
  if reward(w_new) > reward(w_old):
    w_old = w_new
  else:
    pass

For the following 3 methods it is necessary to discretize the sample space. To do this we select for each element a list of threshold points and based on these we discretize the continuous observations. A sufficient discretization for all three algorithms is for example the following:

    threshold_points = [[1, 0, 1],  # x
                        [-0.5, 0, 0.5],  # xdot
                        [-math.radians(5), 0, math.radians(5)],  # theta
                        [-math.radians(5), 0, math.radians(5)]]  # thetadot
                      

It results in a sample space of 256 states.

Solution #3 - Q learning

The Q value function update is the following and can be done online without the need to wait until the end of episode.

Q(S_t,A_t) += alpha * (R_t + discount_factor * max Q(S_{t+1}, a) - Q(S_t,A_t))

Solution #4 - Sarsa

The Sarsa update has the following form:

Q(S_t,A_t) += alpha * (R_t + discount_factor * Q(S_{t+1}, A_{t+1}) - Q(S_t,A_t))

Solution #5 - Monte Carlo

Monte Carlo approximates the Q function with the sample mean of observed returns

Q(S,A) = np.mean([observed returns starting with an state-action value (S,A)])

Examples and evaluations

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Links & Resources

cartpole_v0's People

Contributors

jankrepl avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.