Git Product home page Git Product logo

reinforcement-learning-tutorials's Introduction

Reinforcement Learning Algorithms Tutorial (Python)

This repository shows you theoretical fundamentals for typical reinforcement learning methods (model-free algorithms) with intuitive (but mathematical) explanations and several lines of Python code.

Table of Contents

  1. Q-Learning
  2. Policy Gradient method (on-policy)
  3. Actor Critic method
  4. PPO (Proximal Policy Optimization) (on-policy)
  5. DDPG (Deep Deterministic Policy Gradient) (off-policy)
  6. SAC (Soft Actor-Critic) (off-policy)

All these examples are written in Python from scratch without any RL (reinforcement learning) libraries - such as, RLlib, Stable Baselines, etc.
See here (Minecraft example) for building scripts with RLlib library.

Example Environemnt (CartPole-v1)

In all examples, I commonly use a widely used CartPole environment (env).

See below for the specification of this environment (CartPole-v1) - such as, actions, states (observations), and rewards.

Action Space - Type : Discrete(2)

  • 0 : Push cart to the left
  • 1 : Push cart to the right

Observation Space - Type : Box(-num, num, (4,), float32)

  • Cart Position (-4.8, 4.8)
  • Cart Velocity (-inf, inf)
  • Pole Angle (-0.41, 0.41)
  • Pole Velocity At Tip (-inf, inf)

Reward - Type : float32

It always returns 1.0 as reward.
If completely succeeded, you can then take max 500.0 rewards in a single episode, because a single episode will be truncated on max 500 actions.

Sample Code to run CartPole

source code (Python)

import gym
import random

def pick_sample():
  return random.randint(0, 1)

env = gym.make("CartPole-v1")
for i in range(1):
  print("start episode {}".format(i))
  done = False
  s, _ = env.reset()
  while not done:
    a = pick_sample()
    s, r, term, trunc, _ = env.step(a)
    done = term or trunc
    print("action: {},  reward: {}".format(a, r))
    print("state: {}, {}, {}, {}".format(s[0], s[1], s[2], s[3]))
env.close()

output result

start episode 0
action: 0,  reward: 1.0
state: 0.006784938861824417, -0.18766506871206354, 0.0287443864274386, 0.27414982492533896
action: 0,  reward: 1.0
state: 0.0030316374875831464, -0.383185104857609, 0.03422738292594538, 0.5757584135859465
action: 1,  reward: 1.0
state: -0.004632064609569034, -0.18855925062821827, 0.04574255119766431, 0.2940515065957076

Note : Call render() when you want to show the current state in visual UI as follows.
CartPole rendering

Tsuyoshi Matsuzaki @ Microsoft

reinforcement-learning-tutorials's People

Contributors

tsmatsuz avatar tsmatz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.