Git Product home page Git Product logo

ai-reinforcement-learning-07-model-free-prediction's Introduction

AI-Reinforcement-Learning-07-Model-Free-Prediction

Lecture 4: Model-Free Prediction by David Silver

Website: http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html

Videos: https://www.bilibili.com/video/av32149008/?p=4

  • Monte-Carlo Reinforcement Learning

    • MC methods learn directly from episodes of experience
    • MC is model-free:
      • no knowledge of MDP transitions / rewards
    • MC learns from complete episodes:
      • no bootstrapping
    • MC uses the simplest possible idea: value = mean return
    • Caveat: can only apply MC to episodic MDPs
      • All episodes must terminate
    • Catalogue
      • First-Visit Monte-Carlo Policy Evaluatio
      • Every-Visit Monte-Carlo Policy Evaluatio
    • Incremental Monte-Carlo Updates
      • Update V(s) incrementally after episode
      • For each state St with return Gt
        • N(St) = N(St) + 1
        • V(St) = V(St) + 1 / N(St) * (Gt - V(St)
      • In non-stationary problems, it can be useful to track a running mean, i.e. forget old episodes
        • (St) = V(St) + α (Gt − V(St))
  • Temporal-Difference Learning

    • TD methods learn directly from episodes of experience
    • TD is model-free: no knowledge of MDP transitions / rewards
    • TD learns from incomplete episodes, by bootstrapping
    • TD updates a guess towards a guess
    • TD(0)
      • Update value V(St) toward estimated return
        • V(St) = α ( Rt+1 + γ * V(St+1) - V(St))
    • TD(λ)
      • The λ-return Gλt combines all n-step returns G(n)
      • Using weight (1 − λ)λ^(n−1)
    • Forward-view TD(λ)
      • Update value function towards the λ-return
      • Forward-view looks into the future to compute Gλt
      • Like MC, can only be computed from complete episode
    • Backward View TD(λ)
      • Update online, every step, from incomplete sequences
      • Keep an eligibility trace for every state s
      • Update value V(s) for every state s
      • In proportion to TD-error δt and eligibility trace

  • Advantages and Disadvantages of MC vs. TD

    • TD can learn before knowing the final outcome
      • TD can learn online after every step
      • MC must wait until end of episode before return is known
    • TD can learn without the final outcome
      • TD can learn from incomplete sequences
      • MC can only learn from complete sequences
      • TD works in continuing (non-terminating) environments
      • MC only works for episodic (terminating) environments
    • MC has high variance, zero bias
      • Good convergence properties
      • (even with function approximation)
      • Not very sensitive to initial value
      • Very simple to understand and use
    • TD has low variance, some bias
      • Usually more efficient than MC
      • TD(0) converges to vπ(s)
      • (but not always with function approximation)
      • More sensitive to initial value
  • Bias/Variance Trade-Off

    • Return Gt is unbiased estimate of vπ(St)
    • True TD target Rt+1 + γ * V(St+1) is unbiased estimate of vπ(St)
    • TD target is biased estimate of vπ(St)
    • TD target is much lower variance than the return:
      • Return depends on many random actions, transitions, rewards
      • TD target depends on one random action, transition, reward
  • Bootstrapping: update involves an estimat

    • MC does not bootstrap
    • DP bootstraps
    • TD bootstraps
  • Sampling: update samples an expectation

    • MC samples
    • DP does not sample
    • TD samples

ai-reinforcement-learning-07-model-free-prediction's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.