Git Product home page Git Product logo

q-learning_and_double-q-learning's Introduction

Balancing-Pendulum-with-Q-Learning

Q-Learning

Q-learning is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. It’s considered off-policy because the q-learning function learns from actions that are outside the current policy, like taking random actions, and therefore a policy isn’t needed. More specifically, q-learning seeks to learn a policy that maximizes the total reward.

Important Terms in Q-Learning

  • States: The State, S, represents the current position of an agent in an environment.
  • Action: The Action, A, is the step taken by the agent when it is in a particular state.
  • Rewards: For every action, the agent will get a positive or negative reward.
  • Episodes: When an agent ends up in a terminating state and can’t take a new action.
  • Q-Values: Used to determine how good an Action, A, taken at a particular state, S, is. Q (A, S)

Bellman Equation

The Bellman Equation is used to determine the value of a particular state and deduce how good it is to be in/take that state. This equation is used to update the Q-Table. The optimal state will give us the highest optimal value.

Q-Learning Pseudo code

Reward Stats while Training Q-Learning

Problem with Q-Learning

The important part of the Q-Learning is maxQ(S', a') is at the same time the biggest problem of Q-Learning. In fact, this is the reason why this algorithm performs poorly in some stochastic environments. Because of max operator Q-Learning can overestimate Q-Values for certain actions.

Solution - Double Q-Learning

The proposed solution is to maintain two Q-value functions QA and QB, each one gets update from the other for the next state. The update consists of finding the action a' that maximises QA in the next state (Q(s’, a') = Max Q(s’, a)), then use a' to get the value of QB(s’, a') in order to update QA(s, a).

Double Q-Learning Pseudo code

Reward Stats while Training Double Q-Learning

Pendulum Balancing

Pendulum_gif

q-learning_and_double-q-learning's People

Contributors

bhanuprakashpebbeti avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.