Git Product home page Git Product logo

a2c's Introduction

  1. Take state input
  2. Compute probs for actions state and take action acc to to those probs
  3. Store the probs
  4. Do the action choosen.
  5. Store the reward after each action.
  6. Repeat 1-5 until the episode ends
  7. Calculate discounted rewards for each step in the trajectory
  8. Compute grads

The Objective function is:

img_3.png

The grads can be derived from:

img.png where $$G_t$$ is the dsicounted rewards as a consequence of that actions

This was implementation of vanilla Policy Gradient now lets see A2C. The full form is Advantage Actor Critic In Vanilla implementation many times we take good actions and sometimes bad actions and those 2 cancel out each other and the agent doesnt learn whats actually bad and good.

So in A2C we introduce a Critic which tells how good was the action done in this particular step. It basically is the difference between how much we could have get in this state and how much we actually got. It creates a difference between individula steps instead of whole trajectory. This is the advantage part.

The changed Objective function is :

img_2.png

The grads will be:

img_4.png

Implementation:

  • There will be 2 NNs one predicting the Q values of each state and then one predicting the State values of the current state.
  • The function we want to maximize the Expected difference between the predicted max reward from this state , and the actual reward we get.

Change the name of envoirment to try out different envoirments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.