Git Product home page Git Product logo

homework2's Introduction

#組員:張嘉哲、林暘竣

Homework2 - Policy Gradient

Please complete each homework for each team, and
mention who contributed which parts in your report.

Introduction

In this assignment, we will solve the classic control problem - CartPole.

CartPole is an environment which contains a pendulum attached by an un-actuated joint to a cart, and the goal is to prevent it from falling over. You can apply a force of +1 or -1 to the cart. A reward of +1 is provided for every timestep that the pendulum remains upright.

Setup

  • OpenAI gym
  • TensorFlow
  • Numpy
  • Scipy
  • IPython Notebook

If you already have some of above libraries installed, try to manage the dependencies by yourself.

If you are using a new environment (may be virtual), the preferred approach for installing above dependencies is to use Anaconda, which is a Python distribution that includes many of the most popular Python packages for science, math, engineering and data analysis.

  1. Install Anaconda: Follow the instructions on the Anaconda download site.
  2. Install TensorFlow: See anaconda section of TensorFlow installation page.
  3. Install OpenAI gym: Follow the official installation documents here.

Prerequisites

If you are unfamiliar with Numpy or IPython, you should read materials from CS231n:

Also, knowing the basics of TensorFlow is required to complete this assignment.

For introductory material on TensorFlow, see

Feel free to skip these materials if you are already familiar with these libraries.

How to Start

  1. Start IPython: After you clone this repository and install all the dependencies, you should start the IPython notebook server from the home directory
  2. Open the assignment: Open HW2_Policy_Graident.ipynb, and it will walk you through completing the assignment.

To-Do

  • [+20] Construct a 2-layer neural network to represent policy

  • [+30] Compute the surrogate loss

  • [+20] Compute the accumulated discounted rewards at each timestep

  • [+10] Use baseline to reduce the variance

  • [+10] Modify the code and write a report to compare the variance and performance before and after adding baseline (with figures is better)

  • [+10] In function process_paths of class PolicyOptimizer, why we need to normalize the advantages? i.e., what's the usage of this line:

    p["advantages"] = (a - a.mean()) / (a.std() + 1e-8)

    Include the answer in your report

Other

  • Office hour 2-3 pm in 資電館 with YenChen Lin.
  • Due on Oct. 17 before class.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.