Git Product home page Git Product logo

rl_algorithms's Introduction

Reinforcement Learning Algorithms

I will use this repository to implement various reinforcement learning algorithms (also imitation learning), because I'm sick of reading about them but not really understanding them. Hence, hopefully this repository will help me understand them better. I will also implement various supporting code as needed, such as for simple custom scenarios like GridWorld. Or I can use OpenAI gym. Click on the links to get to the appropriate algorithms. Each sub-directory will have its own READMEs with results there, along with usage instructions.

Here are the algorithms currently implemented or in progress:

Note: "Vanilla Policy Gradients" refers to the REINFORCE algorithm, also known as Monte Carlo Policy Gradient. Sometimes it's called an actor-critic method and other times it's not. Even if it's considered an actor-critic method, the usual way we think of actor-critic involves a TD update rather than waiting until the end of an episode to get returns.

Requirements

Right now the code is designed for Python 2.7, but it should be compatible with Python 3.5+, with the possible exception of if the bash scripts can't tell the difference between which Python versions I'm using.

In short:

  • Python 2.7.x
  • Tensorflow 1.2.0

GPU and TensorFlow

(Update 06/16/17, these are out of date ... just install with pip and preferably virtualenv. It's so much easier.)

I installed TensorFlow 1.0.1 from source. For the configuration script, I used CUDA 8.0, cuDNN 5.1.5, and compute capability 6.1.

Compiling from source means I can get faster CPU instructions. This requires bazel plus extra compiler options. I used:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

This resulted in ton of warning messages but I ended up with:

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 884.276s, Critical Path: 672.19s

and things seem to be working. Then run the command:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

To get a wheel, which we then do a pip install. But be careful due to pip on anaconda vs pip with default python. I use anaconda. And make sure you're not in either the tensorflow or the bazel directories!

Track the GPU usage with nvidia-smi. Unfortunately, that's only for one time-step, but we can instead run:

while true; do nvidia-smi --query-gpu=utilization.gpu --format=csv >> gpu_utilization.log; sleep 10; done;

Or something like that. It will record the output every 10 seconds and dump it into the log file. Ideally, GPU usage should be as high as possible (100% or close to it).

References

I have read a number of reinforcement learning paper references to help me out. A list of papers and summaries (for a few of them) are in my paper notes repository.

rl_algorithms's People

Contributors

danieltakeshi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.