siemanko / tensorflow-deepq Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 296.0 9.05 MB

A deep Q learning demonstration using Google Tensorflow

License: MIT License

Python 29.81% Shell 0.29% Jupyter Notebook 69.90%

tensorflow-deepq's Introduction

This reposity is now obsolte!

Check out the new simpler, better performing and more complete implementation that we released at OpenAI:

https://github.com/openai/baselines

(scroll for docs of the obsolete version)

Reinforcement Learning using Tensor Flow

Quick start

Check out Karpathy game in notebooks folder.

The image above depicts a strategy learned by the DeepQ controller. Available actions are accelerating top, bottom, left or right. The reward signal is +1 for the green fellas, -1 for red and -5 for orange.

Requirements

future==0.15.2
euclid==0.1
inkscape (for animation gif creation)

How does this all fit together.

tf_rl has controllers and simulators which can be pieced together using simulate function.

Using human controller.

Want to have some fun controlling the simulation by yourself? You got it! Use tf_rl.controller.HumanController in your simulation.

To issue commands run in terminal

python3 tf_rl/controller/human_controller.py

For it to work you also need to have a redis server running locally.

Writing your own controller

To write your own controller define a controller class with 3 functions:

action(self, observation) given an observation (usually a tensor of numbers) representing an observation returns action to perform.
store(self, observation, action, reward, newobservation) called each time a transition is observed from observation to newobservation. Transition is a consequence of action and has associated reward
training_step(self) if your controller requires training that is the place to do it, should not take to long, because it will be called roughly every action execution.

Writing your own simulation

To write your own simulation define a simulation class with 4 functions:

observe(self) returns a current observation
collect_reward(self) returns the reward accumulated since the last time function was called.
perform_action(self, action) updates internal state to reflect the fact that aciton was executed
step(self, dt) update internal state as if dt of simulation time has passed.
to_html(self, info=[]) generate an html visualization of the game. info can be optionally passed an has a list of strings that should be displayed along with the visualization

Creating GIFs based on simulation

The simulate method accepts save_path argument which is a folder where all the consecutive images will be stored. To make them into a GIF use scripts/make_gif.sh PATH where path is the same as the path you passed to save_path argument

tensorflow-deepq's People

Contributors

Stargazers

Watchers

Forkers

samim23 kreukle freefrancisco ml-lab peratham jmrinaldi chocoluffy wavelets anjith2006 binderwang stevenliuit mindwing lngvietthang juandisay techscientist ian09 salemameen gwworld benjamesbabala codeaudit abdullah2891 namratab94 stupidwheels flrgsr milesqli fdoperezi ml-ai-nlp-ir goodrahstar qdrk stas-sl mwalton tjrileywisc colebrew copyfun tianmh wangxiong2015 juliabuhmann adamwtow caomw cloudwiser zhouruiapple steven1982 schwarmcyc lifeles666 gwding ironchariot laisun liangpj beronx86 zergey jungle-cat yonglehou atveit seann999 jackdogan zentechthaingo amoliu technologiclee xindaya snurkabill mattderry jicheng-yan dancivitarese nao0811ta arjunchandra gokul-uf hanhongsun xhuvom webmalex hpssjellis fullkawa imcomking caigaojiang nishithbsk iaroslav-ai belvo oplatek ai42 matsuolab misc-git-forks sekmet bwry viccastillo erlenlok brandonsmithj thebirdie sungsulim clsung scroyston mvpcom hakehuang meppe chelimalaraju cc13ny mukuact ricky1203 aemrepo jude2014 dingguijin nilbody

tensorflow-deepq's Issues

Question about observation list in karpathy_game.py

Hello, I am a newbie in Reinforcement Learning, and I thought this project is a wonderful resource to learn RL.

I have been going through your source code, and the line
https://github.com/nivwusquorum/tensorflow-deepq/blob/master/tf_rl/simulation/karpathy_game.py#L202
is kind of confusing for me.

From my understanding, each observation of hero's line consists of
[ type_of_object (wall/friend/enemy), object_speed (x,y), dist_from_hero ]
so the type_of_object should be something like [0,1,0], [1,0,0] or [0,0,1]
in the source code I attached, however, it assigns [1,1,1], and that sounds like it treats all types (wall/friend/enemy) equally.

Is it my misunderstanding? Please help me understand.
Thank you.

-Taeksoo

How to save/restore saved model

how to save and later resume training after restoring already trained model

Thank You !

Having trouble running example on MacOS

Hi -

I've installed TensorFlow and I can run their examples. I suspect I'm missing a path or an initialization step.

This is my version of Python:
Python 2.7.10 :: Anaconda 2.4.0 (x86_64)

When I invoke your sample command it throws an error on the first line. I get similar import problems when I use Jupyter to open the Karpathy notebook:

python tf_rl/controller/human_controller.py
Traceback (most recent call last):
File "tf_rl/controller/human_controller.py", line 1, in
from tf_rl.utils.getch import getch
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/init.py", line 1, in
from .simulate import simulate
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/simulate.py", line 7, in
from tf_rl.utils.event_queue import EventQueue
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/utils/event_queue.py", line 3, in
from queue import PriorityQueue
ImportError: No module named queue

Thanks!

Saved Models, tf.train.Saver

Firstly, let me just say that I love this project, the code is so easy to read and understand, and it blends two things I really wanted to experiment with!

I'm getting to the stage where I want to save/restore model variables after training. I notice that you have a saved_model folder with a .ckpt file for the karpathy game, but I do not see any way to load it using the current notebook.

I've read up on tf.train.Saver, and have tried to use its save and restore functions with partial success (my q_network's weights seem to get restored fine, but the target_q_network's weights do not). Do you have a version of the karpathy_game notebook that was used to create/restore your saved model? If not, could you please advise on what I should keep in mind when setting up tf.train.Saver() and using the save/restore methods?

(EDIT: As is pretty typical for me, I asked this prematurely - I figured out my mistake after a little more fiddling with the code. I had initialised the Saver before the controller was set up, so of course it couldn't save all the variables used by the controller... Sorry!)

Performance decrease after recent commits

Hi! Thanks for useful examples of tensorflow usage.

I'm playing with Deep Q-Learning example and noticed that after some of recent commits performance of DiscreteDeepQ dropped ~5x times. I'm wondering what caused it? Is it because of maintaining 2 copies of q-network? Is there an option to update weights less frequently? Sorry, I don't understand what is going on there very well yet.

high dimension states

Great project! I'm looking to use this with a Kinect v2 camera for a robotics application. I have 26 different joints each with x,y,z coordinates that will be my state space. Looking through the code it looks like the state it just a single int. Can you give me some guidance on how to feed in all these states?

I know with standard qlearning you have a S-A pair which are both just a single value. Is it possible to have 3 actions. In my example it would be motor 1, motor 2, motor 3.

add LICENSE file

Hi! what's the license this is released under?

What approach is used to train continuous controller in tensorflow-deepq?

Did you use some specific paper to write the continuous controller code? I would like to take a look at your continuous branch, and would like to look at such paper first beforehand :) Did you use something like this?

ImportError: No module named 'tf_rl'

I am trying to run this game in the browser from windows cmd but I dob't understand why I have this problem.
python version 3.5
jupyter
How do I solve this?

from future import print_function

import numpy as np
import tempfile
import tensorflow as tf

from tf_rl.controller import DiscreteDeepQ, HumanController
from tf_rl.simulation import KarpathyGame
from tf_rl import simulate
from tf_rl.models import MLP

ImportError Traceback (most recent call last)
in ()
5 import tensorflow as tf
6
----> 7 from tf_rl.controller import DiscreteDeepQ, HumanController
8 from tf_rl.simulation import KarpathyGame
9 from tf_rl import simulate

ImportError: No module named 'tf_rl'

Links to theory for continuous branch

Hi,

A non-technical question, I hope its OK to ask here in github...

I am working on continuous robot control problems and was wondering which approach you are following for the continuous branch. I guess it is the Advantage Actor-Critic (A3C) approach in the 2016 Mnih paper here. However, that method is actually not Q-Learning but a variation of a policy GD method. However, many variables in your controller code suggest that DeepQ learning is applied, so I am a bit confused. Could you confirm that the code tries to reproduce the A3C method in that paper?

How long training for this game?

Hi,

I wonder, how long training taking time so that the ai agent can act like your gif on this github page?

Thanks,

undefined symbol: PyClass_Type

envy@ub1404:~~/os_pri/github/tensorflow-deepq$ PYTHONPATH=~~/os_pri/github/tensorflow-deepq:/home/envy/os_pri/github/tensorflow/_python_build:$PYTHONPATH python3 tf_rl/controller/human_controller.py
Traceback (most recent call last):
File "tf_rl/controller/human_controller.py", line 1, in
from tf_rl.utils.getch import getch
File "/home/envy/os_pri/github/tensorflow-deepq/tf_rl/utils/init.py", line 1, in
import tensorflow as tf
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/init.py", line 49, in
from tensorflow import contrib
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/init.py", line 23, in
from tensorflow.contrib import layers
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/layers/init.py", line 67, in
from tensorflow.contrib.layers.python.framework.tensor_util import *
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/layers/python/framework/tensor_util.py", line 21, in
from tensorflow.python.framework.ops import Tensor
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 39, in
from tensorflow.python.framework import versions
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/framework/versions.py", line 22, in
from tensorflow.python import pywrap_tensorflow
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: /home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: PyClass_Type
envy@ub1404:/os_pri/github/tensorflow-deepq$
envy@ub1404:/os_pri/github/tensorflow-deepq$

DoublePendulum notebook outdated

In ./notebooks/DoublePendulumn.ipynb, when I run

try:
    simulate(d, fps=30, actions_per_simulation_second=1, speed=1.0, simulation_resultion=0.01)
except KeyboardInterrupt:
    print("Interrupted")

It complains

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-7b5361d33da5> in <module>()
      1 try:
----> 2     simulate(d, fps=30, actions_per_simulation_second=1, speed=1.0, simulation_resultion=0.01)
      3 except KeyboardInterrupt:
      4     print("Interrupted")

TypeError: simulate() got an unexpected keyword argument 'actions_per_simulation_second'

The current simulate(...) has an interface way different to what is being called here

def simulate(simulation,
             controller= None,
             fps=60,
             visualize_every=1,
             action_every=1,
             simulation_resolution=None,
             wait=False,
             disable_training=False,
             save_path=None):

Hi, your work is awesome, but could you provide more tutorial docs that we can learn with your code?

New features to add: PolicyGradient, OpenAI Gym, Keras, Asynchronous Training

Since I am using this repo in my work as well, some of the features I found might be useful:

Continuous control: A couple of recent papers have been published to show that deep NN would enable complex high dimensional controls with policy gradient. It seems the continuous branch was initiated but has been inactive for a while (is it totally abandoned?).
Support OpenAI gym to make continuous control easier and bench-marked
Support Keras to shrink the code base regarding to tensorflow, also make it easier to program more complex networks
Asynchronous training with multi-agent

The latter three are supported in Asyncronous RL in Tensorflow + Keras + OpenAI's Gym However I do prefer the architecture here to keep things well structured.

it always stuck at ...

envy@ub1404:~/os_pri/github/tensorflow-deepq$ python3 tf_rl/controller/human_controller.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

TF version, circular import dependencies, AttributeError: 'module' object has no attribute 'ops

Hi,

I have a problem with executing the Karpathy game notebook. Cell 9 gives rise to the following error:

---------------------------------------------------------------------------AttributeError                            Traceback (most recent call last)<ipython-input-6-16dd03e0e8b6> in <module>()
      6 else:
      7     # Tensorflow business - it is always good to reset a graph before creating a new controller.
----> 8     tf.ops.reset_default_graph()
      9     session = tf.InteractiveSession()
     10 
AttributeError: 'module' object has no attribute 'ops

This usually happens when there are circular import dependencies, however, I could not find any. I am using Tensorflow 0.71 and Python 2.7.6 The error also occurs with Python 3.4. Which Python and tensorflow version are you using?

Thanks a lot for the nice code!

Deep Q Controller Possible Error

Thanks for the head start on RL with your DeepQ work. I am relatively new with RL and I was trying to get a system to converge for the longest time using your DeepQ controller, but it kept tending to 0 in total reward. My environment gives positive and negative reward, but almost always "converged" to 0 total reward (lowest energy?).

After re-reviewing many RL examples and TensorFlow, I think I found the issue which was surprising to say the least. I think it is related to TensorFlow's automatic calculation of the gradients. I feel the error is in this line:

temp_diff = self.value_given_action - self.future_reward

I my mind this is the difference of (Y(x) - Yexpected). The derivative of this is ultimately +Y(x)/dCost and I think this forces the solution "away" from the minimum. This in turn increases cost and forced my system to eventually decide to take no action at all (best convergence case it could find). So I reversed the terms according to most of the literature to (Yexpected - Y(x)) and sure enough, reward would grow positive and converge.

This may not affect examples where the data is all positive and you possible stop early enough with a slow learning rate. So changing this line to read as follows may improve the algorithm. This might also fix you Continuous solution as well if that had +/- rewards:

temp_diff = self.future_reward - self.value_given_action

Need some help... Is it possible to extend it to be a multiagents version??

Really thankful for your work. The code is neat and simple, which help a python beginner like me a lot.
After run the example in the notebook folder, I'm now trying to extend it to be a multiagents version by simply replicate multiple copy of the DiscreteDeepQ.
However, I encounter some error of the namescope in DiscreteDeepQ.
I try to solve the issue by add agent id to the namescope. I not quite sure if it's the 'correct' way to solve.
Would it mass up the storage data of the network? Or the problem can be solved by these simple solution?

Ps. The program can be successfully compiled, but the agent seems to just move randomly...

Discrete Hill python notebook

I can't seem to replicate your results in the game notebook. Is your ipython notebook output for that outdated? or is there something subtle going on? I simply ran your game python notebook, but it doesn't seem to learn at all. I haven't seriously studied your code, so some help is appreciated in debugging

Dependency missing

Should add a dependency for 'matplotlib' to your README

i don't understand how to use this

I'm so sorry to be asking this.
I tried for an hour and I can't figure out how to use or run it. Could someone please give me a pointer towards the right direction? I got as far as moving it into the site-packages directory of python, I am running a redis server even though I have no idea how to use it, and I think I have all the modules installed. I really would like to play around with this and experiment with different games but I would rather not start from scratch until I'm sure this is what I want to learn.

where does the tensorflow model gets saved

I tried to figure out tf.train.Saver or something similar in the code, but couldn't locate it. So, where does the model ( *.ckpt ) gets saved ?
I can write my own saver, but since there is saved model for karpathy game already, I was wondering if I am missing something ?

Add more drawing options

In simulate(...), it assumes the environment is drawn with svg strings. However it might not generalise to complex game environment. For example, im drawing in 3D at the moment and it is easy to use matplotlib (maybe pyopengl in the future?).

Is it possible to make the environment drawing interface more general? My thought is something like

...
simulation.setup_draw() # Initialise figure handles, axes for reuse
...

for frame_no ...
    ...
    simulation.draw() # Draw things by plot, scatter, etc or Ipython display()
    ...

In this way, all the figure handles and drawing stuff are handled by the environment class, e.g. KarparthyGame

Pls let me know if you like this idea. If so, I can make another pull request and see how it goes