Git Product home page Git Product logo

qlearning4k's Introduction

qlearning4k

Q-learning for Keras

Qlearning4k is a reinforcement learning add-on for the python deep learning library Keras. Its simple, and is ideal for rapid prototyping.

Example :

from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.optimizers import sgd
from qlearning4k.games import Catch
from qlearning4k import Agent

nb_frames = 1
grid_size = 10
hidden_size = 100

model = Sequential()
model.add(Flatten(input_shape=(nb_frames, grid_size, grid_size)))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dense(3))
model.compile(sgd(lr=.2), "mse")

game = Catch(grid_size)
agent = Agent(model)
agent.train(game)
agent.play(game)

Reinforcement learning 101:

Reinforcement learning is all about training an agent to behave in an environment (such as a video game) so as to optimize a quantity (maximizing the game score), by performing certain actions in the environment (pressing buttons on the controller) and observing what happens. For every action it does, the agent gets a positive, negetive or a zero reward from the environment. These rewards help the agent understand what effect its action had on the environment, and the agent learns to do actions that are likely to result in a higher cumulative reward.

You would have probably already seen DeepMind's Q-learing model play Atari Breakout like a boss, if not then have a look : Google DeepMind's Deep Q-learning playing Atari Breakout. The related papers can be found under "references", but if you are not a pdf guy, I suggest this Nervana's post on the topic.

OK, now lets do the math.

Consider a game G. For simplicity, let G be a function, which takes a game state S and an action a as input and outputs a new state S' and a reward r. S is probably a bunch of numbers that describe a state of the game. a is an action, a number which represents, say, a button on the game controller.

G(S, a) = [S', r]

A function which returns 2 values? Um, lets break it down:

Gs(S, a) = S'

Gr(S, a) = r

Neat.

Now we need our agent, powered by a neural network model M to master this game called G. Ideally, we would like a function F(S) which would return the best action to do for the given state:

F(S) = a

But the structure of the neural network model is different, it returns the expected maximum score for each action (called the Q-value. Hence the term Q-learning) for a given state:

M(S) = {q1, q2, q3, ... qn}

Here q1, q2, .. qn are the expected max-scores (Q values) for each of the possible actions a1, a2, .. an respectively. So the agent simply does the action with the highest reward. In the context of this repo, M is just a Keras model, which takes S as input and outputs the Q values. The number of Q values output by the model equals the number of possible actions in the game. In a game of catch, you can go left, go right, or just stay still. So 3 actions, which means 3 Q values, one for each action.

Now you can redefine the required function F in terms of the nueral network M :

F(S) = argmax(M(S))

Cool !

Now where do we get these Q values from to train the model? From the Q function.

Lets officially define the Q function :

Q(S, a) = Maximum score your agent will get by the end of the game, if he does action a when the game is in state S We know that on performing action a, the game will jump to a new state S', also giving the agent an immediate reward r.

S' = Gs(S, a)

r = Gr(S, a)

Note that when the game is at state S', the agent will do an action a', according to its neural network model M:

a' = F(S') = argmax(M(S'))

Time to define Q, again :

Q(S, a) = Immediete reward from this state + Max-score from the next state onwards

Formally:

Q(S, a) = r + Q(S', a')

Hmm, a recursive function.

Now lets give more weight to the immediete reward by introducing a discounting factor, gamma:

Q(S, a) = r + gamma * Q(S', a')

You got your Ss, and your Qs. All there is left is to train your model.


Using the library :

Checkout the examples. Should get you started. You can easily get it to work with your own games, or any 3rd party game by wrapping it up into a class which implements all the functions in the Game class in games.py. Here is a basic manual of the main stuff:

Agent

init

  • model : Keras model. Should be shape compatible with the game and compiled.

  • memory_size : Size of your agents memory. Set to -1 for infinite memory

  • nb_frames : How many past frames should the agent remeber?

train

  • game : The game to train on

  • nb_eopch : Number of epochs to train

  • batch_size : Batch size for training

  • gamma : Discount factor

  • epsilon : Exploration factor. Can be an integer or a tuple/list of integers with 2 integers. If tuple/list, exploration factor will drop from the first value to the second during traing.

  • epsilon_rate : Rate at which epsilon should drop. If its 0.4 for example, epsilon will reach the lower value by the time 40 % of the training is complete.

  • reset_memory : Should memory be reset after each epoch ?

play

  • game : The game to play

  • nb_eopch : Number of epochs to play


Requirements:


Installation

git clone http://www.github.com/farizrahman4u/qlearning4k.git
cd qlearning4k
python setup.py install

Alternatively by using pip:

pip install git+http://www.github.com/farizrahman4u/qlearning4k.git

Running Example:

cd examples
python test_catch.py
python test_snake.py

References:


TODO:

  • Add items to the TODO list
  • Remove the item that says "Add items to the TODO list" and the one after it from the TODO list.

qlearning4k's People

Contributors

farizrahman4u avatar jandaldrop avatar oscarrl avatar petrbel avatar shagunsodhani avatar tonyztan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qlearning4k's Issues

Implementation of allowed states in the game abstract class

Currently all state transitions are allowed, it would be nice in the base class if we could override a method which allows us to return the set of allowed next states. Currently this must be learned by the algorithm by using the score. I have forked the code and may be able to propose an implementation, I think it is not too hard, I need to study the core training class to see how to implement this.

two example not work

Hi , I have two problem in test_catch.py and test_snake.py

  • python version : Anaconda3/python3.5
  • platform : macOS
  • backend : Tensorflow

In test_catch.py

Traceback (most recent call last):
  File "test_catch.py", line 20, in <module>
    agent.train(catch, batch_size=10, nb_epoch=1000, epsilon=.1)
  File "/Users/Rozen_mac/anaconda3/lib/python3.5/site-packages/qlearning4k-0.0.1-py3.5.egg/qlearning4k/agent.py",line 62, in train
  File "/Users/Rozen_mac/anaconda3/lib/python3.5/site-packages/qlearning4k-0.0.1-py3.5.egg/qlearning4k/agent.py",line 37, in check_game_compatibility
AttributeError: 'Sequential' object has no attribute 'input_layers_node_indices'

In test_snake.py

Traceback (most recent call last):
  File "test_snake.py", line 25, in <module>
    agent.train(snake, batch_size=64, nb_epoch=10000, gamma=0.8)
  File "/Users/Rozen_mac/anaconda3/lib/python3.5/site-packages/qlearning4k-0.0.1-py3.5.egg/qlearning4k/agent.py",line 62, in train
  File "/Users/Rozen_mac/anaconda3/lib/python3.5/site-packages/qlearning4k-0.0.1-py3.5.egg/qlearning4k/agent.py",line 37, in check_game_compatibility
AttributeError: 'Sequential' object has no attribute 'input_layers_node_indices'

PEP8 coding style

Hi,
I noticed that the spaces are used (as the way of indentation) instead of spaces. My PyCharm cries a lot as it isn't PEP8-standard compatible.

Would you accept my pull-request if I transformed the code so that it becomes correctly styled according to PEP8? Besides the space/tab problem, I'd love to add/remove blank lines. I believe it would help other developers to work with your code as it is the commonly required standard.

Btw great code, I'm looking forward to play with it more!

Regards
Petr

ValueError

Hey,
First of all great work on this, very cool to lib to use!

I just have a little issue: It seems that the latest Keras API is creating some issue with example/test_snake.py

When I test it I get the error message:

ValueError: Negative dimension size caused by subtracting 3 from 2 for 'Conv2D_1' (op: 'Conv2D') with input shapes:

minor problem in Example:

In the example of README.md, the variable hidden_size appears to be undefined.
Very minor problem, but since I noticed it I could as well report it.
Thanks, interesting work.

GPU performance question

I noticed that training on CPU-only is much faster than on GPU. With GPU, usage is about 18% according to nvidia-smi.

Is this because it trains for each epoch, and has to send the data to the card? I see the fast-mode in the code, and it seems to be getting activated with the theano backend. I cannot figure out if Ive got something misconfigured.

Various algorithms?

Will this repo be the place where multiple reinforcement learning algorithms (q-learning, A3C, ...) be implemented for Keras?

License

Hi, I want to know what is license of this library. Sorry for dumb question. But I don't seem license information in this file. So I have to ask. Thank you very much.

NaN Loss on train

I have been getting NaN losses when I try to train my agent on a game. I tracked it back to the get_batch function in memory - Y (the output of the model's predictions) turns to all NaNs about halfway through the first epoch. I haven't been able to figure it out from there, though.

Any suggestion would be much appreciated. This package is fantastic!

IndexError: too many indices for array

Running into an issue running the test example programs.

Korys-MacBook-Air:examples korymathewson$ python test_snake.py
Using Theano backend.
Traceback (most recent call last):
File "test_snake.py", line 23, in
agent.train(snake, batch_size=64, nb_epoch=10000, gamma=0.8)
File "build/bdist.macosx-10.6-intel/egg/qlearning4k/agent.py", line 92, in train
IndexError: too many indices for array

Not 100% sure what the error is, before I started digging in I thought I would post some details.

The layer "sequential_1 has multiple inbound nodes, with different input shapes. Hence the notion of "input shape" is ill-defined for the layer. Use `get_input_shape_at(node_index)` instead.

Hi I seem to be running
$ python test_catch.py
and here is what I seem to get

Epoch 1000/1000 | Loss 0.0104 | Epsilon 0.10 | Win count 793
Traceback (most recent call last):
File "test_catch.py", line 21, in
agent.play(catch)
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\qlearning4k-0.0.1-py3.6.egg\qlearning4k\agent.py", line 105, in play
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\qlearning4k-0.0.1-py3.6.egg\qlearning4k\agent.py", line 38, in check_game_compatibility
File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\keras\engine\topology.py", line 1028, in input_shape
' has multiple inbound nodes, '
AttributeError: The layer "sequential_1 has multiple inbound nodes, with different input shapes. Hence the notion of "input shape" is ill-defined for the layer. Use get_input_shape_at(node_index) instead.

test_snake.py 'Negative Dimension Size caused by...

Using TensorFlow backend.
Traceback (most recent call last):
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 654, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\contextlib.py", line 66, in exit
next(self.gen)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 3 from 2 for 'Conv2D_1' (op: 'Conv2D') with input shapes: [?,2,8,16], [3,3,16,32].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_snake.py", line 13, in
model.add(Convolution2D(32, nb_row=3, nb_col=3, activation='relu'))
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\models.py", line 332, in add
output_tensor = layer(self.outputs[0])
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\engine\topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\engine\topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\engine\topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\layers\convolutional.py", line 475, in call
filter_shape=self.W_shape)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\keras-1.2.2-py3.5.egg\keras\backend\tensorflow_backend.py", line 2691, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 397, in conv2d
data_format=data_format, name=name)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\ops.py", line 2632, in create_op
set_shapes_for_outputs(ret)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\ops.py", line 1911, in set_shapes_for_outputs
shapes = shape_func(op)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\ops.py", line 1861, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 595, in call_cpp_shape_fn
require_shape_fn)
File "C:\Users\Me\Anaconda3\envs\tf_rl\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 659, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Negative dimension size caused by subtracting 3 from 2 for 'Conv2D_1' (op: 'Conv2D') with input shapes: [?,2,8,16], [3,3,16,32].

I get the above error when I try to run test_snake.py, with Keras version 1.2.2 and Tensorflow-gpu version 1.3.0

Thoughts?

Exception - You are attempting to share a same 'BatchNormalization' layer

Hello,

Great Q-learning example!

I'm getting the following error running on latest dev builds of Theano and Keras.

D:\Development\DeepLearning\QLearning4k\examples>python test_snake.py
Using Theano backend.
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, cuDNN 4007)
Epoch 001/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 0
Epoch 002/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 0
Epoch 003/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 0
Epoch 004/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 1
Epoch 005/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 1
Epoch 006/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 1
Epoch 007/10000 | Loss 0.0000 | Epsilon 1.00 | Win count 1
Traceback (most recent call last):
File "test_snake.py", line 24, in
agent.train(snake, batch_size=64, nb_epoch=10000, gamma=0.8)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\qlearning4k\agent.py", line 91, in train
batch = self.memory.get_batch(model=model, batch_size=batch_size, gamma=gamma)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\qlearning4k\memory.py", line 35, in get_batch
return self.get_batch_fast(model, batch_size, gamma)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\qlearning4k\memory.py", line 106, in get_batch_fast
self.set_batch_function(model, self.input_shape, batch_size, model.output_shape[-1], gamma)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\qlearning4k\memory.py", line 88, in set_batch_function
Y = model(X)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\keras\engine\topology.py", line 500, in call
return self.call(x, mask)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\keras\models.py", line 164, in call
return self.model.call(x, mask)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\keras\engine\topology.py", line 1951, in call
output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\keras\engine\topology.py", line 2093, in run_internal_graph
output_tensors = to_list(layer.call(computed_tensor, computed_mask))
File "C:\WinPython34\python-3.4.4.amd64\lib\site-packages\keras\layers\normalization.py", line 116, in call
raise Exception('You are attempting to share a '
Exception: You are attempting to share a same BatchNormalization layer across different data flows. This is not possible. You should use mode=2 in BatchNormalization, which has a similar behavior but is shareable (see docs for a description of the behavior).

Changing to mode=2 produces another error...

TypeError: float() argument must be a string or a number

envy@ub1404:/os_pri/github/qlearning4k/examples$ python test_catch.py
Using Theano backend.
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
/home/envy/.local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
warnings.warn("downsample module has been moved to the pool module.")
Epoch 001/1000 | Loss 0.0000 | Epsilon 0.10 | Win count 1
Traceback (most recent call last):
File "test_catch.py", line 20, in
agent.train(catch, batch_size=10, nb_epoch=1000, epsilon=.1)
File "/home/envy/.local/lib/python2.7/site-packages/qlearning4k/agent.py", line 94, in train
loss += float(model.train_on_batch(inputs, targets))
TypeError: float() argument must be a string or a number
envy@ub1404:
/os_pri/github/qlearning4k/examples$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.