Git Product home page Git Product logo

deeplearningflappybird's Introduction

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames

deeplearningflappybird's People

Contributors

chengchingwen avatar cyhsutw avatar jgyllinsky avatar katyprogrammer avatar kellyfu avatar yenchenlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplearningflappybird's Issues

are those calculations right??

In the figure,
I wonder that your math things are valid

I mean,

  1. input 80 x 80 x 4 -- conv. w/ 8 x 8 x 4 x 32, stride 4 --> output 19 x 19 x 32
    (because (80 - 8) / 4 + 1) => your result was 20 x 20 x 32

  2. input 10 x 10 x 32 -- conv. w/ 4 x 4 x32 x 64, stride 2 ---> output 4 x 4 x 64
    (because (10 - 4) / 2 + 1) => your result was 5 x 5 x 64

  3. and...
    input 3 x 3 x 64 -- conv. w/ 3 x 3 x64 x 64 --> your result was 3 x 3 x 64 (is this possible?)

am I wrong?
Since I am a newbie on this area, if I misunderstood, please teach me.

Confusion about the input

Are the input really the last 4 frames or it's just one frame stacked into four? The code below seem to indicate it's one frame stacked into four as serve as the input.

do_nothing[0] = 1
x_t, r_0, terminal = game_state.frame_step(do_nothing)
x_t = cv2.cvtColor(cv2.resize(x_t, (80, 80)), cv2.COLOR_BGR2GRAY)
ret, x_t = cv2.threshold(x_t,1,255,cv2.THRESH_BINARY)
s_t = np.stack((x_t, x_t, x_t, x_t), axis=2)

Another AI flappy bird using genetic programming (evolutionary computation)

Appreciate this excellent work. I got a lot of inspiration from this work on pygame.

I have achieved to train an AI for a more difficult version flappy bird: the horizontal distance between adjacent pipes and the gap between up and down pipes are random within a certain range rather than being fixed. Instead of neural networks and reinforcement learning,I use evolutionary strategies and Cartesian genetic programming, which attempts to build the control function (a math expression) directly using only basic arithmetic operators. With a small population of size 10, the bird can learn to fly quite well in typically less than 50 generations, which seems to be much more efficient than simple neuron evolution.

I implement this algorithm with Python and pygame. For those who are interested, please check my GitHub repository. A demo is here.

training

please,Will you stop after the training

Setting the Difficulty Level of the Game

Hi,

Thanks for your nice code and documentation.

I saw the report from Kevin Chen where he experimented with three difficulty levels (easy, medium, hard) of the game. Can you please tell me which difficulty level the game is set in your code ? and How to change the difficulty level if I want to?

I guess, it's related to value of PIPEGAPSIZE in wrapped_flappy_bird.py.. currently it's set to 100. Is that hard mode? By Increasing or decreasing the PIPEGAPSIZE, can I change the difficulty level? If so, are there any specific value for those modes?

Thanks!

Why use pooling?

Probably you might want to enlighten me on the reason why you use pooling in this architecture, cause as far as i know pooling might result into a network that is insensitive to the location of an object in the image. Thanks in advance

reading file issue

Why I can't read png file from assets?
Is that a problem from my module?

How do 1 and -1 reward be used?

I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.

In Chinese:是不是reward为1和-1的情况也都放在deque里,那么reward为1和-1的被sample出来的几率岂不是很低,反馈就会很慢?

Thank you @yenchenlin

parse_card: can't find card 0


Traceback (most recent call last):
  File "deep_q_network.py", line 8, in <module>
    import wrapped_flappy_bird as game
  File "game/wrapped_flappy_bird.py", line 19, in <module>
    IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
  File "game/flappy_bird_utils.py", line 42, in load
    SOUNDS['die']    = pygame.mixer.Sound('assets/audio/die' + soundExt)
MemoryError

why deep learning is used in this game

As i found on the internet that this game can be built without the use of deep learning [https://github.com/chncyhn/flappybird-qlearning-bot]
So can u help me understand what is more beneficial in using deep learning in this game rather than simply using q-learning.

Game acceleration

Could it be possible to accelerate the game to save the training time?

FINAL_EPSILON = 0.0001 ,maybe INITIAL_EPSILON = 1?

   FINAL_EPSILON = 0.0001 
   INITIAL_EPSILON = 0.0001 
   epsilon = INITIAL_EPSILON
    "so in this condition "epsilon" will never be update."
    if epsilon > FINAL_EPSILON and t > OBSERVE:
        epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE

i meet a new issue that i can not run it.

2016-12-03 15:39:16.578 Python[10293:65253] 15:39:16.578 WARNING: 140: This application, or a library it uses, is using the deprecated Carbon Component Manager for hosting Audio Units. Support for this will be removed in a future release. Also, this makes the host incompatible with version 3 audio units. Please transition to the API's in AudioComponent.h.
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: Failed loading libpng.dylib: dlopen(libpng.dylib, 2): image not found

pygame.error: File is not a Windows BMP file???????

Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: File is not a Windows BMP file

Ignore this

Oops, sorry. I typed the issue in the wrong repo. I'm trying to figure how to delete the issue.

EDIT : Turns out you can't delete issues on Github. really sorry.

How to handel png file?

How to handle png file? I got the following message and have no idea how to solve it. I had tried to install pil or pillow but didn't work.

$python deep_q_network.py
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 21, in load
pygame.image.load('assets/sprites/0.png').convert_alpha(),
pygame.error: File is not a Windows BMP file

memory leak?

memory leak?

It runs well but slower and slower, as memory increasing.

import Tensorflow as tf, some error occurred

Hey there, if anyone else is getting the same error then please uninstall the tensorflow version on windows using pip uninstall tensorflow and then re-install tensorflow.
You might use other versions of tensorflow too, only if it's not working in 1.5.0.
You can also downgrade to pip install tensorflow==1.1
BTW amazing stuff man, Kudos!

AttributeError

After py deep_q_network.py:

AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'

Then pip install --upgrade tensorflow==0.7

It returns:

ERROR: Could not find a version that satisfies the requirement tensorflow==0.7 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)

ERROR: No matching distribution found for tensorflow==0.7

How can I run this?

Crashes on launch on Mac

Hi,

DeepLearningFlappyBird crashes on launch on Mac OS X EI Captian, here is the error log:

tensorflow.python.framework.errors.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-dqn-30000

Pickling the queue

I've implemented DQL for Flappy Bird in Keras and I find that pickling more than 50000 experiences takes over 11GB of storage due to the inefficiency of pickle (or cPickle for that matter), while the actual size of queue is around 5 Gigs using sys.getsizeof() (there is no better alternative to get size of python objects)
Did you face this issue? I would imagine using a database like sqlite should be more efficient.

Unable to open file 'assets/audio/die.ogg', please help >>. .<<

[root@localhost DeepLearningFlappyBird]# python3.6 deep_q_network.py
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused

libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
Traceback (most recent call last):
File "deep_q_network.py", line 8, in
import wrapped_flappy_bird as game
File "game/wrapped_flappy_bird.py", line 19, in
IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
File "game/flappy_bird_utils.py", line 42, in load
SOUNDS['die'] = pygame.mixer.Sound('assets/audio/die' + soundExt)
pygame.error: Unable to open file 'assets/audio/die.ogg'
[root@localhost DeepLearningFlappyBird]#

Checkpoint not found

I tried to run this and this showed on the terminal (on ubuntu):

W tensorflow/core/kernels/io.cc:228] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-dqn-30000

So I went to the "saved_networks" folder, and "bird-dqn-30000" was there.

Inside the checkpoint file, the content was:

model_checkpoint_path: "bird-dqn-30000"
all_model_checkpoint_paths: "bird-dqn-10000"
all_model_checkpoint_paths: "bird-dqn-20000"
all_model_checkpoint_paths: "bird-dqn-30000"

I changed the first line to:

model_checkpoint_path: "saved_networks/bird-dqn-30000"
all_model_checkpoint_paths: "bird-dqn-10000"
all_model_checkpoint_paths: "bird-dqn-20000"
all_model_checkpoint_paths: "bird-dqn-30000"

Worked fine.

I don't know if this is a real issue, or just something messy with my system. Just want to let you know.

question on freezing target nework

Hi @yenchenlin1994 , love your implementation!
I went through your code and I can't seem to find where you've frozen the target network?
Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

libpng warning: iCCP: known incorrect sRGB profile

i want know whats wrong about this , THX very much.

libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
Traceback (most recent call last):
File "deep_q_network.py", line 215, in
main()
File "deep_q_network.py", line 212, in main
playGame()
File "deep_q_network.py", line 209, in playGame
trainNetwork(s, readout, h_fc1, sess)
File "deep_q_network.py", line 82, in trainNetwork
readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices=1)
AttributeError: 'module' object has no attribute 'multiply'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.