floodsung / drl-flappybird Goto Github PK

Playing Flappy Bird Using Deep Reinforcement Learning (Based on Deep Q Learning DQN using Tensorflow)

Python 100.00%

drl-flappybird's Introduction

Playing Flappy Bird Using Deep Reinforcement Learning (Based on Deep Q Learning DQN)

Include NIPS 2013 version and Nature Version DQN

I rewrite the code from another repo and make it much simpler and easier to understand Deep Q Network Algorithm from DeepMind

The code of DQN is only 160 lines long.

To run the code, just type python FlappyBirdDQN.py

Since the DQN code is a unique class, you can use it to play other games.

About the code

As a reinforcement learning problem, we knows we need to obtain observations and output actions, and the 'brain' do the processing work.

Therefore, you can easily understand the BrainDQN.py code. There are three interfaces:

getInitState() for initialization
getAction()
setPerception(nextObservation,action,reward,terminal)

the game interface just need to be able to feed the action to the game and output observation,reward,terminal

Disclaimer

This work is based on the repo: yenchenlin1994/DeepLearningFlappyBird

drl-flappybird's People

Contributors

Stargazers

Watchers

Forkers

algobox vsooda zxzang apple006 ai42 trigrass2 milofong jackeylu moran1986 mcfair yingpcao jialrs situgongyuan lebinhe soledad89 hdyen buduo15 vvw wangxiao5791509 jsonbao matrix10 kennethliukai aaronzhudp realentertain parislhz splendor-kill adairzhao xfdywy li-haoran zhongxingpeng benjamesbabala zannet zhangjiulong gxzhouyong aojiang1991 manu34414 liltonlili hardwalker helloyhan deepalcoholic iceorfire lovewenlee wuntoguo lynnwong11 starrysky1213 v1ns0n lwhuang xiaojingyi snakeroot91 hkxiron zrclll yy28 szy1900 wuruiqi searobbersduck rock999 saadmahboob thblue zlake seanm29 huangpc xielm12 guokr1991 gognlin cyberspacefighter wsoneking herber523 caidongyun sylvia1664 jkx1994 harujiang chop2 kinganeng biwoodfengs lanpay-lulu nanfengpo tyler998 islight caozhengquan allenyzx zssasa amano-ginji 15863004186sunchi ericustc timemao zen-z carolwang1203 gavin-cg fairywindchen rpgyang leonzhang2015 akpotter zaxon xulizhi321 taoshenhaha nwy2010 metazhi fcfangcc gongyanchao chjq201410695

drl-flappybird's Issues

Setting Difficulty level of the Game

Hi,

Thanks for your nice code and documentation.

I saw the report from Kevin Chen [http://cs229.stanford.edu/proj2015/362_report.pdf] where he experimented with three difficulty levels (easy, medium, hard) of the game. Can you please tell me which difficulty level the game is set in your code ? and How to change the difficulty level if I want to?

I guess, it's related to value of PIPEGAPSIZE in wrapped_flappy_bird.py.. currently it's set to 100. Is that hard mode? By Increasing or decreasing the PIPEGAPSIZE, can I change the difficulty level? If so, are there any specific value for those modes?

Thanks!

New observation update problem

Hello @songrotek , code here seems to keep the oldest 3 frames forever, which means the algorithm is not using the newest 4 frames to represent state.

How do 1 and -1 reward be used?

I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.

In Chinese：是不是reward为1和-1的情况也都放在deque里，那么reward为1和-1的被sample出来的几率岂不是很低，反馈就会很慢？

@songrotek Thank you.

Why do the program only use two state?

I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @songrotek

Not found: saved_networks/network-dqn-10000

我套用了您的BrainDQN_Nature模块，基本都没有改变，但是跑起来的时候发现有个问题是：
tensorflow.python.framework.errors.NotFoundError: saved_networks/network-dqn-10000.tempstate14721420171424531239
[[Node: save/save = SaveSlices[T=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/save/tensor_names, save/save/shapes_and_slices, Variable, Variable/Adam, Variable/Adam_1, Variable_1, Variable_1/Adam, Variable_1/Adam_1, Variable_10, Variable_11, Variable_12, Variable_13, Variable_14, Variable_15, Variable_16, Variable_17, Variable_18, Variable_19, Variable_2, Variable_2/Adam, Variable_2/Adam_1, Variable_3, Variable_3/Adam, Variable_3/Adam_1, Variable_4, Variable_4/Adam, Variable_4/Adam_1, Variable_5, Variable_5/Adam, Variable_5/Adam_1, Variable_6, Variable_6/Adam, Variable_6/Adam_1, Variable_7, Variable_7/Adam, Variable_7/Adam_1, Variable_8, Variable_8/Adam, Variable_8/Adam_1, Variable_9, Variable_9/Adam, Variable_9/Adam_1, beta1_power, beta2_power)]]
请问下这个是啥问题，该在哪里修改？谢谢！

Why did you need copyTargetQNetwork

I have no idea about the meaning of copyTargetQNetwork. Why did we need QValueT to eval the QValue_batch? In order to let training process more stable ?

floodsung / drl-flappybird Goto Github PK

drl-flappybird's Introduction

Playing Flappy Bird Using Deep Reinforcement Learning (Based on Deep Q Learning DQN)

Include NIPS 2013 version and Nature Version DQN

About the code

Disclaimer

drl-flappybird's People

Contributors

Stargazers

Watchers

Forkers

drl-flappybird's Issues

Setting Difficulty level of the Game

New observation update problem

How do 1 and -1 reward be used?

Why do the program only use two state?

Not found: saved_networks/network-dqn-10000

Why did you need copyTargetQNetwork

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent