Git Product home page Git Product logo

drl-dqn-atari-pong's Introduction

Deep Q-Learning algorithms on Atari Pong

Summary

    The goal of this application is to find out how accurate and effective can Deep Q-Learning (DQN) be on Atari 1600 game of Pong in OpenAI environment. On top of DQN, additional improvements on the same algorithm were tested, including Multi-step DQN, Double DQN and Dueling DQN. Results that can be seen on the graph below show that basic DQN achieves human-like accuracy after only ~110 played games and great accuracy after 300 games. Improved versions of DQN considered in this project showed greater efficiency and accuracy.

Pong Gif Pong Gif

Basic DQN: Episode 1 vs Episode 216

Environment

    Atari 1600 emulator is made by OpenAI in which you can test your reinforcement algorithms on 59 different games. Deep reinforcement learning is used because the input is a RGB picture of the current frame (210x160x3). Since the picture is computationally too expensive, it is turned to grayscale. Next is downsampling and cutting the image to a playable area, which size is 84x84x1. https://gym.openai.com/envs/Pong-v0/

Grayscale, downsampling and cropped


    In Pong, every game is played until one side earns 21 points. A point is gained when the other side didn't manage to return the ball. In terms of rewards for our agent, he gains -1 reward if he misses the ball, +1 reward if opponent misses the ball and 0 reward in every other case. After one side collects 21 points, total reward gained is calculated by the agent. Therefore minimum total reward is -21, human-like performance is over 0 and +21 is the best possible outcome.

DQN

    For the DQN implementation and the choosing of the hyperparameters, I mostly followed Mnih et al.. I improved the basic DQN, implementing some variations like Double Q-learning, Dueling networks and Multi-step learning. You can find them summarized by Hessel et al.. For more details about each improved version of DQN you can check out these papers:

Results

    Efficiency and accuracy are two main factors in calculating how good the results are. Efficiency means how quickly the agent achieves human-like level and accuracy represents how close is the agent to total reward of +21. Graphs represent how high was mean total reward (on last 40 games) after each game. The agent trained for each variation of algorithm for up to 500 games.

Optimizers

    Adam and RMSProp optimizers were the ones tested in this project. Graph with results comparing the two optimizers can be seen below. It is clear that RMSProp outperformed Adam in these tests, although more test runs are needed for better average values before giving a clear verdict. Some other optimizers can be tested in the future, like SGD or Adamax.

  • #ff7043 Basic DQN Adam
  • #bbbbbb Basic DQN RMSProp
  • #0077bb 2-step DQN Adam
  • #009988 2-step DQN RMSProp

Algorithms

    A few selected variations of implemented algorithms are shown below. Although it looks like 2 step DQN and Double DQN outperformed Dueling DQN in efficiency, important note to keep in mind is that these results need to be averaged over many runs, as both Double DQN and 2 step DQN showed high variancy in results (both better and worse than Dueling DQN). As for accuracy, Dueling DQN mixed with other variations of DQN showed the best results. For more information about viewing all of the data, check out the next section.

  • #ff7043 Basic DQN Adam
  • #ee3377 2-step Dueling DQN RMSProp
  • #009988 2-step Dueling Double DQN RMSProp
  • #0077bb 2-step Double DQN RMSProp
  • #bbbbbb 2-step DQN RMSProp

  • Mean total reward in last 10 games

    • Best efficiency recorded: 2-step DQN RMSProp - after 79 games
    • Best accuracy recorded: 2-step Dueling Double DQN RMSProp - 20.30 score (after 444 games)
  • Mean total reward in last 40 games

    • Best efficiency recorded: 2-step DQN RMSProp - after 93 games
    • Best accuracy recorded: 2-step Dueling Double DQN RMSProp - 19.48 score (after 473 games)

Rest of the data and TensorBoard

    Rest of the training data can be found at /content/runs. If you wish to see it and compare it with the rest, I recommend using TensorBoard. After installation simply change the directory where the data is stored, use the following command

LOG_DIR = "full\path\to\data"
tensorboard --logdir=LOG_DIR --host=127.0.0.1

and open http://localhost:6006 in your browser. For information about installation and further questions visit TensorBoard github

Telegram bot

    Since every run of up to 500 games takes about 3.5 - 4.5 hours, I implemented a Telegram bot to send me updates on how my training is doing. It can easily be created with the next few steps.

  • The first step is to create a new bot (sending '/newbot' command to BotFather and following his instructions). From BotFather you will get TOKEN_ID.
  • The second step is to send any message to your bot and go to 'https://api.telegram.org/botTOKEN_ID/getUpdates' where you replace TOKEN_ID with your token. There you will find CHAT_ID (result -> 0 -> message -> from -> id = CHAT_ID). Replace CHAT_ID and TOKEN_ID in telegram_bot.py with yours and you are good to go.
def telegram_send(message):
    chat_id = "CHAT_ID"
    token = "TOKEN_ID"
    bot = Bot(token=token)
    bot.send_message(chat_id=chat_id, text=message)

Future improvements

    For further improvements on efficiency and accuracy, we can do a couple of things:

drl-dqn-atari-pong's People

Contributors

leonjovanovic avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

naddeok96

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.