Git Product home page Git Product logo

tnt-battlesnake's People

Contributors

fkluger avatar frederikschubert avatar sunnihu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tnt-battlesnake's Issues

Experiment with different state representation

Currently the state is encoded as a (width + 1, height) tensor that contains an encoding of the different entities on their positions. Also in the last column the health points of the agent snake are encoded.

An example state looks like this:

32   32   32   32   32   32   32  67
32   00   42   00   00   31   32  00
32   00   00   00   00   32   32  00
32  -45   32   33   00   33   32  00
32   32   32   32   32   32   32  00

This encodes the following state:

  • width = 7
  • height = 5
  • own health = 67
  • fruit @ [2, 1]
  • agent snake @ [[1, 3], [2, 3], [3, 3]] = [head, body, tail]
  • enemy snake @ [[5, 1], [5, 2], [5, 3]] = [head, body, tail]

It might make training easier if the different entities are represented as different channels of the state. There would be a channel for the agent snake, fruits and obstacles (enemy snakes and walls). For this the observe method has to be modified.

Implement Double-Q Agent

Problem

The DQN agent overestimates Q-Values because of the max operation.

Solution

Use a copy of the network that is updated less frequently for the computation of the target Q-Value and estimate the Q-Value with the existing network.

Steps

  1. Implement double_dqn.py here where the second network is updated every n training steps of the first network.
    • Updating the second network means copying the weights from the first network to it.
  2. Change this line to use the second network.

Perform multiple replays

Performing one replay of (for example) 64 samples per new observation seems to slow down the training significantly. We could try to perform multiple replays in succession every n observations for faster training iterations.

Implement Dueling network

Problem

For some states the action to take does not matter. Still the network has to approximate the Q-Value for each action for each state. This means that to arrive at a correct Q-Value for such a state, the network has to update its weights once for each possible action.

Solution

Separate the Value of the state s and the Value of the action a according to the following formula.

Q(s, a) = V(s) + A(s, a) - mean(A(s, a'))

Intuitively it computes the Q-Value from the Value of the state and the advantage A of an action over the other actions in that state.

Steps

  1. Implement dueling_dqn.py here.
    • An implementation can be found here.

Stop and continue training

A mechanism should be implemented to stop and continue training. For this the following data should be persisted regularly and restored:

  • DQN weights (Use model.save(...) and keras.models.load_model(...))
  • Episode count

Also a file has to be created that documents the hyperparameters for each training.

Implement prioritized replay

Problem

Sampling observations with uniform probability leads to an overrepresentation of states that occur more often.

Solution

Assign a priority to each observation in the memory. The priority p of an observation could be defined via the error in the estimation of its Q-Value. epsilon is the minimum probability and alpha defines the amount of prioritization that is used.

p = (error + epsilon)^alpha

For efficient insertion and retrieval we will use a Sum-Tree.

Steps

  1. Implement experienced_replay.py using the Sum-Tree here.

Create Tensorboard summaries

The following summaries will be useful for debugging the training:

  • Loss
  • Rewards
  • Episode length
  • Action distribution

For evaluation the summaries should also include:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.