tnt-battlesnake's Issues
Perform multiple replays
Performing one replay of (for example) 64 samples per new observation seems to slow down the training significantly. We could try to perform multiple replays in succession every n observations for faster training iterations.
Add "simulation" method to battlesnake_env
Implement proper rendering in battlesnake_env
Implement Double-Q Agent
Problem
The DQN agent overestimates Q-Values because of the max operation.
Solution
Use a copy of the network that is updated less frequently for the computation of the target Q-Value and estimate the Q-Value with the existing network.
Steps
Create Tensorboard summaries
The following summaries will be useful for debugging the training:
- Loss
- Rewards
- Episode length
- Action distribution
For evaluation the summaries should also include:
- State - Q-Value distribution pairs
- State relevancy map
Implement Dueling network
Problem
For some states the action to take does not matter. Still the network has to approximate the Q-Value for each action for each state. This means that to arrive at a correct Q-Value for such a state, the network has to update its weights once for each possible action.
Solution
Separate the Value of the state s
and the Value of the action a
according to the following formula.
Q(s, a) = V(s) + A(s, a) - mean(A(s, a'))
Intuitively it computes the Q-Value from the Value of the state and the advantage A
of an action over the other actions in that state.
Steps
Initialize using a random agent
To prevent falling into a local minimum, we should fill the replay memory with observations of a random agent.
Experiment with different state representation
Currently the state is encoded as a (width + 1, height) tensor that contains an encoding of the different entities on their positions. Also in the last column the health points of the agent snake are encoded.
An example state looks like this:
32 32 32 32 32 32 32 67
32 00 42 00 00 31 32 00
32 00 00 00 00 32 32 00
32 -45 32 33 00 33 32 00
32 32 32 32 32 32 32 00
This encodes the following state:
width = 7
height = 5
own health = 67
fruit @ [2, 1]
agent snake @ [[1, 3], [2, 3], [3, 3]] = [head, body, tail]
enemy snake @ [[5, 1], [5, 2], [5, 3]] = [head, body, tail]
It might make training easier if the different entities are represented as different channels of the state. There would be a channel for the agent snake, fruits and obstacles (enemy snakes and walls). For this the observe method has to be modified.
Implement prioritized replay
Problem
Sampling observations with uniform probability leads to an overrepresentation of states that occur more often.
Solution
Assign a priority to each observation in the memory. The priority p
of an observation could be defined via the error
in the estimation of its Q-Value. epsilon
is the minimum probability and alpha
defines the amount of prioritization that is used.
p = (error + epsilon)^alpha
For efficient insertion and retrieval we will use a Sum-Tree.
Steps
- Implement
experienced_replay.py
using the Sum-Tree here.
Stop and continue training
A mechanism should be implemented to stop and continue training. For this the following data should be persisted regularly and restored:
- DQN weights (Use
model.save(...)
andkeras.models.load_model(...)
) - Episode count
Also a file has to be created that documents the hyperparameters for each training.
Implement rule based agent based on the dqn agent
https://github.com/fkluger/tnt-battlesnake/blob/master/dqn/agent/dqn_agent.py
Maybe use this one as a starting point:
https://github.com/Hawstein/snake-ai
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.