zackattack614 / blackbird Goto Github PK
View Code? Open in Web Editor NEWBoard game self-learning algorithm.
License: MIT License
Board game self-learning algorithm.
License: MIT License
children_probs = [ (child.N ** (1/self.temperature)) / sum([child.N ** (1/self.temperature) for child in self.root.children]) for child in self.root.children]
its not clear in this expression which iterator child is from when evaluating child.N
Consider the following scenario:
A has children b,c,d
We explore A. Expand the children and explore c.
A.N = 1
c.N = 1
b.N = 0
d.N = 0
The U values for b.N/d.N haven't been updated to account for the change in c.N's value.
Also, sum(child.N for child in parent.children) = parent.N.
A chess BoardState class needs to exist for BlackBird to learn the game.
The class should inherit from GameState, and override all functions.
https://github.com/jordan-singer/BlackBird/blob/master/src/Network.py#L219
The teacherPolicy variable is not used anywhere.
Given a game state written in a protobuf, the corresponding BoardState object should be able to deserialize and return a full game state to train on.
The MCTS algorithm doesn't back up the number of plays if we hit an end game state. This results in occasionally ~no exploration, since we can iterate to an game end that the AI thinks is good value (regardless of if it is), and then we will continue to go down that branch and quit.
The simulations should not stop just because we stumbled upon an end game.
Serialization of policy and evaluation shouldn't be handled by the gamestate class.
Loss, as defined here, is just the first element of a column vector. It should use reduce_sum
over the vector, not just return one element of that vector.
An end user should be able to modify the architecture of their neural network via a GUI.
The current learning setup provides only a constant learning rate for the network's loss calculation. This rate should decrease over time, in some clever fashion.
Should this read example.State.Player
?
The BlackBird.TrainingExample
class should be removed, and replaced with a tf.data.Dataset
. The Network.train()
method should use a Dataset object.
The win/loss/draw counts vs random, old, and standard MCTS should be logged in the TrainingStatisticsFact
table.
Dirichlet noise should only be applied in self-play in order to aid in exploration in training, not during network evaluation or official play.
How the code is now:
The problem here is that we 1. chose a node. Then 2. updated the value of that node to be the average of the children values. This doesn't make sense, since an intelligent network would never have chosen all of those children.
Instead, we should only be backing up the value of the move we thought was realistic to make.
A BoardState
's Board member object should have a constant pane of how many times that position has been seen in the game's history.
This is helpful, for example, in informing BlackBird how close it is to a triple repetition in chess.
The softmax function is applied twice in the network's policy head; it should only be applied once. Also note that the output size of the policy is hard-coded to 9, rather than a variable size representing the shape of the board.
We iterate over the entire state history to generate rewards, not just the states in that game.
That is done every game. It just iterates over the entire history and adds ~random rewards to the list.
Training games that are generated on a client computer should be able to be published for a centralized server to train the next network on.
To ensure that game states are as compact as possible before transferring over the wire to a central repository, game states should be serialized in a ProtoBuf. Current state is JSON serialization, which is much less efficient.
BlackBird needs a rating system so that performance across training sessions can be measured.
BoardState arrays should include historical game state data. This will affect the shape of the neural network input, and how data is serialized.
To sample the best branch for exploration + exploitation, the relative expected values of all of the nodes need to be compared after every update.
This code
selected_node = self.root
is only called once. Once it is set to the root, all of the subsequent playouts dive deeper into the same branch
while current_playouts < self.max_playouts: while any(selected_node.children): children_QU = [child.Q + child.U for child in selected_node.children] selected_node = selected_node.children[np.argmax(children_QU)]
It should instead start from the root again and recheck the values to make sure that it is exploring the optimal path, and not something it discovered to suck.
BlackBird's network should be able to train from game states stored in the blackbird.db file.
After each training session, a network is created. This network should have a codename associated with it in the form <adjective>_<noun>, e.g. "pretty_paperclip."
Games generated by local clients should be inserted into the SQLite3 database by default. A toggle in parameters.yaml should exist for this feature.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.