Comments (5)
I would rather recommend that @lightvector try the method presented in the paper Large Memory Layers with Product Keys (see also Lample and LeCun's tweets). The paper deals with transformers for NLP tasks; a memory-augmented 12-layer transformer outperformed a baseline 24-layer transformer. Increasing model capacity without slowing down inference is very attractive, since speed is crucial for game AI but 15b models are underfit compared to 40b models. The human approach to games involves not only development of intuition and reading/simulation skills but also accumulation of knowledge in memory (or other storage media, like books); memory layers enable NNs to accumulate knowledge within itself. Increased capacity potentially allows the NNs to remember rare shapes and refutations to abandoned josekis, and to deal with liberties and ladders better (not so relevant for KataGo); as McAllester pointed out, such discretization/quantization/hard/sparse attention/dynamic routing mechanism employed in the key-value memory layers potentially reduces forgetting, so randomness in the opening for the sake of diversity in training data may not be necessary. Board games also has a more discrete flavor than images which may be better captured by memory instead of linear transformations, and involves more reasoning which makes it closer to visual question answering (VQA) tasks which have also employed memory-augmented networks.
Here is my ready-for-action summary of the product-key memory method introduced in the paper, adapted to CNNs according to my understanding:
- replace two (in the paper the 6th and 12th layers are replaced in a 16-layer model) of the convolutional layers (from d channels to d channels) by the following: a convolutional layer from d channels to dq channels; after batch normalization, the dq-dimensional vector at each of the 361 intersections on the board is split into two dq/2-dimensional query vectors. The memory is indexed by two subkeys, and each subkey can take |C| (learnable) possible values. For each of the two query vectors, find k nearest subkeys (in terms of inner product), retrieve the memory contents (learnable d-dimensional vectors) corresponding to the nearest k product keys, and average them in a certain way.
In the main experiments in the paper, d = 1024 or 1600, dq = 512, k = 32, |C| = 512, memory size = 512^2.
from katago.
If we're still interested in this, it doesn't only have to be me to try all the stuff! Too much of a bottleneck. :)
A way that anyone else could contribute here, even without having a lot GPUs, only needing one GPU, would be for example to do supervised learning on KataGo training data. Implement the desired method in Tensorflow side-by-side with the existing one, and show that the new one works a lot better getting up to strong levels on the existing data or achieving low loss. Then, with the method already working in Tensorflow, it would be pretty trivial to just drop the architecture in to a subsequent real self-play run.
from katago.
I had to double check to make sure, but this isn't a type of neural network architecture. It's a method for selecting efficient hyperparameters for a given neural network architecture.
In other words, there is nothing preventing SE (or GCNet for that matter) AND EfficientNet from being used at the same time.
However, that being said, image recognition tasks are much simpler conceptually than Go. Image recognition tasks are trained on labeled data sets and tested on a known test set. In Go training target is constantly changing as the net learns how to play. I haven't read nearly enough of this paper to know if that is a problem to for EfficientNet approach though.
from katago.
Trying out AutoML before applying EffecientNet seems reasonable here, though experiments with auxiliary targets/features and other RL-accelerating tricks have higher priorities.
from katago.
Possibly relevant in the sense that it affects the model code and training data format - KataGo has added some new targets to help learn Japanese rules and is also experimenting with dropping batch norm entirely. By the end of this year, at which point the end of year run has actually started for real and is a healthy way along (despite some current delays in finalizing implementation details) I'll of course push all the new model code, trained models, and data.
from katago.
Related Issues (20)
- Interact with KataGo via Python HOT 3
- Potential Division by Zero in `searchexplorehelpers.cpp` HOT 2
- I need the simplest method to call the AI model. For example, I directly input the black piece's position, such as[4,5], and the AI model gives the white piece's position, such as[6,7]. Is there a simple way to achieve this by code ? HOT 1
- Is there an empirically optimal maxVisits value that can yield more accurate winrate data? HOT 11
- Data format in SGF files? HOT 2
- Katago not supported on win11 ? HOT 2
- Wrong definition of self.conv_spatial parameter in katago's model_pytorch file? HOT 2
- Draw game with x.5 komi? HOT 2
- There is a flying dagger game on the Wildfox Go gaming platform, flying dagger is to play 2 moves at a time。can find the combination with the highest win rate from the candidate moves by exhaustive enumeration, or modify the code to realize it. HOT 10
- GTP engine can't be synced to current state (im on mac) HOT 1
- Is there a katago function that can do dead-end questions? HOT 3
- model_pytorch.py is too messy and not easy to read. HOT 5
- The introduction of Fixed Variance Initialization and One Batch Norm is different from code. HOT 5
- Training katago to play 2 moves in a row on an 11×11 board HOT 10
- Is it meaningful to let neutral network model analyze less vertices to speedup computing? HOT 9
- I want to train a katago starting from 0, specifying rules such as 2 moves in a row, and training mainly 11 way boards. Is there any introductory tutorial for newbies(我想训练一个从0开始的katago,指定规则,比如连下2步,主要训练11路棋盘。有没有适合新手的入门教程) HOT 4
- katago has dead-alive bugs, is it possible to train a katago ai that specializes in solving dead-alive problems?(katago有死活的bug,能否训练一个专门解决死活题的katago ai?) HOT 1
- Why does v1.13.0 katago perform worse than v1.11.0
- v1.13.1 TensorRT katago failed to use or generate trtcache when genconfig HOT 2
- Note : Fun tesuji from Igo Hatsuyoron 120 trained katago
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from katago.