Comments (5)
Variance is proportional to the index of blocks + 1, because every time you have a (+), you are adding two things that we idealize as uncorrelated random vectors. When you add two things that are uncorrelated, their variance is simply the sum of the variances. Since the output of each block is idealized to be variance 1 due to the normalization within that block, the total variance of the activations in the trunk is the index of blocks + 1.
from katago.
what if i calculate the real variance of the input? What will happen?
from katago.
I updated https://github.com/lightvector/KataGo/blob/master/docs/KataGoMethods.md#fixed-variance-initialization-and-one-batch-norm with an additional diagram to make this more clear.
The point of this is to choose an initialization scale so that the entire operation the net is performing is variance-preserving, as of initialization. If you use the real variance of the input to the net instead of assuming the variance is 1, then the rule for each K would probably be to make the output of the normalized layer equal to whatever that real variance is, so that variance scale is constant through the whole net from input to output.
If you still idealize all the layers and sums properties, then it makes no difference because that will simply scale all the variances of all activations in the neural net proportionally and the K for each normalization will be the same.
If you use the actual empirical variance of every layer in the net instead of idealizing it, and normalize by the empirical value (taking into account the effect of doing so on subsequent layers properly), and continually update it throughout training, then you basically have batch normalization, or something very similar to it, depending on your implementation details.
from katago.
NestedBottleneckResBlock is different from introduction. Because use_repvgg_linear=True, it will add one more conv1x1 in NormActConv.
this is the code:
if self.conv1x1 is not None:
out = self.conv(out) + self.conv1x1(out)
else:
out = self.conv(out)
from katago.
Yes, there are some details like that, good noticing. In this case, it is still equivalent at inference time to not having the 1x1 convolution at all though, and in fact the C++ code doesn't have any 1x1 convolution there for the net that gets exported for self-play. You can add the 1x1 convolution weight directly into the center cell of the 3x3 convolution weights and then it is exactly equivalent to perform just the 3x3 convolution with the center cell of the 3x3 convolution having a higher learning rate.
Edit: So mostly, you can consider this a pretty unimportant detail. The training of the net is almost the same if you set use_repvgg_linear=False, so it's not really worth mentioning in a section that discusses nested bottleneck residual blocks - whether we have the extra 1x1 or not is pretty orthogonal to what the overall architecture with the bottleneck blocks is accomplishing.
from katago.
Related Issues (20)
- training speed for H200 HOT 10
- Compiled error at humansl branch on win10 VS2019 HOT 1
- Train a weight‘s weight, to predict the value of the next Katago weight HOT 3
- How to use Katago 1.14.1 with time cache? HOT 1
- Issue in training: low visit counts and strange initial conditions(board dimensions, komi settings, piece position) HOT 2
- kata-genmove_analyze with KEYVALUEPAIR not working HOT 1
- A problem building for CUDA HOT 2
- Why does lc0 ship cuda dlls with engine but katago doesn't?
- Minor document questions
- Error checks for kata-raw-human-nn HOT 1
- How to set rules to "twisted cross and eating" for beginner HOT 2
- Throwback HOT 3
- Lack of ability HOT 3
- Katago cannot give a definitive answer for the best move. HOT 2
- Report an error HOT 3
- allowResignation affects humanSL strength?? HOT 4
- Loose ladder
- katago 1.15.1 build failure HOT 1
- Do you have a plan to create a GUI? HOT 4
- kata-analyze reports a non-human-like move for the preaz_20k human profile HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from katago.