Git Product home page Git Product logo

Comments (5)

lightvector avatar lightvector commented on August 15, 2024

Variance is proportional to the index of blocks + 1, because every time you have a (+), you are adding two things that we idealize as uncorrelated random vectors. When you add two things that are uncorrelated, their variance is simply the sum of the variances. Since the output of each block is idealized to be variance 1 due to the normalization within that block, the total variance of the activations in the trunk is the index of blocks + 1.

from katago.

Nightbringers avatar Nightbringers commented on August 15, 2024

what if i calculate the real variance of the input? What will happen?

from katago.

lightvector avatar lightvector commented on August 15, 2024

I updated https://github.com/lightvector/KataGo/blob/master/docs/KataGoMethods.md#fixed-variance-initialization-and-one-batch-norm with an additional diagram to make this more clear.

The point of this is to choose an initialization scale so that the entire operation the net is performing is variance-preserving, as of initialization. If you use the real variance of the input to the net instead of assuming the variance is 1, then the rule for each K would probably be to make the output of the normalized layer equal to whatever that real variance is, so that variance scale is constant through the whole net from input to output.

If you still idealize all the layers and sums properties, then it makes no difference because that will simply scale all the variances of all activations in the neural net proportionally and the K for each normalization will be the same.

If you use the actual empirical variance of every layer in the net instead of idealizing it, and normalize by the empirical value (taking into account the effect of doing so on subsequent layers properly), and continually update it throughout training, then you basically have batch normalization, or something very similar to it, depending on your implementation details.

from katago.

Nightbringers avatar Nightbringers commented on August 15, 2024

NestedBottleneckResBlock is different from introduction. Because use_repvgg_linear=True, it will add one more conv1x1 in NormActConv.

this is the code:
if self.conv1x1 is not None:
out = self.conv(out) + self.conv1x1(out)
else:
out = self.conv(out)

from katago.

lightvector avatar lightvector commented on August 15, 2024

Yes, there are some details like that, good noticing. In this case, it is still equivalent at inference time to not having the 1x1 convolution at all though, and in fact the C++ code doesn't have any 1x1 convolution there for the net that gets exported for self-play. You can add the 1x1 convolution weight directly into the center cell of the 3x3 convolution weights and then it is exactly equivalent to perform just the 3x3 convolution with the center cell of the 3x3 convolution having a higher learning rate.

Edit: So mostly, you can consider this a pretty unimportant detail. The training of the net is almost the same if you set use_repvgg_linear=False, so it's not really worth mentioning in a section that discusses nested bottleneck residual blocks - whether we have the extra 1x1 or not is pretty orthogonal to what the overall architecture with the bottleneck blocks is accomplishing.

from katago.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.