I'm trying to train a image recognition network which should be invariant under SO(2).

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

escnn's conv, BN, relu is not equivariant? about escnn HOT 3 CLOSED

quva-lab commented on May 29, 2024

escnn's conv, BN, relu is not equivariant?

from escnn.

Comments (3)

Gabri95 commented on May 29, 2024

Hi @Guptajakala

I suspect the problem is that you are testing the invariance of your model's output when the model's output is not invarint but equivariant.
Indeed, your output type contains n_feat copies of a regular representation, i.e. your 16*4 dimensional output splits in 16 blocks of size 4. The 4 channels within each block permute when the input rotates.
Your test is not accounting for this.

To properly check for equivariance, you should "rotated back" the output out by n_rot.
You could do that by wrapping out in a new GeometricTensor with type out_type and then use the transform_fiber method.
In other words, you should replace out with

GeometricTensor(out, out_type).transform_fibers(gspace.fibergroup.element(-n_rot)).tensor

This is a bit verbose since you unwrapped GeometricTensors and you used cv2 to rotate.
The code is a bit shorter if you use one of our pooling operators (which return GeometricTensors) and loop over gspace.testing_elements() (which returns already a list of GroupElements).

Hope this helps,
Gabriele

from escnn.

Guptajakala commented on May 29, 2024

@Gabri95 There is a pooling layer AdaptiveAvgPool2d at last. After the equivariance conv, suppose the feature shape is (B,D,H,W). After pooling, isn't it (B,D,1,1) and thus invariant? I guess even if I "rotate back", that single scalar in each HW dimension doesn't make any difference?

from escnn.

Gabri95 commented on May 29, 2024

Hi @Guptajakala

In this way, the output will be (approx) invariant to translations but not to rotations.
This is because the D channels in the output of shape (B, D,1,1) will rotate when the input rotates, since you chose out_type = enn.FieldType(gspace, [gspace.regular_repr]*n_feat).
What you say would be correct if you used out_type = enn.FieldType(gspace, [gspace.trivial_repr]*n_feat).

When using out_type = enn.FieldType(gspace, [gspace.regular_repr]*n_feat), you can think of the channels dimension as being features over the rotation subgroup.
Check our tutorial notebook for a more intuitive description of the features of Steerable CNNs.

Hope this helps!
Gabriele

from escnn.

Recommend Projects

escnn's conv, BN, relu is not equivariant? about escnn HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent