Hi, thank you so much for your work! I have one question about the s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question about residual connection about detr HOT 6 CLOSED

facebookresearch commented on August 23, 2024

Question about residual connection

from detr.

Comments (6)

szagoruyko commented on August 23, 2024 4

@mli0603 see the section 4.2 'Importance of positional encodings' in the paper for the architectural choices on where to pass positional encodings. There is also Table 3, in which the second row corresponds to vanilla Transformer where we pass positional encodings once at transformer input, the one you are referring to (also used in the demo colab). As we explain in text, passing encodings in attention directly leads to a significant performance boost.

from detr.

szagoruyko commented on August 23, 2024

in the code you pointed the positional encodings are added in the first line of the function, see https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L154 can you elaborate?

from detr.

mli0603 commented on August 23, 2024

@szagoruyko The residual connection is what I refer to. If you look at line 157, the residual connection is made upon the original input instead of the position encoded input. I.e., currently in the code it is

src = src + self.dropout1(src2)

while I think both the paper and the original transformer paper describe it as

src = q + self.dropout1(src2)

where q is the input with position encoding.

Does this clear things up a little bit? Sorry if the previous description was too confusing.

from detr.

mli0603 commented on August 23, 2024

@szagoruyko Another issue I see is that the positional encoding is added to the src for every encoder/decoder layer in the for loop (https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L76) by with_pos_embedding (https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L154). Is this necessary? In the paper, the positional encoding is only added once, which makes more sense to me.

from detr.

mli0603 commented on August 23, 2024

@szagoruyko Thanks for your comment on pos encoding! I am sorry I completely misunderstood that paragraph in the paper. It now makes sense.

I still wonder if there is any explicit design choices on residual connection between connecting from image featues src directly and from position encoded features q (my first question above). Maybe it is just too minor to make a difference? I really appreciate it.

from detr.

dashesy commented on August 23, 2024

That paragraph now makes much more sense. There is also another deviation from original transformer it seems, in that you apply position encoding only on key and query, but not on value. I did not see in the paper if that choice also improves the performance.

Even in the original paper value output from encoder does not get position embedding treatment, so it makes sense to avoid it for all the values. You add the position (to k,q) for all layers, so it makes sense to not add it to value in any of them.

from detr.

Recommend Projects

Question about residual connection about detr HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent