Git Product home page Git Product logo

Comments (6)

szagoruyko avatar szagoruyko commented on August 23, 2024 4

@mli0603 see the section 4.2 'Importance of positional encodings' in the paper for the architectural choices on where to pass positional encodings. There is also Table 3, in which the second row corresponds to vanilla Transformer where we pass positional encodings once at transformer input, the one you are referring to (also used in the demo colab). As we explain in text, passing encodings in attention directly leads to a significant performance boost.

from detr.

szagoruyko avatar szagoruyko commented on August 23, 2024

in the code you pointed the positional encodings are added in the first line of the function, see https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L154 can you elaborate?

from detr.

mli0603 avatar mli0603 commented on August 23, 2024

@szagoruyko The residual connection is what I refer to. If you look at line 157, the residual connection is made upon the original input instead of the position encoded input. I.e., currently in the code it is

src = src + self.dropout1(src2)

while I think both the paper and the original transformer paper describe it as

src = q + self.dropout1(src2)

where q is the input with position encoding.

Does this clear things up a little bit? Sorry if the previous description was too confusing.

from detr.

mli0603 avatar mli0603 commented on August 23, 2024

@szagoruyko Another issue I see is that the positional encoding is added to the src for every encoder/decoder layer in the for loop (https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L76) by with_pos_embedding (https://github.com/facebookresearch/detr/blob/master/models/transformer.py#L154). Is this necessary? In the paper, the positional encoding is only added once, which makes more sense to me.

from detr.

mli0603 avatar mli0603 commented on August 23, 2024

@szagoruyko Thanks for your comment on pos encoding! I am sorry I completely misunderstood that paragraph in the paper. It now makes sense.

I still wonder if there is any explicit design choices on residual connection between connecting from image featues src directly and from position encoded features q (my first question above). Maybe it is just too minor to make a difference? I really appreciate it.

from detr.

dashesy avatar dashesy commented on August 23, 2024

That paragraph now makes much more sense. There is also another deviation from original transformer it seems, in that you apply position encoding only on key and query, but not on value. I did not see in the paper if that choice also improves the performance.

Even in the original paper value output from encoder does not get position embedding treatment, so it makes sense to avoid it for all the values. You add the position (to k,q) for all layers, so it makes sense to not add it to value in any of them.

from detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.