Git Product home page Git Product logo

Comments (3)

bowenc0221 avatar bowenc0221 commented on July 19, 2024

The decrease in speed is mainly due to using MSDeformAttn as the pixel decoder (compared to multi-scale features and masked attention). Using BiFPN as the pixel decoder can improve speed by (approximately) 25% and using FPN can probably improve it by 40%.

Mask2Former is highly modularized (you can make independent change to backbone, pixel decoder or transformer decoder), if you care about speed, I would suggest making changes in the following order:

  1. pixel decoder (using more efficient FPN instantiation)
  2. backbone (using more efficient backbone design)
  3. Transformer decoder (reduce the number of decoder layers, using single-scale low-resolution features)

Masked attention does not increase computation much and it is the most important component, so I would not suggest removing it to increase speed.

from mask2former.

lucasjinreal avatar lucasjinreal commented on July 19, 2024

@bowenc0221 thanks, that's helpful. you mean if remove the origin pixel decoder, which is maskatten based, althought speed up, but accuracymight drop a lot?

from mask2former.

bowenc0221 avatar bowenc0221 commented on July 19, 2024

No, the Masked Attention is only used to replace Cross Attention in the Transformer decoder (it does not matter what pixel decoder you use). I mean compared to cross-attention, masked attention is not much slower.

from mask2former.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.