I have modified the backbone of Mask2Former to Vmamba, which requires the input size o

How should I fix the input size during testing? about mask2former HOT 3 OPEN

klkl2164 commented on July 19, 2024

How should I fix the input size during testing?

from mask2former.

Comments (3)

zhengyuan-xie commented on July 19, 2024

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

from mask2former.

klkl2164 commented on July 19, 2024

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

I use HUST's ViM as the backbonehttps://github.com/hustvl/Vim/blob/main/vim/models_mamba.py, in which PatchEmbed specifies the input size. I followed the Swin Transformer and added a padding operation, so non-fixed inputs can be used. Fortunately, both ViM and Mask2Former's pixel decoder do not have many requirements for input size. You can try modifying PatchEmbed in this way.
'''
class PatchEmbedfromswintransformer(nn.Module):

def __init__(self, img_size=224, patch_size=16, stride=16, in_chans=3, embed_dim=768, norm_layer=None, flatten=True):
    super().__init__()
    img_size = to_2tuple(img_size)
    patch_size = to_2tuple(patch_size)
    self.img_size = img_size
    self.patch_size = patch_size
    self.grid_size = ((img_size[0] - patch_size[0]) // stride + 1, (img_size[1] - patch_size[1]) // stride + 1)
    self.num_patches = self.grid_size[0] * self.grid_size[1]
    self.flatten = flatten

    self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride)
    self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

def forward(self, x):
    """Forward function."""
    # padding
    _, _, H, W = x.size()
    if W % self.patch_size[1] != 0:
        x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))
    if H % self.patch_size[0] != 0:
        x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))

    x = self.proj(x)  # B C Wh Ww


    if self.flatten:
        x = x.flatten(2).transpose(1, 2)  # BCHW -> BNC
    x = self.norm(x)

    return x

'''

from mask2former.

zhengyuan-xie commented on July 19, 2024

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

I use HUST's ViM as the backbonehttps://github.com/hustvl/Vim/blob/main/vim/models_mamba.py, in which PatchEmbed specifies the input size. I followed the Swin Transformer and added a padding operation, so non-fixed inputs can be used. Fortunately, both ViM and Mask2Former's pixel decoder do not have many requirements for input size. You can try modifying PatchEmbed in this way. ''' class PatchEmbedfromswintransformer(nn.Module):
def __init__(self, img_size=224, patch_size=16, stride=16, in_chans=3, embed_dim=768, norm_layer=None, flatten=True):
    super().__init__()
    img_size = to_2tuple(img_size)
    patch_size = to_2tuple(patch_size)
    self.img_size = img_size
    self.patch_size = patch_size
    self.grid_size = ((img_size[0] - patch_size[0]) // stride + 1, (img_size[1] - patch_size[1]) // stride + 1)
    self.num_patches = self.grid_size[0] * self.grid_size[1]
    self.flatten = flatten

    self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride)
    self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

def forward(self, x):
    """Forward function."""
    # padding
    _, _, H, W = x.size()
    if W % self.patch_size[1] != 0:
        x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))
    if H % self.patch_size[0] != 0:
        x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))

    x = self.proj(x)  # B C Wh Ww


    if self.flatten:
        x = x.flatten(2).transpose(1, 2)  # BCHW -> BNC
    x = self.norm(x)

    return x
'''

Thanks!

from mask2former.

How should I fix the input size during testing? about mask2former HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent