Git Product home page Git Product logo

pytorch-deformable-convolution-v2's Introduction

PyTorch-Deformable-Convolution-v2

Don't feel pain to use Deformable Convolution v2(DCNv2)

If you are curious about how to visualize offset(red point), refer to offset_visualization.py

Usage

from dcn import DeformableConv2d

class Model(nn.Module):
    ...
    self.conv = DeformableConv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1)
    ...

Experiment

You can simply reproduce the results of my experiment on Google Colab.

Refer to experiment.ipynb!

Task

Scaled-MNIST Handwritten Digit Classification

Model

Simple CNN Model including 5 conv layers

class MNISTClassifier(nn.Module):
    def __init__(self,
                 deformable=False):

        super(MNISTClassifier, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv3 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1, bias=True)   
        conv = nn.Conv2d if deformable==False else DeformableConv2d
        self.conv4 = conv(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv5 = conv(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        
        self.pool = nn.MaxPool2d(2)
        self.gap = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(32, 10)
        
    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(x) # [14, 14]
        x = torch.relu(self.conv2(x))
        x = self.pool(x) # [7, 7]
        x = torch.relu(self.conv3(x))
        x = torch.relu(self.conv4(x))
        x = torch.relu(self.conv5(x))
        x = self.gap(x)
        x = x.flatten(start_dim=1)
        x = self.fc(x)
        return x

Training

  • Optimizer: Adam
  • Learning Rate: 1e-3
  • Learning Rate Scheduler: StepLR(step_size=1, gamma=0.7)
  • Batch Size: 64
  • Epochs: 14
  • Augmentation: NONE

Test

In the paper, authors mentioned that the network's ability to model geometric transformation with DCNv2 is considerably enhanced.

I verified it with scale augmentation.

All images in the test set of MNIST dataset are augmented by scale augmentation(x0.5, x0.6, ..., x1.4, x1.5).

Results

Model Top-1 Accuracy(%)
w/o DCNv2 90.03%
w/ DCNv2 92.90%

References

mxnet implementation

To Do Lists

  • Support Onnx Conversion

pytorch-deformable-convolution-v2's People

Contributors

developer0hye avatar redcof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pytorch-deformable-convolution-v2's Issues

About the kernel size

Hi, Your code is concise and beautiful, but it may not consider the non-square kernel size like a tuple (3,6), I think it might be better to add a judgment on tuples.

NaN values in offset

In my model, the offset values get super big and small, so they end up with NaN values. Because of this, my network does not work. How can I fix it? I see you use the clamp() function to limit offset values. But it is useless in this case.

About visualization

what the red dots mean? Do they mean the sample locations?
When visualize conv4, the conv3 is a vanilla convolution, so it is reasonable use resize function to calculate output_conv3's the reception field. But when visualize conv5, the conv4 is a deformable convolution. Is it something wrong with conv5?
Do I get wrong?

How to understand offset in torchvision.ops.deform_conv2d?

As we know, the sampling grid of standard 33 conv is {(-1,-1),...,(1,1)}. In the deformable conv with kernel size 33, the sampling grid is {(-1+p1_x,-1+p1_y),...,(1+p9_x,1+p9_y)}. For torchvision.ops.deform_conv2d, each value in the offset map denotes {p1_x,....,p9_x;p1_y,.....p9_y}? Or, it is {(-1+p1_x),...,(1+p9_x);(-1+p1_y),...,(-1+p9_y)}? Look forward to your reply.

Reproducability/Setting seed

Hi,
Thanks for your code.
The issue is not directly related to your code, but the torchvision.ops.deform_conv2d - function, that is wrapped.
I was wondering if it is possible to preserve reproducibility? When setting all the common torch/numpy/random seeds, I don't get same results when using torchvision.ops.deform_conv2d.

Thanks again!

what is torch version for this dcn?

firstly, thank for your good job. I use torch1.8.0 ,but torchvision.ops.deform_conv2d() function no parameter 'mask' , any solution for it?

How does this implementation differ from the one by Chengdazhi?

The implementation by Chengdazhi is Old (2 years ago) and contains build requirements. However, your implementation is pretty simple, easy, and straightforward! (Thanks for that! :) But I just want to know what is the difference between these two implementations, and have PyTorch included a few operators in their latest version so that we don't require to build anything from scratch for Deformable Convs??

DCNv2 performs worse than regular convolution.

Hey,

I'm working on vascular detection using representation learning based on medical ultrasound data.
My aim is to analyse if the use of DCNv2 has benefits compared to the use of regular convolution.

For that I created auto-encoders with and without deformable convolutional layers.

So far using deformable convolution lead to worse results than using regular convolution.
That's not what I was expecting after reading multiple papers and articles which are praising DCN. (i.e. DCN for MRI classification, msracver DCNv2 implementation, towardsdatascience article)

I was hoping someone could give me some hints how I could improve architecture of the auto-encoder, so that training with deformable convolution yields to better results.

Training specifications:

training_dataset size=45.000 % patches
patch_size=(24,24) % in pixels
batch_size=128
latent_space_dimension=128
epochs=500
learning_rate=0.001
normalize=false % patches have values between 0 and 255
loss_critesion=Mean_Squared_Error
optimizer=torch.optim.SGD()

Architectures:

The results of the training runs has been uploaded to Weights and Biases (see links).

regular convolution

Results of training using regular convolution.

class Cnn53maxNormDropModelEncoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Cnn53maxNormDropModelEncoder, self).__init__()

        self.conv1 = self.conv1_block(1, 32)
        self.pool1 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.conv2 = self.conv2_block(32, 64)
        self.pool2 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.fc1 = torch.nn.Linear(1024, 128)
        self.relu = torch.nn.LeakyReLU()
        self.norm = torch.nn.BatchNorm1d(128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.fc2 = torch.nn.Linear(128, num_classes)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(5, 5)),
            torch.nn.LeakyReLU()
        )

    @staticmethod
    def conv2_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(3, 3)),
            torch.nn.LeakyReLU()
        )

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)

        out = self.norm(out)
        out = self.drop(out)
        out = self.fc2(out)

        return out, size1, size2, indices1, indices2


class Cnn53maxNormDropModelDecoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Cnn53maxNormDropModelDecoder, self).__init__()
        self.fc2 = torch.nn.Linear(num_classes, 128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.norm = torch.nn.BatchNorm1d(128)
        self.relu = torch.nn.LeakyReLU()
        self.fc1 = torch.nn.Linear(128, 1024)
        self.pool2 = torch.nn.MaxUnpool2d((2, 2))
        self.conv2 = self.conv2_block(64, 32)
        self.pool1 = torch.nn.MaxUnpool2d((2, 2))
        self.conv1 = self.conv1_block(in_c=32, out_c=1)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(5, 5))
        )

    @staticmethod
    def conv2_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(3, 3))
        )

    def forward(self, encoded, size1, size2, pool1, pool2):
        out = self.fc2(encoded)
        out = self.drop(out)
        out = self.norm(out)

        out = self.relu(out)
        out = self.fc1(out)
        out = out.view(out.size(0), 64, 4, 4)

        out = self.pool2(out, pool2, output_size=size2)
        out = self.conv2(out)

        out = self.pool1(out, pool1, output_size=size1)
        out = self.conv1(out)

        return out


class Cnn53maxNormDropModel(torch.nn.Module):

    def __init__(self, model_encoder, model_decoder):
        super(Cnn53maxNormDropModel, self).__init__()
        self.encoder = model_encoder
        self.decoder = model_decoder

    def forward(self, x):
        encoded, size1, size2, indices1, indices2 = self.encoder(x)
        decoded = self.decoder(encoded, size1, size2, indices1, indices2)
        return decoded
deformable convolution

Swapping second regular convolutional layer for deformable convolution.
Results of training using deformable convolution.

class Dcn53dmaxNormDropModelEncoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Dcn53dmaxNormDropModelEncoder, self).__init__()

        self.conv1 = self.conv1_block(1, 32)
        self.pool1 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.conv2 = self.conv2_block(32, 64, bias)
        self.pool2 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.fc1 = torch.nn.Linear(1024, 128)
        self.relu = torch.nn.LeakyReLU()
        self.norm = torch.nn.BatchNorm1d(128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.fc2 = torch.nn.Linear(128, num_classes)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(5, 5)),
            torch.nn.LeakyReLU()
        )

    @staticmethod
    def conv2_block(in_c, out_c, bias):
        return torch.nn.Sequential(
            DeformableConv2d(in_c, out_c, kernel_size=3, stride=1, padding=0, bias=bias),
            torch.nn.LeakyReLU()
        )

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)

        out = self.norm(out)
        out = self.drop(out)
        out = self.fc2(out)

        return out, size1, size2, indices1, indices2


class Dcn53dmaxNormDropModelDecoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Dcn53dmaxNormDropModelDecoder, self).__init__()
        self.fc2 = torch.nn.Linear(num_classes, 128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.norm = torch.nn.BatchNorm1d(128)
        self.relu = torch.nn.LeakyReLU()
        self.fc1 = torch.nn.Linear(128, 1024)
        self.pool2 = torch.nn.MaxUnpool2d((2, 2))
        self.conv2 = self.conv2_block(64, 32, bias)
        self.pool1 = torch.nn.MaxUnpool2d((2, 2))
        self.conv1 = self.conv1_block(in_c=32, out_c=1)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(5, 5))
        )

    @staticmethod
    def conv2_block(in_c, out_c, bias):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            DeformableConv2d(in_c, out_c, kernel_size=3, stride=1, padding=2, bias=bias),
        )

    def forward(self, encoded, size1, size2, pool1, pool2):
        out = self.fc2(encoded)
        out = self.drop(out)
        out = self.norm(out)

        out = self.relu(out)
        out = self.fc1(out)
        out = out.view(out.size(0), 64, 4, 4)

        out = self.pool2(out, pool2, output_size=size2)
        out = self.conv2(out)

        out = self.pool1(out, pool1, output_size=size1)
        out = self.conv1(out)

        return out


class Dcn53dmaxNormDropModel(torch.nn.Module):

    def __init__(self, model_encoder, model_decoder):
        super(Dcn53dmaxNormDropModel, self).__init__()
        self.encoder = model_encoder
        self.decoder = model_decoder

    def forward(self, x):
        encoded, size1, size2, indices1, indices2 = self.encoder(x)
        decoded = self.decoder(encoded, size1, size2, indices1, indices2)
        return decoded
More training results:

I've also tried architectures without using pooling and instead use a total of 5 convolutional layers with kernel size (5,5).
So I replaced:

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

with the following:

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        out = self.conv2(out)               # 16
        out = self.conv3(out)               # 12
        out = self.conv4(out)               # 8
        out = self.conv5(out)               # 4

where each self.convX(out) is a block containing convolution, batch-normalization and LeakyReLu.
Full architecture attached. (without deformable: Cnn55555NormLeaky AND with deformable: DcnEnc5d5d555NormLeaky)
models_sandbox.txt

Training results with regular convolutional layers is fine. Results CNN
Training results with two deformable layers for conv1 and conv2 (only in the encoder) is significantly worse. Results DCN

I'm just replacing the regular convolutional layers with deformable convolution. To me it seems as if there's something wrong with the deformable convolution layer but I don't know what I should change to make it work.

I've attached the DCNv2 implementation I'm using. It's the implementation by developer0hye.
dcn2d.txt

I'm relatively new to ML and might just overlook something pretty trivial...
Can you help me?

Visualization for deeper layers

Hi, thank you for your precise code for deformable convolutions. I was wondering if you would be able to help implement visualization for the multi-layer Deformable CNNs. The current visualization code you have has only one layer of deformable
CNN, does the current visualization script work if all
The layers are replaced with deformable CNNs?

How to cite 'PyTorch-Deformable-Convolution-v2'?

Hey,

I've used your implementation in a work of mine.
Now I want to give credit to you. Unfortunately I don't know how to cite this repository.
GitHub has a feature that allows you to define how someone should cite your repository.

You might consider using that feature. :)

Best regards
DezzardHD

Deformable convolution visualization

Hi,
Thanks for your simple implementation ! I wonder if we can visualize the learned sampling location just like the author of the paper have shown in their paper ? Something like this:
Screenshot 2021-09-13 at 10 20 07
I think it has something related to your learned offset and modulation mask but I am not sure how to visualize them as the paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.