Don't feel pain to use Deformable Convolution

License: MIT License

Python 41.66% Jupyter Notebook 58.34%

pytorch dcnv2 dcnv1 dcn deformable-convolutional-networks deformable-convolution object-detection segmentation deep-learning convolutional-neural-networks

pytorch-deformable-convolution-v2's Introduction

PyTorch-Deformable-Convolution-v2

Don't feel pain to use Deformable Convolution v2(DCNv2)

If you are curious about how to visualize offset(red point), refer to offset_visualization.py

Usage

from dcn import DeformableConv2d

class Model(nn.Module):
    ...
    self.conv = DeformableConv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1)
    ...

Experiment

You can simply reproduce the results of my experiment on Google Colab.

Refer to experiment.ipynb!

Task

Scaled-MNIST Handwritten Digit Classification

Model

Simple CNN Model including 5 conv layers

class MNISTClassifier(nn.Module):
    def __init__(self,
                 deformable=False):

        super(MNISTClassifier, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv3 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1, bias=True)   
        conv = nn.Conv2d if deformable==False else DeformableConv2d
        self.conv4 = conv(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        self.conv5 = conv(32, 32, kernel_size=3, stride=1, padding=1, bias=True)
        
        self.pool = nn.MaxPool2d(2)
        self.gap = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(32, 10)
        
    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(x) # [14, 14]
        x = torch.relu(self.conv2(x))
        x = self.pool(x) # [7, 7]
        x = torch.relu(self.conv3(x))
        x = torch.relu(self.conv4(x))
        x = torch.relu(self.conv5(x))
        x = self.gap(x)
        x = x.flatten(start_dim=1)
        x = self.fc(x)
        return x

Training

Optimizer: Adam
Learning Rate: 1e-3
Learning Rate Scheduler: StepLR(step_size=1, gamma=0.7)
Batch Size: 64
Epochs: 14
Augmentation: NONE

Test

In the paper, authors mentioned that the network's ability to model geometric transformation with DCNv2 is considerably enhanced.

I verified it with scale augmentation.

All images in the test set of MNIST dataset are augmented by scale augmentation(x0.5, x0.6, ..., x1.4, x1.5).

Results

Model	Top-1 Accuracy(%)
w/o DCNv2	90.03%
w/ DCNv2	92.90%

References

mxnet implementation

To Do Lists

Support Onnx Conversion

pytorch-deformable-convolution-v2's People

Contributors

Stargazers

Watchers

pytorch-deformable-convolution-v2's Issues

About the kernel size

Hi, Your code is concise and beautiful, but it may not consider the non-square kernel size like a tuple (3,6), I think it might be better to add a judgment on tuples.

In my model, the offset values get super big and small, so they end up with NaN values. Because of this, my network does not work. How can I fix it? I see you use the clamp() function to limit offset values. But it is useless in this case.

Why initialize the parameters to zero?

THANKS FOR YOUR CODE.
I noticed the code init the offset conv parameters to zeros，why？

how to calculate flops of deformable convolution

how to calculate flops of deformable convolution
thank you!

About visualization

what the red dots mean? Do they mean the sample locations?
When visualize conv4, the conv3 is a vanilla convolution, so it is reasonable use resize function to calculate output_conv3's the reception field. But when visualize conv5, the conv4 is a deformable convolution. Is it something wrong with conv5?
Do I get wrong?

感觉您写的可视化有问题

    offsets_y = offsets[:, :9]
    offsets_x = offsets[:, 9:]

应该为：
offsets_y = offsets[:, ::2]
offsets_x = offsets[:, 1::2]

Some question about this nice job

modulator = 2. * torch.sigmoid(self.modulator_conv(x))
what's the reason of 2.*torch.sigmoid

What's the difference from torchvision.ops.DeformConv2d ??

torchvision provide torchvision.ops.deform_conv
what is difference ??

How to understand offset in torchvision.ops.deform_conv2d?

As we know, the sampling grid of standard 33 conv is {(-1,-1),...,(1,1)}. In the deformable conv with kernel size 33, the sampling grid is {(-1+p1_x,-1+p1_y),...,(1+p9_x,1+p9_y)}. For torchvision.ops.deform_conv2d, each value in the offset map denotes {p1_x,....,p9_x;p1_y,.....p9_y}? Or, it is {(-1+p1_x),...,(1+p9_x);(-1+p1_y),...,(-1+p9_y)}? Look forward to your reply.

Reproducability/Setting seed

Hi,
Thanks for your code.
The issue is not directly related to your code, but the torchvision.ops.deform_conv2d - function, that is wrapped.
I was wondering if it is possible to preserve reproducibility? When setting all the common torch/numpy/random seeds, I don't get same results when using torchvision.ops.deform_conv2d.

Thanks again!

what is torch version for this dcn?

firstly, thank for your good job. I use torch1.8.0 ,but torchvision.ops.deform_conv2d() function no parameter 'mask' , any solution for it?

How does this implementation differ from the one by Chengdazhi?

The implementation by Chengdazhi is Old (2 years ago) and contains build requirements. However, your implementation is pretty simple, easy, and straightforward! (Thanks for that! :) But I just want to know what is the difference between these two implementations, and have PyTorch included a few operators in their latest version so that we don't require to build anything from scratch for Deformable Convs??

what's the meaning of modulator = 2. * torch.sigmoid(self.modulator_conv(x))

in dcn.py, function forward(self, x):

modulator = 2. * torch.sigmoid(self.modulator_conv(x))

why the result of sigmod is multiplied by 2, what's the meaning?

DCNv2 performs worse than regular convolution.

Hey,

I'm working on vascular detection using representation learning based on medical ultrasound data.
My aim is to analyse if the use of DCNv2 has benefits compared to the use of regular convolution.

For that I created auto-encoders with and without deformable convolutional layers.

So far using deformable convolution lead to worse results than using regular convolution.
That's not what I was expecting after reading multiple papers and articles which are praising DCN. (i.e. DCN for MRI classification, msracver DCNv2 implementation, towardsdatascience article)

I was hoping someone could give me some hints how I could improve architecture of the auto-encoder, so that training with deformable convolution yields to better results.

Training specifications:

training_dataset size=45.000 % patches
patch_size=(24,24) % in pixels
batch_size=128
latent_space_dimension=128
epochs=500
learning_rate=0.001
normalize=false % patches have values between 0 and 255
loss_critesion=Mean_Squared_Error
optimizer=torch.optim.SGD()

Architectures:

The results of the training runs has been uploaded to Weights and Biases (see links).

regular convolution

Results of training using regular convolution.

class Cnn53maxNormDropModelEncoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Cnn53maxNormDropModelEncoder, self).__init__()

        self.conv1 = self.conv1_block(1, 32)
        self.pool1 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.conv2 = self.conv2_block(32, 64)
        self.pool2 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.fc1 = torch.nn.Linear(1024, 128)
        self.relu = torch.nn.LeakyReLU()
        self.norm = torch.nn.BatchNorm1d(128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.fc2 = torch.nn.Linear(128, num_classes)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(5, 5)),
            torch.nn.LeakyReLU()
        )

    @staticmethod
    def conv2_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(3, 3)),
            torch.nn.LeakyReLU()
        )

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)

        out = self.norm(out)
        out = self.drop(out)
        out = self.fc2(out)

        return out, size1, size2, indices1, indices2


class Cnn53maxNormDropModelDecoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Cnn53maxNormDropModelDecoder, self).__init__()
        self.fc2 = torch.nn.Linear(num_classes, 128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.norm = torch.nn.BatchNorm1d(128)
        self.relu = torch.nn.LeakyReLU()
        self.fc1 = torch.nn.Linear(128, 1024)
        self.pool2 = torch.nn.MaxUnpool2d((2, 2))
        self.conv2 = self.conv2_block(64, 32)
        self.pool1 = torch.nn.MaxUnpool2d((2, 2))
        self.conv1 = self.conv1_block(in_c=32, out_c=1)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(5, 5))
        )

    @staticmethod
    def conv2_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(3, 3))
        )

    def forward(self, encoded, size1, size2, pool1, pool2):
        out = self.fc2(encoded)
        out = self.drop(out)
        out = self.norm(out)

        out = self.relu(out)
        out = self.fc1(out)
        out = out.view(out.size(0), 64, 4, 4)

        out = self.pool2(out, pool2, output_size=size2)
        out = self.conv2(out)

        out = self.pool1(out, pool1, output_size=size1)
        out = self.conv1(out)

        return out


class Cnn53maxNormDropModel(torch.nn.Module):

    def __init__(self, model_encoder, model_decoder):
        super(Cnn53maxNormDropModel, self).__init__()
        self.encoder = model_encoder
        self.decoder = model_decoder

    def forward(self, x):
        encoded, size1, size2, indices1, indices2 = self.encoder(x)
        decoded = self.decoder(encoded, size1, size2, indices1, indices2)
        return decoded

deformable convolution

Swapping second regular convolutional layer for deformable convolution.
Results of training using deformable convolution.

class Dcn53dmaxNormDropModelEncoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Dcn53dmaxNormDropModelEncoder, self).__init__()

        self.conv1 = self.conv1_block(1, 32)
        self.pool1 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.conv2 = self.conv2_block(32, 64, bias)
        self.pool2 = torch.nn.MaxPool2d((2, 2), return_indices=True)
        self.fc1 = torch.nn.Linear(1024, 128)
        self.relu = torch.nn.LeakyReLU()
        self.norm = torch.nn.BatchNorm1d(128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.fc2 = torch.nn.Linear(128, num_classes)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.Conv2d(in_c, out_c, kernel_size=(5, 5)),
            torch.nn.LeakyReLU()
        )

    @staticmethod
    def conv2_block(in_c, out_c, bias):
        return torch.nn.Sequential(
            DeformableConv2d(in_c, out_c, kernel_size=3, stride=1, padding=0, bias=bias),
            torch.nn.LeakyReLU()
        )

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)

        out = self.norm(out)
        out = self.drop(out)
        out = self.fc2(out)

        return out, size1, size2, indices1, indices2


class Dcn53dmaxNormDropModelDecoder(torch.nn.Module):
    def __init__(self, num_classes, bias):
        super(Dcn53dmaxNormDropModelDecoder, self).__init__()
        self.fc2 = torch.nn.Linear(num_classes, 128)
        self.drop = torch.nn.Dropout(p=0.15)
        self.norm = torch.nn.BatchNorm1d(128)
        self.relu = torch.nn.LeakyReLU()
        self.fc1 = torch.nn.Linear(128, 1024)
        self.pool2 = torch.nn.MaxUnpool2d((2, 2))
        self.conv2 = self.conv2_block(64, 32, bias)
        self.pool1 = torch.nn.MaxUnpool2d((2, 2))
        self.conv1 = self.conv1_block(in_c=32, out_c=1)

    @staticmethod
    def conv1_block(in_c, out_c):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            torch.nn.ConvTranspose2d(in_c, out_c, kernel_size=(5, 5))
        )

    @staticmethod
    def conv2_block(in_c, out_c, bias):
        return torch.nn.Sequential(
            torch.nn.LeakyReLU(),
            DeformableConv2d(in_c, out_c, kernel_size=3, stride=1, padding=2, bias=bias),
        )

    def forward(self, encoded, size1, size2, pool1, pool2):
        out = self.fc2(encoded)
        out = self.drop(out)
        out = self.norm(out)

        out = self.relu(out)
        out = self.fc1(out)
        out = out.view(out.size(0), 64, 4, 4)

        out = self.pool2(out, pool2, output_size=size2)
        out = self.conv2(out)

        out = self.pool1(out, pool1, output_size=size1)
        out = self.conv1(out)

        return out


class Dcn53dmaxNormDropModel(torch.nn.Module):

    def __init__(self, model_encoder, model_decoder):
        super(Dcn53dmaxNormDropModel, self).__init__()
        self.encoder = model_encoder
        self.decoder = model_decoder

    def forward(self, x):
        encoded, size1, size2, indices1, indices2 = self.encoder(x)
        decoded = self.decoder(encoded, size1, size2, indices1, indices2)
        return decoded

More training results:

I've also tried architectures without using pooling and instead use a total of 5 convolutional layers with kernel size (5,5).
So I replaced:

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        size1 = out.size()
        out, indices1 = self.pool1(out)     # 10

        out = self.conv2(out)               # 8
        size2 = out.size()
        out, indices2 = self.pool2(out)     # 4

with the following:

    def forward(self, x):                   # 24
        out = self.conv1(x)                 # 20
        out = self.conv2(out)               # 16
        out = self.conv3(out)               # 12
        out = self.conv4(out)               # 8
        out = self.conv5(out)               # 4

where each self.convX(out) is a block containing convolution, batch-normalization and LeakyReLu.
Full architecture attached. (without deformable: Cnn55555NormLeaky AND with deformable: DcnEnc5d5d555NormLeaky)
models_sandbox.txt

Training results with regular convolutional layers is fine. Results CNN
Training results with two deformable layers for conv1 and conv2 (only in the encoder) is significantly worse. Results DCN

I'm just replacing the regular convolutional layers with deformable convolution. To me it seems as if there's something wrong with the deformable convolution layer but I don't know what I should change to make it work.

I've attached the DCNv2 implementation I'm using. It's the implementation by developer0hye.
dcn2d.txt

I'm relatively new to ML and might just overlook something pretty trivial...
Can you help me?

Visualization for deeper layers

Hi, thank you for your precise code for deformable convolutions. I was wondering if you would be able to help implement visualization for the multi-layer Deformable CNNs. The current visualization code you have has only one layer of deformable
CNN, does the current visualization script work if all
The layers are replaced with deformable CNNs?

TypeError: deform_conv2d() got an unexpected keyword argument 'mask'

For anyone facing this issue: deform_conv2d() got an unexpected keyword argument 'mask'

Please note that you need to upgrade your trochvision package to 0.9.1

pip install torchvision==0.9.1

How to cite 'PyTorch-Deformable-Convolution-v2'?

Hey,

I've used your implementation in a work of mine.
Now I want to give credit to you. Unfortunately I don't know how to cite this repository.
GitHub has a feature that allows you to define how someone should cite your repository.

You might consider using that feature. :)

Best regards
DezzardHD

About clamping offsets

Thanks for your repo!
I'm curious about the line you commented out:
https://github.com/developer0hye/Simple-PyTorch-Deformable-Convolution-v2/blob/5578b559e497cc0fd0dc452f6dab9c421f1f8463/dcn.py#L50
Is clamping offsets necessary? Will it stabilize the training?

error occurred when set stride as 2

Hello, thanks for your repo.

There is a problem when set stride to 2.

Can you figure out what the problem is? ( I cannot solve it...)

Deformable convolution visualization

Hi,
Thanks for your simple implementation ! I wonder if we can visualize the learned sampling location just like the author of the paper have shown in their paper ? Something like this:

I think it has something related to your learned offset and modulation mask but I am not sure how to visualize them as the paper.

DCNV2 MODULATOR WEIGHT

PyTorch-Deformable-Convolution-v2/dcn.py

Line 54 in 4d958eb

modulator = 2. * torch.sigmoid(self.modulator_conv(x))

I have a question about the modulator weight. Why do you use 2 to multiply the result of the sigmoid() instead of 1? Shouldn't this modulator range from 0 to 1?

how to use dcnv1 only

I just want to use dcn version1 ，how to do th
is？

developer0hye / pytorch-deformable-convolution-v2 Goto Github PK