miguelvr / dropblock Goto Github PK
View Code? Open in Web Editor NEWImplementation of DropBlock: A regularization method for convolutional networks in PyTorch.
License: MIT License
Implementation of DropBlock: A regularization method for convolutional networks in PyTorch.
License: MIT License
Hi,
did you implement scheduled dropblock
as mentioned in the paper?
In our experiments, we use a linear scheme of decreasing the value ofkeep_prob, which tends to work well across many hyperparameter settings. This linear scheme issimilar to ScheduledDropPath
Do you have pre-trained model released?
Or we just change the dropout to dropblock in pre-trained ResNet 50 is ok?
In this part, the setting for drop_prob in DropBlock2D(block_size = 3, drop_prob = 0.) is "0.", and the stop_value is set to "0.25". But in the example in resnet-cifar10.py(https://github.com/miguelvr/dropblock/tree/master/examples), the setting for drop_prob in DropBlock2D is shown as follows:
The parameters drop_prob and stop_value both take the idential setting by using "drop_prob", but in the first example, the setting for drop_prob and stop_value are 0. and 0.25 seperately. Which one is right? Thank you very much.
Description:
When the Bernoulli distribution samples two ones within their own block size range, the block mask gets a negative value.
Example:
db = DropBlock2D(block_size=2, drop_prob=0.1)
mask = torch.tensor([[[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]])
block_mask = db._compute_block_mask(mask)
block_mask
value:
tensor([[[ 1., 1., 1., 1., 1., 1.],
[ 1., 0., 0., 1., 1., 1.],
[ 1., 0., -1., 0., 1., 1.],
[ 1., 1., 0., 0., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.]]])
expected result:
tensor([[[ 1., 1., 1., 1., 1., 1.],
[ 1., 0., 0., 1., 1., 1.],
[ 1., 0., 0., 0., 1., 1.],
[ 1., 1., 0., 0., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.]]])
Doing model.eval() does not shut off dropblock. How can I put it in evaluation mode for testing?
Thanks
In the benchmark there is
- scheduled dropblock with block_size=5 and increasing drop_prob from 0.0 to 0.25 over 5000 iterations
At line https://github.com/miguelvr/dropblock/blob/master/examples/resnet-cifar10.py#L30 should it be
stop_value=0.25,
?
Because currently in the code start_value=0 and stop_value=drop_prob, where drop_prob equals 0.0 (according to config.yml)). So DropBlock probability will be zero all the time, right?
dropblock/dropblock/dropblock.py
Line 46 in 0ecbb63
Modify this line to " mask = (torch.rand(x.shape[0], *x.shape[2:]).to(x.device) < gamma).float() " can be dropblock run faster.
dropblock/dropblock/dropblock.py
Line 78 in 04b759f
the implement of computer block mask using conv1d is to handle with the overlapped? and the output of _computer_block_mask must have the same size with x, right?
but when I try block_size=7, the output of block_block_mask is [N, 1, 1]. and when I try block_size=9, I will get an error.
In the function of computer block mask, after conv2d, the size of height and weight is
mask_size + 2*(block_size//2+1) - block_size + 1
,
the height and weight of input x is mask_size + block_size//2
.
and the former must be larger than the latter, so the (block_size//2)
connot higher than 3?
Or does it turn off by default?
To ensure the output in the same scale when training and testing, should the output be scaled by 1/(1-p)
during training time?
The description in the paper said "We found that applying DropbBlock
in skip connections in addition to the convolution layers increases the accuracy." but in the example file "resnet-cifar10.py" provided in this repo, the places for plugging DropBlock is different, thank you so much for your help.
File "/opt/conda/lib/python3.6/site-packages/dropblock/dropblock.py", line 140, in forward
out = x * block_mask[:, None, :, :, :]
Encountered when using DropBlock3D.
Hi, how could i modifiy the code so that each feature channel can have its independent DropBlock mask?
Thanks a lot!
Description:
A maxpool operation can be used for the block mask calculation and might be more efficient to compute than a convolution.
Hi! check out Pytorch Lightning as an option for your backend! We're looking for awesome project implemented in Lightning.
Your project will be really easy to maintain on Lightning!
Can Dropblock be used for image segmentation tasks?
dropblock/dropblock/dropblock.py
Line 50 in 0a1f2ab
hello,different padding strategy in DropBlock2D and DropBlock3D,something wrong? "padding=int(np.ceil(self.block_size / 2) + 1))" "padding=int(np.ceil(self.block_size // 2) + 1))"
When I implemented my version of dropblock , I found that Bernoulli sampling can be extremely slow. Therefore I recommend replacing Bernoulli with random matrix.
An implementation may be like that:
mask = (torch.rand(x.shape[0], *mask_sizes)<gamma).float()
in the example of cifar10, the drop_prob in config.yaml is 0. instead of 0.25, is that right?
When you say to only add it in the convolution feature extraction layer only. I just want to make sure I understand correctly what you meant. Did you mean (if I take the U-Net as an example):
Thank you very much,
Originally posted by @Eric2Hamel in #18 (comment)
dropout=DropBlock2D(0.2,block_size=3)
input=torch.randn((1,1,8,8))
output=dropout(input)
49 block_mask = self._compute_block_mask(mask)
50 # apply block mask
---> 51 out = x * block_mask[:, None, :, :]
52
53 # scale output
RuntimeError: The size of tensor a (8) must match the size of tensor b (9) at non-singleton dimension 3
Hello, in the paper, they say that they also apply DropBlock to residual connections, I am searching for an example of this to not make a mistake. Can you please give an example how exactly that should be done?
Awesome work!!! Have you test the speed of your implementation of dropblock?
Thank you for your excellent job about the dropblock. However, there are some errors when I run the code. It occured in the 34 line of resnet-cifar10.py, self.layer1 = self._make_layer(block, 64, layers[0]). I am puzzled about it.
It seems that there is no version for python 3.8
As the title.
Can someone help me answer it?
thanks!
Thanks for the code!
I was wondering if you get the same results as traditional dropout when block_size=1
.
Based on my experiments, using F.dropout2d
for traditional dropout, I cannot confirm this.
All network layers in this repo can be traced with torch.jit.trace()
, but their control flow won't work correctly in traced modules (e.g. training/eval mode is not respected and the check in
dropblock/dropblock/scheduler.py
Line 16 in 16a518a
Since tracing this code does not necessarily emit warnings (see example below), I think this incompatibility should be documented here to make sure no one mistakenly trains traced networks. OTOH networks that are traced in eval mode after a complete training should work as intended, as long as they are not put in training mode again.
Example code that produces a wrong TracedModule
without any warning (using PyTorch 1.0.0):
import torch
from dropblock import DropBlock2D, LinearScheduler
drop_block = LinearScheduler(
DropBlock2D(block_size=3, drop_prob=0.),
start_value=0.,
stop_value=0.25,
nr_steps=5
)
x = torch.randn(1, 1, 8, 8)
traced = torch.jit.trace(drop_block, x)
Hello, thanks for your nice code!
I found there were 2 inconsistencies with the original paper, and they are very easy to fix indeed:
gamma
: in the original paper, all the block_mask
are complete squares (or cubes), sinces its mask
are only sampled on the central parts.mask
s, while in your implement they use the same.I just figure them out, actually I do not know whether they are effective tricks, there are insufficient details discussed in the paper :)
python 3.6
pytorch 0.4.1
Collecting dropblock
Downloading https://files.pythonhosted.org/packages/42/e9/ea1afa72c7114685e6e971e23d68151eea00de171c2c7a6b9872c600be33/dropblock-0.1.0-py3-none-any.whl
Requirement already satisfied: numpy in /data/guoxiaobao/Anaconda3/envs/pytorch/lib/python3.6/site-packages (from dropblock)
Collecting torch==0.4.1 (from dropblock)
Downloading https://files.pythonhosted.org/packages/49/0e/e382bcf1a6ae8225f50b99cc26effa2d4cc6d66975ccf3fa9590efcbedce/torch-0.4.1-cp36-cp36m-manylinux1_x86_64.whl (519.5MB)
100% |████████████████████████████████| 519.5MB 2.7kB/s
Installing collected packages: torch, dropblock
Found existing installation: torch 0.4.0
Uninstalling torch-0.4.0:
Successfully uninstalled torch-0.4.0
Successfully installed dropblock-0.1.0 torch-0.4.1
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
from dropblock import DropBlock2D
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'DropBlock2D'
how can I fix this problem?
Here, feat_size should be sqrt(width * height) ?
Can you please also include DropBlock1D implementation to use it for time-series? Thank you very much.
Hi,
Thanks for your work on this !!!
I am reading your code and found this line:
dropblock/dropblock/dropblock.py
Line 63 in 7fb8fbf
It seems that you generated random mask, and then apply max pooling
operation to it. However, after testing the max pooling
operation, I found the number of 1
s in the mask cannot be kept same with this operation:
mask1 = torch.randint(0, 2, (1, 1, 256, 256)).float()
mask2 = F.max_pool2d(mask1, kernel_size=(5, 5), stride=1, padding=2)
print(mask1.sum())
print(mask2.sum())
The results is:
32715
65536
It seems that the proportion of the positive mask is changed with this operation. Would you please tell me how did you make sure the masked area is same after max pooling
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.