vainf / torch-pruning Goto Github PK

View Code? Open in Web Editor NEW

2.4K 32.0 298.0 9.86 MB

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs

Home Page: https://arxiv.org/abs/2301.12900

License: MIT License

Python 90.65% Shell 9.35%

pruning model-compression network-pruning channel-pruning structural-pruning efficient-deep-learning depgraph cvpr2023

torch-pruning's Introduction

Hello there! My name is Gongfan Fang,

I'm a second-year Ph.D. student at the Learning and Vision (LV) Lab @ National University of Singapore, advised by Professor Xinchao Wang.
Prior to joining LV Lab, I obtained my Bachelor's and Master's degrees at the Visual Intelligence and Pattern Analysis (VIPA) Lab @ Zhejiang University, where I worked under the supervision of Professor Mingli Song.
My research interests mainly revolve around practical algorithms for efficient deep learning.

torch-pruning's People

Contributors

Stargazers

Watchers

Forkers

salemameen hvning kyhoolee denissuh deeplearning2012 xrosliang detrident noticeable akkaze mamintoosi-cs rukiki dbofseuofhust brycelyd whitefu rhhc parthpatel-es jxncyym dolphintear mornydew mathpopo qiu931110 danyangyue johnnylili yogsin maomaojianjian guizhiyi daidaixiaoshidi edmontdants trouble404 fireae wxy1234567 jelenab98 intflow berserkr sccbhxc sunshine411 xubin1994 aarsh2001 dingguodong-826 13301338176 idoltgy chicory-ggg 138725 abhishekvats1997 tommy3266 1079863482 lucky8jw serjio42 phucnhs nikhil153 lovegood-1 dandingbudanding hankye parameswararao-13 joelorentz chang111 chasingw bigodatamining yangyangsu29 horseee dumpmemory cuckoong zp1018 liuyin159 iyangzy jia0511 bigbigyellow haodehao steven0129 lipovsek dolcelatte shenmayufei davorjordacevic dawncc michaelfu123 yulongnan jstzwjr nick-zoo lizhan17 siomn lbinghit pepsi-fg yangsuhui ironicbo nobreakfast b06901052 wfan1203 scott-mao suhang17 wangxun19941124 jasonyank imliupu hovavalon 28-star smallccn yuninn ciciecho-ds cv-nlp tianhaoyue jervint

torch-pruning's Issues

no defined nparams_to_prune in strucured.py

no defined nparams_to_prune in first return
the function PReLUPruning() in structured.py

class PReLUPruning(BasePruningFunction):
    @staticmethod
    def prune_params(layer: nn.PReLU, idxs: list) -> nn.Module:
        if layer.num_parameters == 1:
            return layer, nparams_to_prune
        keep_idxs = list(set(range(layer.num_parameters)) - set(idxs))
        layer.num_parameters = layer.num_parameters - len(idxs)
        layer.weight = torch.nn.Parameter(layer.weight.data.clone()[keep_idxs])
        return layer

    @staticmethod
    def calc_nparams_to_prune(layer: nn.PReLU, idxs: Sequence[int]) -> int:
        nparams_to_prune = 0 if layer.num_parameters == 1 else len(idxs)
        return nparams_to_prune

After pruning, the Resnet18 has negative channel (RuntimeError: Given groups=1, expected weight to be at least 1 at dimension 0, but got weight of size [0, 119, 3, 3] instead)

Hi,
This is a followup to the previous issue #7 which was fixed earlier. However, After the model is successfully pruned (The parameters are down to 4.7M from the initial 21.8M the model fails to do a forward pass and I get this error :

Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 67, in <module>
    out = model(img_fake)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "d:\codes\face\python\FV\models.py", line 212, in forward
    x = self.layer4(x)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\container.py", line 117, in forward
    input = module(input)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "d:\codes\face\python\FV\models.py", line 139, in forward
    out = self.conv1(out)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\conv.py", line 416, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, expected weight to be at least 1 at dimension 0, but got weight of size [0, 119, 3, 3] instead

By looking at the model after the pruning I noticed :

      (bn0): BatchNorm2d(119, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(119, -105, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(-105, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(-105, 151, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(151, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(119, 151, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(151, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

The conv2d has negative output/input channels, so does the BatchNorm! and This is causing the issue it seems.

How to prune VGGNet like networks which incorporate linear layers?

Hi @VainF,
What's the best way to prune VGGNet like architectures?
I found myself, readding the classifier, as trying to add linear layers to the list of prunable layers will also prune the classifier at the end.
Currently I'm doing :

import torch
import torch.nn as nn
import torch_pruning as tp
import random

def random_prune(model, example_inputs, output_transform):
    model.cpu().eval()
    prunable_module_type = ( nn.Conv2d, nn.BatchNorm2d, nn.Linear )
    prunable_modules = [ m for m in model.modules() if isinstance(m, prunable_module_type) ]
    ori_size = tp.utils.count_params( model )
    DG = tp.DependencyGraph().build_dependency( model, example_inputs=example_inputs, output_transform=output_transform )
    for layer_to_prune in prunable_modules:
        # select a layer
        if isinstance( layer_to_prune, nn.Conv2d ):
            prune_fn = tp.prune_conv
        elif isinstance(layer_to_prune, nn.BatchNorm2d):
            prune_fn = tp.prune_batchnorm
        elif isinstance(layer_to_prune, nn.Linear):
            prune_fn = tp.prune_linear
            
        ch = tp.utils.count_prunable_channels( layer_to_prune )
        rand_idx = random.sample( list(range(ch)), min( ch//2, 10 ) )
        plan = DG.get_pruning_plan( layer_to_prune, prune_fn, rand_idx)
        plan.exec()

    print(model)
    with torch.no_grad():
        out = model( example_inputs )
        if output_transform:
            out = output_transform(out)
        print( "  Params: %s => %s"%( ori_size, tp.utils.count_params(model) ) )
        print( "  Output: ", out.shape )
        print("------------------------------------------------------\n")
    return model

Here is the toy model I'm using at the moment :

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, self.fc1.in_features)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

and This is how its used :

example_inputs = torch.randn(1, 3, 32, 32)
output_transform=None
net2 = random_prune(net, example_inputs=example_inputs, output_transform=output_transform)
net2.fc3 = nn.Linear(net2.fc2.out_features, 10)
print(net2)

Whats the best way of pruning VGG like networks in this case?
Thank you very much in advance

Error while trying to prune Lenet-5 architecture with MNIST dataset

DG.build_dependency(model, example_inputs=th.randn(1,1,28,28))
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 309, in build_dependency
self.update_index()
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 315, in update_index
self._set_fc_index_transform( node )
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 437, in _set_fc_index_transform
feature_channels = _get_in_node_out_channels(fc_node.inputs[0])
IndexError: list index out of range

Error in examples

Torch-Pruning/examples/prune_resnet18_cifar10.py

Line 122 in 8bea451

model = torch.load( previous_ckpt )

previous_ckpt should be ckpt.

Refactor into PyTorch BasePruningMethod ?

Dear @VainF,

Awesome work with this library.
I wondered if you could refactor your library to match prune API provided by PyTorch.

It would make integration into upstream libraries simpler.

Best,
T.C

Feature Request - Grouped convolutions

Hi,

@VainF Thank you very much for this project, great work !

I was wondering if you are planning on adding support for conv layer with arbitrary groups parameter (currently there is only support when groups=in_channels=out_channels - known issue in README) ?

Thank you in advance !

issues in assigning different prune indices to different conv layers

I gave pruning_prob as 0.3 when we prune the second conv layer only or when we prune both first and second conv layer. The dimensions come out to be 32 but instead they should be 45 as int(64*0.3) = 45.

Just to emphasise on the fact I have given
-> The original arch
-> the arch when only pruned the first conv layer in the residual block (This part is fine)
-> the arch when only pruned the second conv layer in the residual block (this is the problem)

Here i have explained only for the first residual block but similarly for further blocks as well

I think it is pruning twice as int(int(64*0.3)*0.3) = 32 which we are getting.
Please resolve the issue

---------------------Before Pruning----------------------------------------

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=512, out_features=10, bias=True)
)

--------------------------After Pruning | only first conv layer in residual block------------------------

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 45, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(45, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(45, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 45, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(45, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(45, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(90, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(90, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 180, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(180, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(180, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 180, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(180, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(180, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 359, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(359, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(359, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 359, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(359, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(359, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=512, out_features=10, bias=True)
)

----------------- After pruning | only second conv layer in residual block ------------------------

ResNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 63, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(32, 63, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(63, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 63, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(63, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(63, 126, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(126, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(126, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 252, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(126, 252, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(252, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 252, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=252, out_features=10, bias=True)
)

problem in assigning different pruning indices to different layers in RESNET-56

I was trying to prune Resnet56

Code given below for the model

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init
import math

class DownsampleA(nn.Module):  

  def __init__(self, nIn, nOut, stride):
    super(DownsampleA, self).__init__() 
    self.avg = nn.AvgPool2d(kernel_size=1, stride=stride)   

  def forward(self, x):   
    x = self.avg(x)  
    return torch.cat((x, x.mul(0)), 1)  

class DownsampleC(nn.Module):     

  def __init__(self, nIn, nOut, stride):
    super(DownsampleC, self).__init__()
    assert stride != 1 or nIn != nOut
    self.conv = nn.Conv2d(nIn, nOut, kernel_size=1, stride=stride, padding=0, bias=False)

  def forward(self, x):
    x = self.conv(x)
    return x

class ResNetBasicblock(nn.Module):
  expansion = 1
  """
  RexNet basicblock (https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua)
  """
  def __init__(self, inplanes, planes, stride=1, downsample=None):
    super(ResNetBasicblock, self).__init__()

    self.conv_a = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn_a = nn.BatchNorm2d(planes)

    self.conv_b = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn_b = nn.BatchNorm2d(planes)

    self.downsample = downsample

  def forward(self, x):
    residual = x

    basicblock = self.conv_a(x)
    basicblock = self.bn_a(basicblock)
    basicblock = F.relu(basicblock, inplace=True)

    basicblock = self.conv_b(basicblock)
    basicblock = self.bn_b(basicblock)

    if self.downsample is not None:
      residual = self.downsample(x)
    
    return F.relu(residual + basicblock, inplace=True)

class CifarResNet(nn.Module):
  """
  ResNet optimized for the Cifar dataset, as specified in
  https://arxiv.org/abs/1512.03385.pdf
  """
  def __init__(self, block, depth, num_classes):
    """ Constructor
    Args:
      depth: number of layers.
      num_classes: number of classes
      base_width: base width
    """
    super(CifarResNet, self).__init__()

    #Model type specifies number of layers for CIFAR-10 and CIFAR-100 model
    assert (depth - 2) % 6 == 0, 'depth should be one of 20, 32, 44, 56, 110'
    layer_blocks = (depth - 2) // 6
    print ('CifarResNet : Depth : {} , Layers for each block : {}'.format(depth, layer_blocks))

    self.num_classes = num_classes

    self.conv_1_3x3 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn_1 = nn.BatchNorm2d(16)

    self.inplanes = 16
    self.stage_1 = self._make_layer(block, 16, layer_blocks, 1)
    self.stage_2 = self._make_layer(block, 32, layer_blocks, 2)
    self.stage_3 = self._make_layer(block, 64, layer_blocks, 2)
    self.avgpool = nn.AvgPool2d(8)
    self.classifier = nn.Linear(64*block.expansion, num_classes)

    for m in self.modules():
      if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        #m.bias.data.zero_()
      elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        # m.bias.data.zero_()
      elif isinstance(m, nn.Linear):
        init.kaiming_normal(m.weight)
        # m.bias.data.zero_()

  def _make_layer(self, block, planes, blocks, stride=1):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
      downsample = DownsampleA(self.inplanes, planes * block.expansion, stride)

    layers = []
    layers.append(block(self.inplanes, planes, stride, downsample))
    self.inplanes = planes * block.expansion
    for i in range(1, blocks):
      layers.append(block(self.inplanes, planes))

    return nn.Sequential(*layers)

  def forward(self, x):
    x = self.conv_1_3x3(x)
    x = F.relu(self.bn_1(x), inplace=True)
    x = self.stage_1(x)
    x = self.stage_2(x)
    x = self.stage_3(x)
    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    return self.classifier(x)

def resnet56(num_classes=10):
  """Constructs a ResNet-56 model for CIFAR-10 (by default)
  Args:
    num_classes (uint): number of classes
  """
  model = CifarResNet(ResNetBasicblock, 56, num_classes)
  return model

When we prune the first ever layer of RESNET layer name here being 'conv_1_3x3 ' before the resnet blocks the, As the layer is connected to second Conv layer of every resnet block so they also get pruned with 'conv_1_3x3' but when I try to assign different indices to different conv layers they get assigned the same index

what I mean is
let say
we have conv-layer-2 of resnet block-1 -> LYR_X (name to refer later in description)
and
conv-layer-2 of resnet block-2 -> LYR_Y (name to refer later in description)
also they both are connected with skip connection as this is resnet

I generate pruning plan for pruning 'conv_1_3x3'
let say at indices [2,3,4]

so due to dependency graph the LYR_X and LYR_Y also get assigned the same pruning indices [2,3,4]
BUT
I want to assign different pruning indices to LYR_X and LYR_Y
LYR_X -> [3,5,9]
LYR_Y -> [2,6,8]

Earlier you suggested to manually change the indices

A temporary fix:
You can create a pruning plan, and modify the index of pruning_conv and pruning_related_xxx manually.

so i tried doing this

pruning_plan.plan[0][1][:] = [2,8] # -> conv_1_3x3
pruning_plan.plan[5][1][:] = [3,6] # -> LYR_X 

print(pruning_plan.plan[0], pruning_plan.plan[5])

but for both the layers i was getting [3,6].
instead of getting different indices for different layers

what i have observed is that it assigns the last assigned indices to all the layers.
here its [3,6]

Can you please tell me how can I assign different indices for different layers.

Memory of Model Increases and Inference Stays the Same After Pruning

The memory of my pytorch model increases after I save it to my directory using torch.save(). Also, the inference of my model does not really speed up. Shouldn't it decrease the memory and increase inference since it is structured pruning?

Error when pruning keeps only a single filter

Error

   File "/home/reda/miniconda3/envs/yolov5/lib/python3.8/site-packages/torch_pruning/dependency.py", line 375, in get_pruning_plan
    _fix_denpendency_graph(root_node, pruning_fn, idxs)
  File "/home/reda/miniconda3/envs/yolov5/lib/python3.8/site-packages/torch_pruning/dependency.py", line 368, in _fix_denpendency_graph
    if len(new_indices)==0:
TypeError: object of type 'int' has no len()

What the node looks like

<Node: (6.m.0.cv2.conv (Conv2d(128, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), <MkldnnConvolutionBackward object at 0x7f05ceabb2e0>)>

What this looks like is that pruning this layer outputs a single-channel Conv2D, and then new_indices takes the value of single int rather than a list.

Bug with rounding

HI, @VainF . I've found a bug it the rounding function round_pruning_amount from strategy.py.
The case is with parameters: total_parameters=30, n_to_prune=1, round_to=8 . Current function code returns "-2" that raises an error. The right return variant is "6".
My suggestion is to add extra condition the 14'th line in strategy.py:
if (compensation < round_to // 2 or n_to_prune + compensation < round_to) and after_pruning > round_to:

Inferene time increase after pruning

Inference time increase after prunnig

after load a pruned model,how to test the pruned model?

load a pruned model
model = torch.load('model.pth') # no load_state_dict

pruning Resnet18 fails (KeyError: Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False))

Hi, Thanks a lot for your kind and great contribution.
I am currently trying to prune a custom resnet18 model which was trained for face recognition.
The model is pretty much the same as the normal resnet18 with some minor differences (you can see the actual model definition here

Heres my model if you are intrested

<bound method Module.__repr__ of ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (prelu): PReLU(num_parameters=1)
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=64, out_features=4, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=4, out_features=64, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=64, out_features=4, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=4, out_features=64, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer2): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=128, out_features=8, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=8, out_features=128, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=128, out_features=8, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=8, out_features=128, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer3): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=256, out_features=16, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=16, out_features=256, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=256, out_features=16, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=16, out_features=256, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer4): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=512, out_features=32, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=32, out_features=512, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=512, out_features=32, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=32, out_features=512, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=25088, out_features=512, bias=True)
  (bn3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)>

I used your prune_model() function from examples/prune_resnet18_cifar10.py#L83 and only changed the resnet.BasicBlock to IRBlock and the input size from 32 to 112 in my model the rest is the same :
here is the whole script :

import torch
import torch_pruning as pruning
from models import resnet18, load_model, BasicBlock, Bottleneck, IRBlock, SEBlock, ResNet

def prune_model(model):
    model.cpu()
    # my resnet18 was trained on 112x112 images, so we changed 32 to 112
    DG = pruning.DependencyGraph().build_dependency( model, torch.randn(1, 3, 112, 112))
    def prune_conv(conv, pruned_prob):
        weight = conv.weight.detach().cpu().numpy()
        out_channels = weight.shape[0]
        L1_norm = np.sum(weight, axis=(1, 2, 3))
        num_pruned = int(out_channels * pruned_prob)
        prune_index = np.argsort(L1_norm)[:num_pruned].tolist() # remove filters with small L1-Norm
        plan = DG.get_pruning_plan(conv, pruning.prune_conv, prune_index)
        plan.exec()
    
    block_prune_probs = [0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3]
    blk_id = 0
    for m in model.modules():
        if isinstance( m, IRBlock):
            prune_conv( m.conv1, block_prune_probs[blk_id] )
            prune_conv( m.conv2, block_prune_probs[blk_id] )
            blk_id+=1
    return model    

# load the resnet18 model : 
model = resnet18(pretrained=False, use_se=True)
model = load_model(model, 'BEST_checkpoint_r18.tar')
model.eval()
# prune  the model   
prune_model(model)

but upon running this snippet of code, I get this error ;

Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 52, in <module>
    prune_model(model)
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 46, in prune_model
    prune_conv( m.conv1, block_prune_probs[blk_id] )
  File "d:\Codes\fac_ver\python\FV\Pruning\prune.py", line 39, in prune_conv
    plan = DG.get_pruning_plan(conv, pruning.prune_conv, prune_index)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch_pruning\dependency.py", line 328, in get_pruning_plan
    root_node = self.module_to_node[module]
KeyError: Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

Could you kindly please tell me what I'm missing here?
Thanks a lot in advance

Just specifying first layer in transformer, whole model is getting pruned

By specifying just first layer the pruning tool is pruning whole model. Some code snippect is
pruning_idxs = strategy(model.electra.embeddings.word_embeddings.weight, amount=0.4) pruning_plan = DG.get_pruning_plan(model.electra.embeddings.word_embeddings,tp.prune_embedding, idxs=pruning_idxs)
No idea why its pruning other layers than the specified ones.

Rounding bug

HI, @VainF . There is one more bug, has found in Torch-Pruning/torch_pruning/prune/strategy.py in def round_pruning_amount .
The thing is when you meet that cases:

round_pruning_amount(total_parameters=16, n_to_prune=1, round_to=16)
>> 16

And after that pruning behavior, layer left without all parameters, that is an error. The right pruning behavior is to return pruning amount = 0.
So, I propose add correction line in def round_pruning_amount after 11'th line in Torch-Pruning/torch_pruning/prune/strategy.py:
elif total_parameters <= round_to: return 0

We also can change the entire rounding function to more clear but rougher rounding logic:

def round_pruning_amount(total_parameters, n_to_prune, round_to):
    """round the parameter amount after pruning to an integer multiple of `round_to`.
    """
    n_remain = round_to*max(int(total_parameters - n_to_prune)//round_to, 1)
    return max(total_parameters - n_remain, 0)

Both the variants will correct existing bug. What you think?

About L1_ Norm Sort

Thank you for your wonderful work!
It seems that there is no sparse training to determine how to select channels, but simply sort the weight. If the weight has positive and negative, go through the following code:
L1_ norm = np.sum(weight, axis = (1,2,3))
It's also close to 0.

KeyError on Simple Model

Hey there! Thanks for the repo and the great work

I tried it with a simple model:

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.conv1 =  nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)

    def forward(self, x):
        print("x: ", x.shape)
        y = self.conv1(x)
        print("y: ", y.shape)
        x = F.relu(y)
        z = self.conv2(x)
        print("z: ", z.shape)
        return x


net = Net()

inn = torch.randn((1,3, 256, 256))
out = net(inn)



import torch_pruning as tp
net.eval().cpu()

input_tensor = inn.clone().cpu()
# 1. setup strategy (L1 Norm)
strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

# 2. build layer dependency for resnet18
DG = tp.DependencyGraph()
DG.build_dependency(net, example_inputs=input_tensor)
print("modules!: ", net.modules())
excluded_layers = [ ] # list(model.model[-1].modules())
num_params_before_pruning = tp.utils.count_params( net )
for m in net.modules():

    if isinstance( m, torch.nn.Conv2d ):
        print("m.groups: ", m.groups)
        if m.groups < 2:
            prune_fn = tp.prune_conv
            idxss = strategy(m.weight, amount=0.1)
            pruning_plan = DG.get_pruning_plan( m, prune_fn, idxs=idxss )
            print(pruning_plan)
            pruning_plan.exec()
    else:
        continue


num_params_after_pruning = tp.utils.count_params( net )
print( "  Params: %s => %s"%( num_params_before_pruning, num_params_after_pruning))

running this code give me the following error:

import torch_pruning as tp
net.eval().cpu()

input_tensor = inn.clone().cpu()
# 1. setup strategy (L1 Norm)
strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

# 2. build layer dependency for resnet18
DG = tp.DependencyGraph()
DG.build_dependency(net, example_inputs=input_tensor)
print("modules!: ", net.modules())
excluded_layers = [ ] # list(model.model[-1].modules())
num_params_before_pruning = tp.utils.count_params( net )
for m in net.modules():

    if isinstance( m, torch.nn.Conv2d ):
        print("m.groups: ", m.groups)
        if m.groups < 2:
            prune_fn = tp.prune_conv
            idxss = strategy(m.weight, amount=0.1)
            pruning_plan = DG.get_pruning_plan( m, prune_fn, idxs=idxss )
            print(pruning_plan)
            pruning_plan.exec()
    else:
        continue


num_params_after_pruning = tp.utils.count_params( net )
print( "  Params: %s => %s"%( num_params_before_pruning, num_params_after_pruning))

Am I doing sth wrong here?

Thanks in advance

[Not an Issue] Thank you

Hi all,

This is not an issue, but a thank you for this amazing project.

I have tested several PyTorch pruning libraries and written my own, and so far, this is the best library that really provides what it promise. A smaller/faster model without too much accuracy loss, even for complicated architectures.

So thank you :)

Feel free to close this issue after you read our appreciation.

ShuffleNet architecture

Hello Gongfan,
I would like to learn from you how I could extend your code to work on the shufflenet (1_0, 1_5, 2_0) architectures. Could you please provide pointers in your code where the change must be introduced?
Thank you.

Peter

PS: I have spent some time playing with your code but I could not figure how to incorporate the channel split and channel shuffle operations in the shufflenet architecture into the pruning operations.

fc_node.inputs[0])

Sometimes I get an error in the form:

File "/workspace/code/Development/Workflow.py", line 843, in run_single_pruning_experiment
amount=pruning_spec["amount"], params= run_params[0])
File "/workspace/code/Development/Workflow.py", line 645, in prune_conv_layers_of_model
DG.build_dependency(model, example_inputs=inp)
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 341, in build_dependency
self.update_index()
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 512, in update_index
self._set_fc_index_transform( node )
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 523, in _set_fc_index_transform
feature_channels = _get_out_channels_of_in_node(fc_node.inputs[0])
IndexError: list index out of range

so I don't understand the reason of the problem, but i don't change my model and sometimes it works and sometimes not. So probably there is a bug in the code. Did someone else face this problem?

Can't build dependency graph for model with multiple inputs.

I have a model which takes 2 inputs, image and embeddings.
Here is a simple inputs that I have

in1 = torch.rand(size=(1, 3, 256, 256))  
in2 = torch.rand(size=(512, 1))
out = model(in1, in2)

This is how I am passing 2 inputs. Now, in building dependency, here is what I've tried,

strategy = tp.strategy.L1Strategy()
example_inputs=(Xt, embeds)

DG = tp.DependencyGraph()
DG.build_dependency(G, example_inputs=example_inputs)

Also, for ONNX and tensorflow, I've also faced the same problem but I've solved it with "*" ahead of inputs.

out = G(*example_inputs) This works.

But, DG.build_dependency(G, example_inputs=*example_inputs) this gives an error. Let me know if something is unclear.

Can Torch-Pruning be applied to Transformer models ?

I try to follow the example to prune the transformer model, but the layer-norm always gets a size mismatch error.

Model Size of Pruned model

I'm just starting on Model pruning and your work really helps a lot, I would really like to know how did you calculate the pruned Model size, thank you.

torch.jit.trace of pruned model fails when L2Norm is involved

Hello,

I am attempting to use Torch_Pruning to prune SSD model.

Note that I use this fork:
https://github.com/dkurt/ssd.pytorch/tree/opencv_support

Once I pruned away some of conv filters in the vgg layers, I get the following error with torch.jit.trace on the pruned model:

$ python prune_TP_git_issue.py --model ssd300_mAP_77.43_v2.pth
. . .
File "prune_TP_git_issue.py", line 62, in
model_output = torch.jit.trace(model, torch_image)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/jit/init.py", line 882, in trace
check_tolerance, _force_outplace, _module_class)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/jit/init.py", line 1034, in trace_module
module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call
result = self._slow_forward(*input, **kwargs)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(*input, **kwargs)
File "/nas4/tfs/ssd.pytorch/ssd.py", line 89, in forward
s = self.L2Norm(x)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call
result = self._slow_forward(*input, **kwargs)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(*input, **kwargs)
File "/nas4/tfs/ssd.pytorch/layers/modules/l2norm.py", line 23, in forward
out = self.weight.view(1, -1, 1, 1) * x
RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

Attaching the python file.
prune_TP_git_issue.ssd.pytorch.py.zip

update_index递归报错

递归好像没有截止条件，prune_resnet18_cifar10.py 运行的时候在update_index内部报这个错误是怎么回事？
maximum recursion depth exceeded while calling a Python object

Default pruning_dimension is not supported for non tensor example inputs

From build_dependency:

if pruning_dim >= 0:
    pruning_dim = pruning_dim - len(example_inputs.size())

pruning_dim is 1 by default, and even though using a list or a dictionary of inputs is supported, here an exceptions occurs since example_inputs is a list and doesn't have a size attribute.

I think it could be fixed this way:

if isinstance(example_inputs, torch.Tensor):
    pruning_dim = pruning_dim - len(example_inputs.size())
elif isinstance(example_inputs, (tuple, list)):
    pruning_dim = pruning_dim - len(example_inputs[0].size())
else:
    raise Exception("pruning with non negative dimension is not supported for input of type {}".format(str(type(example_inputs))))

If anyone familiar with the DependencyGraph's code has a better idea I would be glad to hear about it.

Does it support transpose convolution?

the question is the same as the title? thank you for you work.

help on object has no attribute 'name'

trying to prune efficientnet by lukemelas
getting
AttributeError: 'SwishImplementationBackward' object has no attribute 'name'
I tried a lot of solutions that came up in my mind but all failed
I also tried to see some documentations on

<class 'AccumulateGrad'>
<class 'ViewBackward'>
<class 'ViewBackward'>
<class 'MeanBackward1'>
<class 'ViewBackward'>

But I couldn't find them.
Swish looks like this btw

class SwishImplementation(torch.autograd.Function):
    @staticmethod
    def forward(ctx, i):
        result = i * torch.sigmoid(i)
        ctx.save_for_backward(i)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        i = ctx.saved_variables[0]
        sigmoid_i = torch.sigmoid(i)
        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))


class MemoryEfficientSwish(nn.Module):
    def forward(self, x):
        return SwishImplementation.apply(x)

I appreciate any help on this... (spending about 5 hours on this haha.. I'm pretty sure I'm going to have another after this.. but I want to give it a try)

Or if anyone succeeded in filter pruning on efficientnet, I would love to hear your experiences...

thanks

How do I compress the fully connected model

Hello, is it possible to compress such model?

class DNN(nn.Module):    #dnn网络
    def __init__(self, input_size, num_classes, HIDDEN_UNITS):
        super().__init__()
        self.fc1 = nn.Linear(input_size, HIDDEN_UNITS)
        self.fc2 = nn.Linear(HIDDEN_UNITS, num_classes)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        y_hat = self.fc2(x)
        return y_hat

How to assign 'example_inputs',I report the mistake

File "E:\workspace\Graduatio-Project\Torch-Pruning-master\torch_pruning\dependency.py", line 443, in _set_fc_index_transform
    stride = fc_in_features // feature_channels
ZeroDivisionError: integer division or modulo by zero

About weight mismatch

Hi @VainF ! Thank you for your work, but I have some strange problems:
After pruning on my own model, the weight of the model does not match any more.
(RuntimeError: Given groups=1, weight of size [31, 32, 3, 3], expected input[16, 31, 38, 38] to have 32 channels, but got 31 channels instead)
Therefore, i checked the weight of the model and the results are as follows:
(0): Conv2d(3, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): PReLU(num_parameters=1) (2): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): PReLU(num_parameters=1) (4): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): PReLU(num_parameters=1) (6): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): PReLU(num_parameters=1) (8): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): PReLU(num_parameters=1)

The code I used is as follows:

def pruning():
    Model_saved = './Weight/'
    model = torch.load(Model_saved + 'Model.pth').cpu()

    DG = tp.DependencyGraph()
    DG.build_dependency(model, example_inputs=torch.randn(1, 3, 38, 38).float().cpu())

    for m in model.modules():
        if isinstance(m, nn.Conv2d):
            mode = tp.prune_conv
        elif isinstance(m, nn.Linear):
            mode = tp.prune_linear
        elif isinstance(m, nn.BatchNorm2d):
            mode = tp.prune_batchnorm
        else:
            continue

        weight = m.weight.detach().cpu().numpy()
        out_channels = weight.shape[0]
        L1_norm = np.sum(np.abs(weight))
        num_pruned = int(out_channels * 0.2)
        prune_index = np.argsort(L1_norm)[:num_pruned].tolist()
        pruning_plan = DG.get_pruning_plan(m, mode, idxs=prune_index)
        print(pruning_plan)
        pruning_plan.exec()
    return model

In fact, all the Inception blocks in my model encountered this problem.
I'm not sure about the specific reasons.
I hope I can get your advice, thank you for your help.

Global unstructured pruning

Hi, how can I implement global unstructured pruning using this library? It seems I can only prune individual layers and not the entire model

Thanks

Are quantized networks supported?

Hi,
I'm curious to know whether the quantized version of networks are supported, as today I tried that and faced this issue :

QuantizedResnet18 took 35.105 ms [min/max: 35.1/35.1] ms for one forward pass!
Size (MB): 22.23 (initial 87.9)
Number of Parameters: 0.0M
normal resnet took 3624.206 ms [min/max: 3624.2/3624.2] ms 
start of pruning...
Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 91, in <module>
    model = prune_model(model)
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 76, in prune_model
    prune_conv( m.conv1, block_prune_probs[blk_id] )
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 58, in prune_conv
    weight = conv.weight.detach().cpu().numpy()
AttributeError: 'function' object has no attribute 'detach'

Seens like quantized operators are not supported. Is it true or am I missing sth?
Thanks in advance

RecursionError during pruning

Hi - thanks for a wonderful tool. I am trying to test it out with a pretrained model from here. However I am encountering the following error:

module name: encode1.conv0
pruning_idxs: [4, 5, 8, 9, 13, 14, 16, 17, 18, 19, 21, 23, 24, 26, 28, 30, 31, 32, 35, 36, 37, 38, 39, 40, 41, 43, 46, 48, 49, 53, 56, 58]
Traceback (most recent call last):
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 159, in <module>
    load_pretrained(pretrained_ckpt, params_model, model, dummy_data, save_path)
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 120, in load_pretrained
    model = torch_prune(model, dummy_data, params_model['prune_type'], params_model['prune_percent'])
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 92, in torch_prune
    pruning_plan = DG.get_pruning_plan( module, tp.prune_conv, idxs=pruning_idxs )
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 398, in get_pruning_plan
    _fix_denpendency_graph(root_node, pruning_fn, idxs)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  [Previous line repeated 990 more times]
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 387, in _fix_denpendency_graph
    new_indices = dep.index_transform(indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 148, in __call__
    if self.reverse==True:
RecursionError: maximum recursion depth exceeded in comparison

The network architecture is based on this paper. Here is a figure showing the details:

Below is my test script that uses the model definition and pretrained weights from the model repo

# IMPORTS
import argparse
import nibabel as nib
import numpy as np
from datetime import datetime
import time
import sys
import os
import glob
import os.path as op
import logging
import torch
import torch.nn as nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F

from torch.autograd import Variable
from torch.utils.data.dataloader import DataLoader
from torchvision import transforms, utils

from scipy.ndimage.filters import median_filter, gaussian_filter
from skimage.measure import label, regionprops
from skimage.measure import label

from collections import OrderedDict
from os import makedirs

from models.networks import FastSurferCNN
import pandas as pd

# torch-pruning
sys.path.append('../../Torch-Pruning')
import torch_pruning as tp

def options_parse():
    """
    Command line option parser
    """
    parser = argparse.ArgumentParser()

    # Options for model parameters setup (only change if model training was changed)
    parser.add_argument('--num_filters', type=int, default=64,
                        help='Filter dimensions for DenseNet (all layers same). Default=64')
    parser.add_argument('--num_classes_ax_cor', type=int, default=79,
                        help='Number of classes to predict in axial and coronal net, including background. Default=79')
    parser.add_argument('--num_classes_sag', type=int, default=51,
                        help='Number of classes to predict in sagittal net, including background. Default=51')
    parser.add_argument('--num_channels', type=int, default=7,
                        help='Number of input channels. Default=7 (thick slices)')
    parser.add_argument('--kernel_height', type=int, default=5, help='Height of Kernel (Default 5)')
    parser.add_argument('--kernel_width', type=int, default=5, help='Width of Kernel (Default 5)')
    parser.add_argument('--stride', type=int, default=1, help="Stride during convolution (Default 1)")
    parser.add_argument('--stride_pool', type=int, default=2, help="Stride during pooling (Default 2)")
    parser.add_argument('--pool', type=int, default=2, help='Size of pooling filter (Default 2)')

    sel_option = parser.parse_args()

    return sel_option

def torch_prune(model,dummy_data,prune_type,prune_percent):

    print(f'compressing model with prune type: {prune_type}, sparsity: {prune_percent}')

    # 1. setup strategy (L1 Norm)
    strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

    # 2. build layer dependency for resnet18
    DG = tp.DependencyGraph()
    DG.build_dependency(model, example_inputs=dummy_data)

    # 3. get a pruning plan from the dependency graph.
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            print(f'module name: {name}')
           
            pruning_idxs = strategy(module.weight, amount=prune_percent) # or manually selected pruning_idxs=[2, 6, 9, ...]
            print(f'pruning_idxs: {pruning_idxs}')
            pruning_plan = DG.get_pruning_plan( module, tp.prune_conv, idxs=pruning_idxs )
            print(pruning_plan)

            # 4. execute this plan (prune the model)
            pruning_plan.exec()


def load_pretrained(pretrained_ckpt, params_model, model):
    model_state = torch.load(pretrained_ckpt, map_location=params_model["device"])
    new_state_dict = OrderedDict()

    # FastSurfer model specific configs
    for k, v in model_state["model_state_dict"].items():

        if k[:7] == "module." and not params_model["model_parallel"]:
            new_state_dict[k[7:]] = v

        elif k[:7] != "module." and params_model["model_parallel"]:
            new_state_dict["module." + k] = v

        else:
            new_state_dict[k] = v

    model.load_state_dict(new_state_dict)
    model.eval()
    
    return model

if __name__ == "__main__":

    args = options_parse() 

    plane = "Axial"
    pretrained_ckpt = f'../checkpoints/{plane}_Weights_FastSurferCNN/ckpts/Epoch_30_training_state.pkl'

    # Put it onto the GPU or CPU
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

    # Set up model for axial and coronal networks
    params_model = {'num_channels': args.num_channels, 'num_filters': args.num_filters,
                      'kernel_h': args.kernel_height, 'kernel_w': args.kernel_width,
                      'stride_conv': args.stride, 'pool': args.pool,
                      'stride_pool': args.stride_pool, 'num_classes': args.num_classes_ax_cor,
                      'kernel_c': 1, 'kernel_d': 1,
                      'model_parallel': False,
                      'device': device
                      }

    # Select the model
    model = FastSurferCNN(params_model)
    model.to(device)
 
    # Load pretrained weights
    model = load_pretrained(pretrained_ckpt, params_model, model)

    # Prune model
    dummy_data = torch.ones(1, 7, 256, 256)
    model = torch_prune(model, dummy_data, prune_type='L1', prune_percent=0.5)

    # Save pruned model
    # save_path = f'./{plane}_pruned.pth'
    # torch.save(model, save_path)

I will appreciate any help or suggestions! Thanks!

Is it necessary to transfer model to cpu?

Hello.In torch.pruning/dependency.py there is a line model.eval().cpu() . With this i cant use model RAFT (optical flow model) which i'm currently researching (it fails on

raise RuntimeError("module must have its parameters and buffers "
                                   "on device {} (device_ids[0]) but found one of "
                                   "them on device: {}".format(self.src_device_obj, t.device))

even if i'm transferrng it on cpu myself). But if i'm commenting this mentioned line model.eval().cpu() then programm passes through
DG.build_dependency(model, example_inputs=[torch.randn(1, 3, 440, 1024), torch.randn(1, 3, 440, 1024)])
just fine. So, is this line model.eval().cpu() is necessary in torch_pruning? Is torch_pruning works on cpu only?

Thanks in advance.

Functionality to add rounding of filters number for pruning

Hi. I think it will be useful to add functionality to round number of pruned channels to provided number (32 or 16, for example). I've made it locally in prune/strategy.py script. It really accelerate inference speed!
If it can be useful to others, I can try to make pull request with this functionality this week. Any thoughts?

Prune Conv to FC bug

Hi @VainF,
After pruning, conv -> linear, shouldn't the shape for the linear be (8,12) instead of (8, 15) ? Here is a minimal working example to reproduce:

import sys, os
sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_pruning as tp

def seed_everything(seed: int):
    import random, os
    import numpy as np
    import torch
    
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything(42)

class NN(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=2, out_channels=4, kernel_size=3)
        self.linear1 = nn.Linear(in_features=16, out_features=8)
        
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = torch.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        return x

x = torch.randn(1,2,4,4)
model = NN()
strategy = tp.strategy.RandomStrategy()
idxs = strategy(model.conv1.weight, amount=0.25)
DG = tp.DependencyGraph()
DG.build_dependency(model, example_inputs=x)

print(model.conv1.weight.shape) # (4, 2, 3, 3)
print(model.linear1.weight.shape) # (8, 16) 

pruning_plan = DG.get_pruning_plan(model.conv1, tp.prune_conv, idxs=idxs)
pruning_plan.exec()

print(model.conv1.weight.shape) # Expected: (3, 2, 3, 3) / Res: (3, 2, 3, 3)
print(model.linear1.weight.shape) # Expected: (8, 12) / Res: (8, 15)

Keyerror: MkldnnConvolutionBackward with 1D Layers

Hello,

I encountered an issue when trying to use pruning.DependencyGraph function with a CNN that has Conv1d, BatchNorm1d and F.avg_pool1d.

CNN Class

class Conv1DNet(nn.Module):
    def __init__(self, num_classes=10):
        super(Conv1DNet, self).__init__()
 
        self.conv1 = nn.Conv1d(1, 64, kernel_size=3, stride=2, padding=1, bias=False)
        self.bn1 = nn.BatchNorm1d(64)
        self.linear = nn.Linear(512, num_classes)
 
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.avg_pool1d(out, 2*16*281)
        feature = out.view(out.size(0), -1)
        out = self.linear(feature)
        return out

Code run

model = Conv1DNet()

DG = pruning.DependencyGraph(model, fake_input=torch.randn(1,1, 144000))

pruning_plan = DG.get_pruning_plan(model.conv1, pruning.prune_conv, idxs=[2, 6, 9] )
print(pruning_plan)

pruning_plan.exec()

Output

Traceback (most recent call last):
  File "issue_torch_pruning.py", line 27, in <module>
    DG = pruning.DependencyGraph(model, fake_input=torch.randn(1,1, 144000) )
  File "C:\Users\User\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 186, in __init__
    self.build_dependency(model, fake_input)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 321, in build_dependency
    self._traverse_graph(out.grad_fn)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 437, in _traverse_graph
    _recursively_detect_dependencies(begin_node, 0)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 434, in _recursively_detect_dependencies
    _recursively_detect_dependencies(u[0], path_id)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 434, in _recursively_detect_dependencies
    _recursively_detect_dependencies(u[0], path_id)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 407, in _recursively_detect_dependencies
    node_module = self.grad_fn_to_module[node]
KeyError: <MkldnnConvolutionBackward object at 0x000001A1AE08B448>

Thanks for your help !

Any support for RNN/GRU/LSTM structure pruning?

best wishes!

Resnet Pruning Confusion

Could you please tell if we prune Resnet when we prune conv1 in the resnet block and let say we prune indices (1,3,4) in that layer(according to L1 Norm).
Does the layers that are dependent on this layer get pruned at indices (1,3,4) only
OR
Will it apply the L1 Norm again for the layers that are dependent?

无法进行剪枝

when I pruner https://github.com/chenjun2hao/DDRNet.pytorch
I meet the error

`
def prune_model(model):
model.cpu()
DG = tp.DependencyGraph().build_dependency(model, torch.randn((1,3,1024,2048)))
def prune_conv(conv, amount=0.2):
strategy = tp.strategy.L1Strategy()
pruning_index = strategy(conv.weight, amount=amount)
print(pruning_index)
# weight = conv.weight.detach().cpu().numpy()
# out_channels = weight.shape[0]
# L1_norm = np.sum( np.abs(weight), axis=(1,2,3))
# num_pruned = int(out_channels * amount)
# pruning_index = np.argsort(L1_norm)[:num_pruned].tolist() # remove filters with small L1-Norm
plan = DG.get_pruning_plan(conv, tp.prune_conv, pruning_index)
plan.exec()

prunable_modules = [ m for m in model.modules() if isinstance(m, nn.Conv2d) ]
for layer_to_prune in prunable_modules:
    print(layer_to_prune)
    prune_conv(layer_to_prune, 0.5)

return model

=> loading final_layer.bn1.num_batches_tracked from pretrained model
=> loading final_layer.conv1.weight from pretrained model
=> loading final_layer.bn2.weight from pretrained model
=> loading final_layer.bn2.bias from pretrained model
=> loading final_layer.bn2.running_mean from pretrained model
=> loading final_layer.bn2.running_var from pretrained model
=> loading final_layer.bn2.num_batches_tracked from pretrained model
=> loading final_layer.conv2.weight from pretrained model
=> loading final_layer.conv2.bias from pretrained model

Number of Parameters before pruner: 5.7M
torch.Size([1, 3, 128, 256])
Traceback (most recent call last):
File "./tools/pruner_new.py", line 142, in
main()
File "./tools/pruner_new.py", line 134, in main
prune_model(model)
File "./tools/pruner_new.py", line 84, in prune_model
prune_conv(layer_to_prune, 0.8)
File "./tools/pruner_new.py", line 63, in prune_conv
plan = DG.get_pruning_plan(conv, tp.prune_conv, pruning_index)
File "/algdata02/yiming.yu/DDRNet.pytorch_pruner/envp_20210903/lib/python3.7/site-packages/torch_pruning/dependency.py", line 378, in get_pruning_plan
root_node = self.module_to_node[module]
KeyError: Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

KeyError while attempting to prune yolov5 model

Hello,

I am trying to prune yolov5 model using Torch_Pruning. It fails with error message:
KeyError: Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))

Detailed Traceback:
Traceback (most recent call last):
File "models/prune_TP_git_issue.py", line 117, in
new_model = prune_model(model, img)
File "models/prune_TP_git_issue.py", line 64, in prune_model
prune_conv(mm3, SPARSITY)
File "models/prune_TP_git_issue.py", line 48, in prune_conv
raise e
File "models/prune_TP_git_issue.py", line 46, in prune_conv
plan = DG.get_pruning_plan(conv, tp.prune_conv, prune_index)
File "/home/tfs/venv_yolov5_Torch_Pruning/lib/python3.6/site-packages/torch_pruning-0.2.2-py3.6.egg/torch_pruning/dependency.py", line 330, in get_pruning_plan
KeyError: Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))

Attaching the prune_TP.py code.
(a) save the code under models dir
(b) download pre-trained model and place it under weights dir
(c) invoke the code from the main dir as follows:
$ python models/prune_TP_git_issue.py

Also attaching the logfile from this run. It looks like DG.module_to_node dictionary is not built correctly - it seems to have only 1 entry:
------------ BEGIN : DG.module_to_node ---------------------
{Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1)): <Node: (model.24.m.2 (Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))), None)>}
------------ END : DG.module_to_node ---------------------

prune_TP_git_issue.yolov5.py.zip
log.prune_TP_git_issue.txt

Not able to handle Conv-FC Dependency

I used the model
and I trained and pruned all conv layers with prob 0.3
as given in the code for l1 norm

I gave as input same as in the example for resnet
but it is not handling conv-fc dependency

class my_model(nn.Module):
  def __init__(self):
    super(my_model,self).__init__()
    self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=1,padding=1)
    self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=1,padding=1)
    self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=1,padding=1)
    self.pool = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(4*4*64,64)
    self.fc2 = nn.Linear(64,10)
  def forward(self,inp):
    ab = self.pool(F.relu(self.conv1(inp)))
    ab = self.pool(F.relu(self.conv2(ab)))
    ab = self.pool(F.relu(self.conv3(ab)))
    ab = ab.view(ab.shape[0],-1)
    ab = F.relu(self.fc1(ab))
    ab = F.relu(self.fc2(ab))
    return ab

I got this warning
Warning: Unrecognized Conv-FC Dependency. Please handle the dependency manually
Warning: Unrecognized Conv-FC Dependency. Please handle the dependency manually

can you please help with this

how to get the definition of model class after pruning

Thanks for the excellent work! But in general, it need the definition of model class for deploying. Only get the model weights and architecture via torch.save(model, save_path) in other machines may get some trouble (for examples). Is there some ways to get the pruning model defination? Thanks :D

can we perform pruning on yolov5 v4 s/l/m versions of the model
the detectron pretrained model like FasterRCNN-R101FPN model can it be pruned if so what are the steps to follow
THanks in adavance

这个功能是针对模型的某一模块进行剪枝的吗？

我跑了下示例代码，发现模型大小没有变化，看了下代码，发现那个比例针对的是某一个模块，那如果要对整个网络进行剪枝是需要，每一个模块单独进行剪枝吗？不能整体进行剪枝吗？

About Mobilenet pruning

Is there some Mobilenet pruning examples?

vainf / torch-pruning Goto Github PK

torch-pruning's Introduction

torch-pruning's People

Contributors

Stargazers

Watchers

Forkers

torch-pruning's Issues

---------------------Before Pruning----------------------------------------

--------------------------After Pruning | only first conv layer in residual block------------------------

----------------- After pruning | only second conv layer in residual block ------------------------

CNN Class

Code run

Output

Recommend Projects

Recommend Topics

Recommend Org