Git Product home page Git Product logo

only_train_once's Introduction

Only Train Once (OTO): Automatic One-Shot DNN Training And Compression Framework

OTO-bage autoML-bage DNN-training-bage DNN-compress-bage Operator-pruning-bage Operator-erasing-bage build-pytorchs-bage lincese-bage prs-bage

oto_overview

This repository is the (deprecated) Pytorch implementation of Only-Train-Once (OTO). OTO is an $\color{LimeGreen}{\textbf{automatic}}$, $\color{LightCoral}{\textbf{architecture}}$ $\color{LightCoral}{\textbf{agnostic}}$ DNN $\color{Orange}{\textbf{training}}$ and $\color{Violet}{\textbf{compression}}$ (via $\color{CornflowerBlue}{\textbf{structure pruning}}$ and $\color{DarkGoldenRod}{\textbf{erasing}}$ operators) framework. By OTO, users could train a general DNN either from scratch or a pretrained checkpoint to achieve both high performance and slimmer architecture simultaneously in the one-shot manner (without fine-tuning).

Publications

Please find our series of works and bibtexs for kind citations.

oto_overview_2

In addition, we recommend our following efficient ML works.

Installation

We recommend to run the framework under pytorch>=2.0. Use pip or git clone to install.

pip install only_train_once

or

git clone https://github.com/tianyic/only_train_once.git

Quick Start

We provide an example of OTO framework usage. More explained details can be found in tutorials.

Minimal usage example.

import torch
from sanity_check.backends import densenet121
from only_train_once import OTO

# Create OTO instance
model = densenet121()
dummy_input = torch.zeros(1, 3, 32, 32)
oto = OTO(model=model.cuda(), dummy_input=dummy_input.cuda())

# Create HESSO optimizer
optimizer = oto.hesso(variant='sgd', lr=0.1, target_group_sparsity=0.7)

# Train the DNN as normal via HESSO
model.train()
model.cuda()
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(max_epoch):
    f_avg_val = 0.0
    for X, y in trainloader:
        X, y = X.cuda(), y.cuda()
        y_pred = model.forward(X)
        f = criterion(y_pred, y)
        optimizer.zero_grad()
        f.backward()
        optimizer.step()

# A compressed densenet will be generated. 
oto.construct_subnet(out_dir='./')

How the pruning mode in OTO works.

  • Pruning Zero-Invariant Group Partition. OTO at first automatically figures out the dependancy inside the target DNN to build a pruning dependency graph. Then OTO partitions DNN's trainable variables into so-called Pruning Zero-Invariant Groups (PZIGs). PZIG describes a class of pruning minimally removal structure of DNN, or can be largely interpreted as the minimal group of variables that must be pruned together. zig_partition
  • Hybrid Structured Sparse Optimizer. A structured sparsity optimization problem is formulated. A hybrid structured sparse optimizer, including HESSO, DHSPG, LSHPG, is then employed to find out which PZIGs are redundant, and which PZIGs are important for the model prediction. The selected hybrid optimizer explores group sparsity more reliably and typically achieves higher generalization performance than other sparse optimizers. dhspg

  • Construct pruned model. The structures corresponding to redundant PZIGs (being zero) are removed to form the pruned model. Due to the property of PZIGs, the pruned model returns the exact same output as the full model. Therefore, no further fine-tuning is required.

comp_construct

Sanity Check

The sanity check provides the tests for pruning mode in OTO onto various DNNs from CNN to LLM. The pass of sanity check indicates the compliance of OTO onto target DNN.

python sanity_check/sanity_check.py

Note that some tests require additional dependency. Comment off unnecessary tests. We highly recommend to proceed a sanity check over a new customized DNN for testing compliance.

Visualization

The visual_examples provides the visualization of pruning dependency graphs and erasing dependency graphs. Visualization serves as a frequently used tool for employing OTO onto new unseen DNNs if meets errors.

To do list

  • Add more explanations into the current repository.

  • Release a technical report regarding the HESSO optimizer which is not discussed yet in our papers.

  • Release refactorized DHSPG and LHSPG.

  • Release the full pipeline of LoRAShear (upon business administration).

  • Provide more tutorials to cover the experiments in the pruning mode. Main experiments in OTOv2 can be found at otov2_branch.

  • Release official erasing mode after the review process of OTOv3.

  • Provide documentations of the OTO API.

Welcome Contribution

We would greatly appreciate the contributions in any form, such as bug fixes, new features and new tutorials, from our open-source community.

We are humble to provide benefits for the AI community. We look forward to working with the community together to make DNN's training and compression to be more automatic and convinient.

Open for collabration.

We are open and happy for collabrations. Feel free to reach out [email protected] if have any interesting idea.

Legacy OTOv2 repository

The previous OTOv2 repo has been moved into legacy_branch for academic replication.

Citation

If you find the repo useful, please kindly star this repository and cite our papers:

For OTOv3 preprint
@article{chen2023otov3,
  title={OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators},
  author={Chen, Tianyi and Ding, Tianyu and Zhu, Zhihui and Chen, Zeyu and Wu, HsiangTao and Zharkov, Ilya and Liang, Luming},
  journal={arXiv preprint arXiv:2312.09411},
  year={2023}
}

For LoRAShear preprint
@article{chen2023lorashear,
  title={LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery},
  author={Chen, Tianyi and Ding, Tianyu and Yadav, Badal and Zharkov, Ilya and Liang, Luming},
  journal={arXiv preprint arXiv:2310.18356},
  year={2023}
}

For AdaHSPG+ publication in TMLR (theoretical optimization paper)
@article{dai2023adahspg,
  title={An adaptive half-space projection method for stochastic optimization problems with group sparse regularization},
  author={Dai, Yutong and Chen, Tianyi and Wang, Guanyi and Robinson, Daniel P},
  journal={Transactions on machine learning research},
  year={2023}
}

For OTOv2 publication in ICLR 2023
@inproceedings{chen2023otov2,
  title={OTOv2: Automatic, Generic, User-Friendly},
  author={Chen, Tianyi and Liang, Luming and Tianyu, DING and Zhu, Zhihui and Zharkov, Ilya},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

For OTOv1 publication in NeurIPS 2021
@inproceedings{chen2021otov1,
  title={Only Train Once: A One-Shot Neural Network Training And Pruning Framework},
  author={Chen, Tianyi and Ji, Bo and Tianyu, DING and Fang, Biyi and Wang, Guanyi and Zhu, Zhihui and Liang, Luming and Shi, Yixin and Yi, Sheng and Tu, Xiao},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

only_train_once's People

Contributors

c0ngtri123 avatar iamanigeeit avatar miocio-nora avatar nadav-out avatar tianyic avatar xloem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

only_train_once's Issues

ONNX opset setting

Hi @tianyic ,

I run OTO on my project. However, there is an error which describe "RuntimeError: Exporting the operator im2col to ONNX opset version 9 is not supported. Support for this operator was added in version 11, try exporting with this version.". 

I use pytorch 1.10.1, which supports opset=13. However, I didn't find the interface for setting the opset. How can I fix the problem.

thx.

Problem with compress model

Thanks to share codes.
but I met a problem with compressing some model.
I have an error message when I test code below.

attentionNet [network]

import numpy as np
import torch
import torch.nn as nn
from torch.nn.parameter import Parameter
import torch.nn.functional as F
from only_train_once import OTO
from modelDefinition.attentionGen import attentionNet

model = attentionNet()
dummy_input = torch.zeros(1, 3, 128, 128).cuda()
oto = OTO(model=model, dummy_input=dummy_input)
oto.visualize_zigs(view=False)
oto.random_set_zero_groups() # Randomly set a subset of ZIGs to be zero.
oto.compress()

Error massage

workspace/only_train_once/only_train_once/graph/graph.py", line 377, in compress
    pruned_onnx_param = numpy_param[:, incoming_cc.non_zero_group_idxes, ...]
IndexError: index 1 is out of bounds for axis 1 with size 1

So I add code like this, and pass the code.
Is this make sense?

#only_train_once/only_train_once/graph/graph.py#
filtered_non_zero_group_idxes = [x for x in incoming_cc.non_zero_group_idxes if x < numpy_param.shape[1]]
incoming_cc.non_zero_group_idxes = filtered_non_zero_group_idxes

Please let me know how to solve the problem, thank you.

torch     1.13.0
only-train-once     2.0.13
onnx     1.13.1

two problems about algorithm convergence

the first problem: why the convergence process is very slow? bellow is my opt setting:
opt = oto.dhspg( variant='sgd', lr=0.01, first_momentum=0.9, lmbda=1e-2, lmbda_amplify=20, hat_lmbda_coeff=100, target_group_sparsity=0.3, weight_decay=1e-4, start_pruning_steps=20 * len(train_loader), epsilon=0.9)
150 epoches total, wheh finished 120 epoches, the group_sparsity only reaches to 0.14, how to improve?

the second problem, : why the final group_sparsity greater than target_group_sparsity? bellow is my opt setting:
opt = oto.dhspg( variant='adam', target_group_sparsity=0.3, weight_decay=1e-4, start_pruning_steps=20 * len(train_loader), epsilon=0.9)
150 epoches total, wheh finished 120 epoches, but the final group_sparsity is 0.634(greater than 0.3), What's the problem?

hope your response, thanks a lot!

Allowing dummy_input to be tuple / dict

I know i should do a pull request, but this is a quick edit:

def _get_trace_graph(self, model, dummy_input, optimized_onnx=False):
# Run the Pytorch graph to get a trace and generate a graph from it
trace_graph = None
with torch.no_grad():
trace_graph, _ = torch.jit._get_trace_graph(model, dummy_input)

I modified it to make it work with sequences of tensors / dict of keys: tensor. This is very common when running model(**batch).

import inspect
    def _get_trace_graph(self, model, dummy_input, optimized_onnx=False):
        # Run the Pytorch graph to get a trace and generate a graph from it
        trace_graph = None
        with torch.no_grad():
            if isinstance(dummy_input, dict):
                forward_args = inspect.signature(model.forward).parameters.keys()
                input_tensors = []
                for argname in forward_args:
                    if argname not in ['args', 'kwargs']:
                        if argname in dummy_input:
                            input_tensor = dummy_input[argname]
                            input_tensors.append(input_tensor)
                            print(argname, input_tensor.shape)
                        else:
                            input_tensors.append(None)
                input_tensors = tuple(input_tensors)
            elif isinstance(dummy_input, torch.Tensor):
                input_tensors = (dummy_input,)
            else:
                input_tensors = tuple(dummy_input)
            trace_graph, _ = torch.jit._get_trace_graph(model, args=input_tensors)

Is it normal to have different test results on same model and data?

Hello @tianyic,

I was running the sanity_check tests on test_convnexttiny.py and got different results despite using a fixed dummy_input.

dummy_input = 0.5 * torch.ones(size=(1, 3, 224, 224), dtype=torch.float32)

Test Run 1

Maximum output difference :  1.452776312828064
Size of full model        :  0.10655930824577808 GBs
Size of compress model    :  0.01566738821566105 GBs
FLOP  reduction (%)       :  0.5265925908672965
Param reduction (%)       :  0.8535776291678294

Test Run 2

OTO graph constructor
graph build
Maximum output difference :  1.4196803569793701
Size of full model        :  0.10655930824577808 GBs
Size of compress model    :  0.02001185156404972 GBs
FLOP  reduction (%)       :  0.5471079567314201
Param reduction (%)       :  0.8127864514599561

Test Run 3

OTO graph constructor
graph build
Maximum output difference :  1.5195997953414917
Size of full model        :  0.10655930824577808 GBs
Size of compress model    :  0.04547972418367863 GBs
FLOP  reduction (%)       :  0.4412873512719674
Param reduction (%)       :  0.5736049577741684

The difference is quite big so i want to ask if it's normal.

Problems when using OTOv2 on Yolov6

There are some problems when generating ZIG groups from YOLOv6s.
When i do
dummy_input = torch.zeros(1, 3, 640, 640)
self.oto = OTO(model=model.cuda(), dummy_input=dummy_input.cuda())
self.oto.visualize_zigs()
It generates ZIGs in a wrong way, the Concat was not grouped with those Stem vertices next to it, but being a single ZIG group with no params inside. It causes some bugs.

File "E:\anaconda\envs\ai_learn\lib\site-packages\only_train_once\graph\graph.py", line 579, in params_groups
channel_num = cc_param_groups['shapes'][0][0]
IndexError: list index out of range

Is there any ways to directly fix the Grouping problem, or if i make the Concat ZIG's channel_num = 1, will it casue other promblems during other proccess?

image

image

image

How to set epsilon param of dhspg?

In tutorials, epsilon is setted to 0.95, but it is recommend to be in range [0.0, 0.05] from paper's experiments and theroy analysis, so confusing it is!
optimizer = oto.dhspg( variant='sgd', lr=0.1, target_group_sparsity=0.7, weight_decay=1e-4, start_pruning_steps=50 * len(trainloader), # start pruning after 50 epochs epsilon=0.95)

Bellow is my code,
opt = oto.dhspg( variant='sgd', lr=0.01, target_group_sparsity=0.3, weight_decay=1e-4, start_pruning_steps=100 * len(train_loader), # start pruning after 50 epochs epsilon=0.02)

which is reasonable? or both are reasonable?

When quantizing the YOLOv8 model and calling the code optimizer = oto.dhspg(), an error is thrown.

When quantizing the YOLOv8 model and calling the code optimizer = oto.dhspg(), it throws an error:
"Traceback (most recent call last): File "yolov8_test.py", line 32, in optimizer = oto.dhspg( File "/home/sta/ldz/project/quantization/yolov8/only_train_once/only_train_once/init.py", line 42, in dhspg self._optimizer = DHSPG( File "/home/sta/ldz/project/quantization/yolov8/only_train_once/only_train_once/optimizer/dhspg.py", line 62, in init super(DHSPG, self).init(params, defaults) File "/home/sta/anaconda3/envs/yolov8/lib/python3.8/site-packages/torch/optim/optimizer.py", line 187, in init raise ValueError("optimizer got an empty parameter list") ValueError: optimizer got an empty parameter list".

optimizer

Very good work, I have a question, can I use other optimizers,as SGD

The compressed onnx model can't run inference.

When checking the compressed model accuracy in tutorial 01, the error is as follow:


InvalidArgument                           Traceback (most recent call last)
Cell In[5], line 7
      2 testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False, num_workers=4)
      4 # acc1_full, acc5_full = check_accuracy(model, testloader)
      5 # print("Full model: Acc 1: {acc1}, Acc 5: {acc5}".format(acc1=acc1_full, acc5=acc5_full))
----> 7 acc1_compressed, acc5_compressed = check_accuracy_onnx("ResNet_compressed.onnx", testloader)
      8 print("Compressed model: Acc 1: {acc1}, Acc 5: {acc5}".format(acc1=acc1_compressed, acc5=acc5_compressed))

File [e:\github\only_train_once\tutorials\utils\utils.py:57](file:///E:/github/only_train_once/tutorials/utils/utils.py:57), in check_accuracy_onnx(model_path, testloader, two_input)
     54 total = 0
     56 for X, y in testloader:
---> 57     outputs = ort_sess.run(None, {'input.1': X.numpy()})[0]
     58     prec1, prec5 = accuracy_topk(torch.tensor(outputs), y.data, topk=(1, 5))
     59     correct1 += prec1.item()

File [~\AppData\Roaming\Python\Python38\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:200](https://file+.vscode-resource.vscode-cdn.net/e%3A/github/only_train_once/tutorials/~/AppData/Roaming/Python/Python38/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:200), in Session.run(self, output_names, input_feed, run_options)
    198     output_names = [output.name for output in self._outputs_meta]
    199 try:
--> 200     return self._sess.run(output_names, input_feed, run_options)
    201 except C.EPFail as err:
    202     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gemm node. Name:'Gemm_47' Status Message: GEMM: Dimension mismatch, W: {10,15} K: 512 N:10

Problem omega decrease, but group_sparsity not increase

I catch this case when I apply prune in task face recognition in repo insightface, I only prune backbone, head use to softmax class. I try to print result, omega decrease but group_sparsity not increase, loss converge (loss calculate: get embeeding of backbone - embedding original of image). I don't know optimizer of OTO converge or my loss correct when apply OTO. I try to increase epoch, but it still catch this case.

Here is architure
IResNet_zig.gv.pdf

Confused about Fig 3a in OTO v1 paper

Hello,

image

From the definition of $\mathcal{S}(\mathbf{x})$, and assuming $\epsilon = 0 ,\ \mathcal{I}^{0}= \{ 1 \}$, the half space should be the whole region on the right. The projection should be straight onto $[x]_1=0$ or $[x]_2=0$

What am i missing?

TypeError: 'torch._C.Node' object is not subscriptable

im using torch==1.13.1

and the code run in : op_params = {k: torch_node[k] for k in torch_node.attributeNames()}
# op_params = {k: torch_node[k] for k in torch_node.attributeNames()}

        # op_params = {k: getattr(torch_node.schema, k) for k in torch_node.attributeNames()}
        # op_params = {k: torch_node.attribute(k) for k in torch_node.attributeNames()} 

it will generate the bug:TypeError: 'torch._C.Node' object is not subscriptable

as listed, i use other method to get the graph, but all failed

google-protobuf depended by onnx is v2.6.1,but it conflicts with python3,maybe only python2 works well!

When i use oTo code to train my model,an error ecounterd,error details:

Traceback (most recent call last):
File "oto_train.py", line 216, in
main(args.cfg)
File "oto_train.py", line 181, in main
oto.compress()
File "/mnt/home/oToV2/only_train_once/init.py", line 67, in compress
dynamic_axes=dynamic_axes)
File "/mnt/home/oToV2/only_train_once/compression/compression.py", line 27, in automated_compression
import onnx
File "/opt/conda/lib/python3.7/site-packages/onnx/init.py", line 9, in
from onnx.external_data_helper import load_external_data_for_model, write_external_data_tensors
File "/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py", line 10, in
from .onnx_pb import TensorProto, ModelProto
File "/opt/conda/lib/python3.7/site-packages/onnx/onnx_pb.py", line 8, in
from .onnx_ONNX_REL_1_7_ml_pb2 import * # noqa
File "/opt/conda/lib/python3.7/site-packages/onnx/onnx_ONNX_REL_1_7_ml_pb2.py", line 9, in
from google.protobuf import reflection as _reflection
File "/opt/conda/lib/python3.7/site-packages/google/protobuf/reflection.py", line 68, in
from google.protobuf.internal import python_message
File "/opt/conda/lib/python3.7/site-packages/google/protobuf/internal/python_message.py", line 72, in
from google.protobuf.internal import decoder
File "/opt/conda/lib/python3.7/site-packages/google/protobuf/internal/decoder.py", line 167, in
_DecodeVarint = _VarintDecoder((1 << 64) - 1, long)
NameError: name 'long' is not defined

Maybe python2 works well with the google-protobuf v2.6.1 according to the error because long type is allowd in python2,but python2 is outdated as you know, how to solve the problem?

setting on imagenet

According to the optimizer setting for resnet18 on cifar10 in tutorials, I apply the following setting for resnet50 on imagenet
optimizer = oto.dhspg(
variant='sgd',
lr=0.1,
target_group_sparsity=0.4,
weight_decay=1e-4,
start_pruning_steps=50 * len(trainloader), # start pruning after 50 epochs
epsilon=0.95)

However, I get a lower accuracy of about 68. What should I change for resnet50 on imagenet? e.g., epsilon, starting_pruning_steps?

Error when testing yolov5

Thank you for your contribution. I added a detection head to yolov5, and when I tested it with OTO I got an error.

from only_train_once import OTO
import unittest
import os
import onnxruntime as ort
import numpy as np
from models.experimental import attempt_load

OUT_DIR = './cache'

class TestYolov5(unittest.TestCase):
    def test_sanity(self, dummy_input=torch.rand(1, 3, 640, 384, device='cuda:0')):
        model = attempt_load('baseline_model/best.pt', map_location=torch.device('cuda:0'), fuse=False)
        # All parameters in the pretrained Yolov5 are not trainable.
        for _, param in model.named_parameters():
            param.requires_grad = True

        oto = OTO(model, dummy_input)
        oto.mark_unprunable_by_node_ids(
            # ['node-229', 'node-329', 'node-443', 'node-553']
            ['node-229', 'node-581', 'node-471', 'node-359']
        )
        oto.visualize(view=False, out_dir=OUT_DIR)
        optimizer = oto.hesso(
            variant='sgd',
            lr=0.1
        )
        oto.random_set_zero_groups()
        # YOLOv5 has some trouble to directly load torch model
        oto.construct_subnet(
            out_dir=OUT_DIR,
            ckpt_format='onnx'
        )

        full_sess = ort.InferenceSession(oto.full_group_sparse_model_path)
        full_output = full_sess.run(None, {'onnx::Cast_0': dummy_input.numpy()})
        compressed_sess = ort.InferenceSession(oto.compressed_model_path)
        compressed_output = compressed_sess.run(None, {'onnx::Cast_0': dummy_input.numpy()})

        max_output_diff = np.max(np.abs(full_output[0] - compressed_output[0]))
        print("Maximum output difference " + str(max_output_diff.item()))
        self.assertLessEqual(max_output_diff, 1e-3)

        full_model_size = os.stat(oto.full_group_sparse_model_path)
        compressed_model_size = os.stat(oto.compressed_model_path)
        print("Size of full model     : ", full_model_size.st_size / (1024 ** 3), "GBs")
        print("Size of compress model : ", compressed_model_size.st_size / (1024 ** 3), "GBs")

os.makedirs(OUT_DIR, exist_ok=True)
    
if __name__ == '__main__':
    unittest.main()
OTO graph constructor
graph build
/home/code/yolo/models/yolo.py:63: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/code/yolo/models/yolo.py:85: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  y, x = torch.tensor(y, device=d, dtype=t), torch.tensor(x, device=d, dtype=t)
/home/code/yolo/models/yolo.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/code/yolo/models/yolo.py:149: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  y, x = torch.tensor(y, device=d, dtype=t), torch.tensor(x, device=d, dtype=t)
/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py:429: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(torch.__version__) >= LooseVersion('1.9.0') and \
/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py:430: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  LooseVersion(torch.__version__) <= LooseVersion('1.11.10'):
/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py:432: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  elif LooseVersion(torch.__version__) >= LooseVersion('1.13.0'):
E
======================================================================
ERROR: test_sanity (__main__.TestYolov5)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "train_oto.py", line 18, in test_sanity
    oto = OTO(model, dummy_input)
  File "/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/__init__.py", line 17, in __init__
    self.initialize(model=self._model, dummy_input=self._dummy_input, skip_patterns=self._skip_patterns, strict_out_nodes=self._strict_out_nodes)
  File "/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/__init__.py", line 36, in initialize
    self._graph = Graph(model, dummy_input, skip_patterns=skip_patterns, strict_out_nodes=strict_out_nodes)
  File "/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py", line 61, in __init__
    self.build(model, dummy_input)
  File "/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py", line 70, in build
    trace_graph = self._get_trace_graph(model, dummy_input, optimized_onnx=self.trace_onnx)
  File "/home/miniconda3/envs/yolo/lib/python3.8/site-packages/only_train_once/graph/graph.py", line 435, in _get_trace_graph
    raise "Torch {} is not supported because of some bug in _optimize_trace.".format(torch.__version__)
TypeError: exceptions must derive from BaseException

----------------------------------------------------------------------
Ran 1 test in 4.361s

FAILED (errors=1)

Models with no ZIG?

Hi @tianyic ,

I am trying to use OTO on speech models (FastSpeech2) and rewrote parts to make sure all the pytorch ops are supported in ONNX.

However, i found that nothing was pruned. When i run

oto = OTO(model, dummy_input=dummy_input)
optimizers = [oto.hesso(**args.optim_conf)]

I get
hesso.total_num_groups = 0
Target redundant groups per period: [0]

Does this mean there are no zero-invariant groups in the model? This is strange, because there are conv layers in transformer encoder/decoder. Reference code

Any help appreciated, thanks!

OTO on detection models

Hi, thanks for your work. Have u tried OTO on detection models, like the YOLO series? And is it possible for you to release a tutorial about that in the future? Thanks.

hard to reproduce the results on VGG16-bn for Cifar10

We use RTX4090 for training. With the tutorial, we trained over 15 times with diffirent seeds setting (including many times with the initial seed setting 42). However, these experiments results are close, with highest result reached best 93.0% ACC, but is still lower than 93.2% from the lecture.
here are the experiments env:
numpy 1.23.4 python 3.8.15
pytorch 1.13.0 cuda 11.6.2
only_train_once =2.0.16

Can i have any suggestions and env detials to continue reproducing this experiment.
Should i further repeat the experiments with the seed 42, or continuing try other seeds?
ACC with several runs

How to resume?

Hi, @tianyic
I run OTO on my project. However, there is an error which describe "ValueError: loaded state dict has a different number of parameter groups" while the train resumes from checkpoint.

Wrong train use DDP

I modify code resnet18_cifar10.py to train with DDP like that:

import sys

sys.path.append('..')
from sanity_check.backends.resnet_cifar10 import resnet18_cifar10
from only_train_once import OTO
import torch
from torch.utils.data import DataLoader
from torch import distributed
import numpy as np
import random
import os


def setup_seed(seed, cuda_deterministic=True):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    if cuda_deterministic:  # slower, more reproducible
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
    else:  # faster, less reproducible
        torch.backends.cudnn.deterministic = False
        torch.backends.cudnn.benchmark = True


assert torch.__version__ >= "1.12.0", "In order to enjoy the features of the new torch, \
we have upgraded the torch to 1.12.0. torch before than 1.12.0 may not work in the future."

try:
    rank = int(os.environ["RANK"])
    local_rank = int(os.environ["LOCAL_RANK"])
    world_size = int(os.environ["WORLD_SIZE"])
    distributed.init_process_group("nccl")
except KeyError:
    rank = 0
    local_rank = 0
    world_size = 1
    distributed.init_process_group(
        backend="nccl",
        init_method="tcp://127.0.0.1:12584",
        rank=rank,
        world_size=world_size,
    )

setup_seed(seed=2048, cuda_deterministic=False)
torch.cuda.set_device(local_rank)

model = resnet18_cifar10().cuda()
model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = torch.nn.parallel.DistributedDataParallel(model.cuda(), device_ids=[local_rank])
dummy_input = torch.rand(1, 3, 32, 32)
oto = OTO(model=model, dummy_input=dummy_input.cuda())

# A ResNet_zig.gv.pdf will be generated to display the dependency graph.
oto.visualize(view=False, out_dir='../cache')

from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms

trainset = CIFAR10(root='cifar10', train=True, download=True, transform=transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, 4),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))
testset = CIFAR10(root='cifar10', train=False, download=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=0)

optimizer = oto.hesso(
    variant='sgd',
    lr=0.1,
    weight_decay=1e-4,
    target_group_sparsity=0.7,
    start_pruning_step=30 * len(trainloader),
    pruning_periods=10,
    pruning_steps=30 * len(trainloader)
)

from utils.utils import check_accuracy

max_epoch = 100
model.cuda()
criterion = torch.nn.CrossEntropyLoss()
# Every 75 epochs, decay lr by 10.0
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=75, gamma=0.1)
fp16 = False

amp = torch.cuda.amp.grad_scaler.GradScaler(growth_interval=100)
for epoch in range(max_epoch):
    if isinstance(trainloader, DataLoader):
        trainloader.sampler.set_epoch(epoch)
    f_avg_val = 0.0
    model.train()
    lr_scheduler.step()
    for X, y in trainloader:
        X = X.cuda()
        y = y.cuda()
        with torch.cuda.amp.autocast(enabled=fp16):
            y_pred = model.forward(X)
            f = criterion(y_pred, y)
        if fp16:
            optimizer.zero_grad()
            amp.scale(f).backward()
            f_avg_val += f
            amp.step(optimizer)
            amp.update()
        else:
            optimizer.zero_grad()
            f.backward()
            f_avg_val += f
            optimizer.step()
    group_sparsity, param_norm, _ = optimizer.compute_group_sparsity_param_norm()
    norm_important, norm_redundant, num_grps_important, num_grps_redundant = optimizer.compute_norm_groups()
    accuracy1, accuracy5 = check_accuracy(model, testloader)
    f_avg_val = f_avg_val.cpu().item() / len(trainloader)

    print(
        "Ep: {ep}, loss: {f:.2f}, norm_all:{param_norm:.2f}, grp_sparsity: {gs:.2f}, acc1: {acc1:.4f}, norm_import: {norm_import:.2f}, norm_redund: {norm_redund:.2f}, num_grp_import: {num_grps_import}, num_grp_redund: {num_grps_redund}" \
            .format(ep=epoch, f=f_avg_val, param_norm=param_norm, gs=group_sparsity, acc1=accuracy1, \
                    norm_import=norm_important, norm_redund=norm_redundant, num_grps_import=num_grps_important,
                    num_grps_redund=num_grps_redundant
                    ))

oto.construct_subnet(out_dir='../cache')

full_model = torch.load(oto.full_group_sparse_model_path).cpu()
compressed_model = torch.load(oto.compressed_model_path).cpu()

full_output = full_model(dummy_input)
compressed_output = compressed_model(dummy_input)

max_output_diff = torch.max(torch.abs(full_output - compressed_output))
print("Maximum output difference " + str(max_output_diff.item()))
full_model_size = os.stat(oto.full_group_sparse_model_path)
compressed_model_size = os.stat(oto.compressed_model_path)
print("Size of full model     : ", full_model_size.st_size / (1024 ** 3), "GBs")
print("Size of compress model : ", compressed_model_size.st_size / (1024 ** 3), "GBs")

I see the error relavant OTO:
oto = OTO(model=model, dummy_input=dummy_input.cuda(local_rank))
File "/app/trinc/pruning/only_train_once/tutorials/../only_train_once/init.py", line 17, in init
self.initialize(model=self._model, dummy_input=self._dummy_input, skip_patterns=self._skip_patterns, strict_out_nodes=self._strict_out_nodes)
File "/app/trinc/pruning/only_train_once/tutorials/../only_train_once/init.py", line 36, in initialize
self._graph = Graph(model, dummy_input, skip_patterns=skip_patterns, strict_out_nodes=strict_out_nodes)
File "/app/trinc/pruning/only_train_once/tutorials/../only_train_once/graph/graph.py", line 61, in init
self.build(model, dummy_input)
File "/app/trinc/pruning/only_train_once/tutorials/../only_train_once/graph/graph.py", line 70, in build
trace_graph = self._get_trace_graph(model, dummy_input, optimized_onnx=self.trace_onnx)
File "/app/trinc/pruning/only_train_once/tutorials/../only_train_once/graph/graph.py", line 393, in _get_trace_graph
trace_graph, _ = torch.jit._get_trace_graph(model, dummy_input)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 1296, in _get_trace_graph
outs = ONNXTracedModule(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 138, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 1519, in forward
inputs, kwargs = self._pre_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 1420, in _pre_forward
self._sync_buffers()
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 2040, in _sync_buffers
self._sync_module_buffers(authoritative_rank)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 2044, in _sync_module_buffers
self._default_broadcast_coalesced(authoritative_rank=authoritative_rank)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 2066, in _default_broadcast_coalesced
self._distributed_broadcast_coalesced(bufs, bucket_size, authoritative_rank)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 1981, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(
RuntimeError: Tried to trace <torch.torch.classes.c10d.ProcessGroup object at 0x3fb8ab60> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced.

Error in Concat-Split

I try to compress mixnet.py, Can you check again case concat-split, I try to unpruned conv before concat, and after split. But they return false in split. Thank's you
MixNet_zig.gv.pdf

How did you design your supernet search space in OTOv3?

Sorry for asking a vague question that encompasses OTOv3 but I'm in a rush in my thesis and I just bumped into your paper. I just need some pointers. Also some extra questions:

  • What encoding scheme did you use to embed different operations within a network architecture? Can it be replaced with another encoding scheme?

Meaning of omega in compute_group_sparsity_omega( )

Hey! Thanks for publishing such a nice package to perform the experiments of your paper.

I am doing some experiments with OTO and was wondering what is the meaning of your omega metric and how exactly it is formulated. I can't quite grasp it from your code.

Understanding pruning_steps

Hello @tianyic ,

I see that pruning_steps determines how many steps we take before pruning:

self.pruning_period_duration = self.pruning_steps // self.pruning_periods # How many pruning steps for each period

  1. Is it correct to say that if pruning_periods = 1, then pruning_steps = how many steps before we prune once? If it makes more sense to prune after every epoch, i will set pruning_steps = steps_per_epoch.

  2. What does pruning_periods do?

Thanks in advance!

Can not prune YOLOv8 model

Hi @tianyic,
Im trying to prune this model by hesso optimizer, but the compressed model's size is equal to the original model.
I try to use a different dataset (larger and more diverse) and different scale of target group sparsity (min is 0.1). But the problem is not solved.
I doubt my model architecture is not prunable, because each ZIG set has only 1 convolution layer.

DetectionModel_pruning_dependency.pdf

Grouped Conv compress ERROR

I found the OTO compress some error node in a model. These error were found around grouped conv. The shape of weights around Origin Conv_7 are: 16x3x3x3x1 -> 16x1x3x3(grouped conv) -> 24x16x1x1, ..., Compressed as: 16x3x3x3x1 -> 12x1x3x3(grouped conv) -> 24x16x1x1.
image

How to load compressed model ?

The compressed model can't be loaded with the origin model shape, so the compressed model can't be applied, it can only be used to check accuracy after training.

Maybe the shape of compressed model can be saved for building corresponding model class.

Get pytorch model prune shape

Help me. When I run your code with vgg16, I take model pytorch and onnx (pytorch keep the original shape, and onnx prune shape). I want to config quantization use pytorch, not from onnx. How I can take model pytorch has been prune shape to config quantization from pytorch. Thank you

oto.compress() failure

trying to compress my trained custom model (training looks successful) I obtain the following error stacktrace:

  File "/home/algernone/git-reps/human-detection/person_detection/train_with_pruning_oto.py", line 384, in main
    oto.compress()
  File "/home/algernone/.pyenv/versions/python3.9/lib/python3.9/site-packages/only_train_once/__init__.py", line 62, in compress
    _, self.compressed_model_path, self.full_model_path = automated_compression(
  File "/home/algernone/.pyenv/versions/python3.9/lib/python3.9/site-packages/only_train_once/compression/compression.py", line 21, in automated_compression
    torch.onnx.export(
  File "/home/algernone/.pyenv/versions/python3.9/lib/python3.9/site-packages/torch/onnx/utils.py", line 506, in export
    _export(
  File "/home/algernone/.pyenv/versions/python3.9/lib/python3.9/site-packages/torch/onnx/utils.py", line 1553, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/algernone/.pyenv/versions/python3.9/lib/python3.9/site-packages/torch/onnx/utils.py", line 1165, in _model_to_graph
    _C._jit_pass_onnx_assign_output_shape(
RuntimeError: Expected a sequence type, but received a non-iterable type in graph output index 0

I understand that the problem lies in my model architecture but maybe you've already met the problem or have some understanding how to tackle it.

I just expected that it we were able to wrap our model with OTO object then it would be successfully convertible to onnx format but looks like that's not like that.
I think it needs to be said that my model's output is of form list[list[Tensor]]

irreproducible results for resnet18 tutorial notebook

Hello, thanks for your work at first.
I wanted to run your tutorial notebook to reproduce your results (resnet18 having 0.7 sparsity and around the same accuracy) but as a result I've obtained 0.0 sparsity after 300 epochs.
image
I tried to change some parameters such as epsilon and start_pruning_steps but in the best case I've obtained sparsity less than 0.1 after same 300 epochs.
Сould you tell me what's the matter?
Thanks

Protocol Buffer Version Conflict with local

[libprotobuf FATAL google/protobuf/stubs/common.cc:87] This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.1). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "google/protobuf/descriptor.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.1). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "google/protobuf/descriptor.pb.cc".)
Aborted (core dumped)

Prune vision transformer

When I try to prune vision transformer based model, this error occurred:Segmentation fault (core dumped).
I would appreciate it if you could answer!

New tutorials problem resnet50_imagenet.ipynb

 if accuracy1 > best_acc_1:
            best_acc_1 = accuracy1
             torch.save(model, os.path.join(ckpt_dir, 'best_epoch_' + str(epoch) + '_' + str(t) + '.pt'))

The code computing best_acc_1 might have problems.
If those epochs before start_prune_epoch have already reached highest acc, this code might won't refresh Best_acc after pruning ,if it can't get any higher then before it pruned.

oto.compress failed with "xs.append(param.data.view(cc.num_groups, -1))" in graphy.py

@tianyic Hi, when I tried OTO with the following case, oto.compress failed. Could you please give some advice?

import torch
import torch.nn as nn
from only_train_once import OTO


class DemoNet(nn.Module):

    def __init__(self) -> None:
        super().__init__()

        
        self.fc = nn.Sequential(
            nn.Linear(1024, 512),
            nn.Linear(512, 256)
        )

    def forward(self, x):

        # x: [1, 512, 2, 81]
        x = x.view(x.size(0), -1, 1, x.size(3)).permute(0, 3, 1, 2).contiguous()
        x = x.squeeze(-1)
        return self.fc(x)

if __name__ == "__main__":
    
    model = DemoNet()
    model.eval()
    fake_input = torch.randn((1, 512, 2, 81))
    print(f"{model(fake_input).shape}")
    oto = OTO(model=model, dummy_input=fake_input)
    oto.compress()

When I replaced the model in the tutorial with my own other model, this error occurred

When I replaced the model in the tutorial with my own other model, this error occurred:
This is my code:
student_model = STU_IAT().cuda()
#student_model = STU_IAT()
if config.pretrain_dir is not None:
student_model.load_state_dict(torch.load(config.pretrain_dir))

from sanity_check.backends.resnet_cifar10 import resnet18_cifar10
#model = resnet18_cifar10().cuda()
dummy_input = torch.rand(1, 3, 400, 320)
oto = OTO(model=student_model.cuda(), dummy_input=dummy_input.cuda())
oto.visualize(view=False, out_dir='../cache')
print("@@@")

Data Setting

train_dataset = lowlight_loader(images_path=config.img_path, normalize=config.normalize)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True, num_workers=8,
pin_memory=True)
val_dataset = lowlight_loader(images_path=config.img_val_path, mode='test', normalize=config.normalize)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False, num_workers=8, pin_memory=True)

optimizer = torch.optim.Adam(student_model.parameters(), lr=config.lr, weight_decay=config.weight_decay)

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config.num_epochs)

device = next(student_model.parameters()).device
print('the device is:', device)
hard_loss = F.mse_loss
alpha = 0.2
soft_loss = F.smooth_l1_loss

ssim_high = 0
psnr_high = 0

example_inputs = torch.randn(1, 3, 400, 320).to(device)

optimizer = oto.hesso(
variant='sgd',
lr=0.1,
weight_decay=1e-4,
target_group_sparsity=0.7,
start_pruning_step=10 * len(train_loader),
pruning_periods=10,
pruning_steps=10 * len(train_loader)
)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

student_model.train()
print('######## Start IAT Training #########')
image
请问这是为什么?报错信息显示是oto = OTO(model=student_model.cuda(), dummy_input=dummy_input.cuda()) 这一步有问题

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.