Git Product home page Git Product logo

pruned-openvino-yolo's Introduction

Pruned-OpenVINO-YOLO

简体中文

Prerequisite

Install mish-cuda first: https://github.com/JunnYu/mish-cuda Testing platform:WIN10+RTX3090+CUDA11.2

If you can't install it on your device, you can also try https://github.com/thomasbrandon/mish-cuda

Development log

Expand

Introduction

When deploying YOLOv3/v4 on OpenVINO, the full version of the model has low FPS, while the tiny model has low accuracy and poor stability. The full version of the model structure is often designed to be able to detect 80 or more classes in more complex scenes. In our actual use, there are often only a few classes and the scenes are not that complicated. This tutorial will share how to prune YOLOv3/v4 model, and then deploy it on OpenVINO. With little loss of accuracy, the frame rate can be increased by several times on the intel inference devices.On the intel GPU device, it can even realize the simultaneous inference of four channels of video and guarantee the basic real-time requirements

The general process is as follows:

image-20201113214025134

The following takes the YOLOv3-SPP and YOLOv4 models as examples to introduce the details of baseline training, model pruning and deployment on OpenVINO

Note: The data set I used is the two classes of people + car extracted by COCO2014 and the UA-DETRAC dataset I picked and labeled. There are 54647 training sets and 22998 test sets.

Baseline training

Basic training is to train with your own dataset normally, and train the model to the appropriate accuracy。

Recommended:

Note: As the above-mentioned projects do not support YOLOv4 very well, it may cause the training result of YOLOv4 to be slightly worse.

YOLOv3-SPP baseline training result

P R [email protected] Params Size of .weights Inference_time (Tesla P100) BFLOPS
0.554 0.709 0.667 62.5M 238M 17.4ms 65.69

YOLOv4 baseline training result

P R [email protected] Params Size of .weights Inference_time (Tesla T4) BFLOPS
0.587 0.699 0.669 62.5M 244M 28.3ms 59.57

Model pruning

i use this repos yolov3-channel-and-layer-pruning

Thanks tanluren andzbyuan for their great work

This model pruning project is based on ultralytics/yolov3.The folder named after Pruneyolov3v4 is the version of the pruning code I used.This version is based on the June 2020's ultralytics/yolov3, for reference only.The method of use is the same as yolov3-channel-and-layer-pruning. Because more tricks are added to the training process, the training [email protected] will be slightly higher, and P and R will not be too far apart.

If you have any questions about the model pruning part, you can also ask questions here yolov3-channel-and-layer-pruning tanlurenand zbyuanwill be more professional.I only used part of the pruning strategy and share the pruning results here.

Sparsity training

Note: You could use python -c "from models import *; convert('cfg/yolov3.cfg', 'weights/last.pt')" to convert .pt file to .weights file.

.pt file will include epoch information.If you convert it to .weights,you could train model from epoch 0.

python train.py --cfg cfg/my_cfg.cfg --data data/my_data.data --weights weights/last.weights --epochs 300 --batch-size 32 -sr --s 0.001 --prune 1

Important note!

  • Try not to interrupt the training process and finish the training at one time. ultralytics/yolov3 has the problem of discontinuity and sharp decline in various indicators of interrupted training.

  • During sparse training, [email protected] will gradually decrease first, and will slowly rise back after the learning rate decreases in the later stage of training. You can first set s to 0.001. If [email protected] drops sharply in the first few epochs, and P, R, and [email protected] may even drop to 0, then adjust the value of s to a smaller value, such as 0.0001, But this also means that more epochs may be needed to fully sparse

The picture below is my tensorboard diagram of the sparse training of the YOLOv4 model

image-20201114103527008

Although mAP declined in the early stage of sparse training, the minimum remained above 0.4, indicating that the selected s value was appropriate. However, training was abnormal at 230 epochs, P increased sharply, and R decreased sharply (at one time it was close to 0 ), [email protected] also fell sharply. This kind of situation will not appear under normal circumstances in the middle and late stages of training. Even if you encounter a situation like mine, don't panic. If the various indicators have a tendency to return to normal, it will have no effect. If it can't recover after a delay, you may need to retrain.

  • I generally set the parameter epochs to 300 to ensure sufficient sparseness. You can adjust it according to your own data set. Insufficient sparseness will greatly affect the subsequent pruning effect.

  • By observing the bn_weights/hist graph under HISTOGRAMS in tensorboard, you can observe whether the sparsity is successfully performed during training.

    It can be seen that most of the Gmma is gradually pressed to close to 0 during sparsity training .

image-20201114105332649

  • The Gmma weight distribution map of each BN layer after the sparse training under HISTOGRAMS in tensorboard (generated after the last epoch is completed) is used to determine whether the sparseness is sufficient.

The figure below is the result of YOLOv4's sparse training for 300 epochs. It can be seen that most of the Gmma weights tend to 0, and the closer to 0 the more sparseness is. The following figure can already be considered an acceptable sparseness result and is for reference only.

image-20201114105904566

tensorboard also provides the Gmma weight distribution map of the BN layer before sparse training, which can be used as a comparison:

image-20201114110122310

After sparsity training, [email protected] of YOLOv3-SPP dropped by 4 points, and YOLOv4 dropped by 7 points.

YOLOv3-SPP after sparsity training

Model P R [email protected]
Sparsity training 0.525 0.67 0.624

YOLOv4 after sparsity training

Model P R [email protected]
Sparsity training 0.665 0.570 0.595

Model pruning

Pruning can be started after sufficient sparseness. Pruning can be divided into channel pruning and layer pruning, both of which are evaluated based on the Gmma weight of the BN layer, so whether the sparse training is sufficient will directly affect the effect of pruning. Channel pruning greatly reduces the number of model parameters and the size of weight files. The speed-up effect on desktop GPU devices may not be as obvious as on embedded devices. Layer pruning has a more universal acceleration effect. After the pruning is complete, fine-tune the model to restore accuracy.

The following uses YOLOv3-SPP and YOLOv4 as examples to introduce how to find a suitable pruning point (maintain a high [email protected] under the greatest possible pruning force), and i call it the "optimal pruning point":

channel pruing

python slim_prune.py --cfg cfg/my_cfg.cfg --data data/my_data.data --weights weights/last.pt --global_percent 0.8 --layer_keep 0.01

When setting the global channel pruning ratio (Global percent), you can choose a strategy of large intervals and then gradually subdividing to approach the "optimal pruning point". For example, the Global percent first takes 0.7, 0.8, and 0.9. It is found that when 0.7 and 0.8 are taken, the model obtains the compression effect while the accuracy does not decline seriously, and even slightly exceeds the model after spasity training. However, when taking 0.9, P rises sharply, and R and [email protected] drop sharply. It can be inferred that When Global percent is 0.9, it just exceeds the "optimal pruning point", so the Global percent is gradually subdivided into 0.88 and 0.89.And when the Global percent is 0.88 and 0.89,the parameters are the same with three decimal places retained. And the model accuracy is very close to model after spasity training,but 0.89 will have a better compression effect. If we take Global percent as 0.91, 0.92, 0.93, we can find that when we take 0.9, P has risen to the limit 1, and R and [email protected] are close to 0. After this limit is exceeded (that means the Global percent is greater than 0.91), P, R, [email protected] is infinitely close to 0. This also means that the key channels have been cut off.

So it can be determined that when the Global percent is 0.89, it is the "optimal pruning point"

YOLOv3-SPP's parameters of the model after spasity training under different global channel pruning scales

Global percent P R [email protected] Params Size of .weights Inference_time (Tesla P100) BFLOPS
0.7 0.572 0.659 0.627 15.7M 59.8M 16.7ms 25.13
0.8 0.575 0.656 0.626 7.8M 30M 16.7ms 18.07
0.88 0.574 0.652 0.621 2.7M 10.2M 16.6ms 13.27
0.89 0.574 0.652 0.621 2.6M 10.1M 16.5ms 13.23
0.9 0.859 0.259 0.484 2.5M 9.41M 16.3ms 12.71
0.91 1 0.00068 0.14 2.1M 9.02M 16.4ms 11.69
0.92 0 0 0.00118 1.9M 7.15M 16.1ms 10.99
0.93 0 0 0 1.7M 6.34M 16.5ms 10.37

YOLOv4's parameters of the model after spasity training under different global channel pruning scales

Global percent P R [email protected] Params Size of .weights Inference_time (Tesla T4) BFLOPS
0.5 0.693 0.559 0.594 19.8M 75.8M 18.0ms 26.319
0.6 0.697 0.552 0.584 12.8M 49.1M 17.7ms 20.585
0.7 0.699 0.55 0.581 7.1M 27.0M 17.6ms 15.739
0.8 0.696 0.544 0.578 3.0M 11.6M 16.4ms 11.736
0.82 0.697 0.542 0.575 2.4M 9.49M 16.5ms 11.033
0.84 0.698 0.54 0.574 2.0M 7.84M 16.5ms 10.496
0.86 0.698 0.54 0.571 1.7M 6.58M 16.4ms 9.701
0.88 0.706 0.536 0.57 1.5M 6.09M 16.4ms 8.692
0.89 0.787 0.0634 0.204 1.3M 5.36M 16.5ms 8.306
0.9 0.851 0.00079 0.0329 1.2M 4.79M 16.5ms 7.927

In the same way, it can be judged that when the Global percent is 0.88, it is the "optimal pruning point" for channel pruning.

After channel pruning, we could perform layer pruning.

layer prunine

python layer_prune.py --cfg cfg/my_cfg.cfg --data data/my_data.data --weights weights/last.pt --shortcuts 12

The parameter shortcuts is the number of cut Resunits, which is the Cut Resunit parameter in the table below. ​ YOLOv3-SPP-Global-Percent0.89's parameters under different layer pruning forces

Cut Resunit P R [email protected] Params Size of .weights Inference_time (Tesla P100) BFLOPS
16 0.492 0.421 0.397 2.3M 8.97M 10.4ms 12.39
17 0.48 0.365 0.342 2.2M 8.55M 9.7ms 11.79
18 0.547 0.166 0.205 2.1M 7.99M 9.1ms 11.02
19 0.561 0.0582 0.108 2.0M 7.82M 8.9ms 10.06
20 0.631 0.0349 0.0964 1.9M 7.43M 8.2ms 9.93

Analyzing the above table, it can be found that for each additional Res unit cut, P will increase, and R and [email protected] will fall. This is also in line with the theoretical expectations introduced during channel pruning. Generally speaking, a good model P and R should be at a higher level and closer. When 18 Resunits are cut off, both R and [email protected] have dropped significantly, and there is already a large gap between R and P at this time, so the optimal pruning point has been exceeded at this time. If you go further Increase the number of Resunits pruned, R and [email protected] have begun to approach 0. In order to maximize the acceleration effect, you should cut off as many Resunits as possible, and cutting off 17 Resunits (51 layers in total) is obviously the best choice to maintain the accuracy of the model as much as possible, that is the "optimal pruning point".

At the same time, Inference_time also reflects the obvious acceleration effect of layer pruning compared with the baseline model.

YOLOv4-Global Percent0.88's parameters under different layer pruning forces

Cut Resunit P R [email protected] Params Size of .weights Inference_time (Tesla T4) BFLOPS
14 0.686 0.473 0.507 1.5M 5.78M 12.1ms 8.467
17 0.704 0.344 0.419 1.4M 5.39M 11.0ms 7.834
18 0.678 0.31 0.377 1.3M 5.33M 10.9ms 7.815
19 0.781 0.0426 0.121 1.3M 5.22M 10.5ms 7.219
20 0.765 0.0113 0.055 1.2 4.94M 10.4ms 6.817

In the same way, it can be judged that the global channel pruning ratio is 0.88, and 18 Res units are cut (that is, 54 layers are cut) is the "optimal pruning point" of YOLOv4.

Model fine-tuning

python train.py --cfg cfg/prune_0.85_my_cfg.cfg --data data/my_data.data --weights weights/prune_0.85_last.weights --epochs 100 --batch-size 32

Warmup is set in the first few epochs of the model, which helps to restore the accuracy of the model after pruning. The default is 6, if you think it is too much, you can modify the train.py code by yourself.

use the default warmup of 6 epochs, and the results of fine-tuning are as follows:

Comparison of YOLOv3-SPP baseline model and the model after pruning and fine-tuning

Model P R [email protected] Params Size of .weights Inference_time (Tesla P100) BFLOPS
baseline 0.554 0.709 0.667 62.5M 238M 17.4ms 65.69
After finetune 0.556 0.663 0.631 2.2M 8.55M 9.7ms 11.79

image-20201114113556530

Distribution map of the absolute value of the weight of the BN layer of the model after YOLOv3-SPP pruning (left) and after fine-tuning (right)

So far, the whole process of model pruning of YOLOv3-SPP is completed. After model pruning, the model accuracy loses 3 points, and the total model parameters and weight file size are reduced by 96.4%. Model BFLOPS is reduced by 82%, and the inference speed on Tesla P100 GPU is increased by 44%.

Comparison of YOLOv4 baseline model and the model after pruning and fine-tuning

Model P R [email protected] Params Size of .weights Inference_time (Tesla T4) BFLOPS
baseline 0.587 0.699 0.669 62.5M 244M 28.3ms 59.57
After finetune 0.565 0.626 0.601 1.3M 5.33M 10.9ms 7.815

image-20201114113814882

Distribution map of the absolute value of the weight of the BN layer of the model after YOLOv4 pruning (left) and after fine-tuning (right)

So far, the whole process of model pruning of YOLOv4 is completed. After model pruning, the model accuracy loses 7 points, and the total model parameters and weight file size are reduced by 98%. Model BFLOPS is reduced by 87%, and the inference speed on Tesla T4 GPU is increased by 61%.

The model training of pytorch and darknet are quite different in many details.It is often better to fine tune training under the framework of darknet.It should be noted that you only need to use the pruned .cfg file and do not need to load the pre training weights!

Deployment of the model after pruning on OpenVINO

There are many optimization algorithms for the YOLO model, but because the model is converted to the OpenVINO IR model, tensorflow1.x based on the static graph design is used, which makes it necessary to adjust the tensorflow code as long as the model structure is changed. In order to simplify this process, I made a tool to analyze the cfg file of the pruned model and generate tensorflow code. With this tool, the pruned model can be quickly deployed in OpenVINO.

Repositories: https://github.com/TNTWEN/OpenVINO-YOLO-Automatic-Generation

Under OpenVINO, the pruned model can get a 2~3 times increase in frame rate for inference using intel CPU, GPU, HDDL, and NCS2. We can use video splicing, four channels of 416×416 videos are spliced into 832×832, so that OpenVINO four channels of video can simultaneously perform YOLO and ensure basic real-time requirements.

And this tool has the potential to be compatible with other YOLO optimization algorithms. It only needs to provide the cfg file and weight file of the optimized model to complete the model conversion.

Thank you for your use and hope it will help you!

pruned-openvino-yolo's People

Contributors

tntwen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pruned-openvino-yolo's Issues

about the training

I train my own model with the default value you provide. It show me that:
屏幕快照 2020-12-14 上午10 10 46
屏幕快照 2020-12-14 上午10 10 13
屏幕快照 2020-12-14 上午10 10 33

about dataset

Impressed by ur awesome research!!
I wonder how to create a new dataset that only includes the specific class from original COCO dataset,could u plz give me some tips. Thank uuuu!!!

想請問您關於Yolov4的訓練問題

看到您的數據上有對COCO數據集上的行人和車進行訓練,
YOLOv4的版本訓練到Recall: 0.699 map:0.669,
想問您的YOLOV4weights是採用 AlexeyAB官方的yolov4.weights嗎?
謝謝您的研究,感謝

About out of the box models

Hello!

Is it possible download already pruned models of yolo3/4 from you? I would like to compare it with my models.
Thanks in advance!

单机多GPU训练问题

你好,我单机上有多块GTX TITAN X 显卡,当我用 --device 设置多卡训练时,会提示“UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP”,然后 “RuntimeError: CUDA error: an illegal memory access was encountered
”。
如果我用一块显卡训练时,batch=4时还是提示显存不够:“CUDA out of memory. Tried to allocate 338.00 MiB (GPU 0; 11.93 GiB total capacity; 10.94 GiB already allocated; 22.81 MiB free; 11.35 GiB reserved in total by PyTorch)”

tiny pruning

hello

would this work (with some modifications) to prune tiny yolo v3/v4?

thanks

About the training

Hello. Thank you for sharing your awesome research.

I am trying to train the coco2017 dataset on yolov3-spp.
But the error came out:
File "C:\Users\Pruned-OpenVINO-YOLO\Pruneyolov3v4\utils\datasets.py", line 275, in init
raise Exception('Error loading data from %s. See %s' % (path, help_url))
Exception: Error loading data from ..\coco\train2017.txt. See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data

The organization of my directory is mainly shown as follow which is the same as the url mentioned in message.Could you give me some information about it? Thank you!!
├─cfg
├─data
│ ├─coco
│ │ ├─annotations
│ │ ├─images
│ │ │ └─train2017
│ │ └─labels
│ │ ├─train2017
│ │ └─val2017

yolov4.weight

您好,想詢問yolov4.weight是否無法進行訓練,
會出現
WARNING: non-finite loss, ending training tensor([nan, nan, 0., nan], device='cuda:0')
這個問題。
謝謝您!

Convert pruned model to onnx

Hi,

I have used your project to pruned yolov4-tiny model succeed. And i want to convert my pruned model (in pt format) to onnx to deploy on my system. But when I set ONNX_EXPORT to True, I get the following error:
File "workspaces/Pruned-OpenVINO-YOLO/Pruneyolov3v4/models.py", line 207, in forward p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction RuntimeError: shape '[1, 3, 11, 64, 64]' is invalid for input of size 8448

I think this error is because the pruned model has changed the filter parameters in some layers compared to the original model.

I attached the configuration file of my model before and after pruning. Please help to to resolved this problem.
Thanks

yolov4-tiny.txt
prune_0.71_keep_0.01_yolov4-tiny.txt

Getting error while using model.convert function

@TNTWEN I tried converting a .pt quantized yolov4-tiny model to .weights. I got below error?

RuntimeError: Error(s) in loading state_dict for Darknet:
Unexpected key(s) in state_dict: "module_list.0.Conv2d.activation_quantizer.scale", "module_list.0.Conv2d.activation_quantizer.zero_point", "module_list.0.Conv2d.activation_quantizer.range_tracker.min_val", "module_list.0.Conv2d.activation_quantizer.range_tracker.max_val", "module_list.0.Conv2d.activation_quantizer.range_tracker.first_a", "module_list.0.Conv2d.weight_quantizer.scale", "module_list.0.Conv2d.weight_quantizer.zero_point", "module_list.0.Conv2d.weight_quantizer.range_tracker.min_val", "module_list.0.Conv2d.weight_quantizer.range_tracker.max_val", "module_list.0.Conv2d.weight_quantizer.range_tracker.first_w", "module_list.1.Conv2d.activation_quantizer.scale", "module_list.1.Conv2d.activation_quantizer.zero_point", "module_list.1.Conv2d.activation_quantizer.range_tracker.min_val", "module_list.1.Conv2d.activation_quantizer.range_tracker.max_val", "module_list.1.Conv2d.activation_quantizer.range_tracker.first_a", "module_list.1.Conv2d.weight_quantizer.scale", "module_list.1.Conv2d.weight_quantizer.zero_point", "module_list.1.Conv2d.weight_quantizer.range_tracker.min_val", "module_list.1.Conv2d.weight_quantizer.range_tracker.max_val", "module_list.1.Conv2d.weight_quantizer.range_tracker.first_w", "module_list.2.Conv2d.activation_quantizer.scale", "module_list.2.Conv2d.activation_quantizer.zero_point", "module_list.2.Conv2d.activation_quantizer.range_tracker.min_val", "module_list.2.Conv2d.activation_quantizer.range_tracker.max_val", "module_list.2.Conv2d.activati...

Is the quantized model .weights conversion supported in this repository?

Thanks

Shapefile

Hii may I know how do we get the shapefile?
I'm getting such erro with the #Read image shape codes below
codes from datasets.py:
image
error:
image
I the tried to command the #Read image shapes codes as shown
image

and as a result I got this:
image
Looks like now it can read the images but not the labels
Any idea on how to solve this pls

what are the different flags used while pruning

@TNTWEN
I tried sparsity training the yolov4-tiny weights using the following command
!python train.py --cfg cfg/yolov4-tiny-custom.cfg --data data/obj.data --weights weights/yolov4-tiny-custom_4000.weights --epochs 200 --batch-size 32 -sr --s 0.001 --prune 1
what does --prune flag stands for?

In the channel pruning section
python slim_prune.py --cfg cfg/my_cfg.cfg --data data/my_data.data --weights weights/last.pt --global_percent 0.8 --layer_keep 0.01

what does --global_percent and --layer_keep flags stands for?

About trained from AlexeyAB

May I ask you, when I do sparse training, can I use AlexeyAB trained .weights?

In other words, the basic training part is trained by the YOLO framework maintained by AlexeyAB, and the sparse training is used by a well-trained AlexeyAB .weight model.

Because when I use your code for basic training, it often fails to converge. (YOLOv4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.