yasenh / libtorch-yolov5 Goto Github PK

View Code? Open in Web Editor NEW

370.0 9.0 114.0 1.39 MB

A LibTorch inference implementation of the yolov5

License: MIT License

CMake 2.22% C++ 97.49% C 0.30%

yolov5 libtorch gpu

libtorch-yolov5's Introduction

Introduction

A LibTorch inference implementation of the yolov5 object detection algorithm. Both GPU and CPU are supported.

Dependencies

Ubuntu 16.04
CUDA 10.2
OpenCV 3.4.12
LibTorch 1.6.0

TorchScript Model Export

Please refer to the official document here: ultralytics/yolov5#251

Mandatory Update: developer needs to modify following code from the original export.py in yolov5

# line 29
model.model[-1].export = False

Add GPU support: Note that the current export script in yolov5 uses CPU by default, the "export.py" needs to be modified as following to support GPU:

# line 28
img = torch.zeros((opt.batch_size, 3, *opt.img_size)).to(device='cuda')  
# line 31
model = attempt_load(opt.weights, map_location=torch.device('cuda'))

Export a trained yolov5 model:

cd yolov5
export PYTHONPATH="$PWD"  # add path
python models/export.py --weights yolov5s.pt --img 640 --batch 1  # export

Setup

$ cd /path/to/libtorch-yolo5
$ wget https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.6.0.zip
$ unzip libtorch-cxx11-abi-shared-with-deps-1.6.0.zip
$ mkdir build && cd build
$ cmake .. && make

To run inference on examples in the ./images folder:

# CPU
$ ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --view-img
# GPU
$ ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img
# Profiling
$ CUDA_LAUNCH_BLOCKING=1 ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img

Demo

FAQ

terminate called after throwing an instance of 'c10::Error' what(): isTuple() INTERNAL ASSERT FAILED

Make sure "model.model[-1].export = False" when running export script.

Why the first "inference takes" so long from the log?
- The first inference is slower as well due to the initial optimization that the JIT (Just-in-time compilation) is doing on your code. This is similar to "warm up" in other JIT compilers. Typically, production services will warm up a model using representative inputs before marking it as available.
- It may take longer time for the first cycle. The yolov5 python version run the inference once with an empty image before the actual detection pipeline. User can modify the code to process the same image multiple times or process a video to get the valid processing time.

References

libtorch-yolov5's People

Contributors

Stargazers

Watchers

Forkers

martingasser jackhoog shao15xiang xjsxujingsong maxeeovecr silicon2006 crystalxd wolfworld6 zbpjlc liushuan zlszhonglongshen caojinpei zachoines jelly123456 tongshuo-demon richexplor linhandai winggo12 ntforked-ml pingdow wuzuowuyou liej6799 mpastell stjordanis dcopm999 yunfsbc qiuweibin2005 hongranfu xuwei1119 liangxiaoyun tiqq111 dyf-ai shining-love zsffuture wenbank zidanewenqsh guanyuwang0001 fedeai erfun76 wz940216 wangnan31415926 chaucergit zj-ming smlovers12138 sixgodgg simuo bojanfaletic hahahouomg liubamboo reddevil1310 xgdebug zhangshaojie1993 hhxdestiny hongson23 silence0628 tianjun-world thaile189 bolifeyo qile10234 xukaikai1992 panghongwei17 xiaoyu1004 rhysdg zhongqingyang zouwen198317 muhammedakyuzlu bryan-bai shenmayufei doublexxking manman25 rexiome wdnmd-rushb fansun126 panda666666 zhujinwen0924 xinqinew davidko3 pujianjian ddgrcf qf9898 pthread-t fardman69420 caoailin zjj-2015 flyboy82 diemlt4 huanggua123456 dongso 0000duck mhirano tenyearsadream haozhang13 skylord2 linecode nanhai78 xq198109 zhwang41012 snowba1l webstorage119 zshaobo

libtorch-yolov5's Issues

Does anyone run GPU inference successfully?

I could not run the inference with GPU enabled. I follow the instructions to modify the export.py code to export the torchscript model with GPU, but when inferring with libtorch, it cannot load the weight.

Does anyone know how to solve it?

My OS is windows 10 and it is able to run the CPU torchscript model.

Thanks in advance.

Suggestions on creation MSVS project builds

Beyond cmake builds, do you have any suggestions on how to build code like this for Visual Studio?
Thanks in advance...

Performance difference between running model in python vs c++?

Hi @yasenh ,
The code is beautifully written in c++. Even I tried but could not perform the end to end successfully due to lack of expertise in c++ in libtorch. Could you also provide some sort of statistics that could tell whether running the model in c++ improves the performance as compared to running the same model in python in GPU?

What I guess is there should not be much difference, even if it exists. And if exists, why such performance difference is coming? I know I am asking too much, but if you could analyze it, it would be really useful.

Python and libtorch model prediction results are inconsistent

Hello, I have updated the version of YOLOv5 (4.0). I found that the prediction results of the python model are a little different from the results predicted by the libtorch model. The prediction results of the 3.1 version are the same. What is the reason? Can you help me, thank you!

推理速度

在对您的代码进行简单修改后，使用循环读取本地文件的方式，用yolov5s模型，进行效果测试，发现推理速度只有两三帧，且查看GPU,发现GPU的占用率很小，所以想问下，该工程是不是不支持模型加载一次，而进行预测。修改代码部分如图。期待您的答复，谢谢！

`// load input image
std::vectorcv::String filenames;
cv::String folder = "/home/xavier/dataset/DF";
cv::glob(folder, filenames);
for(size_t i = 0; i < filenames.size(); ++i)
{
cv::Mat img = cv::imread(filenames[i]);
//std::cout<<"******"<<filenames[i]<<std::endl;
if (img.empty())
{
std::cerr << "Error loading the image!\n";
return -1;
}
// load network
std::string weights = opt["weights"].asstd::string();
auto detector = Detector(weights, device_type);

// set up threshold
float conf_thres = opt["conf-thres"].as<float>();
float iou_thres = opt["iou-thres"].as<float>();

// inference
auto result = detector.Run(img, conf_thres, iou_thres);

// visualize detections
if (opt["view-img"].as<bool>()) {
    Demo(img, result[0], class_names);
}
//cv::destroyAllWindows();

当检测一张没有任何可识别内容的图片，引发错误崩溃。

0x00007FFB68A2A799 处(位于 Demo.exe 中)有未经处理的异常: Microsoft C++ 异常: c10::Error，位于内存位置 0x00000012BCF9BE60 处。

代码中断跳转到：kernel_lambda.h
auto operator()(Parameters... args) -> decltype(std::declval()(std::forward(args)...)) {
return kernel_func_(std::forward(args)...);
}

调试在这里开始引起崩溃停止：
// get the max classes score at each result (e.g. elements 5-84)
std::tuple<torch::Tensor, torch::Tensor> max_classes = torch::max(det.slice(1, item_attr_size, item_attr_size + num_classes), 1);

./libtorch-yolov5 error

hi
Thank you for your work, when I run "./libtorch-yolov5 /data_1/train_project/OBJ_Detection/yolov5-forward/module/torchscript.pt /data_1/train_project/OBJ_Detection/yolov5-forward/img/000240_01046820200606110918_0035_670_3cls.jpg -gpu".
The following error occurred

terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723, please report a bug to PyTorch. Expected Tuple but got GenericList (toTuple at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f7d00dfaaaa in /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/lib/libc10.so)
frame #1: c10::IValue::toTuple() const & + 0x121 (0x559bed24f2b3 in ./libtorch-yolov5)
frame #2: + 0xef9c (0x559bed245f9c in ./libtorch-yolov5)
frame #3: + 0x4176b (0x559bed27876b in ./libtorch-yolov5)
frame #4: __libc_start_main + 0xe7 (0x7f7cabd8eb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: + 0xcc6a (0x559bed243c6a in ./libtorch-yolov5)

how to solve this problem

发现一个问题，在实际跑的时候前面两张速度很慢

不知道是什么原因，我设置的是一张图一张图过的，有的模型第一张图要7.8秒时间，第二张图也要1.2秒，有的模型第一张图几百毫秒，第二张图最高甚至要30秒。但是很奇怪的是，过了前两张图就全部正常了，总体时间也就几十毫秒左右。

不知道有没有遇到相同问题的，找到原因的，推理时间这块要找原因无从下手啊！

batch_size

hello,thxfor your code ,did you test the batch reference?

I did all the steps but in the make step I get this error

[ 33%] Building CXX object CMakeFiles/libtorch-yolov5.dir/src/detector.cpp.o
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/ArrayRef.h:19:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/MemoryFormat.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/core/TensorBody.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Tensor.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Context.h:4,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/C++17.h:24:2: error: #error You need C++14 to compile PyTorch
#error You need C++14 to compile PyTorch
^~~~~
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/Exception.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
inline decltype(auto) str(const Args&... args) {
^~~~
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected ‘)’ before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:8: error: expected unqualified-id before ‘decltype’
inline decltype(auto) str(const Args&... args) {
^~~~~~~~
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h: In member function ‘void c10::Device::validate()’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:96:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(index_ == -1 || index_ >= 0,
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:98:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(!is_cpu() || index_ <= 0,
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h: In member function ‘void* c10::Allocator::raw_allocate(size_t)’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());
^
.....

why GPU run time longger than CPU run time？？

Can you give sample code of batch inference ?

thx!

Why is the detection speed slow on GPU？

Dear author, I use yolov5s.pt to detect images. It takes 289 ms per frame on CPU and 127 ms per frame on rtx3070 graphics card. Why is it slow on GPU? Pytorch version yolov5 takes 11 ms per frame on rtx3070 graphics card。

I look forward to your help！

Export ONNX with CUDA

Hi！
I have modified the "export.py to support GPU，but still receive the following error:

RuntimeError: Input, output and indices must be on the current device

Do you have any suggestions on how this issue can be resolved?
Thanks!

Performance on Win10 with GPU

My device is I5 with GPU 1080TI 11GB, and I have successfully complie and run on WIN10 with GPU, but why the inference take taht much time(Release mode)? Already coment out the warm up part in the main.cpp, and it will still take around 500ms to process one single image. But when I using the same model to do detection in python, it works much more efficient with 20FPS. Dont know whats wrong with my configuration or any other issue is the C++ project decrease the performance.

model.model[-1].export = False 疑问

有一个疑问，在导出模型时model.model[-1].export = False 但是Detect层中还有对每个特征图进行卷积操作的操作，如果导出模型不导出Detect层，模型的输出不就与训练的不一致了么？

when applying the model.fuse(), the processing of torchscript export fail. (how to accelerate the model inference ？)

there is no clip coords process in your code

Hi, when i use your code , i find a problem. In python version code of yolov5, there is a clip_coords function in /utils/general.py(240 rows) which is to Clip bounding xyxy bounding boxes to image shape (height, width). Sometimes my predict box value may out of image size, so i add a clip coords process in your detector.cpp code. I wonder if I'm doing the right thing. Thank you for sharing.

Error in cmake building

Hi @yasenh
I installed all dependencies and did setup as you said in repo. But when i wanted to build it with cmake(cmake .. && make
) i got this error:

Can you please tell me whats the problem ?
Thank you

terminated call error when run /libtorch-yolov5

改成视频检测后，每次进行第二帧的inference时都会多消耗几百倍的时间。

在main.cpp中改写成视频检测，主要代码如下：

`VideoCapture capture;
std::cout << "finish load network and open the video" << std::endl;

capture.open("/home/****/libtorch-yolov5/test.mp4");
if (!capture.isOpened())
    {
        std::cout << "can not open ...\n" << std::endl;
        return -1;
    }
Mat frame;
namedWindow("output",WINDOW_AUTOSIZE);
// set up threshold
float conf_thres = 0.4;//opt["conf-thres"].as<float>();
float iou_thres = 0.5;//opt["iou-thres"].as<float>();
for (;;)
{
    capture >> frame;
    //Mat pic;
    if (frame.empty()) break;
    //imshow("output",frame);
    std::cout << "start forward" <<std::endl;
    auto result = detector.Run(frame, conf_thres, iou_thres);
    Demo(frame, result, class_names);
    imshow("output",frame);
    if (waitKey(33) >= 0) break;
}

capture.release();
cv::destroyAllWindows();
//return 0;`

然后程序运行时，加载模型后第一帧马上就会显示出检测后的图像，也能正确画出检测框，这个过程很快，但第二帧就需要几百倍的时间，在inference阶段。。之后又回复到更短的时间，每次都是这样，换了视频也是如此，我统计了时间：
----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 137 ms <-------------------------------------------------------137
post-process takes : 19 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 5 ms
inference takes : 7869 ms <--------------------------------------------------------7869
post-process takes : 24 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 3 ms
inference takes : 8 ms <------------------------------------------------------------8
post-process takes : 25 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 8 ms <-------------------------------------------------------------8
post-process takes : 23 ms

请问这可能是什么原因造成的呢？
注：不知道有什么作用，所以我取消掉了warm up。

When `PostProcessing()` error occurred

libtorch 1.5.0 debug.
Visual studio 2019
Windows 10

First step, PostProcess() with temp_img is fine.
But second step, PostProcess() with my custom image the error occurred

ㄴ When I set half_ to false, the error has gone. Why this error occured? How to run this code with half_

如果使用低版本的opencv可以？？？

run yolov5 v4.0 error

run: ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img
error:
terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor):
Expected at most 12 arguments but found 13 positional arguments.

why found 13 positional arguments?

system is NVIDIA Jetson Xavier NX and docker
opencv 4.4.0
libtorch 1.6.0
cuda 10.2
yolov5 v4.0

time of post process is way too long(后处理的时间太长了)

在python代码中用yolov5x模型一张图的预测时间是40ms。包括了得到pred的时间和nms的时间。
在c++代码中，pred的时间在20ms左右，但是nms时间达到了50ms,但实际上，nms中最耗时的部分是从gpu到cpu的数据转化，实际的nms计算并不会这样耗时，这个应该是有办法可以优化的，盼能提点一二。

一定概率会出现内存会一直涨

每次运行可能会出现内存泄露？现象就是内存不断占用，但是一直没有前向

export with torch==1.3.0???

how can i export model with torch==1.3.0????

A problem about model.model[-1].export = False

Why do this?

could you pass a yolo5 weight file? i could download every one of them.

std::bad_alloc

terminate called after throwing an instance of 'std::bad_alloc'
what(): std;;bad_alloc

How to debug?

Hi, thanks for much for creating this repo and it is really awesome.

My question is how to debug with libtorch? Now I face the problem of "segmentation fault(core dumped)" after running the warm-up.

I tried to debug with VSCode, but I could not go deep into libtorch library.

Is it compulsory to use debug version of libtorch?

auto detections = output.toTuple()->elements()[0].toTensor();

执行到 auto detections = output.toTuple()->elements()[0].toTensor(); 出现错误中断：

inline c10::intrusive_ptrivalue::Tuple IValue::toTuple() const & {
AT_ASSERT(isTuple(), "Expected Tuple but got ", tagKind());
return toIntrusivePtrivalue::Tuple();
}

batch inference

can not batch inference?

how use this code for CPU-Only?

Hi, I wanna use this code to inference with cpu-only, without cuda support?

Report error under Windows

output.toTuple()->elements()[0].toTensor();

No obj problem

hello ,
Let me post a question in your project. In detect.cpp,
// if none remain then process next image
if (det.size(1) == 0) {
continue;
}
det.size(1) == 0　should be　det.size(0) == 0　Is that right？

Did you test the GPU speed measures end-to-end time per image?

I tested your code to detect object on a video using GPU. I used GTX 1050 Ti LP and I got 3 FPS. What is it corresponding with your test result?
Thanks @yasenh

No CUDA for inference

Hi, just wondering if it is possible to build without CUDA? I don't have a NVIDIA GPU so I want to do inference with CPU

torch::Tensor preds = module.forward({ imgTensor }).toTuple()->elements()[0].toTensor()

torch::Tensor preds = module.forward({ imgTensor }).toTuple()->elements()[0].toTensor()
code error, because torch forward get a GenericList not a Tuple

getting warning and just yolov5s.torchscript.pt file when export model

I installed Python>=3.8 and PyTorch>=1.6 and also ONNX>=1.7.
I used export.py with your suggestion here: https://github.com/yasenh/libtorch-yolov5#torchscript-model-export

and I got these results, just got yolov5s.torchscript.pt, I didn't get yolov5s.onnx:

When I use weight (yolov5s.torchscript.pt) to run inference, I got these:

Is there any advice? Thank you.

Modify LetterboxImage error

Hello, thank you very much for your open source, it helped me a lot, I have a question:
When the model input image size is 640640, the accuracy of the prediction result changes and the reasoning time becomes longer; then I modified LetterboxImage (refer to the python version), the model input image size is 640480, but the error is reported as follows:

terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/models/yolo.py", line 45, in forward
_35 = (_4).forward(_34, )
_36 = (_2).forward((_3).forward(_35, ), _29, )
_37 = (_0).forward(_33, _35, (_1).forward(_36, ), )
~~~~~~~~~~~ <--- HERE
_38, _39, _40, _41, = _37
return (_41, [_38, _39, _40])
File "code/torch/models/yolo.py", line 75, in forward
_52 = torch.sub(_51, CONSTANTS.c3, alpha=1)
_53 = torch.to(CONSTANTS.c4, dtype=6, layout=0, device=torch.device("cpu"), pin_memory=None, non_blocking=False, copy=False, memory_format=None)
_54 = torch.mul(torch.add(_52, _53, alpha=1), torch.select(CONSTANTS.c5, 0, 0))
~~~~~~~~~ <--- HERE
_55 = torch.slice(y, 4, 0, 2, 1)
_56 = torch.expand(torch.view(_54, [3, 80, 80, 2]), [1, 3, 80, 80, 2], implicit=True)

Traceback of TorchScript, original code (most recent call last):
/home//PycharmProjects/paper_yolov5/models/yolo.py(57): forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
/home////PycharmProjects/paper_yolov5/models/yolo.py(137): forward_once
/home////PycharmProjects/paper_yolov5/models/yolo.py(121): forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
/home////anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py(934): trace_module
/home////anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py(733): trace
/home////PycharmProjects/paper_yolov5/models/export.py(57):
RuntimeError: The size of tensor a (60) must match the size of tensor b (80) at non-singleton dimension 3

How to solve it, thank you!

The models pt to TorchScript what it's unsuccessful

Hi
python models/export.py --weights yolov5s.pt --img 640 --batch 1

Fusing layers...
Model Summary: 120 layers, 7.06617e+06 parameters, 7.06617e+06 gradients
Traceback (most recent call last):
File "models/export.py", line 41, in
y = model(img) # dry run

.
.
.

type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'Detect' object has no attribute 'm'

请教一下cpu推理比gpu快,可能是什么原因？

使用models下的export.py（此文件未改动，来自最新的yolov5）导出模型yolov5s.pt
cpu model
导出

python models\export.py --device cpu

运行

Run once on empty image
----------New Frame----------
pre-process takes : 60 ms
inference takes : 4630 ms
post-process takes : 69 ms
----------New Frame----------
pre-process takes : 77 ms
inference takes : 3762 ms
post-process takes : 155 ms

gpu model
导出

python models\export.py --device 0

运行

Run once on empty image
----------New Frame----------
pre-process takes : 40 ms
inference takes : 2766 ms
post-process takes : 1 ms
----------New Frame----------
pre-process takes : 32 ms
inference takes : 10285 ms
post-process takes : 11 ms

自己的模型怎样生成.torchscript.pt

自己训练了个模型，转换为s.torchscript.pt 总是报错请问怎么回事？

isTuple() INTERNAL ASSERT FAILED

when i run the code,i got this error ,how could i solve it?

terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at "/dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h":842, please report a bug to PyTorch. Expected Tuple but got GenericList
Exception raised from toTuple at /dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h:842 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f569de58eb9 in /dxd/libtorch-yolov5/libtorch/lib/libc10.so)
frame #1: c10::IValue::toTuple() const & + 0xe5 (0x4206cd in ./libtorch-yolov5)
frame #2: ./libtorch-yolov5() [0x41916a]
frame #3: ./libtorch-yolov5() [0x4316f6]
frame #4: __libc_start_main + 0xf0 (0x7f5654d44840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: ./libtorch-yolov5() [0x4176b9]

Aborted (core dumped)

post proccesing takes too long

model forwarding takes only ~5 ms to infer the input blob, but post processing takes about 50 ms , i wonder as pytorch implementation(python) takes only 15 ms for both infer and post processing , but here it's taking too long for post-processing, is there any way to optimize post-processing for low latency.

Thanks