pkuzhou / mtcnn_facedetection_tensorrt Goto Github PK

View Code? Open in Web Editor NEW

202.0 3.0 57.0 13.82 MB

MTCNN C++ implementation with NVIDIA TensorRT Inference accelerator SDK

CMake 1.72% C++ 77.86% Makefile 20.42%

mtcnn_facedetection_tensorrt's Introduction

blob# MTCNN_TensorRT

MTCNN Face detection algorithm's C++ implementation with NVIDIA TensorRT Inference acceleration SDK.

This repository is based on https://github.com/AlphaQi/MTCNN-light.git

Notations

2018/11/14: I have ported most of the computing to GPU using OpenCV CUDA warper and CUDA kernels wrote by myself. See branch all_gpu for more details, note that you need opencv 3.0+ built with CUDA support to run the projects. The speed is about 5-10 times faster on my GTX1080 GPU than master branch.

2018/10/2: Good news! Now you can run the whole MTCNN using TenorRT 3.0 or 4.0!

I adopt the original models from offical project https://github.com/kpzhang93/MTCNN_face_detection_alignment and do the following modifications: Considering TensorRT don't support PRelu layer, which is widely used in MTCNN, one solution is to add Plugin Layer (costome layer) but experiments show that this method breaks the CBR process in TensorRT and is very slow. I use Relu layer, Scale layer and ElementWise addition Layer to replace Prelu (as illustrated below), which only adds a bit of computation and won't affect CBR process, the weights of scale layers derive from original Prelu layers.

Required environments

OpenCV (on ubuntu just run sudo apt-get install libopencv-dev to install opencv)
CUDA 9.0
TensorRT 3.04 or TensorRT 4.16 (I only test these two versions)
Cmake >=3.5
A digital camera to run camera test.

Build

Replace the tensorrt and cuda path in CMakeLists.txt
Configure the detection parameters in mtcnn.cpp (min face size, the nms thresholds , etc)
Choose the running modes (camera test or single image test)
cmake .
make -j
./main

Results

The result will be like this in single image test mode:

Speed

On my computer with nvidia-gt730 grapic card (its performance is very very poor) and intel i5 6500 cpu, when the min face-size is set to 60 pixels, the above image costs 20 to 30ms.

TODO

Inplement the whole processing using GPU computing.

mtcnn_facedetection_tensorrt's People

Contributors

Stargazers

Watchers

mtcnn_facedetection_tensorrt's Issues

few quotations on gpu version

First of all good job on MTCNN GPU version!

why the image size need to configure up front in the constructor?
is there a resin why you comment the mtcnn::~mtcnn()?

输出信息

你好，我这边有一个问题：除了输出bounding box和5个关键点，还有没有其他可供使用的输出信息？

输入640x480的话是重新生成这个size的caffe模型吗

请问现在这个项目是还没有跑通嘛？只有PNet和RNet

如题，谢谢您提供的代码~

Intuition of replacing PReLU

Can you explain how is Scaling, applying ReLU and then again Scaling and elementwise addition equivalent to PReLU?

bug the sample with image is not working

I have a newly install jetpack4.5 on jetson nano
it is basicly ubuntu 18.04 aarch64with all nvidia stuff cuda tensorrt etc

Start generating TenosrRT runtime models
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::replace: __pos (which is 15) > this->size() (which is 0)
The program has unexpectedly finished.

when debugging the problem lies with line 44 in mtcnn.cpp

pnet_engine = new Pnet_engine[scales_.size()];
simpleFace_ = (Pnet**)malloc(sizeof(Pnet*)*scales_.size());
for (size_t i = 0; i < scales_.size(); i++) {
    int changedH = (int)ceil(row*scales_.at(i));
    int changedW = (int)ceil(col*scales_.at(i));
    pnet_engine[i].init(changedH,changedW); <--------- when the are negative values

I was just calling the attached photo
image_test("/home/jetson/git/MTCNN_FaceDetection_TensorRT/4.jpg");

Problem about model training

Thanks for you sharing.
I have question about what data augmentation strategy you have used when trainging this mtcnn model, and could you tell me which dataset you have used if you train this model by public dataset.

core dumped

terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::replace: __pos (which is 15) > this->size() (which is 0)
Aborted (core dumped)

MTCNN not worked when continueously pass images?

GPU MTCNN not detect faces fater first frame. How can solve this problem?

tensorrt speed improvement interms of libtorch with cuda or caffe with cuda?

Have you test original caffe with cuda speed with tensorrt? How many gains does it got?

why add means?

in this code,why add mean? not sub?

请问使用Relu layer, Scale layer 和 ElementWise替换prelu需要重新训练吗？

请问使用Relu layer, Scale layer 和 ElementWise替换prelu需要重新训练吗？还是直接修改prototxt中对应的层就可以了？谢谢

Int8Calibrator

I am using TensorRT 5 and trying to add the code for Int8 Quantization. I tried adding the following lines in baseEngine.cpp but it is giving me an error.

builder->setInt8Mode(true);
IInt8Calibrator* calibrator;
builder->setInt8Calibrator(calibrator);

WARNING: Int8 mode specified but no calibrator specified. Please ensure that you supply Int8 scales for the network layers manually.
ERROR: Calibration failure occured with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.

core dump from detect function

I get opencv Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in GpuMat, file cuda_gpu_mat.cpp, line 152

I see that the assertion can be from all the places we create a new image with Rect temp((*it).y1, (*it).x1, (*it).y2-(*it).y1, (*it).x2-(*it).x1)
but opencv rectangle get Rect_ (_Tp _x, _Tp _y, _Tp _width, _Tp _height)
why the x and y are fliped?
how can I fix this core?

每次启动时间过长，能否将生成的中间文件保存下来？

每次启动，tensorrt都会对caffe模型进行转换，能否只转换一次，并将转换后的模型保存到本地，后面每次只读取转换后的模型？

stack smashing detected

thanks in advance.
while running the main file, program is throwing stack smashing error during generating TenosrRT runtime models.

at caffeToGIEModel function inside pnet_rt.cpp line46 program is throwing this error and process is getting terminated with signal 6 (SIGABRT)

and can you explain or provide link for gLogger, which is used as a parameter in createInferBuilder() [baseEngine.cpp line 32]

多进程运行报错

感谢大佬分享~
demo 我已经跑通，不过我使用多进程运行的时候，遇到了以下错误。
[TensorRT] ERROR: ../rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
希望大佬指教。感谢！

could not parse layer type PReLU?

Thanks your codes.
after cmake and make ,then i run it , the result are as follows:
Beging parsing Pnet model...
could not parse layer type PReLU
End parsing Pnet model
Segmentation fault (core dumped)

I use tensorrt2.1,cudnn7.0,cuda 8.0,opencv2.4.13.
Could you tell me how to solver this problem?

Prelu replacement - "Weights for scale layer" doesn't exists

Hi,
Was the model trained on relu+scale combination or it was trained on prelu and you just replace the prelu with equivalent operations only in the prototxt file.
I have a model which was trained on prelu, but replacing it with relu+scale combination in the prototxt gives me a "Weights for scale layer" doesn't exists on Tensorrt. Any idea how to solve the issue?

inference speed too slow

I run your demo in a tensorRT5.0 docker image, found the speed of inference on your 4.jpg was too slow.
My environment is: ubuntu16.04 + cuda9.0 + cudnn7.3.1 + tensorRT5.0.
Here is the Log:

Start generating TenosrRT runtime models
End generating TensorRT runtime models
first model inference time is 0.842
first model inference time is 0.511
first model inference time is 0.396
first model inference time is 0.313
first model inference time is 0.296
first model inference time is 0.266
first model inference time is 0.254
first time is 3.134
second time is 13.168
third time is 7.437
first model inference time is 0.612
first model inference time is 0.431
first model inference time is 0.344
first model inference time is 0.282
first model inference time is 0.266
first model inference time is 0.251
first model inference time is 0.269
first time is 2.672
second time is 15.089
third time is 7.409
time is 25.31

Do you have any idea about this?