Git Product home page Git Product logo

libdeepvac's Introduction

libdeepvac

Use PyTorch model in C++ project.

这个项目定义了如何在C++项目中使用PyTorch训练的模型。

简介

在MLab(云上炼丹师)实验室,我们使用DeepVAC 来训练获得新模型,使用本项目来部署模型。

libdeepvac作为一个Linux库,在以下四个方面发挥了价值:

  • 向下封装了推理引擎,目前封装了LibTorch,即将封装TensorRT、NCNN、TNN;
  • 向上提供Deepvac类,方便用户继承并实现其自定义的模型;
  • 在modules目录下,MLab提供了经典网络的C++实现;
  • 在utils目录下,MLab提供了网络中常见helper函数的C++实现。

libdeepvac实现的模块

SOTA网络的C++实现

类名 网络 作用
SyszuxFaceRetina RetinaNet 人脸检测
SyszuxOcrPse PSENet 文字检测
SyszuxOcrDB DB Net 文字检测
SyszuxSegEsp ESPNetV2 语义分割
SyszuxClsMobile MobileNetV3 分类
SyszuxDetectYolo YOLOV5 目标检测
SyszuxClsResnet ResNet50 分类

helper函数实现

类名/函数名 作用
AlignFace 人脸对齐
nms 检测框的非极大值抑制
PriorBox 生成目标检测的候选框

未来我们会持续在modules、utils目录下提供SOTA网络的C++实现。如果用户(你)需要什么网络的C++实现,可在issues里提交申请。

编译平台的支持

libdeepvac支持在以下平台上进行编译:

  • x86_64 GNU/Linux(或者叫 AMD64 GNU/Linux)
  • aarch64 GNU/Linux(或者叫 ARM64 GNU/Linux)
  • macOS

未来不太可能扩展到其它平台。

编译目标的支持

libdeepvac支持以下目标平台的编译:

  • x86_64 GNU/Linux(或者叫 AMD64 GNU/Linux)
  • x86_64 GNU/Linux with CUDA(或者叫 AMD64 GNU/Linux with CUDA)
  • aarch64 GNU/Linux(或者叫 ARM64 GNU/Linux)
  • Android
  • iOS
  • Nvidia Jetson Xavier NX(Volta,384 CUDA cores, 48 Tensor cores, 6-core, 8GB)
  • Nvidia Jetson AGX Xavier (Volta, 512 CUDA cores, 6-core, 32GB )
  • Nvidia Jetson TX2 (Pascal, 256 CUDA cores, 2-core/4-core, 8GB)
  • Nvidia Jetson TX2 NX (Pascal, 256 CUDA cores, 2-core/4-core, 4GB)

项目依赖

libdeepvac的编译依赖C++14编译器、CMake、opencv、LibTorch。
最简便、高效的方式就是使用我们提供的MLab HomePod。使用MLab HomePod也是我们推荐的方式。

如何编译libdeepvac

libdeepvac基于CMake进行构建。

编译开关

如果要开始编译libdeepvac,需要先熟悉如下几个CMake选项的作用:

CMake选项 默认值 常用值 作用 备注
BUILD_STATIC ON ON/OFF ON:编译静态libdeepvac
OFF: 编译动态libdeepvac
OFF时,链接OpenCV静态库会带来hidden symbol问题,此时需链接OpenCV动态库
USE_STATIC_LIBTORCH OFF ON/OFF ON: 使用libtorch静态库
OFF: 使用libtorch动态库
MLab HomePod中内置有libtorch动态库
USE_MKL OFF ON/OFF 是否使用Intel MKL作为LAPACK/BLAS实现 OFF的时候,需要使用SYSTEM_LAPACK_LIBRARIES指定另外的LAPACK/BLAS实现,比如openblas、Eigen等
SYSTEM_LAPACK_LIBRARIES "" "-lblas -llapack" USE_MKL关闭后需要指定的LAPACK/BLAS库 在系统路径下安装有相应的开发环境
USE_CUDA OFF ON/OFF 是否使用CUDA 需要CUDA硬件,且系统中已经安装有CUDA ToolKit的开发时
USE_TENSORRT OFF ON/OFF 是否使用TensorRT 需要CUDA硬件,且系统中已经安装有TensorRT的开发时
USE_NUMA OFF ON/OFF 是否链接-lnuma库 NA
USE_LOADER OFF ON/OFF 是否使用图片装载器 需要C++17编译器
GARRULOUS_GEMFIELD OFF ON/OFF 是否打开调试log NA
BUILD_ALL_EXAMPLES OFF ON/OFF 是否编译所有的examples NA

下载依赖

如果你使用的是MLab HomePod 2.0 pro(或者以上版本),则忽略此小节
如果你使用的是自定义环境,那么你至少需要下载opencv库、libtorch库:

你亦可以在MLab HomePod 2.0 pro上自行从源码编译上述的依赖库。

CMake命令

以下命令所使用路径均基于MLab HomePod 2.0 pro(你可以根据自身环境自行更改)。

预备工作

# update to latest libdeepvac
gemfield@homepod2:/opt/gemfield/libdeepvac$ git pull --rebase
# create build directory
gemfield@homepod2:/opt/gemfield/libdeepvac$ mkdir build
gemfield@homepod2:/opt/gemfield/libdeepvac$ cd build

CMake

libdeepvac内置了诸多cmake开关以支持不同的软硬件开发栈:

  • 在X86_64 GPU服务器上,使用CUDA,使用libtorch静态库,且用MKL作为BLAS/LAPACK库 (MLab HomePod 2.0 pro支持):
cmake -DUSE_MKL=ON -DUSE_CUDA=ON -DUSE_STATIC_LIBTORCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/libtorch;/opt/gemfield/opencv4deepvac/" -DCMAKE_INSTALL_PREFIX=../install ..
  • 在X86_64 GPU服务器上,使用CUDA,使用libtorch动态库,且用MKL作为BLAS/LAPACK库 (MLab HomePod 2.0 pro支持):
cmake -DUSE_MKL=ON -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/opencv4deepvac;/opt/conda/lib/python3.8/site-packages/torch/" -DCMAKE_INSTALL_PREFIX=../install ..
  • 在X86_64 GPU服务器上,使用TensorRT和libtorch静态库,且用MKL作为BLAS/LAPACK库:
cmake -DUSE_MKL=ON -DUSE_CUDA=ON -DUSE_MAGMA=ON -DUSE_STATIC_LIBTORCH=ON -DUSE_TENSORRT=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/opencv4deepvac/;/opt/gemfield/libtorch" -DCMAKE_INSTALL_PREFIX=../install ..
  • 在Nvidia Jetson Xavier NX上,使用TensorRT,且用系统的blas和lapack库:
cmake -DUSE_CUDA=ON -DUSE_NUMA=ON -DUSE_TENSORRT=ON -DSYSTEM_LAPACK_LIBRARIES="-lblas -llapack" -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/opencv4deepvac/;/opt/gemfield/libtorch" -DCMAKE_INSTALL_PREFIX=../install ..

编译

cmake --build . --config Release
make install

如何使用libdeepvac库

如何在自己的项目中使用libdeepvac预编译库呢?

1. 添加find_package(Deepvac REQUIRED)

在自己项目的CMakeLists.txt中,添加

find_package(Deepvac REQUIRED)

当然,基于libdeepvac的项目也必然基于opencv和libtorch,因此,下面两个find_package也是必须的:

find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)

2. 使用libdeepvac提供的头文件cmake变量

在自己项目的CMakeLists.txt中,你可以使用如下cmake变量:

  • DEEPVAC_INCLUDE_DIRS:libdeepvac库的头文件目录;
  • DEEPVAC_LIBTORCH_INCLUDE_DIRS:libtorch库的头文件目录;
  • DEEPVAC_TENSORRT_INCLUDE_DIRS:TensorRT库的头文件目录;
  • DEEPVAC_CV_INCLUDE_DIRS:OpenCV库的头文件目录;

3. 使用libdeepvac提供的库文件cmake变量

  • DEEPVAC_LIBRARIES:libdeepvac库;
  • DEEPVAC_LIBTORCH_CPU_LIBRARIES:libtorch cpu版库;
  • DEEPVAC_LIBTORCH_CUDA_LIBRARIES:libtorch cuda版库;
  • DEEPVAC_LIBTORCH_DEFAULT_LIBRARIES:libtorch默认版库(编译时用的cpu还是cuda);
  • DEEPVAC_LIBCUDA_LIBRARIES:Nvidia cuda runtime库;
  • DEEPVAC_TENSORRT_LIBRARIES:Nvidia TensorRT runtime库;
  • DEEPVAC_CV_LIBRARIES:OpenCV库;

使用举例:

#头文件
target_include_directories(${your_target} "${DEEPVAC_LIBTORCH_INCLUDE_DIRS};${DEEPVAC_TENSORRT_INCLUDE_DIRS};${CMAKE_CURRENT_SOURCE_DIR}/include>")

#库文件
target_link_libraries( ${your_target} ${DEEPVAC_LIBRARIES} ${DEEPVAC_LIBTORCH_CUDA_LIBRARIES} ${DEEPVAC_LIBCUDA_LIBRARIES} ${DEEPVAC_CV_LIBRARIES})

Benchmark

libdeepvac会提供不同目标平台及不同推理引擎的benchmark,当前仅支持libtorch推理引擎。

1. X86-64 Linux + LibTorch的benchmark步骤

# 如果是MLab HomePod 2.0 标准版
git clone https://github.com/DeepVAC/libdeepvac && cd libdeepvac

# 如果是MLab HomePod 2.0 pro版
cd /opt/gemfield/libdeepvac && git pull --rebase
  • 编译
#新建编译目录
mkdir build
cd build
#cmake(如果基于LibTorch动态库)
cmake -DGARRULOUS_GEMFIELD=ON -DUSE_MKL=ON -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/opencv4deepvac/;/opt/conda/lib/python3.8/site-packages/torch/" -DCMAKE_INSTALL_PREFIX=../install ..
#cmake(如果基于LibTorch静态库)
cmake -DGARRULOUS_GEMFIELD=ON -DUSE_MKL=ON -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/opt/gemfield/opencv4deepvac/;/opt/gemfield/libtorch/" -DCMAKE_INSTALL_PREFIX=../install ..
#编译
make -j4
  • 运行benchmark
./bin/test_resnet_benchmark cuda:0 <your_torch_script.pt> <a_imagenet_test.jpg>

2. NA

演示

SYSZUX-FACE基于本项目实现了人脸检测功能。

libdeepvac's People

Contributors

1icas avatar buptlihang avatar gemfield avatar mhgl avatar wyh163 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

libdeepvac's Issues

example 里带了cuda头文件,在纯CPU环境下编译不通过

RT,没有安装CUDA的机器,设置USE_CUDA=OFF, BUILD_ALL_EXAMPLES = OFF,因为仍然还是会编example,编译时报找不到头文件错误,具体:

In file included from /root/installs/libdeepvac/examples/src/test_resnet_benchmark.cpp:11:
/usr/libtorch/include/c10/cuda/CUDAStream.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory
 #include <cuda_runtime_api.h>

似应根据BUILD_ALL_EXAMPLES 和USE_CUDA的状态决定要build哪些examples?

libtorch性能问题调研

测试步骤和方法

使用https://zhuanlan.zhihu.com/p/363319763 中的:

  • "PyTorch resnet50 benchmark步骤"
  • "LibTorch resnet50 benchmark步骤"

注意:MLab HomePod 1.0中的PyTorch版本为1.8.1

测试模型:resnet50

环境

宿主机OS:Ubuntu 20.04
软件环境:MLab HomePod 1.0
CPU:Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz
GPU:NVIDIA GTX 2080ti
GPU驱动:NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0

测试场景 CPU利用率 内存(GB) GPU利用率 显存(GB) 线程数
C++ libtorch ~105% ~4.1 100% ~3.7 25
PyTorch ~132% ~3.9 ~95% ~9 32

input shape = 224x224

测试场景 forward time(ms)
C++ libtorch 3.950
PyTorch 6.4478

input shape = 640x640

测试场景 forward time(ms)
C++ libtorch 11.242
PyTorch 10.4656

input shape = 1280x720

测试场景 forward time(ms)
C++ libtorch 24.678
PyTorch 20.8056

input shape = 1280x1280

测试场景 forward time(ms)
C++ libtorch 38.094
PyTorch 37.6037

多人脸图片检测不全问题

目前对30人脸以下的图片检测效果较好
检测多人脸的图片是效果不好,比较容易出现漏检现象
主要原因参考变量:
top_k
keep_top_k
若要识别100左右的人脸,可适当增大这两个变量的值(参考值:150)

consulting about libtorch memory leak

Hi, I am using libtorch-cxx11-abi-shared-with-deps-1.6.0+cpu.zip, which is downloaded from Pytorch official website. Howere, I have experience memory leak during inference. I use valgrind and checked, the leak happened in location below:

==107045== 69,664 bytes in 1 blocks are possibly lost in loss record 87,916 of 88,026
==107045==    at 0x483C855: malloc (vg_replace_malloc.c:380)
==107045==    by 0x95D7857: mm_account_ptr_by_tid..0 (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x95D6CB1: mkl_serv_malloc (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x8B9E3F6: mkl_serv_domain_get_max_threads (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x531C8B8: at::init_num_threads() (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x5D26184: THFloatTensor_equalImpl(c10::TensorImpl*, c10::TensorImpl*) [clone .constprop.349] (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x5D26336: THFloatTensor_equal(c10::TensorImpl*, c10::TensorImpl*) (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x5BBCE31: at::native::legacy::cpu::_th_equal(at::Tensor const&, at::Tensor const&) (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x771B27D: torch::autograd::VariableType::equal(at::Tensor const&, at::Tensor const&) (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x7D826CD: torch::jit::(anonymous namespace)::tensorEqual(at::Tensor const&, at::Tensor const&) (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x7D85134: torch::jit::(anonymous namespace)::attributesEqualCSE(torch::jit::Node const*, torch::jit::Node const*) [clone .constprop.364] (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
==107045==    by 0x7D85D97: torch::jit::EqualNode::operator()(torch::jit::Node const*, torch::jit::Node const*) const (in /opt/e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)

I wonder have you experience similar problem when using libtorch? If so, could you please give me some insight how to solve it?

libdeepvac的性能测试数据汇总

树莓派 4b

  • 模块:retinaface模块;
  • Device:CPU
  • 输入:经典test1.jpg(gemfield班级照);
  • 引擎:LibTorch
  • 结果:
gemfield model load time: 0.64975
img2cvmat time: 0.00924529
begin: 0
process time: 92.8428
cv circle and write time: 0.0086217
begin: 1
process time: 92.9481
cv circle and write time: 0.00842767
begin: 2
process time: 92.0108
cv circle and write time: 0.00839388
begin: 3
process time: 91.9794
cv circle and write time: 0.00842672

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.