tencent / ncnn Goto Github PK

ncnn is a high-performance neural network inference framework optimized for the mobile platform

License: Other

CMake 0.88% Shell 0.04% C++ 49.61% C 36.52% Batchfile 0.01% GLSL 8.27% Python 4.67% SWIG 0.01%

inference high-preformance simd arm-neon deep-learning artificial-intelligence android ios ncnn vulkan

ncnn's Introduction

ncnn

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third-party dependencies. It is cross-platform and runs faster than all known open-source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, creating intelligent APPs, and bringing artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu, and so on.

ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。 ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖，跨平台，手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn，开发者能够将深度学习算法轻松移植到手机端高效执行，开发出人工智能 APP，将 AI 带到你的指尖。 ncnn 目前已在腾讯多款应用中使用，如：QQ，Qzone，微信，天天 P 图等。

技术交流 QQ 群 637093648 (超多大佬) 答案：卷卷卷卷卷（已满）	Telegram Group https://t.me/ncnnyes	Discord Channel https://discord.gg/YRsxgmF
Pocky QQ 群（MLIR YES!） 677104663 (超多大佬) 答案：multi-level intermediate representation
他们都不知道 pnnx 有多好用群 818998520 (新群！)

Download & Build status

https://github.com/Tencent/ncnn/releases/latest

	how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3, Pi4 / POWER / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000
	Source
	Build for Android Build for Termux on Android
	Android
	Android shared
	Build for iOS on macOS with xcode
	iOS
	iOS-Simulator
	Build for macOS
	macOS
	Mac-Catalyst
	watchOS
	watchOS-Simulator
	tvOS
	tvOS-Simulator
	visionOS
	visionOS-Simulator
	Apple xcframework
	Build for Linux / NVIDIA Jetson / Raspberry Pi3, Pi4 / POWER
	Ubuntu 20.04
	Ubuntu 22.04
	Build for Windows x64 using VS2017
	VS2015
	VS2017
	VS2019
	VS2022
	Build for WebAssembly
	WebAssembly
	Build for ARM Cortex-A family with cross-compiling Build for Hisilicon platform with cross-compiling Build for AllWinner D1 Build for Loongson 2K1000 Build for QNX
	Linux (arm)
	Linux (aarch64)
	Linux (mips)
	Linux (mips64)
	Linux (ppc64)
	Linux (riscv64)
	Linux (loongarch64)

Support most commonly used CNN network

支持大部分常用的 CNN 网络

Classical CNN: VGG AlexNet GoogleNet Inception ...
Practical CNN: ResNet DenseNet SENet FPN ...
Light-weight CNN: SqueezeNet MobileNetV1 MobileNetV2/V3 ShuffleNetV1 ShuffleNetV2 MNasNet ...
Face Detection: MTCNN RetinaFace scrfd ...
Detection: VGG-SSD MobileNet-SSD SqueezeNet-SSD MobileNetV2-SSDLite MobileNetV3-SSDLite ...
Detection: Faster-RCNN R-FCN ...
Detection: YOLOv2 YOLOv3 MobileNet-YOLOv3 YOLOv4 YOLOv5 YOLOv7 YOLOX ...
Detection: NanoDet
Segmentation: FCN PSPNet UNet YOLACT ...
Pose Estimation: SimplePose ...

HowTo

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn 组件使用指北 alexnet 附带详细步骤，新人强烈推荐 :)

use netron for ncnn model visualization

out-of-the-box web model conversion

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step

FAQ

ncnn throw error

ncnn produce wrong result

ncnn vulkan

Features

Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
Pure C++ implementation, cross-platform, supports Android, iOS and so on
ARM NEON assembly level of careful optimization, calculation speed is extremely high
Sophisticated memory management and data structure design, very low memory footprint
Supports multi-core parallel computing acceleration, ARM big.LITTLE CPU scheduling optimization
Supports GPU acceleration via the next-generation low-overhead Vulkan API
Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
Support direct memory zero copy reference load network model
Can be registered with custom layer implementation and extended
Well, it is strong, not afraid of being stuffed with 卷 QvQ

功能概述

支持卷积神经网络，支持多输入和多分支结构，可计算部分分支
无任何第三方库依赖，不依赖 BLAS/NNPACK 等计算框架
纯 C++ 实现，跨平台，支持 Android / iOS 等
ARM Neon 汇编级良心优化，计算速度极快
精细的内存管理和数据结构设计，内存占用极低
支持多核并行计算加速，ARM big.LITTLE CPU 调度优化
支持基于全新低消耗的 Vulkan API GPU 加速
可扩展的模型设计，支持 8bit 量化和半精度浮点存储，可导入 caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
支持直接内存零拷贝引用加载网络模型
可注册自定义层实现并扩展
恩，很强就是了，不怕被塞卷 QvQ

supported platform matrix

✅ = known work and runs fast with good optimization
✔️ = known work, but speed may not be fast enough
❔ = shall work, not confirmed
/ = not applied

	Windows	Linux	Android	macOS	iOS
intel-cpu	✔️	✔️	❔	✔️	/
intel-gpu	✔️	✔️	❔	❔	/
amd-cpu	✔️	✔️	❔	✔️	/
amd-gpu	✔️	✔️	❔	❔	/
nvidia-gpu	✔️	✔️	❔	❔	/
qcom-cpu	❔	✔️	✅	/	/
qcom-gpu	❔	✔️	✔️	/	/
arm-cpu	❔	❔	✅	/	/
arm-gpu	❔	❔	✔️	/	/
apple-cpu	/	/	/	✔️	✅
apple-gpu	/	/	/	✔️	✔️
ibm-cpu	/	✔️	/	/	/

Project examples

https://github.com/magicse/ncnn-colorization-siggraph17

https://github.com/mizu-bai/ncnn-fortran Call ncnn from Fortran
https://github.com/k2-fsa/sherpa Use ncnn for real-time speech recognition (i.e., speech-to-text); also support embedded devices and provide mobile Apps (e.g., Android App)

License

BSD 3 Clause

ncnn's People

Contributors

Stargazers

Watchers

Forkers

starimel itwalter xyt2008 wangdongfrank guanbo-bao rookiepig liuguoyou kuyun-zhangyang williamdeve i-math xjsxuexing pustar wolf1981 vsooda grseb9s jassonvia scholltan nhzlx yaochx hesitationer liubinyijia flyflywang zhangbaoliang limitmhw wangyaobsz szldmgy jiaolong 280185386 peralhuang facegen lihaixiang xmqmicky huhuanming xshhhm 10183308 wyw636 daydreamcoding dlyshare amadeuzou vitanie gjtjx poisonbox andyhx trantorrepository ichejun caomw gething llinjing boyuansun hiker2046 baileyqbb ai-books txdywy leezqcst wikipedia2008 zhangll1990 hellodrx issac8huxley matrixping likeucode cuipeng xuegoo xiangrufan ctaodream pengwenfu liulei2776 wangsheng1991 pioneerlab forrestsong runauto hiprince bin913 zhxfl zhaoyunfan tjadamlee ltyscu arthacker123 devopsmi wwb1942 2php tomzhang zhguanw walkoncross liyong3forever feiyang2010jin ahuang1900 ningbende wangqiang1588 qfish lnas01 ahlfors shiyongde chenhaifeng2016 tpeng barongeng kli-casia guozhongluo lamhocn kurli tonymou

ncnn's Issues

How to compile the example?

怎么编译code里面的example，我需要将生成的lib导入进去吗？

运行caffe2ncnn出现core dump

在网上下载了caffe的模型，进行转换的时候出现core dump
直接在Ubuntu编译好了tool
./caffe2ncnn style_modle/deploy.prototxt style_modle/bvlc_reference_caffenet.caffemodel

另外请问这个ncnn2mem是做什么用的？

您好，请问一下，我使用AS，该怎么编译该项目，过程是什么样的？

What is the implementation of the convlution algorithm under the x86?

I am confused with it.
@nihui

请教一下怎么编译为VS工程

我在windows下如何利用VS进行源码学习？怎么编译？

采用自己的网络模型，计算出的结果与matlab运算出的相差较大

如题，您可不可以帮忙指点问题可能出在哪儿？

确认过ncnn每层网络输入输出size，简单看了几个FC的weight值，是正确的。

会不会下面的代码有问题？

int detect_faceIDnet(const cv::Mat& bgr, float *feat)
{
ncnn::Mat in = ncnn::Mat::from_pixels(bgr.data, ncnn::Mat::PIXEL_BGR, bgr.cols, bgr.rows);
in.substract_mean_normalize(mean_vals, 0);

ncnn::Extractor ex = faceIDnet.create_extractor();
ex.set_light_mode(true);
ex.set_num_threads(4);
ex.input("data", in);
ncnn::Mat out;
ex.extract("eltwise_fc1", out);
for (int j = 0; j<FEAT_NUM; j++)
{
	const float* prob = out.data + out.cstep * j;
	feat[j] = prob[0];
	//cout << feat[j] << " ";		
}
//cout << endl;
return 0;

}

加上openmp程序就闪退是什么原因？

编译可以通过，运行就闪退。安卓平台。

新建一个test.cpp，怎么编译可执行文件？

Compile source code and example in linux

Compile source code

Request a protobuf library. Please install it before you compile source code.

cd $root
mkdir build && cd build
cmake ..
make -j
make install

It will install include files and libncnn.a in $root/build/install.

Compile example

I don't know how to build it with cmake, I write a Makefile to build it as follow.

g++ -otest -I$root/build/install/include -lopencv_core -lopencv_highgui -lopencv_imgproc -fopenmp -pthread test.cpp $root/build/install/lib/libncnn.a

How to run the example project?

我使用的是android studio，不知道你的example project是基于什么平台开发的，怎样才能把它加入到android project中？

Inference on batch data

请问对于NCWH的数据，N > 1的图片输入，ncnn::Extractor如何处理？ncnn::Mat能支持多图转换？

纯C版本和TensorFlow相比性能差距挺大的

在mips架构的单核CPU上面，分别运行纯C版本的squeezenet demo，TensorFlow每帧3.9S，NCNN每帧19.8S，这差距有点大啊？？

CMAKE_SYSTEM_NAME is 'Android' but 'NVIDIA Nsight Tegra Visual Studio Edition' is not installed.

在win8.1上编译 android 版本，出现此错误，我都安装了NVIDIA Nsight Tegra Visual Studio Edition ，依然出错，如何破

oc工程搭建

ncnn应该要初始化文件，看android里工程里有assets里有相关文件；
oc的话应该是参考在linux的示例吧？
net.load_param("model.param");
net.load_model("model.bin");
这两个文件怎么生成？

请问后续是否还会对性能做进一步提升不？比如采用Winograd算法等

如何编译examples/squeezenet.cpp

现在mac直接跑squeezenet.cpp这个demo
1、cmake . 然后make，出现下面错误，手动在src下添加"platform.h"后出现（2）的错误

2、新出现错误

3、在CMakeLists.txt 添加include_directories(${OpenCV_INCLUDE_DIRS})后又出现新的错误：

4、指定libncnn.a绝对路径后又出现新的错误

第二条错误跟第四条的错误是一样的，又回来了/(ㄒoㄒ)/~~
菜鸟，望各位大神指点……

后续是否会支持将 tensorflow pb模型转为ncnn的模型呢?

Conv and Pooling differs from caffe when stride = 2

@nihui Conv output is same with caffe when stride = 1,but differ when stride = 2.Why?

'不怕被塞卷' what it means?

hi guys, any guesses what this line mean?

Android的squeezenet demo，测试速度160ms，和caffe持平，请问这正常吗？

rt
up主你们测试的时候都是什么样的速度？看新智源采访说能提速2~4倍，然而自己跑出来的速度ncnn和caffe都是160ms。
是在编译过程中自己漏掉了什么编译选项吗？

img2col计算卷积对比直接计算卷积是否会高效一点？

作者你好：
你的工作非常棒，但是在读你代码时，发现卷积计算方式为传统方式，你也提到直接计算卷积会节省内存，我想请教一下，如果按照img2col展开为矩阵乘法的方式是否会计算效率更高一些？谢谢！

怎么在hisi 3519上编译ncnn?

你好，请问怎么在hisi 3519上编译ncnn，是armv7架构，而且怎么样才能保证用到arm下面的neon和汇编？

半精度和8bit量化的计算

您好！我看了您的代码，发现目前版本只支持半精度浮点和8bit量化的存储，但是没有提供这两种类型的计算。最近我也在做半精度浮点计算和8bit量化计算的工作，不知道您是否有提供半精度浮点和8bit量化计算的打算？

能否介绍下NCNN的提升计算速度的一些方法？

目前看NCNN确实要比Caffe等框架要快的多，希望 @nihui 可以介绍一些。
然后X86下的卷积计算方法我没太看懂，能否稍微介绍解释下如何实现的？

能实现类似caffe的blob reshape功能？

对于input层的height和width可变的情况，或者说像图像金字塔那样，输入层的尺寸是有多个的情况，用ncnn该怎么做比较好？

up主可以出一个ios demo吗

文档较少，没有demo不知道如何使用

How to use this? Any Doc or Samples?

Nice job, If any site or docs provide along with readme or wiki could be better.

ios 平台下 wiki 步骤 make报错，如何解决，问题如下图

app problem

error as following:
undefined reference to 'ncnn:Net::load_param(unsigned char const*)'
undefined reference to 'ncnn:Net::load_model(unsigned char const*)'
undefined reference to 'ncnn:Net::from_pixels(unsigned char const*,int,int,int)'
........
undefined reference to 'ncnn:Net::Net()'
undefined reference to 'ncnn:Net::~Net()'
error:linker command failed with exit code 1 (use -v to see invocation)

如何编译tool ？

请问一下，我现在想将caffe 的模型进行转换，应该要用到tool里的caffe2ncnn，如何进行编译？这个caffe2ncnn.cpp 应该不需要进行交叉编译吧？
还有请问ncnn2mem.cpp这个文件是做什么用的？

请问ncnn2mem是做什么用的？squeezenet_v1.1.param是如何生成的

请问ncnn2mem是做什么用的？
另外我试了./caffe2ncnn squeezenet_v1.1.prototxt squeezenet_v1.1.caffemodel
只得到了ncnn.bin和ncnn.proto ， example里面的squeezenet_v1.1.param 是怎么转换得到的？

海思3519可以用吗？

有python版本吗？

如题

关于模型的优化

刚才终于编译好 so文件了但是我放android里面运行时间消耗居然有4s
前段时间用caffe测试是2.5s
我想请教一下转android项目的具体流程看看是不是遗漏了什么步骤

caffe2ncnn出错，protobuf相关

caffe2ncnn转换example里的squeezenet是没问题的，但刚尝试转换一个 face verification的caffe模型，就出现下面的错误

小白不会安装

请问，有没有详细一点的安装文档或者教程？

does ncnn support caffemodel loading?

as the example, model loading using:
squeezenet.load_param("squeezenet_v1.1.param");
squeezenet.load_model("squeezenet_v1.1.bin");

does ncnn support caffemodel loading? how to load caffemodel?

ARM Linux 上如何编出带有neon支持的库

我在linux上make出来的静态库，默认是x86的,conv层也是conv_x86，请问我需要怎样才能在Linux上编出带neon支持的库？

如何编译caffe2ncnn工具

我在ubuntu上编译caffe2ncnn.cpp的时候一直编译不成功，头文件都已经添加了。请问还有什么需要特别注意的地方吗？

ncnn2mem编译出错

/tools/../src/layer.h:22:22: fatal error: platform.h: 没有那个文件或目录
报的这个错
我想问下具体的编译流程

关于低精度矩阵乘法的问题

@nihui 你好，请问NCNN这种对于 FLOAT 数据进行量化的过程，是否有考虑过后续如果使用这种量化方式的参数进行矩阵乘法（Fix_Point）的溢出问题呢？

希望添加group convolution和depth-wise convolution

RT，计算量很小的mobilenet和shufflenet都需要group convolution/depth-wise convolution操作，看api似乎对于group conv不支持，希望添加相应的支持并做优化。

性能好像没有发挥出来

在RK3288上4核全开，squeezenet要220多ms，和tensorflow相比，不算有优势的。

请问这个跟 Core ML 有什么联系区别吗？

Where is the project web?

Awesome!
Could u post the project web and post some blog about how to install and use it in android or ios?

测试MTCNN结果完全不一样

跑了下MTCNN的PNet和RNet结果与标准结果相差很大，拿一张人脸给RNet的得分也很低

const float mean_vals[3] = {127.5f, 127.5f, 127.5f};
const float norm_vals[3] = {0.0078125, 0.0078125, 0.0078125};

int hs = ceil(img_hscales[i]);
int ws = ceil(img_wscales[i]);
ncnn::Mat pnet_img = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, img_w, img_h, ws, hs);
pnet_img.substract_mean_normalize(mean_vals,norm_vals);
ncnn::Extractor Pnet_ex = Pnet.create_extractor();
Pnet_ex.set_light_mode(true);
Pnet_ex.input("data", pnet_img);
ncnn::Mat score, loc;
Pnet_ex.extract("prob1", score);
Pnet_ex.extract("conv4-2", loc);

if(*(score_data+i)>=thresh)。。。