Comments (6)
用最新的 20240410 版本
pnnx 转换时使用 fp16=0
ncnn 禁用fp16 https://github.com/Tencent/ncnn/wiki/FAQ-ncnn-produce-wrong-result#disable-fp16
应该就能获得完全一致的结果了
from ncnn.
用 ncnn-20240102,禁用fp16,检测结果和pytorch一致。 禁用fp16后,速度慢了50%。能否启用fp16,还能得到正确结果?
用 20240410 版本啊
from ncnn.
用 ncnn-20240102,禁用fp16,检测结果和pytorch一致。
禁用fp16后,速度慢了50%。能否启用fp16,还能得到正确结果?
from ncnn.
采用 https://github.com/Qengineering/YoloV8-ncnn-Raspberry-Pi-4 的模型,这个模型去掉了部分输出推理逻辑,在启动fp16时,检测结果是正确的。 对原始模型的输出是有某些算子不支持吗?
from ncnn.
20240410 , 启用fp16,能得到正确结果,精度有偏差。
pytorch 检测结果(half=False,不启用fp16):
原始模型:
[class] [x_center] [y_center] [width] [height] [confidence]
20 0.578238 0.674553 0.595038 0.432779 0.900195
33 0.287819 0.258129 0.104904 0.100881 0.671818
33 0.113862 0.0640004 0.222977 0.128001 0.64798
33 0.248576 0.161419 0.0472315 0.0305715 0.539291
14 0.599585 0.415717 0.0515121 0.0925999 0.472335
ncnn模型:
20 0.578125 0.675 0.59375 0.433594 0.902344
33 0.1125 0.0645264 0.224219 0.129053 0.731445
33 0.287891 0.257812 0.104883 0.101074 0.689941
33 0.248437 0.161426 0.0470703 0.0304688 0.547363
14 0.6 0.416016 0.0527344 0.0919922 0.474854
C++检测结果:
Kunpeng-920 启用fp16
20 = 0.89453 at 417.00 451.00 887.00 x 442.00
33 = 0.72852 at 352.00 211.00 157.00 x 99.00
33 = 0.72705 at 0.00 0.00 337.00 x 127.00
33 = 0.65088 at 337.00 145.00 68.00 x 35.00
14 = 0.63623 at 861.00 373.00 75.00 x 90.00
Kunpeng-920 不启用fp16
20 = 0.89414 at 416.00 451.00 888.00 x 442.00
33 = 0.73525 at 352.00 210.00 157.00 x 99.00
33 = 0.72803 at 0.00 0.00 337.00 x 128.00
33 = 0.65210 at 337.00 145.00 69.00 x 35.00
14 = 0.62812 at 861.00 373.00 75.00 x 90.00
Intel(R) Xeon(R) Gold 6240 不启用fp16
20 = 0.89415 at 416.00 451.00 888.00 x 442.00
33 = 0.73525 at 352.00 210.00 157.00 x 99.00
33 = 0.72803 at 0.00 0.00 337.00 x 128.00
33 = 0.65210 at 337.00 145.00 69.00 x 35.00
14 = 0.62812 at 861.00 373.00 75.00 x 90.00
C++不启用fp16的检测结果和pytorch不启用fp16推理ncnn模型的精度也存在偏差。
不启用fp16,鲲鹏和x64 检测结果完全一样。
from ncnn.
20240410 , 启用fp16,能得到正确结果,精度有偏差。
pytorch 检测结果(half=False,不启用fp16): 原始模型: [class] [x_center] [y_center] [width] [height] [confidence] 20 0.578238 0.674553 0.595038 0.432779 0.900195 33 0.287819 0.258129 0.104904 0.100881 0.671818 33 0.113862 0.0640004 0.222977 0.128001 0.64798 33 0.248576 0.161419 0.0472315 0.0305715 0.539291 14 0.599585 0.415717 0.0515121 0.0925999 0.472335
ncnn模型: 20 0.578125 0.675 0.59375 0.433594 0.902344 33 0.1125 0.0645264 0.224219 0.129053 0.731445 33 0.287891 0.257812 0.104883 0.101074 0.689941 33 0.248437 0.161426 0.0470703 0.0304688 0.547363 14 0.6 0.416016 0.0527344 0.0919922 0.474854
C++检测结果: Kunpeng-920 启用fp16 20 = 0.89453 at 417.00 451.00 887.00 x 442.00 33 = 0.72852 at 352.00 211.00 157.00 x 99.00 33 = 0.72705 at 0.00 0.00 337.00 x 127.00 33 = 0.65088 at 337.00 145.00 68.00 x 35.00 14 = 0.63623 at 861.00 373.00 75.00 x 90.00
Kunpeng-920 不启用fp16 20 = 0.89414 at 416.00 451.00 888.00 x 442.00 33 = 0.73525 at 352.00 210.00 157.00 x 99.00 33 = 0.72803 at 0.00 0.00 337.00 x 128.00 33 = 0.65210 at 337.00 145.00 69.00 x 35.00 14 = 0.62812 at 861.00 373.00 75.00 x 90.00
Intel(R) Xeon(R) Gold 6240 不启用fp16 20 = 0.89415 at 416.00 451.00 888.00 x 442.00 33 = 0.73525 at 352.00 210.00 157.00 x 99.00 33 = 0.72803 at 0.00 0.00 337.00 x 128.00 33 = 0.65210 at 337.00 145.00 69.00 x 35.00 14 = 0.62812 at 861.00 373.00 75.00 x 90.00
C++不启用fp16的检测结果和pytorch不启用fp16推理ncnn模型的精度也存在偏差。 不启用fp16,鲲鹏和x64 检测结果完全一样。
这个误差就是 fp16 fp32 造成的,不影响实际使用效果
from ncnn.
Related Issues (20)
- benchmark测试占用率低
- 手动创建的net,推理慢了很多
- 我有3个GPU,但get_gpu_count()=1 HOT 8
- pnnx和ncnn输出不一致
- I convert onnx to ncnn successfully, but all my inference is all nan. Eg, the output of net.extract() is all nan. HOT 15
- 鲲鹏920环境,yolov8n模型int8量化速度比默认的fp16慢了50% HOT 5
- 在对mtcnn模型第二层进行./ncnn2table过程中出现了段错误
- Bad performance for int8 inference on XuanTie 906 (RISC-V) HOT 1
- intrinsic code 没有体现算法的原本的设计,是否有计划升级intrinsic code,并设置vset为ta mu
- Can simplestl use in RTOS ?
- [ncnn-android-yolov8] How to handle real-time detect when the view set orientation to "landscape" ? HOT 1
- pnnx能正常转换模型,模型输出与onnx模型输出不一致 HOT 4
- EfficientPhys onnx转ncnn模型转换报错 HOT 3
- 用自己的yolov8模型 转成ncnn 在windows下部署后,接口也对上了,但是结果却出不来,不尽人意
- 是否能够单独使用矩阵乘法的API? HOT 1
- 转换模型报错,如何定位
- how do I get the fossilize file .foz out of the vulkan driver? HOT 2
- 龙芯教育派2k1000中运行报错浮点数例外
- a minor issue in prebuild ncnn-android libs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ncnn.