openppl-public / ppl.nn Goto Github PK

View Code? Open in Web Editor NEW

1.2K 38.0 207.0 17.56 MB

A primitive library for neural network

License: Apache License 2.0

CMake 0.79% Shell 0.04% C 0.39% C++ 97.73% Objective-C 0.48% Python 0.53% Batchfile 0.01% Assembly 0.04%

neural-network deep-learning onnx

ppl.nn's People

Contributors

Stargazers

Watchers

Forkers

alohali powerpwang chimucong johnzlli bluaxe daydreamcoding wolf15 dongxiao92 alcanderian ltj2013 yandai aboluock xingtaixiaokai sunshineywz123 sirius93123 whatlas dachengtechnology zouyxdut aptsunny xupinjie zt706 zimyang amtf1683 org-mars yangwang92 kongan turbo0628 neo-vincent j201111100523 pengcuo 666dzy666 dlyshare liuqiaoping7 mbrukman weisk column6942 si-xu wonderzy jie311 xqpinitial note-liu andrea-mariadb hush-alibaba codgeek zhen8838 gp1322719830 lqlsoftware ysh329 no5-aaron-wu zhiqwang lan-tools allentdan kyleban shining-love aceyclee charlotteroes sunnyligithub artorias123 ybai62868 hxl1990 xinsuinizhuan xxxhycl2010 upton1919 graceywh zjj-2015 ma-wenhui runauto rayechen chaucerg lvmodan leevan jichengyuan runningleon bug1989 onemoresecond siqi-yang alan-zxt tangyanf acejim litianjian chensusu11 yanxing-shi nyanpasua kiminh mochiset daviehr sensetimexsy fzhsbc lzhangzz yanqingwu rainkert yxpandjay robslhc zzskyzzsky qinxianglinya jiaomingjun kuroro-tian hiha3456 kyu-junyi renyi581176

ppl.nn's Issues

git建议

提个建议，能不能把build过程中在线git clone hpcc的方式改成离线的，目前这种方式对断网的内网环境有点不太友好

Potential bug in x86 winograd avx512 t31 kernel

Hi, I'm studying the x86 avx512 winograd code and feels there might be a bug.

ppl.nn/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/conv2d/winograd/avx512/conv2d_n16cx_winograd_t31_kernel_fp32_avx512.h

Line 129 in 0782904

"jl 6f\n" // label_ic_remain

I guess this should be "jl 5f\n" to jump to label_ic_remain, but I haven't test it yet. Just for your reference.

The inference performance of pplnn seems inconsistent for the same model

Hi guys! I used pplnn to test inference performance for resnet50 model, but I got definitely different results when I performed pplnn five times consecutively. The average latency of a single query ranges from 0.37ms to 2.03ms, which is unexpected and makes me confused.

Device: RTX 3090
CUDA: 11.1
Command:

pplnn --warmuptimes 100 --runningtimes 100 --enable-profiling --dims 1_3_224_224 --in-shapes 1_3_224_224 --onnx-model model.onnx

Logs:
resnet50 result

armserver build error

when build armserver, build failed, array init use variable.

PRelu、Dropout、Upsample ParseNodeInfo for node unsupported

(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:21:45.381][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:21:45.381][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:169] can not find param parser info of type[:PRelu]
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:21:45.384][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:21:45.385][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:21:45.385][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:21:45.385][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:49:47.035][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:49:47.035][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:169] can not find param parser info of type[:Neg]
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:49:47.039][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:49:47.039][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:49:47.039][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:49:47.040][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:55:12.258][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:55:12.258][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:169] can not find param parser info of type[:Dropout]
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:55:12.266][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:55:12.267][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:55:12.267][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:55:12.268][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:58:17.767][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:58:17.767][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:169] can not find param parser info of type[:Upsample]
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:58:17.770][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:58:17.770][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:58:17.771][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:58:17.771][pplnn.cc:714] create OnnxRuntimeBuilder failed.

您好，请问PRelu、Dropout、Upsample这些网络层是onnx模型文件问题还是目前不支持呢？

[x86-compile] error: impossible constraint in ‘asm’

I try to compile the latest master.

CPU	result
Core i5-9500(not support avx512)	`error: impossible constraint in ‘asm’`
Xeon 6130(support avx512)	pass

I find that latest commit supports AVX-512. If it is a bug, will ppl support more CPU(no avx512) and any macro to separate AVX-512 codes?
Thanks.

WSL complie error: failed to convert GOTPCREL relocation; relink with --no-relax

I want to complie the classification model under the path: ppl.nn/pplnn-build/samples/cpp/run_model, but get this issue.
the complie log is :
`
[ 1%] Built target pplcommon_static
Consolidate compiler generated dependencies of target PPLCUDAKernel
[ 10%] Built target PPLCUDAKernel
[ 18%] Built target libprotobuf
[ 43%] Built target PPLKernelX86
Consolidate compiler generated dependencies of target pplnn_static
[100%] Built target pplnn_static
Consolidate compiler generated dependencies of target classification
[100%] Linking CXX executable classification
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
samples/cpp/run_model/CMakeFiles/classification.dir/build.make:150: recipe for target 'samples/cpp/run_model/classification' failed
make[2]: *** [samples/cpp/run_model/classification] Error 1
CMakeFiles/Makefile2:1040: recipe for target 'samples/cpp/run_model/CMakeFiles/classification.dir/all' failed
make[1]: *** [samples/cpp/run_model/CMakeFiles/classification.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

`
my cuda version is 11.0

How to quantization INT8 / FP16

How to quantization INT8 / FP16?

Define_string_opt("--quantization", g_flag_quantization, "", "declare **json file** saved quantization information");

what does the json file look like？

Why sample model tests/testdata/conv.onnx CANNOT be profiled ?

I build ppl.nn project, and try one test by ./pplnn-build/tools/pplnn --onnx-model tests/testdata/conv.onnx， it works normally,

[INFO][2021-07-05 21:47:39.764][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-05 21:47:39.764][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-05 21:47:39.764][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 21:47:39.764][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 21:47:39.764][pplnn.cc:526] input[0]:
[INFO][2021-07-05 21:47:39.764][pplnn.cc:527]     name: input
[INFO][2021-07-05 21:47:39.764][pplnn.cc:534]     dim(s): 1 3 4 4
[INFO][2021-07-05 21:47:39.764][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 21:47:39.764][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 21:47:39.764][pplnn.cc:538]     NumBytesIncludePadding: 192
[INFO][2021-07-05 21:47:39.764][pplnn.cc:539]     NumBytesExcludePadding: 192
[INFO][2021-07-05 21:47:39.764][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 21:47:39.764][pplnn.cc:545] output[0]:
[INFO][2021-07-05 21:47:39.764][pplnn.cc:546]     name: 5
[INFO][2021-07-05 21:47:39.764][pplnn.cc:553]     dim(s): 1 3 5 5
[INFO][2021-07-05 21:47:39.764][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 21:47:39.764][pplnn.cc:556]     DataFormat: N16CX
[INFO][2021-07-05 21:47:39.764][pplnn.cc:557]     NumBytesIncludePadding: 1600
[INFO][2021-07-05 21:47:39.764][pplnn.cc:558]     NumBytesExcludePadding: 300
[INFO][2021-07-05 21:47:39.764][pplnn.cc:561] ----------------------
[INFO][2021-07-05 21:47:39.764][pplnn.cc:791] Run() costs: 0.010000 ms.
[INFO][2021-07-05 21:47:39.764][pplnn.cc:799] Run ok

when I try to run in profile mode, It got stuck at some where and never return. 😅😅 code it too new to me, hard to find cause, pls help~

comand: ./pplnn-build/tools/pplnn --onnx-model tests/testdata/conv.onnx --enable-profiling --warmuptimes 3

[INFO][2021-07-05 21:52:35.459][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-05 21:52:35.459][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-05 21:52:35.459][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 21:52:35.459][pplnn.cc:526] input[0]:
[INFO][2021-07-05 21:52:35.459][pplnn.cc:527]     name: input
[INFO][2021-07-05 21:52:35.459][pplnn.cc:534]     dim(s): 1 3 4 4
[INFO][2021-07-05 21:52:35.459][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 21:52:35.459][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 21:52:35.459][pplnn.cc:538]     NumBytesIncludePadding: 192
[INFO][2021-07-05 21:52:35.459][pplnn.cc:539]     NumBytesExcludePadding: 192
[INFO][2021-07-05 21:52:35.459][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 21:52:35.459][pplnn.cc:545] output[0]:
[INFO][2021-07-05 21:52:35.459][pplnn.cc:546]     name: 5
[INFO][2021-07-05 21:52:35.459][pplnn.cc:553]     dim(s): 1 3 5 5
[INFO][2021-07-05 21:52:35.459][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 21:52:35.459][pplnn.cc:556]     DataFormat: N16CX
[INFO][2021-07-05 21:52:35.459][pplnn.cc:557]     NumBytesIncludePadding: 1600
[INFO][2021-07-05 21:52:35.459][pplnn.cc:558]     NumBytesExcludePadding: 300
[INFO][2021-07-05 21:52:35.459][pplnn.cc:561] ----------------------
[INFO][2021-07-05 21:52:35.459][pplnn.cc:791] Run() costs: 0.010000 ms.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:799] Run ok
[INFO][2021-07-05 21:52:35.459][pplnn.cc:803] Warm up start for 3 times.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:810] Warm up end.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:818] Profiling start

Performance issue compared to MXNet

Hi, guys, I tested openppl with different batch size and compared its inference performance to MXNet. IMHO, if the batch size increase exponentially, the inference latency will increase near exponentially too. But the results in table below donot follow this rule. Therefore, I doubt that my test command is wrong, or the latency given by openppl is about a single sample instead of a single query?

~~By the way, I found that when --dim is specified, --in-shapes option does not seem to work properly.~~

Environment

MXNet: 1.6.0
OpenPPL: 7dd75a1
TensorRT: 8.0
Device: GTX 1080
CUDA: 10.2

The table below shows latency (ms) of ResNet50_v1b, FP32
OpenPPL command: pplnn --warmuptimes 400 --runningtimes 100 --enable-profiling --dims bs_3_224_224 --in-shapes bs_3_224_224 --onnx-model model.onnx
MXNet inference code: link

batch size	1	2	4	8	16	32
MXNet	5.814193	7.570517	11.836981	20.500102	36.853303	69.709606
OpenPPL	1.730683	1.818604	2.219570	2.222927	2.294234	2.958766
TensorRT	2.66706	4.01048	6.84147	12.4254	23.0325	43.8126

What does PPLNN stands for?

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network"

The first P stands for PPLNN , the second P stands for Primitive, is it correct?

execute engine run func, failed with error CUDA_ERROR_INVALID_IMAGE

log as follows:
[INFO][2021-11-03 10:37:40.451][simple_graph_partitioner.cc:108] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-11-03 10:37:40.456][opt_graph.cc:206] Create 206 TensorImpl
[INFO][2021-11-03 10:37:40.458][opt_graph.cc:317] added 171 new bridge kernels
[INFO][2021-11-03 10:37:40.483][algo_conv_hmma.cc:116] Compiling Conv_0
[INFO][2021-11-03 10:37:45.786][algo_conv_hmma.cc:116] Compiling Conv_4
[INFO][2021-11-03 10:37:56.366][algo_conv_hmma.cc:116] Compiling Conv_5
[INFO][2021-11-03 10:38:04.654][algo_conv_hmma.cc:116] Compiling Conv_9
[INFO][2021-11-03 10:38:11.969][algo_conv_hmma.cc:116] Compiling Conv_9
[INFO][2021-11-03 10:38:11.979][algo_conv_hmma.cc:116] Compiling Conv_10
[INFO][2021-11-03 10:38:19.334][algo_conv_hmma.cc:116] Compiling Conv_14
[INFO][2021-11-03 10:38:22.449][algo_conv_hmma.cc:116] Compiling Conv_14
[INFO][2021-11-03 10:38:22.458][algo_conv_hmma.cc:116] Compiling Conv_16
[INFO][2021-11-03 10:38:22.462][algo_conv_hmma.cc:116] Compiling Conv_20
[INFO][2021-11-03 10:38:22.471][algo_conv_hmma.cc:116] Compiling Conv_22
[INFO][2021-11-03 10:38:22.473][algo_conv_hmma.cc:116] Compiling Conv_26
[INFO][2021-11-03 10:38:25.715][algo_conv_hmma.cc:116] Compiling Conv_26
[INFO][2021-11-03 10:38:25.719][algo_conv_hmma.cc:116] Compiling Conv_27
[INFO][2021-11-03 10:38:32.929][algo_conv_hmma.cc:116] Compiling Conv_31
[INFO][2021-11-03 10:38:36.166][algo_conv_hmma.cc:116] Compiling Conv_31
[INFO][2021-11-03 10:38:36.169][algo_conv_hmma.cc:116] Compiling Conv_33
[INFO][2021-11-03 10:38:36.171][algo_conv_hmma.cc:116] Compiling Conv_37
[INFO][2021-11-03 10:38:36.178][algo_conv_hmma.cc:116] Compiling Conv_39
[INFO][2021-11-03 10:38:43.604][algo_conv_hmma.cc:116] Compiling Conv_43
[INFO][2021-11-03 10:38:45.240][algo_conv_hmma.cc:116] Compiling Conv_43
[INFO][2021-11-03 10:38:45.243][algo_conv_hmma.cc:116] Compiling Conv_44
[INFO][2021-11-03 10:38:46.794][algo_conv_hmma.cc:116] Compiling Conv_48
[INFO][2021-11-03 10:38:48.354][algo_conv_hmma.cc:116] Compiling Conv_48
[INFO][2021-11-03 10:38:48.357][algo_conv_hmma.cc:116] Compiling Conv_50
[INFO][2021-11-03 10:38:48.358][algo_conv_hmma.cc:116] Compiling Conv_54
[INFO][2021-11-03 10:38:48.361][algo_conv_hmma.cc:116] Compiling Conv_56
[INFO][2021-11-03 10:38:48.362][algo_conv_hmma.cc:116] Compiling Conv_60
[INFO][2021-11-03 10:38:50.086][algo_conv_hmma.cc:116] Compiling Conv_60
[INFO][2021-11-03 10:38:50.090][algo_conv_hmma.cc:116] Compiling Conv_61
[INFO][2021-11-03 10:38:51.999][algo_conv_hmma.cc:116] Compiling Conv_65
[INFO][2021-11-03 10:38:54.211][algo_conv_hmma.cc:116] Compiling Conv_67
[INFO][2021-11-03 10:38:54.212][algo_conv_hmma.cc:116] Compiling Conv_71
[INFO][2021-11-03 10:38:57.143][algo_conv_hmma.cc:116] Compiling Conv_71
[INFO][2021-11-03 10:38:57.145][algo_conv_hmma.cc:116] Compiling Conv_72
[INFO][2021-11-03 10:38:59.957][algo_conv_hmma.cc:116] Compiling Conv_76
[INFO][2021-11-03 10:39:02.925][algo_conv_hmma.cc:116] Compiling Conv_76
[INFO][2021-11-03 10:39:02.927][algo_conv_hmma.cc:116] Compiling Conv_78
[INFO][2021-11-03 10:39:02.929][algo_conv_hmma.cc:116] Compiling Conv_82
[INFO][2021-11-03 10:39:02.929][algo_conv_hmma.cc:116] Compiling Conv_82
[INFO][2021-11-03 10:39:02.931][algo_conv_hmma.cc:116] Compiling Conv_84
[INFO][2021-11-03 10:39:02.933][algo_conv_hmma.cc:116] Compiling Conv_88
[INFO][2021-11-03 10:39:02.934][algo_conv_hmma.cc:116] Compiling Conv_90
[INFO][2021-11-03 10:39:02.936][algo_conv_hmma.cc:116] Compiling Conv_94
[INFO][2021-11-03 10:39:05.646][algo_conv_hmma.cc:116] Compiling Conv_95
[INFO][2021-11-03 10:39:08.061][algo_gemm.cc:113] Compiling Gemm_98
[INFO][2021-11-03 10:39:09.090][opt_graph.cc:592] deleted 167 bridge kernels
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 215] engine config as follows:
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 216] forward type : 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 217] mem mode : 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 227] input[0]: input
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 237] dim(s): 1 3 224 224
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 238] DataType: FLOAT32
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 239] DataFormat: NDARRAY
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 240] BytesIncludePadding: 602112
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 241] BytesExcludePadding: 602112
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 247] output[0]: probs
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 257] dim(s): 0 1000
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 258] DataType: FLOAT32
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 259] DataFormat: NDARRAY
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 260] BytesIncludePadding: 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 261] BytesExcludePadding: 0
[2021-11-03 10:39:17.024 +08:00] [infer-engine-log] [---I---] [thread 30647] [base_model.cpp loadModel 88] model /disk26b/zhaojd/github_codes/infer_engine_serving/models//test_openppl/1/model.bin loaded success success.
[2021-11-03 10:39:17.034 +08:00] [infer-engine-log] [---I---] [thread 30647] [prometheus_metrics.cpp init 31] prometheus binding address is 0.0.0.0:10098
[2021-11-03 10:39:17.036 +08:00] [infer-engine-log] [---I---] [thread 30647] [server_manager.cpp ServerManager 26] add resource monitor timer task to worker group success.
[2021-11-03 10:39:17.036 +08:00] [infer-engine-log] [---I---] [thread 30647] [server_manager.cpp ServerManager 36] add model update timer task to worker group success
start service....
[2021-11-03 10:39:17.037 +08:00] [infer-engine-log] [---I---] [thread 30647] [infer_engine_server.cpp start 29] http service binding to port 10099
[2021-11-03 10:39:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:07.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:07.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:57.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:47.041 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:47.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:27.044 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:47.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:27.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:57.044 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:19.931 +08:00] [infer-engine-log] [---I---] [thread 32922] [http_service.cpp operator() 143] 7505367914-->Enter modelInferHandler
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 69] task 7505367914 info as follows:
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 70] task_type : model_infer
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 71] model_name : test_openppl
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 72] model_version: 1
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 73] model_action :
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 79] image shape : 224x224x3
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [http_service.cpp execTask 116] 7505367914-->execute model_infer task
[2021-11-03 10:52:19.939 +08:00] [infer-engine-log] [---I---] [thread 32909] [model_manager.cpp processModelInferTask 227] 7505367914-->Exec process_model_infer_task
[2021-11-03 10:52:27.316 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:37.312 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:47.547 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:06.856 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:10.127 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:22.376 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:27.620 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:39.003 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:49.611 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:21.842 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:22.715 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:23.831 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:35.932 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...

error: cuModuleLoadDataEx(&module_, source_code_.second.c_str(), 0, 0, 0) failed with error CUDA_ERROR_INVALID_IMAGE
terminate called after throwing an instance of 'std::system_error'
what(): Resource deadlock avoided

发现2个不支持的op

主干代码，centos7 编译后，运行命令：
pplnn --use-x86 --onnx-model xxx.onnx

···
[INFO][2021-11-26 15:01:38.832][pplnn.cc:797] ppl.nn version: 63efb93-dirty
[INFO][2021-11-26 15:01:38.832][pplnn.cc:278] ***** register X86Engine *****
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:175] unsupported op: domain[], type[MaxPool], version[9]
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:202] ParseNodeInfo for node[MaxPool_2] failed: unsupported
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2021-11-26 15:01:39.002][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:39.002][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:39.002][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
[ERROR][2021-11-26 15:01:39.016][pplnn.cc:817] create RuntimeBuilder failed.
···

···
[INFO][2021-11-26 15:01:51.409][pplnn.cc:797] ppl.nn version: 63efb93-dirty
[INFO][2021-11-26 15:01:51.409][pplnn.cc:278] ***** register X86Engine *****
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:175] unsupported op: domain[], type[CumSum], version[11]
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:202] ParseNodeInfo for node[CumSum_181] failed: unsupported
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2021-11-26 15:01:51.598][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:51.599][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:51.599][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
[ERROR][2021-11-26 15:01:51.614][pplnn.cc:817] create RuntimeBuilder failed.
···

testIsaAVX512 execute fail

thanks for your great jobs，I intergrate ppl into my project，when i run the program in debug mode，it failed with exception as follows

cuda创建引起后启动不了

ERROR: axis[ << axis << ] is out of range[-2, 2].

can ImagePreprocess use cuda？

samples/cpp/run_model/classification.cpp use opencv for Preprocess？
why not use ppl.cv（cuda）?

x86引擎运行没问题，但cuda引擎无法运行，会卡在Compiling Conv_0直至64G内存全部耗尽

[DEBUG][2022-02-26 11:23:12.125][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_127_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_139_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_151_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_163_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_176_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_185_Fused]
[INFO][2022-02-26 11:23:12.127][engine_graph_partitioner.cc:103] total partition(s) of graph[torch-jit-export]: 1.
[DEBUG][2022-02-26 11:23:12.153][opt_graph.cc:186] Can not reshape safely for node[Resize_170]
[DEBUG][2022-02-26 11:23:12.154][opt_graph.cc:186] Can not reshape safely for node[Resize_158]
[DEBUG][2022-02-26 11:23:12.155][opt_graph.cc:186] Can not reshape safely for node[Resize_146]
[DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Resize_134]
[DEBUG][2022-02-26 11:23:12.156][reshape_concat.cc:43] ERROR: input[1]'s dim[2]'s value[1] != input[0]'s dim[2]'s value[37].
[DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Concat_171]
[DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_183]
[DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_192]
[DEBUG][2022-02-26 11:23:12.173][opt_graph.cc:200] Create 305 TensorImpl
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_172] and nextnode[Relu_173]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_124] and nextnode[Relu_125]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_136] and nextnode[Relu_137]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_148] and nextnode[Relu_149]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_160] and nextnode[Relu_161]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Add_121]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Relu_122]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_118] and nextnode[Relu_119]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_116] and nextnode[Relu_117]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Add_114]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Relu_115]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_111] and nextnode[Relu_112]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_109] and nextnode[Relu_110]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Add_107]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Relu_108]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_103] and nextnode[Relu_104]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_101] and nextnode[Relu_102]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Add_99]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Relu_100]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_96] and nextnode[Relu_97]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_94] and nextnode[Relu_95]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Add_92]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Relu_93]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_89] and nextnode[Relu_90]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_87] and nextnode[Relu_88]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Add_85]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Relu_86]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_82] and nextnode[Relu_83]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_80] and nextnode[Relu_81]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Add_78]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Relu_79]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_75] and nextnode[Relu_76]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_73] and nextnode[Relu_74]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Add_71]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Relu_72]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_68] and nextnode[Relu_69]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_66] and nextnode[Relu_67]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Add_64]
[DEBUG][2022-02-26 11:23:12.179][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Relu_65]
[DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_60] and nextnode[Relu_61]
[DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_58] and nextnode[Relu_59]
[DEBUG][2022-02-26 11:23:12.181][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Add_56]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Relu_57]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_53] and nextnode[Relu_54]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_51] and nextnode[Relu_52]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Add_49]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Relu_50]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_46] and nextnode[Relu_47]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_44] and nextnode[Relu_45]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Add_42]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Relu_43]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_39] and nextnode[Relu_40]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_37] and nextnode[Relu_38]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Add_35]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Relu_36]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_31] and nextnode[Relu_32]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_29] and nextnode[Relu_30]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Add_27]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Relu_28]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_24] and nextnode[Relu_25]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_22] and nextnode[Relu_23]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Add_20]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Relu_21]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_17] and nextnode[Relu_18]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_15] and nextnode[Relu_16]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Add_13]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Relu_14]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_9] and nextnode[Relu_10]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_7] and nextnode[Relu_8]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_4] and nextnode[Relu_5]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_2] and nextnode[Relu_3]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_0] and nextnode[Relu_1]
[INFO][2022-02-26 11:23:12.192][opt_graph.cc:311] added 261 new bridge kernels
[INFO][2022-02-26 11:23:12.724][algo_conv_hmma.cc:126] Compiling Conv_0

Refer to Documents for more details. 这个链接失效了

这个参考文档失效啦，可以提供一下吗

Compile failed: fatal error: Python.h: No such file or directory

First, thanks for your work.

When I compiled openppl on docker, it returns an error:

/root/ppl.nn/deps/pybind11/include/pybind11/detail/common.h:186:10: fatal error: Python.h: No such file or directory

My base image environment:

os: Ubuntu20.04
python: 3.8.10

I tried to install libpython3.8-dev, then locate Python.h in /usr/include/python3.8/. After that, I tried to recompile and still report an error Python.h: No such file or directory
However, I have successfully compiled openppl on my host, host environment:

os: Ubuntu16.04
chip: x64
Pybind has been preinstalled

Next, I tried to pip install pybind and delete hpcc declare pybind function in deps.cmake, it still same error. And maybe some packages' versions are too old.

So, could you release a docker image or help me solve this problem?
Thanks!

ubuntu 18.04 compiler error and fixed

when I compile the lastest code in
ubuntu 18.04
cmake 3.19.3

using ./build.sh -DHPCC_USE_OPENMP=ON

it shows error like this:

and when I add the header #include< cmath> in file src/ppl/nn/engines/x86/impls/test/utils/check.h , it works fine.

unsupported op with opset14

When I convert a pytorch model into onnx model with opset14 and then run with ppl (X86), I get the following error:

[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:175] unsupported op: domain[], type[Add], version[14]
[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:202] ParseNodeInfo for node[Add_2] failed: unsupported
[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2022-02-18 21:37:08.903][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2022-02-18 21:37:08.904][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2022-02-18 21:37:08.904][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported

After I changed the opset from 14 to 11, the error above was gone but I got a segmentation error with no information..

core dump while use pplnn for x86 benchmarking

System Info

OS: Ubuntu 16.04
Compiler: GCC-8.2
CPU: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
The ONNX model uses opset11 which can run using TensorRT and ONNXRuntime

Error Message as follows:

(perf) λ 96fbdb7483e1 /work/github/ppl.nn/pplnn-build/tools {master} bash test.sh
[INFO][2022-01-29 02:22:15.166][pplnn.cc:1053] ppl.nn version: cf85289
[INFO][2022-01-29 02:22:15.210][pplnn.cc:332] ***** register X86Engine *****
[INFO][2022-01-29 02:22:15.294][engine_graph_partitioner.cc:103] total partition(s) of graph[paddle-onnx]: 1.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
test.sh: line 8: 56259 Aborted (core dumped) ./pplnn --use-x86 --onnx-model ./models/MobileNetV1.onnx --mm-policy mem --enable-profiling --min-profiling-seconds 10 --warmup-iterations

cmake error: could not find git for clone of hpcc-populate

Hi @openppl-public, I try to compile ppl.nn on ubuntu 20.04, gcc 9.3.0, cuda 11.3, cmake 3.16.3 and got the following error:


# ./build.sh -DHPCC_USE_CUDA=ON
mkdir: cannot create directory '/workspace/github/ppl.nn/pplnn-build': File exists
cmd -> cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/workspace/github/ppl.nn/pplnn-build/install -DHPCC_USE_CUDA=ON .. && make -j24 && make install
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Populating hpcc
CMake Error at /usr/share/cmake-3.16/Modules/ExternalProject.cmake:2421 (message):
  error: could not find git for clone of hpcc-populate
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/ExternalProject.cmake:3236 (_ep_add_download_command)
  CMakeLists.txt:13 (ExternalProject_Add)


-- Configuring incomplete, errors occurred!
See also "/workspace/github/ppl.nn/deps/hpcc-subbuild/CMakeFiles/CMakeOutput.log".
CMake Error at /usr/share/cmake-3.16/Modules/FetchContent.cmake:903 (message):
  CMake step for hpcc failed: 1
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/FetchContent.cmake:1006 (__FetchContent_directPopulate)
  cmake/deps.cmake:23 (FetchContent_Populate)
  CMakeLists.txt:27 (include)


-- Configuring incomplete, errors occurred!
See also "/workspace/github/ppl.nn/pplnn-build/CMakeFiles/CMakeOutput.log".

How do I fix it?

Thanks

按官方的mask_rcnn sample执行出错

如题，
mmdetection转mask_rcnn.onnx一切正常，然后使用ppl的python sample推理mask_rcnn模型时报错

[less_kernel.cc:81] unsupported data_type: FLOAT64

简单调试了一下，输入数据是fp32的，难道是转出来的onnx是float64？但是官方不是跑通了的么

Centos cuda run error(cuda version is 11.0)

when I run "./pplnn-build/tools/pplnn --use-cuda --onnx-model tests/testdata/conv.onnx" , core dump occurred

cmake应该怎样引入pplnn?

我想单独编译cpp示例代码,

文件结构:

  |--build
  |-- classification.cpp
  |-- CMakeLists.txt
  |-- libs
  |-- main.cpp

CmakeList.txt这样写的:

cmake_minimum_required(VERSION 3.0.0)
project(ppl_classify VERSION 0.1.0)

find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${PPLNN_INCLUDE_DIRECTORIES})

add_executable(classification classification.cpp)
target_link_libraries(classification PUBLIC pplnn_static ${OpenCV_LIBS})

执行cmake ..,
但是第26行代码这里还会报错:ppl/nn/models/onnx/onnx_runtime_builder_factory.h' file not foundclang(pp_file_not_found)

ppl.nn/samples/cpp/run_model/classification.cpp

Lines 26 to 28 in 363080c

 #include "ppl/nn/models/onnx/onnx_runtime_builder_factory.h" 

 #include "ppl/nn/engines/x86/engine_factory.h" 

 #include "ppl/nn/engines/x86/x86_engine_options.h"

感觉是CMakeLists.txt写的不对,请问CMakeLists.txt应该怎样写?

compile error on Mac os

clang = 12.0
when running build.sh, errors occur.
`-- Configuring done

-- Generating done

-- Build files have been written to: /Users/bytedance/Desktop/ppl.nn/pplnn-build

1 #!/bin/bash

Scanning dependencies of target pplcommon_static

1 cmake_minimum_required(VERSION 3.11)

Consolidate compiler generated dependencies of target pplcommon_static

[ 1%] Building CXX object ppl.common-build/CMakeFiles/pplcommon_static.dir/src/ppl/common/log.cc.o

In file included from /Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:1:

/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:54:17: error: class member cannot be redeclared
LogMessage& operator<<(long long ll);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:51:17: note: previous declaration is here
LogMessage& operator<<(int64_t i64);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:55:17: error: class member cannot be redeclared
LogMessage& operator<<(unsigned long long ull);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:52:17: note: previous declaration is here
LogMessage& operator<<(uint64_t u64);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:75:1: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat]
DEF_READ_OPERATOR_FUNC(int64_t, "%ld");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%lld
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:61:44: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
auto len = snprintf(buf, 128, fmt, value);
~~~ ^~~~~
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:76:1: warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
DEF_READ_OPERATOR_FUNC(uint64_t, "%lu");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%llu
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:61:44: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
auto len = snprintf(buf, 128, fmt, value);
~~~ ^~~~~
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:78:1: error: redefinition of 'operator<<'
DEF_READ_OPERATOR_FUNC(long long, "%lld");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:75:1: note: previous definition is here
DEF_READ_OPERATOR_FUNC(int64_t, "%ld");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:79:1: error: redefinition of 'operator<<'
DEF_READ_OPERATOR_FUNC(unsigned long long, "%llu");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:76:1: note: previous definition is here
DEF_READ_OPERATOR_FUNC(uint64_t, "%lu");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
2 warnings and 4 errors generated.
make[2]: *** [ppl.common-build/CMakeFiles/pplcommon_static.dir/src/ppl/common/log.cc.o] Error 1
make[1]: *** [ppl.common-build/CMakeFiles/pplcommon_static.dir/all] Error 2
make: *** [all] Error 2`
Should I delete all the redeclared class member?

Why Winograd 4x3 use 6 x 2 register blocking not 4 x 3 register blocking in PPL

As the title describe, why winograd not use 4x3 reigster blocking ?
#define TILE_RF_CNT() 6
#define OC_RF_CNT() 2

Thanks.

Model in onnx model zoo can not run in ppl.nn

Hi folks. I tried running models from onnx model zoo and find out a lots of pretrained model can not run with ppl.nn, the error is varies across models, eg

./pplnn-build/tools/pplnn --onnx-model resnet50-v1-7.onnx
[INFO][2021-07-04 04:43:10.810][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[WARNING][2021-07-04 04:43:12.138][engine.cc:192] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-04 04:43:12.138][pplnn.cc:88] ***** register CudaEngine *****
[ERROR][2021-07-04 04:43:12.211][model_parser.cc:46] unsupported opset [:8]
[ERROR][2021-07-04 04:43:12.214][runtime_builder_impl.cc:33] parse graph failed: unsupported
[ERROR][2021-07-04 04:43:12.214][onnx_runtime_builder_factory.cc:42] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-04 04:43:12.219][pplnn.cc:697] create OnnxRuntimeBuilder failed.

or

./pplnn-build/tools/pplnn --onnx-model efficientnet-lite4-11.onnx
[INFO][2021-07-04 04:40:22.270][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[WARNING][2021-07-04 04:40:23.590][engine.cc:192] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-04 04:40:23.590][pplnn.cc:88] ***** register CudaEngine *****
[ERROR][2021-07-04 04:40:24.453][simple_graph_partitioner.cc:73] cannot find implementation of op[:MatMul]
[ERROR][2021-07-04 04:40:24.453][utils.cc:412] partitioning graph[tf2onnx] failed: not found
[ERROR][2021-07-04 04:40:24.453][runtime_builder_impl.cc:39] process graph failed: not found
[ERROR][2021-07-04 04:40:24.453][onnx_runtime_builder_factory.cc:42] init OnnxRuntimeBuilder failed: not found
[ERROR][2021-07-04 04:40:24.455][pplnn.cc:697] create OnnxRuntimeBuilder failed.

...

how's the restriction with onnx and do you have a support matrix about the operators?

how to build on Mac m1 Pro?

how to build on Mac m1 Pro? what's the configs in build.sh

mmdetection model Faster_rcnn failed with pplnn

I followed the document of converting model with openmmp and generated pretrained faster_rcnn model , but when I used x86 pplnn to run this model, there is an error says:

[INFO][2021-07-06 08:48:49.498][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-06 08:48:49.498][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-06 08:48:49.761][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[ERROR][2021-07-06 08:48:50.556][kernel.cc:14] reshape kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][kernel.cc:47] BeforeExecute() of kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][scheduler_common.cc:153] exec kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][sequential_scheduler.cc:99] execute kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][pplnn.cc:784] Run() failed: invalid value

The mobilenet can execute successfully. Can I do anything to make this model execute right?

About build time with cuda

I try to build ppl.nn on server, seems the compile process stuck at

...
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/PPLKernelX86.dir/src/ppl/kernel/x86/int64/reorder/reorder_n16cx_ndarray_int64_avx.cpp.o
[ 50%] Linking CXX static library libPPLKernelX86.a
[ 50%] Built target PPLKernelX86
Scanning dependencies of target test_conv2d
Scanning dependencies of target test_fc
Scanning dependencies of target test_gemm
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_fc.dir/test/test_fc.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_fc.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_conv2d.dir/test/test_conv2d.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_gemm.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_gemm.dir/test/test_gemm.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_conv2d.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Linking CXX executable test_fc
[ 50%] Built target test_fc
[ 50%] Linking CXX executable test_conv2d
[ 51%] Linking CXX executable test_gemm
[ 51%] Built target test_conv2d
[ 51%] Built target test_gemm

I observe that it only use 1 core of the CPU, and the memory usage is very high(99% on 128GB), the compile process took hours freeze in test_gemm, does it take a very long time to build？

vector out of rang debug for cuda

ppl.nn\src\ppl\nn\engines\cuda\kernel.cc 147 行
when Flatten_ OP GetDim maybe out of vector range

Mask R-CNN failed with pplnn

The model was conveted from mmdetection library. And when I try to execute with pplnn, it shows error:

[INFO][2021-07-14 17:18:19.999][pplnn.cc:703] ppl.nn version: 5d56662bf5a288898f0dd5b90f763459cc86f47a
[WARNING][2021-07-14 17:18:21.873][engine.cc:209] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-14 17:18:21.873][pplnn.cc:104] ***** register CudaEngine *****
[INFO][2021-07-14 17:18:22.320][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
[ERROR][2021-07-14 17:18:22.322][reshape_reshape.cc:66] infer shape failed.
[ERROR][2021-07-14 17:18:22.338][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.339][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.340][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.341][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.341][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.343][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.343][reshape_unsqueeze.cc:36] axes overflow.
[ERROR][2021-07-14 17:18:22.343][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[INFO][2021-07-14 17:18:22.346][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export1]: 1.
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:204] Create 2 TensorImpl
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:316] added 2 new bridge kernels
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:478] deleted 1 bridge kernels
[INFO][2021-07-14 17:18:22.347][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export2]: 1.
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:204] Create 20 TensorImpl
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:316] added 21 new bridge kernels
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:478] deleted 14 bridge kernels
[ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.348][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.391][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
[ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
[INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export3]: 1.
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:204] Create 2 TensorImpl
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:316] added 2 new bridge kernels
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:478] deleted 1 bridge kernels
[INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export4]: 1.
[INFO][2021-07-14 17:18:22.393][opt_graph.cc:204] Create 20 TensorImpl
[INFO][2021-07-14 17:18:22.393][opt_graph.cc:316] added 21 new bridge kernels
[INFO][2021-07-14 17:18:22.408][opt_graph.cc:478] deleted 14 bridge kernels
[ERROR][2021-07-14 17:18:22.408][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.409][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.413][reshape_split.cc:59] splited axis and sum of split point not match.
[INFO][2021-07-14 17:18:22.426][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export5]: 1.
[ERROR][2021-07-14 17:18:22.426][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.429][reshape_concat.cc:42] input shape not match.
[INFO][2021-07-14 17:18:22.429][opt_graph.cc:204] Create 135 TensorImpl
[INFO][2021-07-14 17:18:22.430][opt_graph.cc:316] added 174 new bridge kernels
[INFO][2021-07-14 17:18:22.433][opt_graph.cc:478] deleted 153 bridge kernels
[INFO][2021-07-14 17:18:22.434][opt_graph.cc:204] Create 2263 TensorImpl
[INFO][2021-07-14 17:18:22.660][opt_graph.cc:316] added 2626 new bridge kernels
[INFO][2021-07-14 17:20:05.963][opt_graph.cc:478] deleted 2547 bridge kernels
[ERROR][2021-07-14 17:20:06.007][scheduler_common.cc:170] exec kernel[Pad_146] failed: invalid value
[ERROR][2021-07-14 17:20:06.007][sequential_scheduler.cc:116] execute kernel[Pad_146] failed: invalid value
[ERROR][2021-07-14 17:20:06.007][pplnn.cc:804] Run() failed: invalid value

I'm running it with true image data. Dose that pplnn support maskrcnn, or what should I do to execute it suceessfully?
Thanks a lot!
The model was generated by this command:

python ../tools/deployment/pytorch2onnx.py ../configs/mask_rcnn/mask_rcnn_r50_fpn_mstrain-poly_3x_coco.py \
mask_rcnn_r50_fpn_mstrain-poly_3x_coco_20210524_201154-21b550bb.pth \
--output-file mask_rcnn.onnx --simplify --dynamic-export

pplnn failed with CUDA

I tried CUDA10.2 and CUDA11.0, both failed at
"[ERROR][2021-07-05 06:38:36.723][buffered_cuda_allocator.cc:91] cuMemAddressReserve failed: operation not supported"
what should I do to avoid this error?

My environment is Centos8 gcc8.4, and the gpu device is Tesla T4.
BTW the x86 version can run sucessfully.

PPL is not support your GPU device right now

[INFO][2021-09-28 03:27:34.466][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-09-28 03:27:34.466][opt_graph.cc:202] Create 4 TensorImpl
[INFO][2021-09-28 03:27:34.466][opt_graph.cc:313] added 4 new bridge kernels
[ERROR][2021-09-28 03:27:34.467][opt_graph.cc:472] PPL is not support your GPU device right now.
[ERROR][2021-09-28 03:27:34.467][opt_graph.cc:630] Selec algos for each kernel failed: unsupported
[ERROR][2021-09-28 03:27:34.467][engine.cc:58] OptGraph DoOptimeize failed: unsupported
[ERROR][2021-09-28 03:27:34.467][engine.cc:68] DoOptimize failed: unsupported
[ERROR][2021-09-28 03:27:34.467][utils.cc:257] process graph[torch-jit-export] by engine[cuda] failed: unsupported
[ERROR][2021-09-28 03:27:34.467][utils.cc:467] GenPartitionsInfo failed:unsupported
[ERROR][2021-09-28 03:27:34.467][runtime_builder_impl.cc:51] process graph failed: unsupported
[ERROR][2021-09-28 03:27:34.467][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
ERROR: create RuntimeBuilder failed.

Lower performance compiled with clang (Darwin)

Continued with work #20:
I tried with iMac 2018 (Intel Core i7 [email protected]), it shows an unnormal performance compared with other inference engine (openvino / TNN):

Model: MobileV1
Input: images with [3, 224, 224]
TNN / openvino runs with 10~30ms per image (already warmed up)
pplnn runs with 100+ms per image.

$ ./pplnn-build/tools/pplnn --onnx-model data/mobilenet_1.0_224.onnx \
                --reshaped-inputs data/input-1_3_224_224-fp32.dat \
                --mm-policy perf \
                --warmuptimes 100 \
                --core-binding \
                --enable-profiling
[INFO][2021-07-23 14:16:09.793][pplnn.cc:708] ppl.nn version: eb685a9da839b3c74b4c1e36b571c4c652cfba0c
[INFO][2021-07-23 14:16:09.802][pplnn.cc:131] ***** register X86Engine *****
[INFO][2021-07-23 14:16:09.861][simple_graph_partitioner.cc:107] total partition(s) of graph[./mobilenet_1.0_224.onnx]: 1.
[INFO][2021-07-23 14:16:10.200][pplnn.cc:548] ----- input info -----
[INFO][2021-07-23 14:16:10.200][pplnn.cc:551] input[0]:
[INFO][2021-07-23 14:16:10.200][pplnn.cc:552]     name: input
[INFO][2021-07-23 14:16:10.200][pplnn.cc:559]     dim(s): 1 3 224 224
[INFO][2021-07-23 14:16:10.200][pplnn.cc:561]     DataType: FLOAT32
[INFO][2021-07-23 14:16:10.200][pplnn.cc:562]     DataFormat: NDARRAY
[INFO][2021-07-23 14:16:10.200][pplnn.cc:563]     NumBytesIncludePadding: 602112
[INFO][2021-07-23 14:16:10.200][pplnn.cc:564]     NumBytesExcludePadding: 602112
[INFO][2021-07-23 14:16:10.200][pplnn.cc:567] ----- output info -----
[INFO][2021-07-23 14:16:10.200][pplnn.cc:570] output[0]:
[INFO][2021-07-23 14:16:10.200][pplnn.cc:571]     name: prob_Y
[INFO][2021-07-23 14:16:10.200][pplnn.cc:578]     dim(s): 1 1000 1 1
[INFO][2021-07-23 14:16:10.200][pplnn.cc:580]     DataType: FLOAT32
[INFO][2021-07-23 14:16:10.200][pplnn.cc:581]     DataFormat: NDARRAY
[INFO][2021-07-23 14:16:10.200][pplnn.cc:582]     NumBytesIncludePadding: 4000
[INFO][2021-07-23 14:16:10.200][pplnn.cc:583]     NumBytesExcludePadding: 4000
[INFO][2021-07-23 14:16:10.200][pplnn.cc:586] ----------------------
[INFO][2021-07-23 14:16:10.200][pplnn.cc:820] Run() costs: 232.927002 ms.
[INFO][2021-07-23 14:16:10.200][pplnn.cc:828] Run ok
[INFO][2021-07-23 14:16:10.200][pplnn.cc:832] Warm up start for 100 times.
[INFO][2021-07-23 14:16:20.537][pplnn.cc:839] Warm up end.
[INFO][2021-07-23 14:16:20.537][pplnn.cc:847] Profiling start
[INFO][2021-07-23 14:16:21.546][pplnn.cc:863] Duration: 1009.201000 ms
[INFO][2021-07-23 14:16:21.546][pplnn.cc:873] Average run cost: 112.133444 ms.
[INFO][2021-07-23 14:16:21.546][pplnn.cc:876] Profiling End

I've done profiling with tools/pplnn, it shows that pplnn spend 85% time on nopw instruction in the conv2d of MobileV1.
The source code is src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/conv2d/gemm_direct/fma/conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp

objdump shows:

$ objdump -D conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.o | wc -l
   17796
$ objdump -D conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.o | grep nopw | wc -l
    8931

conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.S:
...
    4e3a: 4c 89 e0                     	movq	%r12, %rax
    4e3d: 48 8b 5f 18                  	movq	24(%rdi), %rbx
    4e41: 4c 8b 16                     	movq	(%rsi), %r10
    4e44: 49 83 fa 10                  	cmpq	$16, %r10
    4e48: 0f 8c 87 b9 00 00            	jl	0x107d5 <__ZN3ppl6kernel3x8647conv2d_n16cx_gemm_direct_fp32_fma_blk1x6_kernelILb0ELi16ELi6EEEvPKxS4_+0xbab5>
    4e4e: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    4e58: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    4e62: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
... (nopw 4e63 - ffeb) 
    ffec: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    fff6: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
   10000: c5 7c 10 33                  	vmovups	(%rbx), %ymm14
   10004: c5 7c 10 7b 20               	vmovups	32(%rbx), %ymm15
...
... (nopw 157de - 1fff4)

I think may be clang tried to align the instructions block but it's too large for this kernel, and I didn't find any potential clang options that may causing this problem (yeah, I've tried with -O0 and only compile this file).

Any ideas?

error: no matching function for call to ‘ppl::nn::OnnxRuntimeBuilderFactory::Create

在安装mmdeploy过程中执行cmake --build . -- -j$(nproc) && cmake --install .报错:
/home/zcc/mmdeploy/csrc/net/ppl/ppl_net.cpp:77:89: error: no matching function for call to ‘ppl::nn::OnnxRuntimeBuilderFactory::Create(char*, std::__cxx11::basic_string::size_type, ppl::nn::Engine**, std::vectorppl::nn::Engine*::size_type)’
onnx.data(), onnx.size(), engines.data(), engines.size())));
^

[Improvement] print run time on average single inference

Hi folks, just a suggestion here:
in https://github.com/openppl-public/ppl.nn/blob/master/docs/en/cuda-doc/benchmark_tool.md

The running cost is shown in log as following:

Run() costs: *** ms.
The average running time for once reasoning is running cost divided by the number of running times you set.

it's kind of confusing, so I think it would be better to print the average running time directly.

Thanks

Will ppl.nn x86 support int8 inference?

As can be seen in openvino benchmark, int8 inference achieves better performance than fp32 on intel CPU. Will ppl.nn x86 support int8 inference?

shape support of BatchNorm

Current BatchNorm implementation only supports 4-dim inputs, while ONNX operator specification supports 2-dim inputs.
Is it by design?

https://github.com/openppl-public/ppl.nn/blob/75af6eedf96aa0bcd07941db668a9c172a2e3478/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/batchnorm/batchnorm_normal_fp32_avx.cpp

I fixed it locally like #108, which works for my scenario.

windows编译出错

您好：
在windows10 vs2017下编译出错出现未定义错误
E0020 未定义标识符 "_mm512_floor_ps" pplkernelx86_static ppl.nn\src\ppl\nn\engines\x86\impls\src\ppl\kernel\x86\common\math_avx512.h 103

Infer on vgg16 with batch size 32 took more than 1 hour before run actual inference

Step to reproduce

./pplnn-build/tools/pplnn --in-shapes 32_3_224_224 --dims 32_3_224_224 --warmuptimes 200 --runningtimes 200 --onnx-model vgg16.onnx
[INFO][2021-07-05 08:31:30.885][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[INFO][2021-07-05 08:31:32.207][pplnn.cc:88] ***** register CudaEngine *****
[INFO][2021-07-05 08:31:32.940][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 08:31:33.295][opt_graph.cc:187] Create 71 TensorImpl
[INFO][2021-07-05 08:31:33.295][opt_graph.cc:299] added 56 new bridge kernels
[INFO][2021-07-05 09:46:30.989][opt_graph.cc:461] deleted 52 bridge kernels
[INFO][2021-07-05 09:46:46.325][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 09:46:46.326][pplnn.cc:526] input[0]:
[INFO][2021-07-05 09:46:46.326][pplnn.cc:527]     name: input.1
[INFO][2021-07-05 09:46:46.326][pplnn.cc:534]     dim(s): 32 3 224 224
[INFO][2021-07-05 09:46:46.326][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 09:46:46.326][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 09:46:46.326][pplnn.cc:538]     NumBytesIncludePadding: 19267584
[INFO][2021-07-05 09:46:46.326][pplnn.cc:539]     NumBytesExcludePadding: 19267584
[INFO][2021-07-05 09:46:46.326][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 09:46:46.326][pplnn.cc:545] output[0]:
[INFO][2021-07-05 09:46:46.326][pplnn.cc:546]     name: 70
[INFO][2021-07-05 09:46:46.326][pplnn.cc:553]     dim(s): 32 1000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 09:46:46.326][pplnn.cc:556]     DataFormat: NDARRAY
[INFO][2021-07-05 09:46:46.326][pplnn.cc:557]     NumBytesIncludePadding: 128000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:558]     NumBytesExcludePadding: 128000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:561] ----------------------
[INFO][2021-07-05 09:46:46.326][pplnn.cc:791] Run() costs: 9175.929688 ms.
[INFO][2021-07-05 09:46:46.326][pplnn.cc:799] Run ok

As shown in log, the time start on 08:31 and start inference on 09:46, took 75 minutes to prepare. Is it normal？the model was import from torchvison and export to onnx

import torchvision
dummy_input = torch.randn(32, 3, 224, 224)
model = torchvision.models.vgg16(pretrained = True)
model.eval()
torch.onnx.export(model, dummy_input, "vgg16.onnx", opset_version=11)

Also, test with batch size = 1, the time is pretty normal.

# ./pplnn-build/tools/pplnn --onnx-model vgg16.onnx --in-shapes 1_3_224_224 --dims 1_3_224_224 --warmuptimes 100 --runningtimes 100
[INFO][2021-07-05 05:21:44.428][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[INFO][2021-07-05 05:21:46.437][pplnn.cc:88] ***** register CudaEngine *****
[INFO][2021-07-05 05:21:47.230][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 05:21:47.511][opt_graph.cc:187] Create 71 TensorImpl
[INFO][2021-07-05 05:21:47.511][opt_graph.cc:299] added 56 new bridge kernels
[INFO][2021-07-05 05:24:30.634][opt_graph.cc:461] deleted 52 bridge kernels
[INFO][2021-07-05 05:24:31.300][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 05:24:31.300][pplnn.cc:526] input[0]:
[INFO][2021-07-05 05:24:31.300][pplnn.cc:527]     name: input.1
[INFO][2021-07-05 05:24:31.300][pplnn.cc:534]     dim(s): 1 3 224 224
[INFO][2021-07-05 05:24:31.300][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 05:24:31.300][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 05:24:31.300][pplnn.cc:538]     NumBytesIncludePadding: 602112
[INFO][2021-07-05 05:24:31.300][pplnn.cc:539]     NumBytesExcludePadding: 602112
[INFO][2021-07-05 05:24:31.300][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 05:24:31.300][pplnn.cc:545] output[0]:
[INFO][2021-07-05 05:24:31.300][pplnn.cc:546]     name: 70
[INFO][2021-07-05 05:24:31.300][pplnn.cc:553]     dim(s): 1 1000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 05:24:31.300][pplnn.cc:556]     DataFormat: NDARRAY
[INFO][2021-07-05 05:24:31.300][pplnn.cc:557]     NumBytesIncludePadding: 4000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:558]     NumBytesExcludePadding: 4000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:561] ----------------------
[INFO][2021-07-05 05:24:31.300][pplnn.cc:791] Run() costs: 344.269989 ms.
[INFO][2021-07-05 05:24:31.300][pplnn.cc:799] Run ok

linux平台编译问题

pplnn tlinux build.sh遇到的问题：

pplnn_cpu/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/common/simd_tools.cpp:15:45: 错误：‘_MM_DENORMALS_ZERO_ON’在此作用域中尚未声明
from /home/shiweifan/qiao/pplnn_cpu/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/arithmetic/sse/arithmetic_fp32_sse.cpp:1:
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/nmmintrin.h:31:3: 错误：#error "SSE4.2 instruction set not enabled"
修改cmakelist.txt:
if(CMAKE_COMPILER_IS_GNUCXX)
add_compile_options(-msse4.2)
message(STATUS "optional:-msse4.2")
endif(CMAKE_COMPILER_IS_GNUCXX)
遇到的问题：
Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/PPLKernelX86.dir/src/ppl/kernel/x86/fp32/arithmetic/avx/arithmetic_fp32_avx.cpp.o
c++: 错误：unrecognized command line option ‘-mtune-ctrl=256_unaligned_load_optimal,256_unaligned_store_optimal’
咨询一下可有类似问题，如何解决。

Win32 cmake 报错

Win64 编译正常，但Win32 cmake 无法生成项目文件。
尝试更新cmake到最新版本，错误依旧存在。

PS D:\ppl.nn> cmake --version
cmake version 3.23.0-rc2

命令及报错如下：

What are the problems?(snapshots or detailed error messages)

PS D:\ppl.nn> .\build.bat -G "Visual Studio 16 2019" -A Win32 -DHPCC_USE_X86_64=ON

D:\ppl.nn>md pplnn-build

D:\ppl.nn>cd pplnn-build

D:\ppl.nn\pplnn-build>cmake -G "Visual Studio 16 2019" -A Win32 -DHPCC_USE_X86_64=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ..
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19043.
-- The C compiler identification is MSVC 19.29.30038.1
-- The CXX compiler identification is MSVC 19.29.30038.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x86/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x86/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Populating hpcc
CMake Error: Error: generator platform: Win32
Does not match the platform used previously:
Either remove the CMakeCache.txt file and CMakeFiles directory or choose a different binary directory.
CMake Error at C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1076 (message):
  CMake step for hpcc failed: 1
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1217:EVAL:2 (__FetchContent_directPopulate)
  C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1217 (cmake_language)
  cmake/deps.cmake:59 (FetchContent_Populate)
  CMakeLists.txt:39 (include)


-- Configuring incomplete, errors occurred!
See also "D:/ppl.nn/pplnn-build/CMakeFiles/CMakeOutput.log".

D:\ppl.nn\pplnn-build>cmake --build . -j --config Release
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.10.2+857e5a733
版权所有(C) Microsoft Corporation。保留所有权利。

MSBUILD : error MSB1009: 项目文件不存在。
开关:ALL_BUILD.vcxproj

D:\ppl.nn\pplnn-build>cmake --build . --target install -j --config Release
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.10.2+857e5a733
版权所有(C) Microsoft Corporation。保留所有权利。

MSBUILD : error MSB1009: 项目文件不存在。
开关:install.vcxproj

D:\ppl.nn\pplnn-build>cd ..
PS D:\ppl.nn>

windows cuda run error

when I run benchmark example (pplnn), cuda compile code failed
$_4IZUYSC7PT A2R~{((Q0IA$

Can it be deployed on mobile devices? How long does it take?

如题。移动端部署效率怎样，耗时怎样？该版本是否为阉割版?

centernet runs with memory error.

My gpu is Tesla T4, and sample model runs normally.
When I use centernet with --mm-policy=mem, it turns out erorr like this, but it can get an output.

WHen I use --mm-policy=perf, it gets error out of memory like this:

It seems they both end with memory error, is this error familiar to your team, or how can I avoid this error?

Does ppl.nn support fp16 or int8?

Just want to know if ppl.nn support half-precision or int8 inference.

Looking forward to your reply.

No module named 'pyppl'

Ubuntu 16.04.6
cmake: 3.16.6
Anaconda virtual enviroment: pytorch 1.7+cu101

first git clone https://github.com/openppl-public/ppl.nn.git

Then ./build.sh -DHPCC_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON

Finally run the command PYTHONPATH=./pplnn-build/install python ./samples/python/maskrcnn_onnx/run_maskrcnn_onnx.py

it raise the error:

how to find the module named 'pyppl'

	#include "ppl/nn/models/onnx/onnx_runtime_builder_factory.h"
	#include "ppl/nn/engines/x86/engine_factory.h"
	#include "ppl/nn/engines/x86/x86_engine_options.h"