Git Product home page Git Product logo

ppl.nn's People

Contributors

a1trl9 avatar alcanderian avatar czhappy avatar gilsaia avatar grimoire avatar haojiangzheng avatar icthu avatar jianfei-wangg avatar jiaomingjun avatar johnxusjtu avatar kyu-junyi avatar litianjian avatar liuhaoss avatar lzhangzz avatar mochiset avatar moran232 avatar nihui avatar openppl-public avatar ouonline avatar si-xu avatar sunnyligithub avatar tangyanf avatar violetevergardenyyh avatar watersounds avatar xupinjie avatar ysh329 avatar yxpandjay avatar zchrissirhcz avatar zhiqwang avatar zichentian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ppl.nn's Issues

git建议

提个建议,能不能把build过程中在线git clone hpcc的方式改成离线的,目前这种方式对断网的内网环境有点不太友好

The inference performance of pplnn seems inconsistent for the same model

Hi guys! I used pplnn to test inference performance for resnet50 model, but I got definitely different results when I performed pplnn five times consecutively. The average latency of a single query ranges from 0.37ms to 2.03ms, which is unexpected and makes me confused.

Device: RTX 3090
CUDA: 11.1
Command:

pplnn --warmuptimes 100 --runningtimes 100 --enable-profiling --dims 1_3_224_224 --in-shapes 1_3_224_224 --onnx-model model.onnx

Logs:
resnet50 result

PRelu、Dropout、Upsample ParseNodeInfo for node unsupported

(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:21:45.381][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:21:45.381][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:169] can not find param parser info of type[:PRelu]
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:21:45.384][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:21:45.384][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:21:45.385][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:21:45.385][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:21:45.385][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:49:47.035][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:49:47.035][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:169] can not find param parser info of type[:Neg]
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:49:47.039][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:49:47.039][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:49:47.039][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:49:47.039][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:49:47.040][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:55:12.258][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:55:12.258][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:169] can not find param parser info of type[:Dropout]
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:55:12.266][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:55:12.266][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:55:12.267][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:55:12.267][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:55:12.268][pplnn.cc:714] create OnnxRuntimeBuilder failed.
(base) [qiao@VM-238-190-centos ~/ppl.nn]$ ./pplnn-build/tools/pplnn --onnx-model ~/test.onnx --mm-policy mem --min-profiling-time 10 --warmuptimes 5 --core-binding --disable-avx512
[INFO][2021-07-08 20:58:17.767][pplnn.cc:700] ppl.nn version: 513e612
[INFO][2021-07-08 20:58:17.767][pplnn.cc:127] ***** register X86Engine *****
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:169] can not find param parser info of type[:Upsample]
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:195] ParseNodeInfo for node[] failed: unsupported
[ERROR][2021-07-08 20:58:17.770][graph_parser.cc:262] ParseGraphNode failed.
[ERROR][2021-07-08 20:58:17.770][model_parser.cc:76] parse graph failed: unsupported
[ERROR][2021-07-08 20:58:17.770][runtime_builder_impl.cc:50] parse graph failed: unsupported
[ERROR][2021-07-08 20:58:17.771][onnx_runtime_builder_factory.cc:59] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-08 20:58:17.771][pplnn.cc:714] create OnnxRuntimeBuilder failed.

您好,请问PRelu、Dropout、Upsample这些网络层是onnx模型文件问题还是目前不支持呢?

[x86-compile] error: impossible constraint in ‘asm’

I try to compile the latest master.

CPU result
Core i5-9500(not support avx512) error: impossible constraint in ‘asm’
Xeon 6130(support avx512) pass

I find that latest commit supports AVX-512. If it is a bug, will ppl support more CPU(no avx512) and any macro to separate AVX-512 codes?
Thanks.

WSL complie error: failed to convert GOTPCREL relocation; relink with --no-relax

I want to complie the classification model under the path: ppl.nn/pplnn-build/samples/cpp/run_model, but get this issue.
the complie log is :
`
[ 1%] Built target pplcommon_static
Consolidate compiler generated dependencies of target PPLCUDAKernel
[ 10%] Built target PPLCUDAKernel
[ 18%] Built target libprotobuf
[ 43%] Built target PPLKernelX86
Consolidate compiler generated dependencies of target pplnn_static
[100%] Built target pplnn_static
Consolidate compiler generated dependencies of target classification
[100%] Linking CXX executable classification
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
samples/cpp/run_model/CMakeFiles/classification.dir/build.make:150: recipe for target 'samples/cpp/run_model/classification' failed
make[2]: *** [samples/cpp/run_model/classification] Error 1
CMakeFiles/Makefile2:1040: recipe for target 'samples/cpp/run_model/CMakeFiles/classification.dir/all' failed
make[1]: *** [samples/cpp/run_model/CMakeFiles/classification.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

`
my cuda version is 11.0

How to quantization INT8 / FP16

How to quantization INT8 / FP16?

Define_string_opt("--quantization", g_flag_quantization, "", "declare **json file** saved quantization information");

what does the json file look like?

Why sample model tests/testdata/conv.onnx CANNOT be profiled ?

I build ppl.nn project, and try one test by ./pplnn-build/tools/pplnn --onnx-model tests/testdata/conv.onnx, it works normally,

[INFO][2021-07-05 21:47:39.764][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-05 21:47:39.764][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-05 21:47:39.764][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 21:47:39.764][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 21:47:39.764][pplnn.cc:526] input[0]:
[INFO][2021-07-05 21:47:39.764][pplnn.cc:527]     name: input
[INFO][2021-07-05 21:47:39.764][pplnn.cc:534]     dim(s): 1 3 4 4
[INFO][2021-07-05 21:47:39.764][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 21:47:39.764][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 21:47:39.764][pplnn.cc:538]     NumBytesIncludePadding: 192
[INFO][2021-07-05 21:47:39.764][pplnn.cc:539]     NumBytesExcludePadding: 192
[INFO][2021-07-05 21:47:39.764][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 21:47:39.764][pplnn.cc:545] output[0]:
[INFO][2021-07-05 21:47:39.764][pplnn.cc:546]     name: 5
[INFO][2021-07-05 21:47:39.764][pplnn.cc:553]     dim(s): 1 3 5 5
[INFO][2021-07-05 21:47:39.764][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 21:47:39.764][pplnn.cc:556]     DataFormat: N16CX
[INFO][2021-07-05 21:47:39.764][pplnn.cc:557]     NumBytesIncludePadding: 1600
[INFO][2021-07-05 21:47:39.764][pplnn.cc:558]     NumBytesExcludePadding: 300
[INFO][2021-07-05 21:47:39.764][pplnn.cc:561] ----------------------
[INFO][2021-07-05 21:47:39.764][pplnn.cc:791] Run() costs: 0.010000 ms.
[INFO][2021-07-05 21:47:39.764][pplnn.cc:799] Run ok

when I try to run in profile mode, It got stuck at some where and never return. 😅😅 code it too new to me, hard to find cause, pls help~

comand: ./pplnn-build/tools/pplnn --onnx-model tests/testdata/conv.onnx --enable-profiling --warmuptimes 3

[INFO][2021-07-05 21:52:35.459][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-05 21:52:35.459][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-05 21:52:35.459][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 21:52:35.459][pplnn.cc:526] input[0]:
[INFO][2021-07-05 21:52:35.459][pplnn.cc:527]     name: input
[INFO][2021-07-05 21:52:35.459][pplnn.cc:534]     dim(s): 1 3 4 4
[INFO][2021-07-05 21:52:35.459][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 21:52:35.459][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 21:52:35.459][pplnn.cc:538]     NumBytesIncludePadding: 192
[INFO][2021-07-05 21:52:35.459][pplnn.cc:539]     NumBytesExcludePadding: 192
[INFO][2021-07-05 21:52:35.459][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 21:52:35.459][pplnn.cc:545] output[0]:
[INFO][2021-07-05 21:52:35.459][pplnn.cc:546]     name: 5
[INFO][2021-07-05 21:52:35.459][pplnn.cc:553]     dim(s): 1 3 5 5
[INFO][2021-07-05 21:52:35.459][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 21:52:35.459][pplnn.cc:556]     DataFormat: N16CX
[INFO][2021-07-05 21:52:35.459][pplnn.cc:557]     NumBytesIncludePadding: 1600
[INFO][2021-07-05 21:52:35.459][pplnn.cc:558]     NumBytesExcludePadding: 300
[INFO][2021-07-05 21:52:35.459][pplnn.cc:561] ----------------------
[INFO][2021-07-05 21:52:35.459][pplnn.cc:791] Run() costs: 0.010000 ms.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:799] Run ok
[INFO][2021-07-05 21:52:35.459][pplnn.cc:803] Warm up start for 3 times.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:810] Warm up end.
[INFO][2021-07-05 21:52:35.459][pplnn.cc:818] Profiling start

Performance issue compared to MXNet

Hi, guys, I tested openppl with different batch size and compared its inference performance to MXNet. IMHO, if the batch size increase exponentially, the inference latency will increase near exponentially too. But the results in table below donot follow this rule. Therefore, I doubt that my test command is wrong, or the latency given by openppl is about a single sample instead of a single query?

By the way, I found that when --dim is specified, --in-shapes option does not seem to work properly.

Environment

MXNet: 1.6.0
OpenPPL: 7dd75a1
TensorRT: 8.0
Device: GTX 1080
CUDA: 10.2

The table below shows latency (ms) of ResNet50_v1b, FP32
OpenPPL command: pplnn --warmuptimes 400 --runningtimes 100 --enable-profiling --dims bs_3_224_224 --in-shapes bs_3_224_224 --onnx-model model.onnx
MXNet inference code: link

batch size 1 2 4 8 16 32
MXNet 5.814193 7.570517 11.836981 20.500102 36.853303 69.709606
OpenPPL 1.730683 1.818604 2.219570 2.222927 2.294234 2.958766
TensorRT 2.66706 4.01048 6.84147 12.4254 23.0325 43.8126

What does PPLNN stands for?

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network"

The first P stands for PPLNN , the second P stands for Primitive, is it correct?

execute engine run func, failed with error CUDA_ERROR_INVALID_IMAGE

log as follows:
[INFO][2021-11-03 10:37:40.451][simple_graph_partitioner.cc:108] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-11-03 10:37:40.456][opt_graph.cc:206] Create 206 TensorImpl
[INFO][2021-11-03 10:37:40.458][opt_graph.cc:317] added 171 new bridge kernels
[INFO][2021-11-03 10:37:40.483][algo_conv_hmma.cc:116] Compiling Conv_0
[INFO][2021-11-03 10:37:45.786][algo_conv_hmma.cc:116] Compiling Conv_4
[INFO][2021-11-03 10:37:56.366][algo_conv_hmma.cc:116] Compiling Conv_5
[INFO][2021-11-03 10:38:04.654][algo_conv_hmma.cc:116] Compiling Conv_9
[INFO][2021-11-03 10:38:11.969][algo_conv_hmma.cc:116] Compiling Conv_9
[INFO][2021-11-03 10:38:11.979][algo_conv_hmma.cc:116] Compiling Conv_10
[INFO][2021-11-03 10:38:19.334][algo_conv_hmma.cc:116] Compiling Conv_14
[INFO][2021-11-03 10:38:22.449][algo_conv_hmma.cc:116] Compiling Conv_14
[INFO][2021-11-03 10:38:22.458][algo_conv_hmma.cc:116] Compiling Conv_16
[INFO][2021-11-03 10:38:22.462][algo_conv_hmma.cc:116] Compiling Conv_20
[INFO][2021-11-03 10:38:22.471][algo_conv_hmma.cc:116] Compiling Conv_22
[INFO][2021-11-03 10:38:22.473][algo_conv_hmma.cc:116] Compiling Conv_26
[INFO][2021-11-03 10:38:25.715][algo_conv_hmma.cc:116] Compiling Conv_26
[INFO][2021-11-03 10:38:25.719][algo_conv_hmma.cc:116] Compiling Conv_27
[INFO][2021-11-03 10:38:32.929][algo_conv_hmma.cc:116] Compiling Conv_31
[INFO][2021-11-03 10:38:36.166][algo_conv_hmma.cc:116] Compiling Conv_31
[INFO][2021-11-03 10:38:36.169][algo_conv_hmma.cc:116] Compiling Conv_33
[INFO][2021-11-03 10:38:36.171][algo_conv_hmma.cc:116] Compiling Conv_37
[INFO][2021-11-03 10:38:36.178][algo_conv_hmma.cc:116] Compiling Conv_39
[INFO][2021-11-03 10:38:43.604][algo_conv_hmma.cc:116] Compiling Conv_43
[INFO][2021-11-03 10:38:45.240][algo_conv_hmma.cc:116] Compiling Conv_43
[INFO][2021-11-03 10:38:45.243][algo_conv_hmma.cc:116] Compiling Conv_44
[INFO][2021-11-03 10:38:46.794][algo_conv_hmma.cc:116] Compiling Conv_48
[INFO][2021-11-03 10:38:48.354][algo_conv_hmma.cc:116] Compiling Conv_48
[INFO][2021-11-03 10:38:48.357][algo_conv_hmma.cc:116] Compiling Conv_50
[INFO][2021-11-03 10:38:48.358][algo_conv_hmma.cc:116] Compiling Conv_54
[INFO][2021-11-03 10:38:48.361][algo_conv_hmma.cc:116] Compiling Conv_56
[INFO][2021-11-03 10:38:48.362][algo_conv_hmma.cc:116] Compiling Conv_60
[INFO][2021-11-03 10:38:50.086][algo_conv_hmma.cc:116] Compiling Conv_60
[INFO][2021-11-03 10:38:50.090][algo_conv_hmma.cc:116] Compiling Conv_61
[INFO][2021-11-03 10:38:51.999][algo_conv_hmma.cc:116] Compiling Conv_65
[INFO][2021-11-03 10:38:54.211][algo_conv_hmma.cc:116] Compiling Conv_67
[INFO][2021-11-03 10:38:54.212][algo_conv_hmma.cc:116] Compiling Conv_71
[INFO][2021-11-03 10:38:57.143][algo_conv_hmma.cc:116] Compiling Conv_71
[INFO][2021-11-03 10:38:57.145][algo_conv_hmma.cc:116] Compiling Conv_72
[INFO][2021-11-03 10:38:59.957][algo_conv_hmma.cc:116] Compiling Conv_76
[INFO][2021-11-03 10:39:02.925][algo_conv_hmma.cc:116] Compiling Conv_76
[INFO][2021-11-03 10:39:02.927][algo_conv_hmma.cc:116] Compiling Conv_78
[INFO][2021-11-03 10:39:02.929][algo_conv_hmma.cc:116] Compiling Conv_82
[INFO][2021-11-03 10:39:02.929][algo_conv_hmma.cc:116] Compiling Conv_82
[INFO][2021-11-03 10:39:02.931][algo_conv_hmma.cc:116] Compiling Conv_84
[INFO][2021-11-03 10:39:02.933][algo_conv_hmma.cc:116] Compiling Conv_88
[INFO][2021-11-03 10:39:02.934][algo_conv_hmma.cc:116] Compiling Conv_90
[INFO][2021-11-03 10:39:02.936][algo_conv_hmma.cc:116] Compiling Conv_94
[INFO][2021-11-03 10:39:05.646][algo_conv_hmma.cc:116] Compiling Conv_95
[INFO][2021-11-03 10:39:08.061][algo_gemm.cc:113] Compiling Gemm_98
[INFO][2021-11-03 10:39:09.090][opt_graph.cc:592] deleted 167 bridge kernels
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 215] engine config as follows:
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 216] forward type : 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 217] mem mode : 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 227] input[0]: input
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 237] dim(s): 1 3 224 224
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 238] DataType: FLOAT32
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 239] DataFormat: NDARRAY
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 240] BytesIncludePadding: 602112
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 241] BytesExcludePadding: 602112
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 247] output[0]: probs
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 257] dim(s): 0 1000
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 258] DataType: FLOAT32
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 259] DataFormat: NDARRAY
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 260] BytesIncludePadding: 0
[2021-11-03 10:39:17.023 +08:00] [infer-engine-log] [---I---] [thread 30647] [openppl_engine.cpp printModelInfo 261] BytesExcludePadding: 0
[2021-11-03 10:39:17.024 +08:00] [infer-engine-log] [---I---] [thread 30647] [base_model.cpp loadModel 88] model /disk26b/zhaojd/github_codes/infer_engine_serving/models//test_openppl/1/model.bin loaded success success.
[2021-11-03 10:39:17.034 +08:00] [infer-engine-log] [---I---] [thread 30647] [prometheus_metrics.cpp init 31] prometheus binding address is 0.0.0.0:10098
[2021-11-03 10:39:17.036 +08:00] [infer-engine-log] [---I---] [thread 30647] [server_manager.cpp ServerManager 26] add resource monitor timer task to worker group success.
[2021-11-03 10:39:17.036 +08:00] [infer-engine-log] [---I---] [thread 30647] [server_manager.cpp ServerManager 36] add model update timer task to worker group success
start service....
[2021-11-03 10:39:17.037 +08:00] [infer-engine-log] [---I---] [thread 30647] [infer_engine_server.cpp start 29] http service binding to port 10099
[2021-11-03 10:39:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:39:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:40:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:07.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:41:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:42:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:43:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:07.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:44:57.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:47.041 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:45:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:47.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:46:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:17.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:27.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:47:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:27.044 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:48:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:37.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:47.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:49:57.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:27.039 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:50:57.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:27.040 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:37.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:47.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:51:57.044 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:07.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:17.038 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:19.931 +08:00] [infer-engine-log] [---I---] [thread 32922] [http_service.cpp operator() 143] 7505367914-->Enter modelInferHandler
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 69] task 7505367914 info as follows:
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 70] task_type : model_infer
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 71] model_name : test_openppl
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 72] model_version: 1
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 73] model_action :
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [task_info.cpp printTaskInfo 79] image shape : 224x224x3
[2021-11-03 10:52:19.938 +08:00] [infer-engine-log] [---I---] [thread 32922] [http_service.cpp execTask 116] 7505367914-->execute model_infer task
[2021-11-03 10:52:19.939 +08:00] [infer-engine-log] [---I---] [thread 32909] [model_manager.cpp processModelInferTask 227] 7505367914-->Exec process_model_infer_task
[2021-11-03 10:52:27.316 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:37.312 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:52:47.547 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:06.856 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:10.127 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:22.376 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:27.620 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:39.003 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:53:49.611 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:21.842 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:22.715 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:23.831 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...
[2021-11-03 10:54:35.932 +08:00] [infer-engine-log] [---I---] [thread 32918] [metrics.cpp resourceMonitorTimerTask 26] monitor task timeout, begin to update ...

error: cuModuleLoadDataEx(&module_, source_code_.second.c_str(), 0, 0, 0) failed with error CUDA_ERROR_INVALID_IMAGE
terminate called after throwing an instance of 'std::system_error'
what(): Resource deadlock avoided

发现2个不支持的op

主干代码,centos7 编译后,运行命令:
pplnn --use-x86 --onnx-model xxx.onnx

···
[INFO][2021-11-26 15:01:38.832][pplnn.cc:797] ppl.nn version: 63efb93-dirty
[INFO][2021-11-26 15:01:38.832][pplnn.cc:278] ***** register X86Engine *****
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:175] unsupported op: domain[], type[MaxPool], version[9]
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:202] ParseNodeInfo for node[MaxPool_2] failed: unsupported
[ERROR][2021-11-26 15:01:39.001][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2021-11-26 15:01:39.002][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:39.002][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:39.002][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
[ERROR][2021-11-26 15:01:39.016][pplnn.cc:817] create RuntimeBuilder failed.
···

···
[INFO][2021-11-26 15:01:51.409][pplnn.cc:797] ppl.nn version: 63efb93-dirty
[INFO][2021-11-26 15:01:51.409][pplnn.cc:278] ***** register X86Engine *****
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:175] unsupported op: domain[], type[CumSum], version[11]
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:202] ParseNodeInfo for node[CumSum_181] failed: unsupported
[ERROR][2021-11-26 15:01:51.598][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2021-11-26 15:01:51.598][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:51.599][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2021-11-26 15:01:51.599][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
[ERROR][2021-11-26 15:01:51.614][pplnn.cc:817] create RuntimeBuilder failed.
···

testIsaAVX512 execute fail

thanks for your great jobs,I intergrate ppl into my project,when i run the program in debug mode,it failed with exception as follows
企业微信截图_16358559249664

x86引擎运行没问题,但cuda引擎无法运行,会卡在Compiling Conv_0直至64G内存全部耗尽

[DEBUG][2022-02-26 11:23:12.125][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_127_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_139_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_151_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_163_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_176_Fused]
[DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_185_Fused]
[INFO][2022-02-26 11:23:12.127][engine_graph_partitioner.cc:103] total partition(s) of graph[torch-jit-export]: 1.
[DEBUG][2022-02-26 11:23:12.153][opt_graph.cc:186] Can not reshape safely for node[Resize_170]
[DEBUG][2022-02-26 11:23:12.154][opt_graph.cc:186] Can not reshape safely for node[Resize_158]
[DEBUG][2022-02-26 11:23:12.155][opt_graph.cc:186] Can not reshape safely for node[Resize_146]
[DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Resize_134]
[DEBUG][2022-02-26 11:23:12.156][reshape_concat.cc:43] ERROR: input[1]'s dim[2]'s value[1] != input[0]'s dim[2]'s value[37].
[DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Concat_171]
[DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_183]
[DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_192]
[DEBUG][2022-02-26 11:23:12.173][opt_graph.cc:200] Create 305 TensorImpl
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_172] and nextnode[Relu_173]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_124] and nextnode[Relu_125]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_136] and nextnode[Relu_137]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_148] and nextnode[Relu_149]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_160] and nextnode[Relu_161]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Add_121]
[DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Relu_122]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_118] and nextnode[Relu_119]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_116] and nextnode[Relu_117]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Add_114]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Relu_115]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_111] and nextnode[Relu_112]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_109] and nextnode[Relu_110]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Add_107]
[DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Relu_108]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_103] and nextnode[Relu_104]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_101] and nextnode[Relu_102]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Add_99]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Relu_100]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_96] and nextnode[Relu_97]
[DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_94] and nextnode[Relu_95]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Add_92]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Relu_93]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_89] and nextnode[Relu_90]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_87] and nextnode[Relu_88]
[DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Add_85]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Relu_86]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_82] and nextnode[Relu_83]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_80] and nextnode[Relu_81]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Add_78]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Relu_79]
[DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_75] and nextnode[Relu_76]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_73] and nextnode[Relu_74]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Add_71]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Relu_72]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_68] and nextnode[Relu_69]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_66] and nextnode[Relu_67]
[DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Add_64]
[DEBUG][2022-02-26 11:23:12.179][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Relu_65]
[DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_60] and nextnode[Relu_61]
[DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_58] and nextnode[Relu_59]
[DEBUG][2022-02-26 11:23:12.181][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Add_56]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Relu_57]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_53] and nextnode[Relu_54]
[DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_51] and nextnode[Relu_52]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Add_49]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Relu_50]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_46] and nextnode[Relu_47]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_44] and nextnode[Relu_45]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Add_42]
[DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Relu_43]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_39] and nextnode[Relu_40]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_37] and nextnode[Relu_38]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Add_35]
[DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Relu_36]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_31] and nextnode[Relu_32]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_29] and nextnode[Relu_30]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Add_27]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Relu_28]
[DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_24] and nextnode[Relu_25]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_22] and nextnode[Relu_23]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Add_20]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Relu_21]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_17] and nextnode[Relu_18]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_15] and nextnode[Relu_16]
[DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Add_13]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Relu_14]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_9] and nextnode[Relu_10]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_7] and nextnode[Relu_8]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_4] and nextnode[Relu_5]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_2] and nextnode[Relu_3]
[DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_0] and nextnode[Relu_1]
[INFO][2022-02-26 11:23:12.192][opt_graph.cc:311] added 261 new bridge kernels
[INFO][2022-02-26 11:23:12.724][algo_conv_hmma.cc:126] Compiling Conv_0

Compile failed: fatal error: Python.h: No such file or directory

First, thanks for your work.

When I compiled openppl on docker, it returns an error:

/root/ppl.nn/deps/pybind11/include/pybind11/detail/common.h:186:10: fatal error: Python.h: No such file or directory

My base image environment:

os: Ubuntu20.04
python: 3.8.10

I tried to install libpython3.8-dev, then locate Python.h in /usr/include/python3.8/. After that, I tried to recompile and still report an error Python.h: No such file or directory
However, I have successfully compiled openppl on my host, host environment:

os: Ubuntu16.04
chip: x64
Pybind has been preinstalled 

Next, I tried to pip install pybind and delete hpcc declare pybind function in deps.cmake, it still same error. And maybe some packages' versions are too old.

So, could you release a docker image or help me solve this problem?
Thanks!

ubuntu 18.04 compiler error and fixed

when I compile the lastest code in
ubuntu 18.04
cmake 3.19.3

using ./build.sh -DHPCC_USE_OPENMP=ON

it shows error like this:
image

and when I add the header #include< cmath> in file src/ppl/nn/engines/x86/impls/test/utils/check.h , it works fine.

unsupported op with opset14

When I convert a pytorch model into onnx model with opset14 and then run with ppl (X86), I get the following error:

[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:175] unsupported op: domain[], type[Add], version[14]
[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:202] ParseNodeInfo for node[Add_2] failed: unsupported
[ERROR][2022-02-18 21:37:08.903][graph_parser.cc:267] ParseGraphNode failed.
[ERROR][2022-02-18 21:37:08.903][model_parser.cc:80] parse graph failed: unsupported
[ERROR][2022-02-18 21:37:08.904][runtime_builder_impl.cc:47] parse graph failed: unsupported
[ERROR][2022-02-18 21:37:08.904][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported

After I changed the opset from 14 to 11, the error above was gone but I got a segmentation error with no information..

core dump while use pplnn for x86 benchmarking

System Info

OS: Ubuntu 16.04
Compiler: GCC-8.2
CPU: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
The ONNX model uses opset11 which can run using TensorRT and ONNXRuntime

Error Message as follows:


(perf) λ 96fbdb7483e1 /work/github/ppl.nn/pplnn-build/tools {master} bash test.sh
[INFO][2022-01-29 02:22:15.166][pplnn.cc:1053] ppl.nn version: cf85289
[INFO][2022-01-29 02:22:15.210][pplnn.cc:332] ***** register X86Engine *****
[INFO][2022-01-29 02:22:15.294][engine_graph_partitioner.cc:103] total partition(s) of graph[paddle-onnx]: 1.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
test.sh: line 8: 56259 Aborted (core dumped) ./pplnn --use-x86 --onnx-model ./models/MobileNetV1.onnx --mm-policy mem --enable-profiling --min-profiling-seconds 10 --warmup-iterations

cmake error: could not find git for clone of hpcc-populate

Hi @openppl-public, I try to compile ppl.nn on ubuntu 20.04, gcc 9.3.0, cuda 11.3, cmake 3.16.3 and got the following error:


# ./build.sh -DHPCC_USE_CUDA=ON
mkdir: cannot create directory '/workspace/github/ppl.nn/pplnn-build': File exists
cmd -> cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/workspace/github/ppl.nn/pplnn-build/install -DHPCC_USE_CUDA=ON .. && make -j24 && make install
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Populating hpcc
CMake Error at /usr/share/cmake-3.16/Modules/ExternalProject.cmake:2421 (message):
  error: could not find git for clone of hpcc-populate
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/ExternalProject.cmake:3236 (_ep_add_download_command)
  CMakeLists.txt:13 (ExternalProject_Add)


-- Configuring incomplete, errors occurred!
See also "/workspace/github/ppl.nn/deps/hpcc-subbuild/CMakeFiles/CMakeOutput.log".
CMake Error at /usr/share/cmake-3.16/Modules/FetchContent.cmake:903 (message):
  CMake step for hpcc failed: 1
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/FetchContent.cmake:1006 (__FetchContent_directPopulate)
  cmake/deps.cmake:23 (FetchContent_Populate)
  CMakeLists.txt:27 (include)


-- Configuring incomplete, errors occurred!
See also "/workspace/github/ppl.nn/pplnn-build/CMakeFiles/CMakeOutput.log".

How do I fix it?

Thanks

按官方的mask_rcnn sample执行出错

如题,
mmdetection转mask_rcnn.onnx一切正常,然后使用ppl的python sample推理mask_rcnn模型时报错

[less_kernel.cc:81] unsupported data_type: FLOAT64

简单调试了一下,输入数据是fp32的,难道是转出来的onnx是float64?但是官方不是跑通了的么

cmake应该怎样引入pplnn?

我想单独编译cpp示例代码,

文件结构:

  |--build
  |-- classification.cpp
  |-- CMakeLists.txt
  |-- libs
  |-- main.cpp

CmakeList.txt这样写的:

cmake_minimum_required(VERSION 3.0.0)
project(ppl_classify VERSION 0.1.0)

find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${PPLNN_INCLUDE_DIRECTORIES})

add_executable(classification classification.cpp)
target_link_libraries(classification PUBLIC pplnn_static ${OpenCV_LIBS})

执行cmake ..,
但是第26行代码这里还会报错:ppl/nn/models/onnx/onnx_runtime_builder_factory.h' file not foundclang(pp_file_not_found)

#include "ppl/nn/models/onnx/onnx_runtime_builder_factory.h"
#include "ppl/nn/engines/x86/engine_factory.h"
#include "ppl/nn/engines/x86/x86_engine_options.h"

感觉是CMakeLists.txt写的不对,请问CMakeLists.txt应该怎样写?

compile error on Mac os

clang = 12.0
when running build.sh, errors occur.
`-- Configuring done

-- Generating done

-- Build files have been written to: /Users/bytedance/Desktop/ppl.nn/pplnn-build

1 #!/bin/bash

Scanning dependencies of target pplcommon_static

1 cmake_minimum_required(VERSION 3.11)

Consolidate compiler generated dependencies of target pplcommon_static

[ 1%] Building CXX object ppl.common-build/CMakeFiles/pplcommon_static.dir/src/ppl/common/log.cc.o

In file included from /Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:1:

/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:54:17: error: class member cannot be redeclared
LogMessage& operator<<(long long ll);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:51:17: note: previous declaration is here
LogMessage& operator<<(int64_t i64);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:55:17: error: class member cannot be redeclared
LogMessage& operator<<(unsigned long long ull);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.h:52:17: note: previous declaration is here
LogMessage& operator<<(uint64_t u64);
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:75:1: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat]
DEF_READ_OPERATOR_FUNC(int64_t, "%ld");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%lld
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:61:44: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
auto len = snprintf(buf, 128, fmt, value);
~~~ ^~~~~
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:76:1: warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
DEF_READ_OPERATOR_FUNC(uint64_t, "%lu");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%llu
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:61:44: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
auto len = snprintf(buf, 128, fmt, value);
~~~ ^~~~~
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:78:1: error: redefinition of 'operator<<'
DEF_READ_OPERATOR_FUNC(long long, "%lld");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:75:1: note: previous definition is here
DEF_READ_OPERATOR_FUNC(int64_t, "%ld");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:79:1: error: redefinition of 'operator<<'
DEF_READ_OPERATOR_FUNC(unsigned long long, "%llu");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:76:1: note: previous definition is here
DEF_READ_OPERATOR_FUNC(uint64_t, "%lu");
^
/Users/bytedance/Desktop/ppl.nn/deps/ppl.common/src/ppl/common/log.cc:59:29: note: expanded from macro 'DEF_READ_OPERATOR_FUNC'
LogMessage& LogMessage::operator<<(Type value) {
^
2 warnings and 4 errors generated.
make[2]: *** [ppl.common-build/CMakeFiles/pplcommon_static.dir/src/ppl/common/log.cc.o] Error 1
make[1]: *** [ppl.common-build/CMakeFiles/pplcommon_static.dir/all] Error 2
make: *** [all] Error 2`
Should I delete all the redeclared class member?

Model in onnx model zoo can not run in ppl.nn

Hi folks. I tried running models from onnx model zoo and find out a lots of pretrained model can not run with ppl.nn, the error is varies across models, eg

./pplnn-build/tools/pplnn --onnx-model resnet50-v1-7.onnx
[INFO][2021-07-04 04:43:10.810][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[WARNING][2021-07-04 04:43:12.138][engine.cc:192] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-04 04:43:12.138][pplnn.cc:88] ***** register CudaEngine *****
[ERROR][2021-07-04 04:43:12.211][model_parser.cc:46] unsupported opset [:8]
[ERROR][2021-07-04 04:43:12.214][runtime_builder_impl.cc:33] parse graph failed: unsupported
[ERROR][2021-07-04 04:43:12.214][onnx_runtime_builder_factory.cc:42] init OnnxRuntimeBuilder failed: unsupported
[ERROR][2021-07-04 04:43:12.219][pplnn.cc:697] create OnnxRuntimeBuilder failed.

or

./pplnn-build/tools/pplnn --onnx-model efficientnet-lite4-11.onnx
[INFO][2021-07-04 04:40:22.270][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[WARNING][2021-07-04 04:40:23.590][engine.cc:192] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-04 04:40:23.590][pplnn.cc:88] ***** register CudaEngine *****
[ERROR][2021-07-04 04:40:24.453][simple_graph_partitioner.cc:73] cannot find implementation of op[:MatMul]
[ERROR][2021-07-04 04:40:24.453][utils.cc:412] partitioning graph[tf2onnx] failed: not found
[ERROR][2021-07-04 04:40:24.453][runtime_builder_impl.cc:39] process graph failed: not found
[ERROR][2021-07-04 04:40:24.453][onnx_runtime_builder_factory.cc:42] init OnnxRuntimeBuilder failed: not found
[ERROR][2021-07-04 04:40:24.455][pplnn.cc:697] create OnnxRuntimeBuilder failed.

...

how's the restriction with onnx and do you have a support matrix about the operators?

mmdetection model Faster_rcnn failed with pplnn

I followed the document of converting model with openmmp and generated pretrained faster_rcnn model , but when I used x86 pplnn to run this model, there is an error says:

[INFO][2021-07-06 08:48:49.498][pplnn.cc:683] ppl.nn version: 7dd75a1077867fc9a762449953417088446ae2f8-dirty
[INFO][2021-07-06 08:48:49.498][pplnn.cc:110] ***** register X86Engine *****
[INFO][2021-07-06 08:48:49.761][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[ERROR][2021-07-06 08:48:50.556][kernel.cc:14] reshape kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][kernel.cc:47] BeforeExecute() of kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][scheduler_common.cc:153] exec kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][sequential_scheduler.cc:99] execute kernel[Expand_1100] failed: invalid value
[ERROR][2021-07-06 08:48:50.556][pplnn.cc:784] Run() failed: invalid value

The mobilenet can execute successfully. Can I do anything to make this model execute right?

About build time with cuda

I try to build ppl.nn on server, seems the compile process stuck at

...
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/PPLKernelX86.dir/src/ppl/kernel/x86/int64/reorder/reorder_n16cx_ndarray_int64_avx.cpp.o
[ 50%] Linking CXX static library libPPLKernelX86.a
[ 50%] Built target PPLKernelX86
Scanning dependencies of target test_conv2d
Scanning dependencies of target test_fc
Scanning dependencies of target test_gemm
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_fc.dir/test/test_fc.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_fc.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_conv2d.dir/test/test_conv2d.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_gemm.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_gemm.dir/test/test_gemm.cpp.o
[ 50%] Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/test_conv2d.dir/__/__/__/__/__/__/tools/simple_flags.cc.o
[ 50%] Linking CXX executable test_fc
[ 50%] Built target test_fc
[ 50%] Linking CXX executable test_conv2d
[ 51%] Linking CXX executable test_gemm
[ 51%] Built target test_conv2d
[ 51%] Built target test_gemm

I observe that it only use 1 core of the CPU, and the memory usage is very high(99% on 128GB), the compile process took hours freeze in test_gemm, does it take a very long time to build?

Mask R-CNN failed with pplnn

The model was conveted from mmdetection library. And when I try to execute with pplnn, it shows error:

[INFO][2021-07-14 17:18:19.999][pplnn.cc:703] ppl.nn version: 5d56662bf5a288898f0dd5b90f763459cc86f47a
[WARNING][2021-07-14 17:18:21.873][engine.cc:209] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
[INFO][2021-07-14 17:18:21.873][pplnn.cc:104] ***** register CudaEngine *****
[INFO][2021-07-14 17:18:22.320][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
[ERROR][2021-07-14 17:18:22.322][reshape_reshape.cc:66] infer shape failed.
[ERROR][2021-07-14 17:18:22.338][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.339][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.340][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.341][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.341][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.343][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.343][reshape_unsqueeze.cc:36] axes overflow.
[ERROR][2021-07-14 17:18:22.343][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
[INFO][2021-07-14 17:18:22.346][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export1]: 1.
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:204] Create 2 TensorImpl
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:316] added 2 new bridge kernels
[INFO][2021-07-14 17:18:22.346][opt_graph.cc:478] deleted 1 bridge kernels
[INFO][2021-07-14 17:18:22.347][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export2]: 1.
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:204] Create 20 TensorImpl
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:316] added 21 new bridge kernels
[INFO][2021-07-14 17:18:22.347][opt_graph.cc:478] deleted 14 bridge kernels
[ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.348][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.391][reshape_add.cc:39] unbroadcastable input.
[ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
[ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
[INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export3]: 1.
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:204] Create 2 TensorImpl
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:316] added 2 new bridge kernels
[INFO][2021-07-14 17:18:22.392][opt_graph.cc:478] deleted 1 bridge kernels
[INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export4]: 1.
[INFO][2021-07-14 17:18:22.393][opt_graph.cc:204] Create 20 TensorImpl
[INFO][2021-07-14 17:18:22.393][opt_graph.cc:316] added 21 new bridge kernels
[INFO][2021-07-14 17:18:22.408][opt_graph.cc:478] deleted 14 bridge kernels
[ERROR][2021-07-14 17:18:22.408][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.409][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
[ERROR][2021-07-14 17:18:22.413][reshape_split.cc:59] splited axis and sum of split point not match.
[INFO][2021-07-14 17:18:22.426][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export5]: 1.
[ERROR][2021-07-14 17:18:22.426][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
[ERROR][2021-07-14 17:18:22.429][reshape_concat.cc:42] input shape not match.
[INFO][2021-07-14 17:18:22.429][opt_graph.cc:204] Create 135 TensorImpl
[INFO][2021-07-14 17:18:22.430][opt_graph.cc:316] added 174 new bridge kernels
[INFO][2021-07-14 17:18:22.433][opt_graph.cc:478] deleted 153 bridge kernels
[INFO][2021-07-14 17:18:22.434][opt_graph.cc:204] Create 2263 TensorImpl
[INFO][2021-07-14 17:18:22.660][opt_graph.cc:316] added 2626 new bridge kernels
[INFO][2021-07-14 17:20:05.963][opt_graph.cc:478] deleted 2547 bridge kernels
[ERROR][2021-07-14 17:20:06.007][scheduler_common.cc:170] exec kernel[Pad_146] failed: invalid value
[ERROR][2021-07-14 17:20:06.007][sequential_scheduler.cc:116] execute kernel[Pad_146] failed: invalid value
[ERROR][2021-07-14 17:20:06.007][pplnn.cc:804] Run() failed: invalid value

I'm running it with true image data. Dose that pplnn support maskrcnn, or what should I do to execute it suceessfully?
Thanks a lot!
The model was generated by this command:

python ../tools/deployment/pytorch2onnx.py ../configs/mask_rcnn/mask_rcnn_r50_fpn_mstrain-poly_3x_coco.py \
mask_rcnn_r50_fpn_mstrain-poly_3x_coco_20210524_201154-21b550bb.pth \
--output-file mask_rcnn.onnx --simplify --dynamic-export

pplnn failed with CUDA

I tried CUDA10.2 and CUDA11.0, both failed at
"[ERROR][2021-07-05 06:38:36.723][buffered_cuda_allocator.cc:91] cuMemAddressReserve failed: operation not supported"
what should I do to avoid this error?

My environment is Centos8 gcc8.4, and the gpu device is Tesla T4.
BTW the x86 version can run sucessfully.

PPL is not support your GPU device right now

[INFO][2021-09-28 03:27:34.466][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-09-28 03:27:34.466][opt_graph.cc:202] Create 4 TensorImpl
[INFO][2021-09-28 03:27:34.466][opt_graph.cc:313] added 4 new bridge kernels
[ERROR][2021-09-28 03:27:34.467][opt_graph.cc:472] PPL is not support your GPU device right now.
[ERROR][2021-09-28 03:27:34.467][opt_graph.cc:630] Selec algos for each kernel failed: unsupported
[ERROR][2021-09-28 03:27:34.467][engine.cc:58] OptGraph DoOptimeize failed: unsupported
[ERROR][2021-09-28 03:27:34.467][engine.cc:68] DoOptimize failed: unsupported
[ERROR][2021-09-28 03:27:34.467][utils.cc:257] process graph[torch-jit-export] by engine[cuda] failed: unsupported
[ERROR][2021-09-28 03:27:34.467][utils.cc:467] GenPartitionsInfo failed:unsupported
[ERROR][2021-09-28 03:27:34.467][runtime_builder_impl.cc:51] process graph failed: unsupported
[ERROR][2021-09-28 03:27:34.467][onnx_runtime_builder_factory.cc:58] init RuntimeBuilder failed: unsupported
ERROR: create RuntimeBuilder failed.

Lower performance compiled with clang (Darwin)

Continued with work #20:
I tried with iMac 2018 (Intel Core i7 [email protected]), it shows an unnormal performance compared with other inference engine (openvino / TNN):

  • Model: MobileV1
  • Input: images with [3, 224, 224]
  • TNN / openvino runs with 10~30ms per image (already warmed up)
  • pplnn runs with 100+ms per image.
$ ./pplnn-build/tools/pplnn --onnx-model data/mobilenet_1.0_224.onnx \
                --reshaped-inputs data/input-1_3_224_224-fp32.dat \
                --mm-policy perf \
                --warmuptimes 100 \
                --core-binding \
                --enable-profiling
[INFO][2021-07-23 14:16:09.793][pplnn.cc:708] ppl.nn version: eb685a9da839b3c74b4c1e36b571c4c652cfba0c
[INFO][2021-07-23 14:16:09.802][pplnn.cc:131] ***** register X86Engine *****
[INFO][2021-07-23 14:16:09.861][simple_graph_partitioner.cc:107] total partition(s) of graph[./mobilenet_1.0_224.onnx]: 1.
[INFO][2021-07-23 14:16:10.200][pplnn.cc:548] ----- input info -----
[INFO][2021-07-23 14:16:10.200][pplnn.cc:551] input[0]:
[INFO][2021-07-23 14:16:10.200][pplnn.cc:552]     name: input
[INFO][2021-07-23 14:16:10.200][pplnn.cc:559]     dim(s): 1 3 224 224
[INFO][2021-07-23 14:16:10.200][pplnn.cc:561]     DataType: FLOAT32
[INFO][2021-07-23 14:16:10.200][pplnn.cc:562]     DataFormat: NDARRAY
[INFO][2021-07-23 14:16:10.200][pplnn.cc:563]     NumBytesIncludePadding: 602112
[INFO][2021-07-23 14:16:10.200][pplnn.cc:564]     NumBytesExcludePadding: 602112
[INFO][2021-07-23 14:16:10.200][pplnn.cc:567] ----- output info -----
[INFO][2021-07-23 14:16:10.200][pplnn.cc:570] output[0]:
[INFO][2021-07-23 14:16:10.200][pplnn.cc:571]     name: prob_Y
[INFO][2021-07-23 14:16:10.200][pplnn.cc:578]     dim(s): 1 1000 1 1
[INFO][2021-07-23 14:16:10.200][pplnn.cc:580]     DataType: FLOAT32
[INFO][2021-07-23 14:16:10.200][pplnn.cc:581]     DataFormat: NDARRAY
[INFO][2021-07-23 14:16:10.200][pplnn.cc:582]     NumBytesIncludePadding: 4000
[INFO][2021-07-23 14:16:10.200][pplnn.cc:583]     NumBytesExcludePadding: 4000
[INFO][2021-07-23 14:16:10.200][pplnn.cc:586] ----------------------
[INFO][2021-07-23 14:16:10.200][pplnn.cc:820] Run() costs: 232.927002 ms.
[INFO][2021-07-23 14:16:10.200][pplnn.cc:828] Run ok
[INFO][2021-07-23 14:16:10.200][pplnn.cc:832] Warm up start for 100 times.
[INFO][2021-07-23 14:16:20.537][pplnn.cc:839] Warm up end.
[INFO][2021-07-23 14:16:20.537][pplnn.cc:847] Profiling start
[INFO][2021-07-23 14:16:21.546][pplnn.cc:863] Duration: 1009.201000 ms
[INFO][2021-07-23 14:16:21.546][pplnn.cc:873] Average run cost: 112.133444 ms.
[INFO][2021-07-23 14:16:21.546][pplnn.cc:876] Profiling End

I've done profiling with tools/pplnn, it shows that pplnn spend 85% time on nopw instruction in the conv2d of MobileV1.
The source code is src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/conv2d/gemm_direct/fma/conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp

objdump shows:

$ objdump -D conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.o | wc -l
   17796
$ objdump -D conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.o | grep nopw | wc -l
    8931

conv2d_n16cx_gemm_direct_kernel_fp32_fma.cpp.S:
...
    4e3a: 4c 89 e0                     	movq	%r12, %rax
    4e3d: 48 8b 5f 18                  	movq	24(%rdi), %rbx
    4e41: 4c 8b 16                     	movq	(%rsi), %r10
    4e44: 49 83 fa 10                  	cmpq	$16, %r10
    4e48: 0f 8c 87 b9 00 00            	jl	0x107d5 <__ZN3ppl6kernel3x8647conv2d_n16cx_gemm_direct_fp32_fma_blk1x6_kernelILb0ELi16ELi6EEEvPKxS4_+0xbab5>
    4e4e: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    4e58: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    4e62: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
... (nopw 4e63 - ffeb) 
    ffec: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
    fff6: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
   10000: c5 7c 10 33                  	vmovups	(%rbx), %ymm14
   10004: c5 7c 10 7b 20               	vmovups	32(%rbx), %ymm15
...
... (nopw 157de - 1fff4) 

I think may be clang tried to align the instructions block but it's too large for this kernel, and I didn't find any potential clang options that may causing this problem (yeah, I've tried with -O0 and only compile this file).

Any ideas?

error: no matching function for call to ‘ppl::nn::OnnxRuntimeBuilderFactory::Create

在安装mmdeploy过程中执行cmake --build . -- -j$(nproc) && cmake --install .报错:
/home/zcc/mmdeploy/csrc/net/ppl/ppl_net.cpp:77:89: error: no matching function for call to ‘ppl::nn::OnnxRuntimeBuilderFactory::Create(char*, std::__cxx11::basic_string::size_type, ppl::nn::Engine**, std::vectorppl::nn::Engine*::size_type)’
onnx.data(), onnx.size(), engines.data(), engines.size())));
^

windows编译出错

您好:
在windows10 vs2017下 编译出错 出现未定义错误
E0020 未定义标识符 "_mm512_floor_ps" pplkernelx86_static ppl.nn\src\ppl\nn\engines\x86\impls\src\ppl\kernel\x86\common\math_avx512.h 103

Infer on vgg16 with batch size 32 took more than 1 hour before run actual inference

Step to reproduce

./pplnn-build/tools/pplnn --in-shapes 32_3_224_224 --dims 32_3_224_224 --warmuptimes 200 --runningtimes 200 --onnx-model vgg16.onnx
[INFO][2021-07-05 08:31:30.885][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[INFO][2021-07-05 08:31:32.207][pplnn.cc:88] ***** register CudaEngine *****
[INFO][2021-07-05 08:31:32.940][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 08:31:33.295][opt_graph.cc:187] Create 71 TensorImpl
[INFO][2021-07-05 08:31:33.295][opt_graph.cc:299] added 56 new bridge kernels
[INFO][2021-07-05 09:46:30.989][opt_graph.cc:461] deleted 52 bridge kernels
[INFO][2021-07-05 09:46:46.325][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 09:46:46.326][pplnn.cc:526] input[0]:
[INFO][2021-07-05 09:46:46.326][pplnn.cc:527]     name: input.1
[INFO][2021-07-05 09:46:46.326][pplnn.cc:534]     dim(s): 32 3 224 224
[INFO][2021-07-05 09:46:46.326][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 09:46:46.326][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 09:46:46.326][pplnn.cc:538]     NumBytesIncludePadding: 19267584
[INFO][2021-07-05 09:46:46.326][pplnn.cc:539]     NumBytesExcludePadding: 19267584
[INFO][2021-07-05 09:46:46.326][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 09:46:46.326][pplnn.cc:545] output[0]:
[INFO][2021-07-05 09:46:46.326][pplnn.cc:546]     name: 70
[INFO][2021-07-05 09:46:46.326][pplnn.cc:553]     dim(s): 32 1000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 09:46:46.326][pplnn.cc:556]     DataFormat: NDARRAY
[INFO][2021-07-05 09:46:46.326][pplnn.cc:557]     NumBytesIncludePadding: 128000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:558]     NumBytesExcludePadding: 128000
[INFO][2021-07-05 09:46:46.326][pplnn.cc:561] ----------------------
[INFO][2021-07-05 09:46:46.326][pplnn.cc:791] Run() costs: 9175.929688 ms.
[INFO][2021-07-05 09:46:46.326][pplnn.cc:799] Run ok

As shown in log, the time start on 08:31 and start inference on 09:46, took 75 minutes to prepare. Is it normal?the model was import from torchvison and export to onnx

import torchvision
dummy_input = torch.randn(32, 3, 224, 224)
model = torchvision.models.vgg16(pretrained = True)
model.eval()
torch.onnx.export(model, dummy_input, "vgg16.onnx", opset_version=11)

Also, test with batch size = 1, the time is pretty normal.

# ./pplnn-build/tools/pplnn --onnx-model vgg16.onnx --in-shapes 1_3_224_224 --dims 1_3_224_224 --warmuptimes 100 --runningtimes 100
[INFO][2021-07-05 05:21:44.428][pplnn.cc:683] ppl.nn version: v0.1.0-dirty
[INFO][2021-07-05 05:21:46.437][pplnn.cc:88] ***** register CudaEngine *****
[INFO][2021-07-05 05:21:47.230][simple_graph_partitioner.cc:90] total partition(s) of graph[torch-jit-export]: 1.
[INFO][2021-07-05 05:21:47.511][opt_graph.cc:187] Create 71 TensorImpl
[INFO][2021-07-05 05:21:47.511][opt_graph.cc:299] added 56 new bridge kernels
[INFO][2021-07-05 05:24:30.634][opt_graph.cc:461] deleted 52 bridge kernels
[INFO][2021-07-05 05:24:31.300][pplnn.cc:523] ----- input info -----
[INFO][2021-07-05 05:24:31.300][pplnn.cc:526] input[0]:
[INFO][2021-07-05 05:24:31.300][pplnn.cc:527]     name: input.1
[INFO][2021-07-05 05:24:31.300][pplnn.cc:534]     dim(s): 1 3 224 224
[INFO][2021-07-05 05:24:31.300][pplnn.cc:536]     DataType: FLOAT32
[INFO][2021-07-05 05:24:31.300][pplnn.cc:537]     DataFormat: NDARRAY
[INFO][2021-07-05 05:24:31.300][pplnn.cc:538]     NumBytesIncludePadding: 602112
[INFO][2021-07-05 05:24:31.300][pplnn.cc:539]     NumBytesExcludePadding: 602112
[INFO][2021-07-05 05:24:31.300][pplnn.cc:542] ----- output info -----
[INFO][2021-07-05 05:24:31.300][pplnn.cc:545] output[0]:
[INFO][2021-07-05 05:24:31.300][pplnn.cc:546]     name: 70
[INFO][2021-07-05 05:24:31.300][pplnn.cc:553]     dim(s): 1 1000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:555]     DataType: FLOAT32
[INFO][2021-07-05 05:24:31.300][pplnn.cc:556]     DataFormat: NDARRAY
[INFO][2021-07-05 05:24:31.300][pplnn.cc:557]     NumBytesIncludePadding: 4000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:558]     NumBytesExcludePadding: 4000
[INFO][2021-07-05 05:24:31.300][pplnn.cc:561] ----------------------
[INFO][2021-07-05 05:24:31.300][pplnn.cc:791] Run() costs: 344.269989 ms.
[INFO][2021-07-05 05:24:31.300][pplnn.cc:799] Run ok

linux平台编译问题

pplnn tlinux build.sh遇到的问题:

  1. pplnn_cpu/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/common/simd_tools.cpp:15:45: 错误:‘_MM_DENORMALS_ZERO_ON’在此作用域中尚未声明
  2. from /home/shiweifan/qiao/pplnn_cpu/src/ppl/nn/engines/x86/impls/src/ppl/kernel/x86/fp32/arithmetic/sse/arithmetic_fp32_sse.cpp:1:
    /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/nmmintrin.h:31:3: 错误:#error "SSE4.2 instruction set not enabled"
    修改cmakelist.txt:
    if(CMAKE_COMPILER_IS_GNUCXX)
    add_compile_options(-msse4.2)
    message(STATUS "optional:-msse4.2")
    endif(CMAKE_COMPILER_IS_GNUCXX)
    遇到的问题:
    Building CXX object src/ppl/nn/engines/x86/impls/CMakeFiles/PPLKernelX86.dir/src/ppl/kernel/x86/fp32/arithmetic/avx/arithmetic_fp32_avx.cpp.o
    c++: 错误:unrecognized command line option ‘-mtune-ctrl=256_unaligned_load_optimal,256_unaligned_store_optimal’
    咨询一下可有类似问题,如何解决。

Win32 cmake 报错

Win64 编译正常,但Win32 cmake 无法生成项目文件。
尝试更新cmake到最新版本,错误依旧存在。

PS D:\ppl.nn> cmake --version
cmake version 3.23.0-rc2

命令及报错如下:

What are the problems?(snapshots or detailed error messages)

PS D:\ppl.nn> .\build.bat -G "Visual Studio 16 2019" -A Win32 -DHPCC_USE_X86_64=ON

D:\ppl.nn>md pplnn-build

D:\ppl.nn>cd pplnn-build

D:\ppl.nn\pplnn-build>cmake -G "Visual Studio 16 2019" -A Win32 -DHPCC_USE_X86_64=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install ..
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19043.
-- The C compiler identification is MSVC 19.29.30038.1
-- The CXX compiler identification is MSVC 19.29.30038.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x86/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x86/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Populating hpcc
CMake Error: Error: generator platform: Win32
Does not match the platform used previously:
Either remove the CMakeCache.txt file and CMakeFiles directory or choose a different binary directory.
CMake Error at C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1076 (message):
  CMake step for hpcc failed: 1
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1217:EVAL:2 (__FetchContent_directPopulate)
  C:/Program Files/CMake/share/cmake-3.23/Modules/FetchContent.cmake:1217 (cmake_language)
  cmake/deps.cmake:59 (FetchContent_Populate)
  CMakeLists.txt:39 (include)


-- Configuring incomplete, errors occurred!
See also "D:/ppl.nn/pplnn-build/CMakeFiles/CMakeOutput.log".

D:\ppl.nn\pplnn-build>cmake --build . -j --config Release
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.10.2+857e5a733
版权所有(C) Microsoft Corporation。保留所有权利。

MSBUILD : error MSB1009: 项目文件不存在。
开关:ALL_BUILD.vcxproj

D:\ppl.nn\pplnn-build>cmake --build . --target install -j --config Release
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.10.2+857e5a733
版权所有(C) Microsoft Corporation。保留所有权利。

MSBUILD : error MSB1009: 项目文件不存在。
开关:install.vcxproj

D:\ppl.nn\pplnn-build>cd ..
PS D:\ppl.nn>

centernet runs with memory error.

My gpu is Tesla T4, and sample model runs normally.
When I use centernet with --mm-policy=mem, it turns out erorr like this, but it can get an output.
image
WHen I use --mm-policy=perf, it gets error out of memory like this:
image
It seems they both end with memory error, is this error familiar to your team, or how can I avoid this error?

No module named 'pyppl'

Ubuntu 16.04.6
cmake: 3.16.6
Anaconda virtual enviroment: pytorch 1.7+cu101

first git clone https://github.com/openppl-public/ppl.nn.git

Then ./build.sh -DHPCC_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON

Finally run the command PYTHONPATH=./pplnn-build/install python ./samples/python/maskrcnn_onnx/run_maskrcnn_onnx.py

it raise the error:
image
how to find the module named 'pyppl'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.