vertexc / dl-infer-perf Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 523 KB

deep learning inference perf analysis

Python 97.14% Shell 2.86%

deep-learning performance-analysis tvm xla tensorrt onnx tensorflow pytorch

dl-infer-perf's Introduction

dl-infer-perf

A perf analysis of deep learning inference performance over pytorch/tensorflow and TensorRT/XLA/TVM.

Environments

TVM

docker: nvidia/cuda:11.1.1-devel-ubuntu18.0

compile tvm with llvm (clang+llvm-11.0.1-x86_64-linux-gnu-ubuntu-16.04)

XLA

docker: nvcr.io/nvidia/tensorflow:20.07-tf2-py3

TensorRT

docker: nvcr.io/nvidia/tensorrt:19.09-py3

Candidates

model

vgg16
mobilenet
resnet50
inception

optimizer

TVM
XLA
TensorRT

front-end(dl framework)

ONNX
Pytorch
Tensorflow

	Onnx	Pytorch	Tensorflow
baseline	✅	✅	✅
tvm	#4	✅ 1.4	✅ 1.12
XLA	-	-	✅
TensorRT	#5	-	✅

Usage

Run per optimizer&frontend

usage: python3 infer_perf/<script.py> <model_name> <options>

i.e. run tf resnet50 with xla enabled
python3 infer_perf/to_xla resnet50 --xla

Run varios jobs together

usage: python3 executor <job.json> --report <outout.csv>

i.e.
python executor torch2tvm.json  --report result.csv

Run with benchmark server

usage: python3 executor <job.json> --server <bm_server>

Code Format

yapf infer_perf/*.py -i --style yapf.style

dl-infer-perf's People

Contributors

Stargazers

Watchers

Forkers

lijiunderstand

dl-infer-perf's Issues

pytorch->tvm cuda oom

Get error while run multiple tasks together by executor

name:torch2tvm model:mobilenet batch_size:2 params:{'backend': 'cuda'}
Traceback (most recent call last):
  [bt] (8) /scratch/tvm/build/libtvm.so(TVMFuncCall+0x5f) [0x7f1aab82354f]
  [bt] (7) /scratch/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x3a0) [0x7f1aab64d600]
  [bt] (6) /scratch/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0x1d2e) [0x7f1aab64c4de]
  [bt] (5) /scratch/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)+0xdf) [0x7f1aab0b716f]
  [bt] (4) /scratch/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)+0x584) [0x7f1aab0b6824]
  [bt] (3) /scratch/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target)+0x62f) [0x7f1aab1561bf]
  [bt] (2) /scratch/tvm/build/libtvm.so(tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const+0x298) [0x7f1aab15b458]
  [bt] (1) /scratch/tvm/build/libtvm.so(tvm::codegen::BuildCUDA(tvm::IRModule, tvm::Target)+0x2be) [0x7f1aab797f5e]
  [bt] (0) /scratch/tvm/build/libtvm.so(+0x1a40fdb) [0x7f1aab81ffdb]
  File "/scratch/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/scratch/tvm/python/tvm/autotvm/measure/measure_methods.py", line 676, in tvm_callback_cuda_compile
    ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
  File "/scratch/tvm/python/tvm/contrib/nvcc.py", line 92, in compile_cuda
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

but okay with python3 infer_perf/torch2tvm.py mobilenet --batch=2 --size=2

tvm inference performance get worse when batchsize is larger

Currently tvm schedules are mostly optimized for batch_size=1. To use batch_size is larger than 1, one needs to compile modle with AutoTvm.

https://discuss.tvm.apache.org/t/more-slower-use-tvm-than-mxnet-when-i-use-batch-forward/1810/2

tf->tvm cuda mobilenet fail

python infer_perf/tf2tvm.py mobilenet --backend=cuda --size=256

Traceback (most recent call last):
  File "infer_perf/tf2tvm.py", line 69, in <module>
    duration = util.simple_bench(runner, args.size)
  File "/scratch/dl-infer-perf/infer_perf/util.py", line 7, in simple_bench
    runner(data_size)
  File "infer_perf/tf2tvm.py", line 48, in runner
    module.run()
  File "/scratch/tvm/python/tvm/contrib/graph_runtime.py", line 206, in run
    self._run()
  File "/scratch/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /scratch/tvm/build/libtvm.so(TVMFuncCall+0x5f) [0x7f6b518cc54f]
  [bt] (2) /scratch/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f6b5197f5d6]
  [bt] (1) /scratch/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x7f7) [0x7f6b5197f4c7]
  [bt] (0) /scratch/tvm/build/libtvm.so(+0x1af35e2) [0x7f6b5197b5e2]
  File "/scratch/tvm/src/runtime/cuda/cuda_module.cc", line 105
  File "/scratch/tvm/src/runtime/library_module.cc", line 78
TVMError:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------

  Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

onnx->trt fail on mobilenet

root@ubuntu1804-lts-base:/scratch/dev/dev/dl-infer-perf# python3 infer_perf/onnx2trt.py mobilenet
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "infer_perf/onnx2trt.py", line 80, in <module>
    runner = onnx2trt_runner(args.model, batch_size=args.batch)
  File "infer_perf/onnx2trt.py", line 60, in onnx2trt_runner
    context = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

onnx->tvm fails on mobilenet/inception/resnet50

(tvm-onnx-env) root@ubuntu1804-lts-base /s/dl-infer-perf# python infer_perf/onnx2tvm.py resnet50 --batch 64
Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -model=unknown -thread_warp_size=32, workload=('dense_small_batch.cuda', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):
  File "infer_perf/onnx2tvm.py", line 70, in <module>
    backend=args.backend)
  File "infer_perf/onnx2tvm.py", line 42, in onnx2tvm_runner
    module.set_input(input_name, data)
  File "/scratch/tvm/python/tvm/contrib/graph_runtime.py", line 182, in set_input
    v.copyfrom(value)
  File "/scratch/tvm/python/tvm/runtime/ndarray.py", line 147, in copyfrom
    source_array.shape, shape
ValueError: array shape do not match the shape of NDArray (64, 3, 224, 224) vs (1, 3, 224, 224)
(tvm-onnx-env) root@ubuntu1804-lts-base /s/dl-infer-perf# python infer_perf/onnx2tvm.py resnet50 --batch 64
Traceback (most recent call last):
  File "infer_perf/onnx2tvm.py", line 70, in <module>
    backend=args.backend)
  File "infer_perf/onnx2tvm.py", line 38, in onnx2tvm_runner
    lib = relay.build(mod, target, params=params)
  File "/scratch/tvm/python/tvm/relay/build_module.py", line 269, in build
    graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
  File "/scratch/tvm/python/tvm/relay/build_module.py", line 132, in build
    self._build(mod, target, target_host)
  File "/scratch/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /scratch/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::Optimize(tvm::IRModule, tvm::Map<tvm::Integer, tvm::Target, void, void> const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0xeb2) [0x7fa63d6ae172]
  [bt] (7) /scratch/tvm/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x69) [0x7fa63cb27d09]
  [bt] (6) /scratch/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x30b) [0x7fa63cc51f7b]
  [bt] (5) /scratch/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x24e) [0x7fa63cc51ebe]
  [bt] (4) /scratch/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7fa63cc52b07]
  [bt] (3) /scratch/tvm/build/libtvm.so(+0x1827e0d) [0x7fa63d66de0d]
  [bt] (2) /scratch/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fa63d66d0e7]
  [bt] (1) /scratch/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0x1348) [0x7fa63d50be98]
  [bt] (0) /scratch/tvm/build/libtvm.so(+0x16c1a12) [0x7fa63d507a12]
  [bt] (8) /scratch/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x30b) [0x7fa63cc51f7b]
  [bt] (7) /scratch/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x24e) [0x7fa63cc51ebe]
  [bt] (6) /scratch/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7fa63cc52b07]
  [bt] (5) /scratch/tvm/build/libtvm.so(+0x1827e0d) [0x7fa63d66de0d]
  [bt] (4) /scratch/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fa63d66d0e7]
  [bt] (3) /scratch/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0x37a) [0x7fa63d50aeca]
  [bt] (2) /scratch/tvm/build/libtvm.so(tvm::runtime::TypedPackedFunc<bool (tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>(bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const+0x4cc) [0x7fa63d12728c]
  [bt] (1) /scratch/tvm/build/libtvm.so(tvm::relay::ReshapeRel(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)+0x614) [0x7fa63d41f504]
  [bt] (0) /scratch/tvm/build/libtvm.so(+0x15a7b22) [0x7fa63d3edb22]
  File "/scratch/tvm/src/relay/analysis/type_solver.cc", line 624
TVMError:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
  Check failed: false == false: [15:24:46] /scratch/tvm/src/relay/op/tensor/transform.cc:701:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------

  Check failed: oshape_sum == data_shape_sum (2048 vs. 131072) : Input tensor shape and reshaped shape are not compatible