TLCBench

Benchmark scripts for TVM

Content

Requirement
Intel CPU
NVIDIA GPU

Requirement

Tested with
TVM commit id: 91e07e1f3a7 (Feb. 5, 2021)
mxnet==1.7.0
gluonnlp==0.10.0

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.40 ms             (0.08 ms)
mobilenet_v2       1            1.33 ms             (0.05 ms)
bert               1            31.31 ms            (0.11 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.30 ms             (0.05 ms)
mobilenet_v2       1            0.91 ms             (0.02 ms)
bert               1            16.52 ms            (0.16 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            3.54 ms             (0.02 ms)
mobilenet_v2       1            0.74 ms             (0.00 ms)
bert               1            89.06 ms            (1.22 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            2.90 ms             (0.01 ms)
mobilenet_v2       1            0.57 ms             (0.00 ms)
bert               1            9.95 ms             (0.01 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "cuda -model=t4"  --logdir saved_logs/latest

Tuning

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autotvm.py --network all --target "cuda -model=t4"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "cuda -model=t4"

Roadmap for a Reproducible TVM Benchmark

Motivation

Currently, TVM lacks an up-to-date and reproducible benchmark. The only benchmark is hosted at tvm/apps/benchmark. However, this benchmark is too old and has several flaws.

The results were obtained 2 years ago.
The deep learning models are old. It does not include new models (e.g., BERT, EfficientNet)
The input format is TVM's internal relay format. It does not use formats from high-level frameworks (e.g., pytorch, mxnet) or open exchange format (e.g., ONNX).
It does not cover Intel CPUs.
It only provides pre-tuned configurations by tophub, but does not provide the scripts to generate these configurations.

This repo aims at building a new open, reproducible bechmark for TVM. When the repo is ready, we can run evaluation nightly and run auto-tuning weekly or monthly.

Approach

As the first step, we target three models, three hardware platforms and four code generation strategies.
To make the comparision with other frameworks easier, we choose ONNX as the input model format.

models: resnet-50, mobilenet v2 and BERT with batch size 1
hardware platforms: NVIDIA GPU, Intel CPU, ARM CPU
code generation strategies: autotvm, auto-scheduler, tvm + manual library, ONNX-runtime.

All logs generated during the auto-tuning should be uploaded for future references.

Roadmap

Task 1: Add autotvm benchmark

reference: the old autotvm benchmark

Implement auto-tuning scripts by following the tutorials
Implement evaluation scripts by following the old benchmark
Use ONNX as input format by following the front end tutorials. You can find models from the onnx model zoo or other reliable source.

Task 2: Add auto-scheduler benchmark

Implement auto-tuning scripts by following the tutorials
Implement evaluation scripts by following the old autotvm benchmark

Task 3: Add ONNX-runtime benchmark

reference: https://github.com/microsoft/onnxruntime

Task 4: Add tvm + manual library benchmark

reference: https://tvm.apache.org/docs/tutorials/frontend/using_external_lib.html

tlc-pack / tlcbench Goto Github PK

tlcbench's Introduction

TLCBench

Content

Requirement

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

Benchmark All Networks

Benchmark One Network

Tuning

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

Benchmark All Networks

Benchmark One Network

Tuning

tlcbench's People

Contributors

Stargazers

Watchers

Forkers

tlcbench's Issues

Motivation

Approach

Roadmap

Task 1: Add autotvm benchmark

Task 2: Add auto-scheduler benchmark

Task 3: Add ONNX-runtime benchmark

Task 4: Add tvm + manual library benchmark

Recommend Projects

Recommend Topics

Recommend Org