Git Product home page Git Product logo

tlcbench's Introduction

TLCBench

Benchmark scripts for TVM

Content

Requirement

Tested with
TVM commit id: 91e07e1f3a7 (Feb. 5, 2021)
mxnet==1.7.0
gluonnlp==0.10.0

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

  • AutoTVM
-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.40 ms             (0.08 ms)
mobilenet_v2       1            1.33 ms             (0.05 ms)
bert               1            31.31 ms            (0.11 ms)
-------------------------------------------------------------
  • AutoScheduler
-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.30 ms             (0.05 ms)
mobilenet_v2       1            0.91 ms             (0.02 ms)
bert               1            16.52 ms            (0.16 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

  • Commands for AutoTVM
python3 benchmark_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest
  • Commands for AutoScheduler
python3 benchmark_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

  • Commands for AutoTVM
python3 benchmark_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest
  • Commands for AutoScheduler
python3 benchmark_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

  • Commands for AutoTVM
# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
  • Commands for AutoScheduler
# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

  • AutoTVM
-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            3.54 ms             (0.02 ms)
mobilenet_v2       1            0.74 ms             (0.00 ms)
bert               1            89.06 ms            (1.22 ms)
-------------------------------------------------------------
  • AutoScheduler
-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            2.90 ms             (0.01 ms)
mobilenet_v2       1            0.57 ms             (0.00 ms)
bert               1            9.95 ms             (0.01 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

  • Commands for AutoTVM
python3 benchmark_autotvm.py --network all --target "cuda -model=t4" --logdir saved_logs/latest
  • Commands for AutoScheduler
python3 benchmark_autoscheduler.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

  • Commands for AutoTVM
python3 benchmark_autotvm.py --network resnet_50 --target "cuda -model=t4" --logdir saved_logs/latest
  • Commands for AutoScheduler
python3 benchmark_autoscheduler.py --network resnet_50 --target "cuda -model=t4"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

  • Commands for AutoTVM
# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autotvm.py --network all --target "cuda -model=t4"
  • Commands for AutoScheduler
# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "cuda -model=t4"

tlcbench's People

Contributors

masahi avatar merrymercy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tlcbench's Issues

LLVM Version

Hi,

I was trying to reproduce the performance results on g4dn.4xlarge instance but got the following error:

~/TLCBench$ python3 benchmark_autoscheduler.py --network bert --target "cuda -model=t4"  --logdir saved_logs/latest
Benchmark bert-B1-float32 ...
python3: /tmp/build/80754af9/llvmdev_1560805331146/work/lib/ExecutionEngine/MCJIT/MCJIT.cpp:204: virtual void llvm::MCJIT::generateCodeForModule(llvm::Module*): Assertion `M->getDataLayout() == getDataLayout() && "DataLayout Mismatch"' failed.
Aborted (core dumped)

My speculation is that the issue might be caused by LLVM version incompatibility. Would you mind sharing the LLVM version that you are using? I have been using LLVM Ver. 10.

Roadmap for a Reproducible TVM Benchmark

Motivation

Currently, TVM lacks an up-to-date and reproducible benchmark. The only benchmark is hosted at tvm/apps/benchmark. However, this benchmark is too old and has several flaws.

  1. The results were obtained 2 years ago.
  2. The deep learning models are old. It does not include new models (e.g., BERT, EfficientNet)
  3. The input format is TVM's internal relay format. It does not use formats from high-level frameworks (e.g., pytorch, mxnet) or open exchange format (e.g., ONNX).
  4. It does not cover Intel CPUs.
  5. It only provides pre-tuned configurations by tophub, but does not provide the scripts to generate these configurations.

This repo aims at building a new open, reproducible bechmark for TVM. When the repo is ready, we can run evaluation nightly and run auto-tuning weekly or monthly.

Approach

As the first step, we target three models, three hardware platforms and four code generation strategies.
To make the comparision with other frameworks easier, we choose ONNX as the input model format.

  • models: resnet-50, mobilenet v2 and BERT with batch size 1
  • hardware platforms: NVIDIA GPU, Intel CPU, ARM CPU
  • code generation strategies: autotvm, auto-scheduler, tvm + manual library, ONNX-runtime.

All logs generated during the auto-tuning should be uploaded for future references.

Roadmap

Task 1: Add autotvm benchmark

reference: the old autotvm benchmark

  • Implement auto-tuning scripts by following the tutorials
  • Implement evaluation scripts by following the old benchmark
  • Use ONNX as input format by following the front end tutorials. You can find models from the onnx model zoo or other reliable source.

Task 2: Add auto-scheduler benchmark

  • Implement auto-tuning scripts by following the tutorials
  • Implement evaluation scripts by following the old autotvm benchmark

Task 3: Add ONNX-runtime benchmark

reference: https://github.com/microsoft/onnxruntime

Task 4: Add tvm + manual library benchmark

reference: https://tvm.apache.org/docs/tutorials/frontend/using_external_lib.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.