Git Product home page Git Product logo

amperecomputingai / ampere_model_library Goto Github PK

View Code? Open in Web Editor NEW
17.0 6.0 4.0 1.32 MB

AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)

Home Page: https://hub.docker.com/u/amperecomputingai

License: Apache License 2.0

Python 94.49% Jupyter Notebook 4.61% Shell 0.90%
aarch64 ampere arm64 armv8-a artificial-intelligence computer-vision inference machine-learning mlperf-inference model-zoo natural-language-processing onnxruntime pytorch tensorflow dlrm large-language-models llama2 yolov8

ampere_model_library's Introduction

Ampere AI

Ampere Model Library

CI tests PyTorch pull count TF pull count ORT pull count llama.cpp pull count

AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)

This means we want the library to be quick to set up and to get you numbers you are interested in. On top of that we want the code to be readible and well structured so it's easy to inspect what exactly is being measured. If you feel like we are not exactly there, please let us know right away by raising an issue! Thank you :)

AML setup

Ampere AI solutions Visit our dockerhub for our frameworks selection.

sudo apt update && sudo apt install -y docker.io
sudo docker run --privileged=true -it amperecomputingai/pytorch:latest
# we also offer onnxruntime and tensorflow

You should see terminal output similar to that one:

Ampere docker welcome prompt

Now, inside the Docker container, run:

git clone --recursive https://github.com/AmpereComputingAI/ampere_model_library.git
cd ampere_model_library
bash setup_deb.sh
source set_env_variables.sh

You are good to go! ๐Ÿ‘Œ

Examples

The go-to solution is benchmark.py script

Benchmark script allows you to quickly evaluate performance of your Ampere system on the example of:

  • ResNet-50 v1.5
  • Whisper medium EN
  • DLRM
  • BERT large
  • YOLO v8s

It's incredibly user-friendly and designed to assist you in getting the best out of your system.

After completing setup with Ampere Optimized PyTorch (see AML setup), it's as easy as:

python3 benchmark.py --no-interactive  # remove --no-interactive if you want a quick estimation of performance

Evaluation results

Running particular AI architectures

Architectures are categorized based on the task they were originally envisioned for. Therefore, you will find ResNet and VGG under computer_vision and BERT under natural_language_processing. Usual workflow is to first setup AML (see AML setup), source environment variables by running source set_env_variables.sh and run run.py or similarly named python file in the directory of the achitecture you want to benchmark. Some models require additional setup steps to be completed first, which should be described in their respective directories under README.md files.

ResNet-50 v1.5

ResNet-50 architecture

Note that the example uses PyTorch - we recommend using Ampere Optimized PyTorch for best results (see AML setup).

source set_env_variables.sh
IGNORE_DATASET_LIMITS=1 AIO_IMPLICIT_FP16_TRANSFORM_FILTER=".*" AIO_NUM_THREADS=32 python3 computer_vision/classification/resnet_50_v15/run.py -m resnet50 -p fp32 -b 16 -f pytorch

The command above will run the model utilizing 32 threads, with batch size of 16. Implicit conversion to FP16 datatype will be applied - you can default to fp32 precision by not setting the AIO_IMPLICIT_FP16_TRANSFORM_FILTER variable.

PSA: you can adjust the level of AIO debug messages by setting AIO_DEBUG_MODE to values in range from 0 to 4 (where 0 is the most peaceful)

Whisper tiny EN

Whisper architecture

Note that the example uses PyTorch - we recommend using Ampere Optimized PyTorch for best results (see AML setup).

source set_env_variables.sh
AIO_IMPLICIT_FP16_TRANSFORM_FILTER=".*" AIO_NUM_THREADS=32 python3 speech_recognition/whisper/run.py -m tiny.en

The command above will run the model utilizing 32 threads, implicit conversion to FP16 datatype will be applied - you can default to fp32 precision by not setting the AIO_IMPLICIT_FP16_TRANSFORM_FILTER variable.

LLaMA2 7B

Transformer vs LLaMA

Note that the example uses PyTorch - we recommend using Ampere Optimized PyTorch for best results (see AML setup).

Before running this example you need to be granted access by Meta to LLaMA2 model. Go here: Meta and here: HF to learn more.

source set_env_variables.sh
wget https://github.com/tloen/alpaca-lora/raw/main/alpaca_data.json
AIO_IMPLICIT_FP16_TRANSFORM_FILTER=".*" AIO_NUM_THREADS=32 python3 natural_language_processing/text_generation/llama2/run.py -m meta-llama/Llama-2-7b-chat-hf --dataset_path=alpaca_data.json

The command above will run the model utilizing 32 threads, implicit conversion to FP16 datatype will be applied - you can default to fp32 precision by not setting the AIO_IMPLICIT_FP16_TRANSFORM_FILTER variable.

YOLO v8 large

YOLO object detection

Note that the example uses PyTorch - we recommend using Ampere Optimized PyTorch for best results (see AML setup).

source set_env_variables.sh
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l.pt
AIO_IMPLICIT_FP16_TRANSFORM_FILTER=".*" AIO_NUM_THREADS=32 python3 computer_vision/object_detection/yolo_v8/run.py -m yolov8l.pt -p fp32 -f pytorch

The command above will run the model utilizing 32 threads, implicit conversion to FP16 datatype will be applied - you can default to fp32 precision by not setting the AIO_IMPLICIT_FP16_TRANSFORM_FILTER variable.

BERT large

BERT embeddings

Note that the example uses PyTorch - we recommend using Ampere Optimized PyTorch for best results (see AML setup).

source set_env_variables.sh
wget -O bert_large_mlperf.pt https://zenodo.org/records/3733896/files/model.pytorch?download=1
AIO_IMPLICIT_FP16_TRANSFORM_FILTER=".*" AIO_NUM_THREADS=32 python3 natural_language_processing/extractive_question_answering/bert_large/run_mlperf.py -m bert_large_mlperf.pt -p fp32 -f pytorch

The command above will run the model utilizing 32 threads, implicit conversion to FP16 datatype will be applied - you can default to fp32 precision by not setting the AIO_IMPLICIT_FP16_TRANSFORM_FILTER variable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.