This repository contains below libraries. They can be installed independent of each other.
- ort_moe: Mixture of Experts implementation in PyTorch
- torch_ort: ONNX Runtime package that accelerates PyTorch models
- torch_ort_infer: ONNX Runtime package that accelerates inference for PyTorch models
Mixture of Experts layer implementation is available in the ort_moe folder.
- ort_moe/docs/moe.md provides brief overview of the implementation.
- A simple MoE tutorial is provided here.
- Note: ONNX Runtime (following pre-requisites) is not required to run the MoE layer. It is intergrated in stand-alone Pytorch.
cd ort_moe
pip install build # Install PyPA build
python -m build
ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.
This repository contains the source code for the package, as well as instructions for running the package.
You need a machine with at least one NVIDIA or AMD GPU to run ONNX Runtime for PyTorch.
You can install and run torch-ort in your local environment, or with Docker.
By default, torch-ort depends on PyTorch 1.9.0, ONNX Runtime 1.9.0 and CUDA 10.2.
-
Install CUDA 10.2
-
Install CuDNN 7.6
-
Install torch-ort
pip install torch-ort
-
Run post-installation script for ORTModule
python -m torch_ort.configure
Get install instructions for other combinations in the Get Started Easily
section at https://www.onnxruntime.ai/ under the Optimize Training
tab.
-
Clone this repo
git clone [email protected]:pytorch/ort.git
-
Install extra dependencies
pip install wget pandas sklearn transformers
-
Run the training script
python ./ort/tests/bert_for_sequence_classification.py
from torch_ort import ORTModule
model = ORTModule(model)
# PyTorch training script follows
import torch
from torch_ort.optim import FusedAdam
class NeuralNet(torch.nn.Module):
...
# Only supports GPU Currently.
device = "cuda"
model = NeuralNet(...).to(device)
ort_fused_adam_optimizer = FusedAdam(
model.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=0.01, eps=1e-8
)
loss = model(...).sum()
loss.backward()
ort_fused_adam_optimizer.step()
ort_fused_adam_optimizer.zero_grad()
For detailed documentation see FusedAdam
For a full working example see FusedAdam Test Example
import torch
from torch.utils.data import DataLoader
from torch_ort.utils.data import LoadBalancingDistributedSampler
class MyDataset(torch.utils.data.Dataset):
...
def collate_fn(data):
...
return samples, label_list
samples = [...]
labels = [...]
dataset = MyDataset(samples, labels)
data_sampler = sampler.LoadBalancingDistributedSampler(
dataset, complexity_fn=complexity_fn, world_size=2, rank=0, shuffle=False
)
train_dataloader = DataLoader(dataset, batch_size=2, sampler=data_sampler, collate_fn=collate_fn)
for batched_data, batched_label in train_dataloader:
optimizer.zero_grad()
loss = loss_fn(model(batched_data) , batched_labels)
loss.backward()
optimizer.step()
For detailed documentation see LoadBalancingDistributedSampler
For a full working example see LoadBalancingDistributedSampler Test Example
To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.
ONNX Runtime for PyTorch is now extended to support PyTorch model inference using ONNX Runtime.
It is available via the torch-ort-infer python package. This preview package enables OpenVINO™ Execution Provider for ONNX Runtime by default for accelerating inference on various Intel® CPUs, Intel® integrated GPUs, and Intel® Movidius™ Vision Processing Units - referred to as VPU.
This repository contains the source code for the package, as well as instructions for running the package.
-
Ubuntu 18.04, 20.04
-
Python* 3.7, 3.8 or 3.9
By default, torch-ort-infer depends on PyTorch 1.12 and ONNX Runtime OpenVINO EP 1.12.
-
Install torch-ort-infer with OpenVINO dependencies.
pip install torch-ort-infer[openvino]
-
Run post-installation script
python -m torch_ort.configure
To see OpenVINO™ integration with Torch-ORT in action, see demos, which shows you how to run inference on some of the most popular Deep Learning models.
from torch_ort import ORTInferenceModule
model = ORTInferenceModule(model)
# PyTorch inference script follows
Users can configure different options for a given Execution Provider to run inference. As an example, OpenVINO™ Execution Provider options can be configured as shown below:
from torch_ort import ORTInferenceModule, OpenVINOProviderOptions
provider_options = OpenVINOProviderOptions(backend = "GPU", precision = "FP16")
model = ORTInferenceModule(model, provider_options = provider_options)
# PyTorch inference script follows
If no provider options are specified by user, OpenVINO™ Execution Provider is enabled with following options by default:
backend = "CPU"
precision = "FP32"
For more details on APIs, see usage.md.
Experimental support on Intel® MyriadX VPU in this preview.
Below is an example of how you can leverage OpenVINO™ integration with Torch-ORT in a simple NLP usecase. A pretrained BERT model fine-tuned on the CoLA dataset from HuggingFace model hub is used to predict grammar correctness on a given input text.
from transformers
import AutoTokenizer, AutoModelForSequenceClassification
import numpy as np
from torch_ort import ORTInferenceModule
tokenizer = AutoTokenizer.from_pretrained(
"textattack/bert-base-uncased-CoLA")
model = AutoModelForSequenceClassification.from_pretrained(
"textattack/bert-base-uncased-CoLA")
# Wrap model in ORTInferenceModule to prepare the model for inference using OpenVINO Execution Provider on CPU
model = ORTInferenceModule(model)
text = "Replace me any text by you'd like ."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
# Post processing
logits = output.logits
logits = logits.detach().cpu().numpy()
# predictions
pred = np.argmax(logits, axis=1).flatten()
print("Grammar correctness label (0=unacceptable, 1=acceptable)")
print(pred)
This project has an MIT license, as found in the LICENSE file.