Website | Docs | Blog | Discord
NOS (torch-nos
) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.
- 👩💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- 🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
- 🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
- 🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
- 📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
- ⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
- ☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.
- [Jan 2024] ✍️ [blog] Serving LLMs on a budget with SkyServe.
- [Jan 2024] 📚 [docs] NOS x SkyPilot Integration page!
- [Jan 2024] ✍️ [blog] Getting started with NOS tutorials is available here!
- [Dec 2023] 🛝 [repo] We open-sourced the NOS playground to help you get started with more examples built on NOS!
NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite LLM client.
gRPC API ⚡ | REST API |
from nos.client import Client
client = Client("[::]:50051")
model = client.Module("meta-llama/Llama-2-7b-chat-hf")
response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True) |
curl \
-X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"messages": [{"role": "user", "content": "Tell me a story of 1000 words with emojis"}],
"temperature": 0.7, "stream": true
}' |
Build MidJourney discord bots in seconds.
gRPC API ⚡ | REST API |
from nos.client import Client
client = Client("[::]:50051")
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],
width=1024, height=1024, num_images=1) |
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["hippo with glasses in a library, cartoon styling"],
"width": 1024,
"height": 1024,
"num_images": 1
}
}' |
Build scalable semantic search of images/videos in minutes.
gRPC API ⚡ | REST API |
from nos.client import Client
client = Client("[::]:50051")
clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(texts=["fox jumped over the moon"]) |
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip-vit-base-patch32",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}' |
Perform real-time audio transcription using Whisper.
gRPC API ⚡ | REST API |
from pathlib import Path
from nos.client import Client
client = Client("[::]:50051")
model = client.Module("openai/whisper-small.en")
with client.UploadFile(Path("audio.wav")) as remote_path:
response = model(path=remote_path)
# {"chunks": ...} |
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=openai/whisper-small.en' \
-F '[email protected]' |
Run classical computer-vision tasks in 2 lines of code.
gRPC API ⚡ | REST API |
from pathlib import Path
from nos.client import Client
client = Client("[::]:50051")
model = client.Module("yolox/medium")
response = model(images=[Image.open("image.jpg")]) |
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=yolox/medium' \
-F '[email protected]' |
Want to run models not supported by NOS? You can easily add your own models following the examples in the NOS Playground.
- Tutorials
- Quickstart
- Models
- Concepts: Architecture Overview, ModelSpec, ModelManager, Runtime Environments
- Demos: Building a Discord Image Generation Bot, Video Search Demo
This project is licensed under the Apache-2.0 License.
NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0
.
We welcome contributions! Please see our contributing guide for more information.
- 💬 Send us an email at [email protected] or join our Discord for help.
- 📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.
<style> .md-typeset h1, .md-content__button { display: none; } </style>