The nos from vkantchev

Website | Docs | Blog | Discord

What is NOS?

NOS (torch-nos) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.

Why use NOS?

👩‍💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.

🔥 What's New

[Jan 2024] ✍️ [blog] Serving LLMs on a budget with SkyServe.
[Jan 2024] 📚 [docs] NOS x SkyPilot Integration page!
[Jan 2024] ✍️ [blog] Getting started with NOS tutorials is available here!
[Dec 2023] 🛝 [repo] We open-sourced the NOS playground to help you get started with more examples built on NOS!

What can NOS do?

💬 Chat / LLM Agents (ChatGPT-as-a-Service)

NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite LLM client.

gRPC API ⚡

REST API

from nos.client import Client

client = Client("[::]:50051")

model = client.Module("meta-llama/Llama-2-7b-chat-hf")
response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)

curl \
-X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "meta-llama/Llama-2-7b-chat-hf",
    "messages": [{"role": "user", "content": "Tell me a story of 1000 words with emojis"}],
    "temperature": 0.7, "stream": true
  }'

🏞️ Image Generation (Stable-Diffusion-as-a-Service)

Build MidJourney discord bots in seconds.

gRPC API ⚡

REST API

from nos.client import Client

client = Client("[::]:50051")

sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],
              width=1024, height=1024, num_images=1)

curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
    "model_id": "stabilityai/stable-diffusion-xl-base-1-0",
    "inputs": {
        "prompts": ["hippo with glasses in a library, cartoon styling"],
        "width": 1024,
        "height": 1024,
        "num_images": 1
    }
}'

🧠 Text & Image Embedding (CLIP-as-a-Service)

Build scalable semantic search of images/videos in minutes.

gRPC API ⚡

REST API

from nos.client import Client

client = Client("[::]:50051")

clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(texts=["fox jumped over the moon"])

curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
    "model_id": "openai/clip-vit-base-patch32",
    "method": "encode_text",
    "inputs": {
        "texts": ["fox jumped over the moon"]
    }
}'

🎙️ Audio Transcription (Whisper-as-a-Service)

Perform real-time audio transcription using Whisper.

gRPC API ⚡

REST API

from pathlib import Path
from nos.client import Client

client = Client("[::]:50051")

model = client.Module("openai/whisper-small.en")
with client.UploadFile(Path("audio.wav")) as remote_path:
  response = model(path=remote_path)
# {"chunks": ...}

curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=openai/whisper-small.en' \
-F '[email protected]'

🧐 Object Detection (YOLOX-as-a-Service)

Run classical computer-vision tasks in 2 lines of code.

gRPC API ⚡

REST API

from pathlib import Path
from nos.client import Client

client = Client("[::]:50051")

model = client.Module("yolox/medium")
response = model(images=[Image.open("image.jpg")])

curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=yolox/medium' \
-F '[email protected]'

⚒️ Custom models

Want to run models not supported by NOS? You can easily add your own models following the examples in the NOS Playground.

📚 Documentation

Tutorials
Quickstart
Models
Concepts: Architecture Overview, ModelSpec, ModelManager, Runtime Environments
Demos: Building a Discord Image Generation Bot, Video Search Demo

📄 License

This project is licensed under the Apache-2.0 License.

📡 Telemetry

NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0.

🤝 Contributing

We welcome contributions! Please see our contributing guide for more information.

🔗 Quick Links

💬 Send us an email at [email protected] or join our Discord for help.
📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.

vkantchev / nos Goto Github PK

nos's Introduction

What is NOS?

Why use NOS?

🔥 What's New

What can NOS do?

💬 Chat / LLM Agents (ChatGPT-as-a-Service)

🏞️ Image Generation (Stable-Diffusion-as-a-Service)

🧠 Text & Image Embedding (CLIP-as-a-Service)

🎙️ Audio Transcription (Whisper-as-a-Service)

🧐 Object Detection (YOLOX-as-a-Service)

⚒️ Custom models

📚 Documentation

📄 License

📡 Telemetry

🤝 Contributing

🔗 Quick Links

nos's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org