autonomi-ai / nos Goto Github PK

View Code? Open in Web Editor NEW

125.0 1.0 10.0 16.85 MB

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

Home Page: https://docs.nos.run/

License: Apache License 2.0

Makefile 1.79% Python 97.60% Shell 0.29% Jinja 0.32%

computer-vision inference inference-acceleration machine-learning generative-ai llm-inference

nos's People

Contributors

Stargazers

Watchers

Forkers

spillai ai-jie01 tbliu outtanames kousun12 shaneholloman vkantchev jimburtoft ml4web vasu-kukkapalli

nos's Issues

Inconsistent connection times with InferenceClient("localhost:50051") and `InferenceClient("[::]:50051")`

InferenceClient("localhost:50051") is taking significantly longer to establish a connection with WaitForServer() taking over 30+ seconds. The defaults InferenceClient("[::]:50051") however are instantaneous (< 1ms).

cc @mkornacker

PyPI project registration

Typed prediction outputs (beyond just a dict) for all models

#128 #132

Complete test coverage for docker runtime and CLI.

[ci] Platform support (Windows, MacOS Intel, MacOS M1)

Github workflow CI: Support platforms: Windows, MacOS for basic models (SD v2, CLIP)

Custom model registration flow

CI nightly builds

`GetModelInfo()` to query prediction output type (resolution, IMG2BBOX etc.)

Should provide any metadata Marcel needs for writing a UDF.

Build GPU docker base images with cuda-base

pytest decorators for benchmarking/testing within conda runtime/environment

Similar to #17 but for conda environments

`.whl` file for nos client

Docstrings with mkdocs

Documentation should be inline.

Investigate `accelerate` dependencies with several `nvidia-*` packages

Currently we create a large docker image (11GB) for the base gpu image

Docker image optimization with multi-stage builds, and cuda base

Interrupting grpc server during model download results in corrupted pth file. Maybe add a checksum as well?

REST/gRPC gateway interface for inference service

Use gRPC gateway to serve REST API via reverse proxy.

https://grpc-ecosystem.github.io/grpc-gateway/
Add buf support with openapi v2 integration
https://blog.logrocket.com/guide-to-grpc-gateway/

References:

Sentry Telemetry

Persist model checkpoints for stable diffusion and efficientdet to avoid slow downloads during local development

Github actions for building dockerfiles

pytest decorators for benchmarking/testing within docker runtime

Consider https://github.com/avast/pytest-docker and dockerized task decorator

CI scaffolding for benchmarks

Benchmark decorators with separate make test-benchmark target

Public MkDocs for nos

Create publicly accessible MkDocs for nos

MacOS support (with tests for M1 Mac, Intel Mac)

`hub.register` decorator with a full model deployment spec (compile, run)

@hub.register(
    name="<org_name>/detection2d-detr-resnet-50", 
    build_spec=DevelopmentConfig(
        conda="autonomi-ai/nos-base-dev",
	resources=ResourceConfig(cpu=8, memory="8Gi", gpu=0.25, gpu_memory="4Gi"), # runtime resource
    ),
    runtime_spec=RuntimeConfig(
        conda="autonomi-ai/nos-base-runtime", # runtime conda env
	resources=ResourceConfig(cpu=2, memory="4Gi", gpu=0.25, gpu_memory="4Gi"), # runtime resources
    ),
)

`nos serve`: Serve optimized `nos` model by name

nos serve -m stability-ai/stable-diffusion-v2: Serve optimized nos model (blocking)
nos serve -d stability-ai/stable-diffusion-v2: Serve optimized `nos model (daemon/detached)
nos serve -c deployment.yml: Serve collection of models (blocking)
nos serve -d -c deployment.yml: Serve collection of models (daemon/detached)

Github actions for pytest

[cli] Harden `hub system info` for different platforms`

Platforms: Mac, Windows
Modes: host, conda, docker (or docker-in-docker),

CI triggers for e2e, client, server etc. to run associated test

Add dockerfile and github action to test building conda environments

Docker can take path to env as an argument

Distribute generated `*_pb2.py` files with wheel file

Avoid having to generate the pb2 files by the client and instead generate pb files on make dist and add it to the wheel file.

nos CLI command for listing runtime libraries/versions

We'd need this for users when reporting bugs and also for internal benchmarking purposes.

pytest test data (video, image)

`pyproject.toml`, `precommit` scaffolding

Subclass InferenceServiceRuntime under DockerRuntime

Reduce bloat, move init, ready, id etc into the subclass. Right now we just have an inference runtime, but future releases might include runtimes for benchmarking, compilation etc.

FastRCNN grpc inference server spec

nos-gpu on Docker hub

Support "auto" devices to automatically infer gpu/cpu/device resources from `ModelSpec`

Investigate model weights diffs for faster `hub.load(...)`

If we're able to build checksums for layer-wise weights, we should be able to only download the diffs and speed up model downloads significantly. This is particularly helpful if you're fine-tuning models (especially the last few layers, or changing only parts of an ensemble model).

Potentially consider W&B report integration with CI tests
Related to #5

task
model_id (e.g. huggingface/CLIP_ViT_base_32)
model inference signature
model URL?
input resolution
batch size?
default model for task?

Push `whl` and Docker image on NOS version bump

nos-cpu on Docker hub

#97