autonomi-ai / nos Goto Github PK
View Code? Open in Web Editor NEW⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
Home Page: https://docs.nos.run/
License: Apache License 2.0
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
Home Page: https://docs.nos.run/
License: Apache License 2.0
InferenceClient("localhost:50051")
is taking significantly longer to establish a connection with WaitForServer()
taking over 30+ seconds. The defaults InferenceClient("[::]:50051")
however are instantaneous (< 1ms).
cc @mkornacker
Should provide any metadata Marcel needs for writing a UDF.
Similar to #17 but for conda environments
Documentation should be inline.
Currently we create a large docker image (11GB) for the base gpu image
Use gRPC gateway to serve REST API via reverse proxy.
buf
support with openapi v2 integrationReferences:
Benchmark decorators with separate make test-benchmark
target
Create publicly accessible MkDocs for nos
Register models as part of the nos hub registry, with full build-time and runtime spec.
@hub.register(
name="<org_name>/detection2d-detr-resnet-50",
build_spec=DevelopmentConfig(
conda="autonomi-ai/nos-base-dev",
resources=ResourceConfig(cpu=8, memory="8Gi", gpu=0.25, gpu_memory="4Gi"), # runtime resource
),
runtime_spec=RuntimeConfig(
conda="autonomi-ai/nos-base-runtime", # runtime conda env
resources=ResourceConfig(cpu=2, memory="4Gi", gpu=0.25, gpu_memory="4Gi"), # runtime resources
),
)
nos serve -m stability-ai/stable-diffusion-v2
: Serve optimized nos
model (blocking)nos serve -d stability-ai/stable-diffusion-v2
: Serve optimized `nos model (daemon/detached)nos serve -c deployment.yml
: Serve collection of models (blocking)nos serve -d -c deployment.yml
: Serve collection of models (daemon/detached)Docker can take path to env as an argument
Avoid having to generate the pb2 files by the client and instead generate pb files on make dist
and add it to the wheel file.
We'd need this for users when reporting bugs and also for internal benchmarking purposes.
Reduce bloat, move init, ready, id etc into the subclass. Right now we just have an inference runtime, but future releases might include runtimes for benchmarking, compilation etc.
If we're able to build checksums for layer-wise weights, we should be able to only download the diffs and speed up model downloads significantly. This is particularly helpful if you're fine-tuning models (especially the last few layers, or changing only parts of an ensemble model).
Currently hitting dependency issues when trying to install nos in the base nos environment (cloudpickle, grpc, av etc.). Need to resolve these before v0.1.
We want to be able to query for model info from the cli, including:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.