AIKit is a quick, easy, and local or cloud-agnostic way to get started to host and deploy large language models (LLMs) for inference. No GPU, internet access or additional tools are needed to get started except for Docker!
AIKit uses LocalAI under-the-hood to run inference. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open-source LLMs powered by AIKit!
๐ For full documentation, please see AIKit website!
- ๐ณ No GPU, Internet access or additional tools needed except for Docker!
- ๐ค Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom distroless-based image
- ๐ Easy to use declarative configuration
- โจ OpenAI API compatible to use with any OpenAI API compatible client
- ๐ธ Multi-modal model support
- ๐ผ๏ธ Image generation support with Stable Diffusion
- ๐ฆ Support for GGUF (
llama
), GPTQ (exllama
orexllama2
), EXL2 (exllama2
), and GGML (llama-ggml
) and Mamba models - ๐ข Kubernetes deployment ready
- ๐ฆ Supports multiple models with a single image
- ๐ฅ๏ธ Supports GPU-accelerated inferencing with NVIDIA GPUs
- ๐ Signed images for
aikit
and pre-made models - ๐ Support for non-proprietary and self-hosted container registries to store model images
You can get started with AIKit quickly on your local machine without a GPU!
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:7b
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-2-7b-chat",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
Output should be similar to:
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
That's it! ๐ API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.
AIKit comes with pre-made models that you can use out-of-the-box!
Model | Optimization | Parameters | Command | License |
---|---|---|---|---|
๐ฆ Llama 2 | Chat | 7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:7b |
Llama 2 |
๐ฆ Llama 2 | Chat | 13B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:13b |
Llama 2 |
๐ฌ Orca 2 | 13B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/orca2:13b |
Microsoft Research | |
Instruct | 8x7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
Apache | |
Instruct | 2.7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi2:2.7b |
MIT |
Model | Optimization | Parameters | Command | License |
---|---|---|---|---|
๐ฆ Llama 2 Chat | Chat | 7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:7b-cuda |
Llama 2 |
๐ฆ Llama 2 Chat | Chat | 13B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:13b-cuda |
Llama 2 |
๐ฌ Orca 2 | 13B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/orca2:13b-cuda |
Microsoft Research | |
Instruct | 8x7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b-cuda |
Apache | |
Instruct | 2.7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi2:2.7b-cuda |
MIT |
๐ For more information and how to create your own images, please see AIKit website.