Git Product home page Git Product logo

awesome-local-ai's Introduction

If you tried Jan Desktop and liked it, please also check out the following awesome collection of open source and/or local AI tools and solutions.

Your contributions are always welcome!

Lists

Inference Engine

Repository Description Supported model formats CPU/GPU Support UI language Platform Type
llama.cpp - Inference of LLaMA model in pure C/C++ GGML/GGUF Both C/C++ Text-Gen
Nitro - 3MB inference engine embeddable in your apps. Uses Llamacpp and more Both Both Text-Gen
ollama - CLI and local server. Uses Llamacpp Both Both Text-Gen
koboldcpp - A simple one-file way to run various GGML models with KoboldAI's UI GGML Both C/C++ Text-Gen
LoLLMS - Lord of Large Language Models Web User Interface. Nearly ALL Both Python Text-Gen
ExLlama - A more memory-efficient rewrite of the HF transformers implementation of Llama AutoGPTQ/GPTQ GPU Python/C++ Text-Gen
vLLM - vLLM is a fast and easy-to-use library for LLM inference and serving. GGML/GGUF Both Python Text-Gen
SGLang - 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse) Safetensor / AWQ / GPTQ GPU Python Text-Gen
LmDeploy - LMDeploy is a toolkit for compressing, deploying, and serving LLMs. Pytorch / Turbomind Both Python/C++ Text-Gen
Tensorrt-llm - Inference efficiently on NVIDIA GPUs Python / C++ runtimes Both Python/C++ Text-Gen
CTransformers - Python bindings for the Transformer models implemented in C/C++ using GGML library GGML/GPTQ Both C/C++ Text-Gen
llama-cpp-python - Python bindings for llama.cpp GGUF Both Python Text-Gen
llama2.rs - A fast llama2 decoder in pure Rust GPTQ CPU Rust Text-Gen
ExLlamaV2 - A fast inference library for running LLMs locally on modern consumer-class GPUs GPTQ/EXL2 GPU Python/C++ Text-Gen
LoRAX - Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs Safetensor / AWQ / GPTQ GPU Python/Rust Text-Gen
text-generation-inference - Inference serving toolbox with optimized kernels for each LLM architecture Safetensors / AWQ / GPTQ Both Python/Rust Text-Gen

Inference UI

  • oobabooga - A Gradio web UI for Large Language Models.
  • LM Studio - Discover, download, and run local LLMs.
  • LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
  • FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
  • faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
  • GPT4All - A free-to-use, locally running, privacy-aware chatbot.
  • LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
  • LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
  • LLM as a Chatbot Service - LLM as a Chatbot Service.
  • FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs).
  • Automatic1111 - Stable Diffusion web UI.
  • ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
  • Wordflow - Run, share, and discover AI prompts in your browsers
  • petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
  • ChatUI - Open source codebase powering the HuggingChat app.
  • AI-Mask - Browser extension to provide model inference to web apps. Backed by web-llm and transformers.js
  • everything-rag - Interact with (virtually) any LLM on Hugging Face Hub with an asy-to-use, 100% local Gradio chatbot.
  • LmScript - UI for SGLang and Outlines

Platforms / full solutions

  • H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform.
  • BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.
  • Predibase - Serverless LoRA Fine-Tuning and Serving for LLMs.

Developer tools

  • Jan Framework - At its core, Jan is a cross-platform, local-first and AI native application framework that can be used to build anything.
  • Pinecone - Long-Term Memory for AI.
  • PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
  • Datature - The All-in-One Platform to Build and Deploy Vision AI.
  • One AI - MAKING GENERATIVE AI BUSINESS-READY.
  • Gooey.AI - Create Your Own No Code AI Workflows.
  • Mixo.io - AI website builder.
  • Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
  • GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
  • Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
  • LangChain - A framework for developing applications powered by language models.
  • gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
  • LMQL - LMQL is a query language for large language models.
  • LlamaIndex - A data framework for building LLM applications over external data.
  • Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
  • trypromptly - Create AI Apps & Chatbots in Minutes.
  • BentoML - BentoML is the platform for software engineers to build AI products.
  • LiteLLM - Call all LLM APIs using the OpenAI format.

User Tools

  • llmcord.py - Discord LLM Chatbot - Talk to LLMs with your friends!

Agents

  • SuperAGI - Opensource AGI Infrastructure.
  • Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
  • BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
  • AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
  • HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
  • AI Agents - AI Agent that Power Up Your Productivity.
  • AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
  • GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
  • GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
  • MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.
  • Open Interpreter - Let language models run code. Have your agent write and execute code.
  • CrewAI - Cutting-edge framework for orchestrating role-playing, autonomous AI agents.

Training

  • FastChat - An open platform for training, serving, and evaluating large language models.
  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • BMTrain - Efficient Training for Big Models.
  • Alpa - Alpa is a system for training and serving large-scale neural networks.
  • Megatron-LM - Ongoing research training transformer models at scale.
  • Ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models.
  • Nanotron - Minimalistic large language model 3D-parallelism training.
  • TRL - Language model alignment with reinforcement learning.
  • PEFT - Parameter efficient fine-tuning (LoRA, DoRA, model merger and more)

LLM Leaderboard

Research

  • Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
  • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
  • Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
  • Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
  • LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
  • Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
  • TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
  • Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
  • An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
  • Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
  • ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
  • TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Community

awesome-local-ai's People

Contributors

0xsage avatar astrabert avatar gintasz avatar henryh0x1 avatar lucasavila00 avatar merveenoyan avatar mikebirdtech avatar mralaminh avatar osanseviero avatar pacoccino avatar tgaddair avatar vince-lam avatar whoabuddy avatar xiaohk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-local-ai's Issues

Alphabetize lists

When adding a new entry I put it on the bottom by default, but I think the lists would be easier to read if alphabetized.

RWKV in the list

Hi,
rwkv.cpp allows you to quantize all the model under the RWKV family directly on your PC CPU or GPU
inference is both with CPU or GPU
RWKV-Runner is the full LMStudio alike solution for 1 click use of these best multi-languages LLM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.