🧺 RAGstack

Deploy a private ChatGPT alternative hosted within your VPC. Connect it to your organization's knowledge base and use it as a corporate oracle. Supports open-source LLMs like Llama 2, Falcon, and GPT4All.

Retrieval Augmented Generation (RAG) is a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other systems and inserting them into the LLM’s context window via a prompt. This gives LLMs information beyond what was provided in their training data, which is necessary for almost every enterprise use case. Examples include data from current web pages, data from SaaS apps like Confluence or Salesforce, and data from documents like sales contracts and PDFs.

RAG works better than fine-tuning the model because it’s cheaper, it’s faster, and it’s more reliable since the source of information is provided with each response.

RAGstack deploys the following resources for retrieval-augmented generation:

Open-source LLM

GPT4All: When you run locally, RAGstack will download and deploy Nomic AI's gpt4all model, which runs on consumer CPUs.
Falcon-7b: On the cloud, RAGstack deploys Technology Innovation Institute's falcon-7b model onto a GPU-enabled GKE cluster.
LLama 2: On the cloud, RAGstack can also deploy the 7B paramter version of Meta's Llama 2 model onto a GPU-enabled GKE cluster.

Vector database

Qdrant: Qdrant is an open-source vector database written in Rust, so it's highly performant and self-hostable.

Server + UI

Simple server and UI that handles PDF upload, so that you can chat over your PDFs using Qdrant and the open-source LLM of choice.

Run locally

Copy ragstack-ui/local.env into ragstack-ui/.env
Run scripts/local/run-dev. This will download ggml-gpt4all-j-v1.3-groovy.bin into server/llm/local/ and run the server, LLM, and Qdrant vector database locally.

All services will be ready once you see the following message:

INFO:     Application startup complete.

Deploy to Google Cloud

To deploy the RAG stack using Falcon-7B running on GPUs to your own google cloud instance, go through the following steps:

Run scripts/gcp/deploy-gcp.sh. This will prompt you for your GCP project ID, service account key file, and region as well as some other parameters (model, HuggingFace token etc).
If you get an error on the Falcon-7B deployment step, run the following commands and then run scripts/gcp/deploy-gcp.sh again:

gcloud config set compute/zone YOUR-REGION-HERE
gcloud container clusters get-credentials gpu-cluster
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The deployment script was implemented using Terraform.

You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ragstack-server instance in your Google Cloud run.

Deploy to AWS

To deploy the RAG stack using Falcon-7B running on GPUs to your own AWS EC2 instances (using ECS), go through the following steps:

Run scripts/aws/deploy-aws.sh. This will prompt you for your AWS credentials as well as some other parameters (model, HuggingFace token etc).

The deployment script was implemented using Terraform.

You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ALB instance.

Deploy to Azure

To deploy the RAG stack using Falcon-7B running on GPUs to your own AKS, go through the following steps:

Run ./azure/deploy-aks.sh. This will prompt you for your AKS subscription as well as some other parameters (model, HuggingFace token etc).

The deployment script was implemented using Terraform.

You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ragstack-server service in your AKS.

Please note that this AKS deployment is using node pool with NVIDIA Tesla T4 Accelerator which is not in all subscriptions available

Roadmap

✅ GPT4all support
✅ Falcon-7b support
✅ Deployment on GCP
✅ Deployment on AWS
✅ Deployment on Azure
🚧 Llama-2-40b support

Credits

The code for containerizing Falcon 7B is from Het Trivedi's tutorial repo. Check out his Medium article on how to dockerize Falcon here!

ludoplex / rag-stack Goto Github PK

rag-stack's Introduction

🧺 RAGstack

Open-source LLM

Vector database

Server + UI

Run locally

Deploy to Google Cloud

Deploy to AWS

Deploy to Azure

Roadmap

Credits

rag-stack's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent