Git Product home page Git Product logo

ludoplex / rag-stack Goto Github PK

View Code? Open in Web Editor NEW

This project forked from psychic-api/rag-stack

1.0 0.0 0.0 128.96 MB

๐Ÿค– Deploy a private ChatGPT alternative hosted within your VPC. ๐Ÿ”ฎ Connect it to your organization's knowledge base and use it as a corporate oracle. Supports open-source LLMs like Llama 2, Falcon, and GPT4All.

Home Page: https://www.chatmyfiles.com/

License: MIT License

Shell 2.77% JavaScript 1.86% Python 34.79% PowerShell 1.53% TypeScript 45.27% CSS 0.67% Makefile 0.14% HTML 0.79% HCL 10.81% Dockerfile 1.10% ASP.NET 0.29%

rag-stack's Introduction

๐Ÿงบ RAGstack

Deploy a private ChatGPT alternative hosted within your VPC. Connect it to your organization's knowledge base and use it as a corporate oracle. Supports open-source LLMs like Llama 2, Falcon, and GPT4All.

Discord Issues Twitter

Retrieval Augmented Generation (RAG) is a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other systems and inserting them into the LLMโ€™s context window via a prompt. This gives LLMs information beyond what was provided in their training data, which is necessary for almost every enterprise use case. Examples include data from current web pages, data from SaaS apps like Confluence or Salesforce, and data from documents like sales contracts and PDFs.

RAG works better than fine-tuning the model because itโ€™s cheaper, itโ€™s faster, and itโ€™s more reliable since the source of information is provided with each response.

RAGstack deploys the following resources for retrieval-augmented generation:

Open-source LLM

  • GPT4All: When you run locally, RAGstack will download and deploy Nomic AI's gpt4all model, which runs on consumer CPUs.

  • Falcon-7b: On the cloud, RAGstack deploys Technology Innovation Institute's falcon-7b model onto a GPU-enabled GKE cluster.

  • LLama 2: On the cloud, RAGstack can also deploy the 7B paramter version of Meta's Llama 2 model onto a GPU-enabled GKE cluster.

Vector database

  • Qdrant: Qdrant is an open-source vector database written in Rust, so it's highly performant and self-hostable.

Server + UI

Simple server and UI that handles PDF upload, so that you can chat over your PDFs using Qdrant and the open-source LLM of choice. Screenshot 2023-08-02 at 9 22 27 PM

Run locally

  1. Copy ragstack-ui/local.env into ragstack-ui/.env
  2. Run scripts/local/run-dev. This will download ggml-gpt4all-j-v1.3-groovy.bin into server/llm/local/ and run the server, LLM, and Qdrant vector database locally.

All services will be ready once you see the following message:

INFO:     Application startup complete.

Deploy to Google Cloud

To deploy the RAG stack using Falcon-7B running on GPUs to your own google cloud instance, go through the following steps:

  1. Run scripts/gcp/deploy-gcp.sh. This will prompt you for your GCP project ID, service account key file, and region as well as some other parameters (model, HuggingFace token etc).
  2. If you get an error on the Falcon-7B deployment step, run the following commands and then run scripts/gcp/deploy-gcp.sh again:
gcloud config set compute/zone YOUR-REGION-HERE
gcloud container clusters get-credentials gpu-cluster
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The deployment script was implemented using Terraform.

  1. You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ragstack-server instance in your Google Cloud run.

Deploy to AWS

To deploy the RAG stack using Falcon-7B running on GPUs to your own AWS EC2 instances (using ECS), go through the following steps:

  1. Run scripts/aws/deploy-aws.sh. This will prompt you for your AWS credentials as well as some other parameters (model, HuggingFace token etc).

The deployment script was implemented using Terraform.

  1. You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ALB instance.

Deploy to Azure

To deploy the RAG stack using Falcon-7B running on GPUs to your own AKS, go through the following steps:

  1. Run ./azure/deploy-aks.sh. This will prompt you for your AKS subscription as well as some other parameters (model, HuggingFace token etc).

The deployment script was implemented using Terraform.

  1. You can run the frontend by creating a .env file in ragstack-ui and setting VITE_SERVER_URL to the url of the ragstack-server service in your AKS.

Please note that this AKS deployment is using node pool with NVIDIA Tesla T4 Accelerator which is not in all subscriptions available

Roadmap

  • โœ… GPT4all support
  • โœ… Falcon-7b support
  • โœ… Deployment on GCP
  • โœ… Deployment on AWS
  • โœ… Deployment on Azure
  • ๐Ÿšง Llama-2-40b support

Credits

The code for containerizing Falcon 7B is from Het Trivedi's tutorial repo. Check out his Medium article on how to dockerize Falcon here!

rag-stack's People

Contributors

ambardhesi avatar ayan-bandyopadhyay avatar glucaci avatar jasonwcfan avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.