Ask Poddy

Ask Poddy (named after "Poddy", the RunPod mascot) is a user-friendly Retrieval-Augmented Generation (RAG) web application designed to showcase the ease of setting up OpenAI-compatible APIs using open-source models running serverless on RunPod. Built with Next.js, React, Tailwind, Vercel AI SDK, and LangChain, it uses Meta-Llama-3-8B-Instruct for the LLM and multilingual-e5-large-instruct for text embeddings.

This tutorial will guide you through deploying Ask Poddy in your environment, enabling it to answer questions related to RunPod effectively, by leveraging the open-source workers worker-vllm and worker-infinity-embedding.

Concept
Tutorial: Setting Up "Ask Poddy" in Your Environment

Concept

Ask Poddy is designed to demonstrate the integration of serverless OpenAI-compatible APIs with open-source models. The application runs locally (but it could also be deployed into the cloud), while the computational heavy lifting is handled by serverless endpoints on RunPod. This architecture allows seamless use of existing OpenAI-compatible tools and frameworks without needing to develop custom APIs.

Here's how RAG works in Ask Poddy:

User: Asks a question.
Vector Store: The question is sent to LangChain, which uses the worker-infinity-embedding endpoint to convert the question into an embedding using the multilingual-e5-large-instruct model.
Vector Store: Performs a similarity search to find relevant documents based on the question.
AI SDK: The retrieved documents and the user's question are sent to the worker-vllm endpoint.
worker-vllm: Generates an answer using the Meta-Llama-3-8B-Instruct model.
User: Receives the answer.

Tip

You can choose any other LLM that is compatible with vLLM.

Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

git installed
Node.js and npm installed
RunPod account

1. Clone the Repository

Clone the Ask Poddy repository and go into the cloned directory:

git clone https://github.com/blib-la/ask-poddy.git
cd ask-poddy

Clone the RunPod docs repository into ask-poddy/data/runpod-docs.

git clone https://github.com/runpod/docs.git ./data/runpod-docs

Note

The RunPod docs repository contains the RunPod documentation that Ask Poddy will use to answer questions.

Copy the img folder from ./data/runpod-docs/static/img to ./public

Note

This makes it possible for Ask Poddy to include images from the RunPod documentation.

2. Install Dependencies

Navigate to the ask-poddy directory and install the dependencies:

npm install

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

Create two network volumes with 15GB storage each in the same data center as the serverless endpoints.
- Volume for embeddings: infinity_embeddings
- Volume for LLM: vllm_llama3

Note

Using network volumes ensures that the models and embeddings are stored persistently, allowing for faster subsequent requests as the data does not need to be downloaded or recreated each time.

3.2 Worker-vLLM Endpoint

Follow the guide for setting up the vLLM endpoint with worker-vllm: vLLM Setup Guide, but make sure to use the meta-llama/Meta-Llama-3-8B-Instruct model instead of the one mentioned in the guide. And also make sure to select the network volume vllm_llama3 when creating the endpoint.

3.3 Worker-Infinity-Embedding Endpoint

Create a template using this guide: Create Template
Use the Docker image runpod/worker-infinity-embedding:stable-cuda12.1.0 from worker-infinity-embedding and set the environment variable MODEL_NAMES to intfloat/multilingual-e5-large-instruct.
Deploy a serverless endpoint using this guide: Deploy Endpoint. And also make sure to select the network volume infinity_embeddings when creating the endpoint.

4. Configure Environment Variables

Generate your RunPod API key
Find the endpoint IDs underneath the deployed serverless endpoints.

Create your .env.local file with the following variables:

RUNPOD_API_KEY=your_runpod_api_key
RUNPOD_ENDPOINT_ID_VLLM=your_vllm_endpoint_id
RUNPOD_ENDPOINT_ID_EMBEDDING=your_embedding_endpoint_id

5. Populate the Vector Store

To populate the vector store, run the following command:

npm run populate

Note

The first run will take some time as the worker downloads the embeddings model (multilingual-e5-large-instruct). Subsequent requests will use the downloaded model stored in the network volume.

This command reads all markdown documents from the ask-poddy/data/runpod-docs/ folder, creates embeddings using the embedding endpoint running on RunPod, and stores these embeddings in the local vector store:

Documents: The markdown documents from the ask-poddy/data/runpod-docs/ folder are read by LangChain.
Chunks: LangChain converts the documents into smaller chunks, which are then sent to the worker-infinity-embedding endpoint.
worker-infinity-embedding: Receives chunks, generates embeddings using the multilingual-e5-large-instruct model, and sends them back.
Vector Store: LangChain saves these embeddings in the local vector store (HNSWlib).

Tip

A vector store is a database that stores embeddings (vector representations of text) to enable efficient similarity search. This is crucial for the RAG process as it allows the system to quickly retrieve relevant documents based on the user's question.

6. Start the Local Web Server

Start the local web server:

npm run dev

Open http://localhost:3000 to access the UI.

7. Ask Poddy

Now that everything is running, you can ask your RunPod-related question, like:

What is RunPod?
How do I create a serverless endpoint?
What are the benefits of using a network volume?
How can I become a host for the community cloud?
Can RunPod help my startup to get going?

Note

The first run will take some time as the worker downloads the LLM (Meta-Llama-3-8B-Instruct). Subsequent requests will use the downloaded model stored in the network volume.

maglietti / ask-poddy Goto Github PK

ask-poddy's Introduction

Ask Poddy

Concept

Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

3.2 Worker-vLLM Endpoint

3.3 Worker-Infinity-Embedding Endpoint

4. Configure Environment Variables

5. Populate the Vector Store

6. Start the Local Web Server

7. Ask Poddy

ask-poddy's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent