Git Product home page Git Product logo

ask-poddy's Introduction

Ask Poddy

A screenshot of the Ask Poddy web app showing a chat between the user and the AI

Ask Poddy (named after "Poddy", the RunPod mascot) is a user-friendly Retrieval-Augmented Generation (RAG) web application designed to showcase the ease of setting up OpenAI-compatible APIs using open-source models running serverless on RunPod. Built with Next.js, React, Tailwind, Vercel AI SDK, and LangChain, it uses Meta-Llama-3-8B-Instruct for the LLM and multilingual-e5-large-instruct for text embeddings.

This tutorial will guide you through deploying Ask Poddy in your environment, enabling it to answer questions related to RunPod effectively, by leveraging the open-source workers worker-vllm and worker-infinity-embedding.





Concept

Ask Poddy is designed to demonstrate the integration of serverless OpenAI-compatible APIs with open-source models. The application runs locally (but it could also be deployed into the cloud), while the computational heavy lifting is handled by serverless endpoints on RunPod. This architecture allows seamless use of existing OpenAI-compatible tools and frameworks without needing to develop custom APIs.

Here's how RAG works in Ask Poddy:

Diagram showing how the RAG process works

  1. User: Asks a question.
  2. Vector Store: The question is sent to LangChain, which uses the worker-infinity-embedding endpoint to convert the question into an embedding using the multilingual-e5-large-instruct model.
  3. Vector Store: Performs a similarity search to find relevant documents based on the question.
  4. AI SDK: The retrieved documents and the user's question are sent to the worker-vllm endpoint.
  5. worker-vllm: Generates an answer using the Meta-Llama-3-8B-Instruct model.
  6. User: Receives the answer.

Tip

You can choose any other LLM that is compatible with vLLM.



Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

1. Clone the Repository

  1. Clone the Ask Poddy repository and go into the cloned directory:
git clone https://github.com/blib-la/ask-poddy.git
cd ask-poddy
  1. Clone the RunPod docs repository into ask-poddy/data/runpod-docs.
git clone https://github.com/runpod/docs.git ./data/runpod-docs

Note

The RunPod docs repository contains the RunPod documentation that Ask Poddy will use to answer questions.

  1. Copy the img folder from ./data/runpod-docs/static/img to ./public

Note

This makes it possible for Ask Poddy to include images from the RunPod documentation.


2. Install Dependencies

Navigate to the ask-poddy directory and install the dependencies:

npm install

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

  1. Create two network volumes with 15GB storage each in the same data center as the serverless endpoints.
    • Volume for embeddings: infinity_embeddings
    • Volume for LLM: vllm_llama3

Note

Using network volumes ensures that the models and embeddings are stored persistently, allowing for faster subsequent requests as the data does not need to be downloaded or recreated each time.

3.2 Worker-vLLM Endpoint

  1. Follow the guide for setting up the vLLM endpoint with worker-vllm: vLLM Setup Guide, but make sure to use the meta-llama/Meta-Llama-3-8B-Instruct model instead of the one mentioned in the guide. And also make sure to select the network volume vllm_llama3 when creating the endpoint.

3.3 Worker-Infinity-Embedding Endpoint

  1. Create a template using this guide: Create Template
  2. Use the Docker image runpod/worker-infinity-embedding:stable-cuda12.1.0 from worker-infinity-embedding and set the environment variable MODEL_NAMES to intfloat/multilingual-e5-large-instruct.
  3. Deploy a serverless endpoint using this guide: Deploy Endpoint. And also make sure to select the network volume infinity_embeddings when creating the endpoint.

4. Configure Environment Variables

  1. Generate your RunPod API key
  2. Find the endpoint IDs underneath the deployed serverless endpoints.

Screenshot showing the ID of the worker underneath the title

  1. Create your .env.local file with the following variables:
RUNPOD_API_KEY=your_runpod_api_key
RUNPOD_ENDPOINT_ID_VLLM=your_vllm_endpoint_id
RUNPOD_ENDPOINT_ID_EMBEDDING=your_embedding_endpoint_id

5. Populate the Vector Store

To populate the vector store, run the following command:

npm run populate

Note

The first run will take some time as the worker downloads the embeddings model (multilingual-e5-large-instruct). Subsequent requests will use the downloaded model stored in the network volume.

This command reads all markdown documents from the ask-poddy/data/runpod-docs/ folder, creates embeddings using the embedding endpoint running on RunPod, and stores these embeddings in the local vector store:

Diagram showing how the vector store gets populated with documents

  1. Documents: The markdown documents from the ask-poddy/data/runpod-docs/ folder are read by LangChain.
  2. Chunks: LangChain converts the documents into smaller chunks, which are then sent to the worker-infinity-embedding endpoint.
  3. worker-infinity-embedding: Receives chunks, generates embeddings using the multilingual-e5-large-instruct model, and sends them back.
  4. Vector Store: LangChain saves these embeddings in the local vector store (HNSWlib).

Tip

A vector store is a database that stores embeddings (vector representations of text) to enable efficient similarity search. This is crucial for the RAG process as it allows the system to quickly retrieve relevant documents based on the user's question.


6. Start the Local Web Server

  1. Start the local web server:
npm run dev
  1. Open http://localhost:3000 to access the UI.

7. Ask Poddy

Now that everything is running, you can ask your RunPod-related question, like:

  • What is RunPod?
  • How do I create a serverless endpoint?
  • What are the benefits of using a network volume?
  • How can I become a host for the community cloud?
  • Can RunPod help my startup to get going?

Note

The first run will take some time as the worker downloads the LLM (Meta-Llama-3-8B-Instruct). Subsequent requests will use the downloaded model stored in the network volume.

ask-poddy's People

Contributors

timpietrusky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.