Llama Langchain RAG Project

Course: CSCI-GA.2565
Institution: New York University
Term: Spring 2024

Overview

The Llama Langchain RAG project is an application designed specifically for fans of the beloved sitcom Friends for fun. Using the power of Retrieval-Augmented Generation (RAG) combined with a Language Model (LLM), this project employs LLaMA 2, fine-tuned with Lora technique using Replicate to provide detailed, contextually accurate answers to complex queries related to content, plot, and characters. The app is deployed using Streamlit, includes session chat history, and provides an option to select multiple LLaMA2 API endpoints on Replicate.

Try our app: friends-rag.streamlit.app/

Sample queries you can use: evaluation.txt

Note on Model Initialization: The first prediction request from fine-tuned models like "Finetuned LLaMA2" and "Finetuned LLaMA2 with RAG" will take longer (expect 3 to 5 minutes) after a period of inactivity due to a "cold boot," where the model needs to be fetched and loaded. Subsequent requests will respond much quicker. More details on cold boots can be found here.

Note: This is the production version of the application and is optimized for deployment. Running it locally may require modifications to suit the development environment.

Getting Started

Prerequisites

Relative API key(s) (optional; e.g. for embedding model)
Python 3.11 or higher
Git Large File Storage (LFS) for handling large datasets and model files

Installation

Install dependencies.
- [Optional but recommended]
  - Create a virtual python environment with
```
   python -m venv .venv
```
  - Activate it with
```
   source .venv/bin/activate
```
- Install dependencies with
```
   pip install -r requirements.txt
```
Create the Chroma DB:

python populate_database.py

Setup before being able to do inference:
- Case 1: If you choose to run the base Llama 2 model locally, you'll need to have Ollama installed and run ollama serve in a seperate terminal.
- Case 2: If you choose to do inference with replicate with our models locally, you'll need to have REPLICATE_API_TOKEN setup as an environment variable.
- Case 3: You can simply test run our deployed project on streamlit: friends-rag.streamlit.app.
Test run to query the Chroma DB, the below command will return an output based on RAG and the selected model:

python query_data.py "Which role does Adam Goldberg plays?"

Start the App locally:

streamlit run app.py

In case the file size exceeds Github's recommended maximum file size of 50.00 MB, you may need to use Git Large File Storage.

Configuration & Features:

Finetuning usually involves using a domain related dataset. In this project, we decided to curate our own (Question-Answer) pairs dataset for finetuning and RAG.
Domain-related files (txt and jsonl) are stored in the data folder, such as trivia.txt and s1_s2.jsonl. Using Langchain, a vector database was created in chroma folder based on the data for RAG. More content could be added as needed.
The front-end and deployment is implemented with Streamlit.
Option to select between differnet Llama2 chat API endpoints (base LLaMA2, finetuned LLaMA2, base with RAG, finetuned with RAG).
Each model (base LLaMA2, finetuned LLaMA2, base with RAG, finetuned with RAG) runs on Replicate.

The frontend was refactored from a16z's implementation of their LLaMA2 chatbot.

guochenmeinian / llama-langchain-rag Goto Github PK

llama-langchain-rag's Introduction

Llama Langchain RAG Project

Overview

Getting Started

Prerequisites

Installation

Configuration & Features:

Resources:

llama-langchain-rag's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent