Git Product home page Git Product logo

chat-pdf's Introduction

ChatPDF

Screenshot 2024-04-23 at 8 41 42 PM

ChatPDF is a Next.js project bootstrapped with create-next-app.

Live deployment link: https://chat-to-pdf.vercel.app/.

Features Walkthrough Demo: https://drive.google.com/file/d/1fSoZTbu8a_s8J6-6utZfVOLvijlDH97j/view?usp=sharing

Local Development ๐Ÿ’ป

Here is a guide to run ChatPDF locally:

Clone the repository:

git clone https://github.com/trangiabach/chat-pdf.git

Install required packages:

npm install

Run the development server:

npm run dev

Open http://localhost:3000 and ChatPDF should be up and running!

Features ๐Ÿ™Œ

  • Support uploading multiple PDFs
  • Support multiple LLMs from OpenAI and Anthropic
  • Select an uploaded PDF of any choice to chat to, even during chat
  • Select a model of any choice to chat to, even during chat
  • Refresh, stop chats functionality
  • Remove unwanted PDFs
  • Adjust model temparatures
  • Support large PDFs ingestion/understanding via vector embeddings retrieval and token splitting
  • Streaming UI, token-by-token
  • Theme switching
  • Mobile responsive

Architecture โš™๏ธ

ChatPDF is partitioned into the following:

  • Client/Frontend Framework: Next.js (React), TypeScript
  • Styling: Tailwind CSS
  • Components Library: shadcn/ui
  • Server/Backend Frameworks: ChatPDF uses Next.js serverless API routes, which creates API endpoints as Node.js serverless functions. These functions are defined under app/api/*
  • Storage: A bucket to store PDF files using Supabase Storage
  • Vector Store: A vector store to store embeddings of PDF files (for RAG) using Supabase pgvector
  • LLM Interface: Uses Langchain to interface with multiple LLMs and create RAG features efficiently
  • LLM UI: Uses Vercel AI SDK to handle streaming and state management in the UI efficiently

Notes ๐Ÿ“

Due to the allotted time and assumptions given, there are some certain limitations that can be improved upon:

  • Since there no authentication required and hence, no notion of ownership, all PDFs files uploaded are public to all users.
  • Each query to ChatPDF are provided with a context containing at most 3 text chunks, each having 4000 tokens maximum. This is done so that the context + prompt do not exceed model's minimum context limit of about 16000 tokns (for GPT-3.5-Turbo). All other models can handle much larger context windows
  • Certain type errors since some packages are out of sync or lacking support for types
  • We can move the API logic to a dedicated Python or TS/JS server and use the existing serverless API routes provided by Next.js as a abstraction layer

Roadmap ๐Ÿ“ˆ

We can account for some limitations and improve ChatPDF:

  • Introduce Authentication, either manually implemented or via Auth Providers such as Auth0, Clerk
  • With authentication, implement user-specific PDFs uploading/management, chat sessions/history
  • Adaptative chunking strategies based on document length and models' context limits to increase quality of text embeddings
  • Adaptative retrievers parameters generation (topK, etc) based on model context limits
  • Agent/Multi-step queries to improve quality of response
  • Select multiple PDFs to chat with at once
  • Introduce more models, particularly open-source ones
  • Settings to adjust desired chunk length and overlap for users
  • Improve prompt and prompt customization for users
  • More linter rules
  • LLM Observability via Langfuse

Approach ๐Ÿ”Ž

ChatPDF has 2 main functionality:

  1. PDFs processing and embedding (process/embed endpoint is available in app/api/pdfs/embed)

    • Uploaded PDFs are sent to the server-side, where PDFJS is used to extract text.
    • The texts are then splitted into smaller documents (<=4000 tokens in size).
    • These documents are then embedded, with the corresponding PDF's metadata and upserted into Supabase's PG vector store.
    • The PDF is stored in a Supabase's bucket to be used later for display on the frontend
  2. RAG (RAG endpoint is available in app/api/chat)

    • Users sends query, along with thir desired model, PDF file to chat to and chat hisory
    • Query is embedded. The top 3 most similar text chunks from the PDF file is retrieved, which is combined into a context
    • The context is included in the prompt along with the chat history and send to the
    • LLM streams its response token-by-token back to the frontend

chat-pdf's People

Contributors

trangiabach avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.