FlaskGPT is a minimal ChatGPT clone that leverages the langchain
library to provide an interactive graphical user interface (GUI) for querying a JSON file, specifically the resume.json
file from the resume.json project. This project is indebted to Thomas Davis for the use of his resume.json
file.
The main objective of FlaskGPT is to enable users to ask questions interactively using a Large Language Model (LLM) in combination with the resume.json
file, which has been converted into a vector database. The project is implemented using Flask and is designed to work with an OpenAI model or a Llama2-based LLM model (gguf file format).
FlaskGPT is inspired by the work of Himanshu Sekhar Das on InfoGPT, a similar project built using Streamlit. While InfoGPT offers a more extensive feature set and supports a wider range of file formats, FlaskGPT aims to provide a minimalistic and Flask-based alternative.
- Interactive GUI for querying the
resume.json
file. - Utilizes a Large Language Model (LLM) to generate responses.
- Supports the OpenAI API and the Llama2-based LLM model (gguf files only).
- Real-time token output using Server-Sent Events (SSE).
-
Clone the repository:
git clone https://github.com/benbaker76/FlaskGPT.git cd FlaskGPT
-
I recommend using a virtual environment:
python -m venv .venv source .venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API credentials by following the instructions in the OpenAI documentation. Make sure your
OPENAI_API_KEY
environment variable is set.export OPENAI_API_KEY=<your secret key>
-
Start the Flask server:
python3 FlaskGPT.py
-
Access the FlaskGPT GUI by opening a web browser and navigating to
http://localhost:5000
.
- Launch the FlaskGPT GUI as described in the "Getting Started" section.
- Use the interface to ask questions based on the
resume.json
file. - Enjoy real-time token output and responses from the Large Language Model.
This work is based on the principle of Retrieval Augmentation Generation (RAG) which is an approach in the field of Natural Language Processing (NLP) that combines three key components: retrieval, augmentation, and generation. This approach is designed to improve the performance and capabilities of language models.
Token Limits: Vector databases play a crucial role in enabling large language models (LLMs) to manage extensive data within token constraints. These databases store and index vector representations of text, allowing LLMs to retrieve relevant segments efficiently. For instance, in this demo, the default OpenAI model, gpt-3.5-turbo
, has a maximum context length of 4097 tokens. Vector databases facilitate optimal data utilization by helping LLMs produce coherent responses while working within these limitations.
Text Splitting and Embedding: In this process, we break down the document's content into smaller units that encapsulate document metadata. After dividing the document into these smaller chunks, we employ a vector embedding model to convert them into representations stored in the vector database.
Retrieval: Retrieval refers to the process of accessing and obtaining specific information or data from a database. It involves searching for and returning relevant items or documents based on user queries or criteria.
Output Generation: Output Generation occurs after obtaining relevant context from the retriever component. This context is then used as input for the Language Model (LLM), which generates an appropriate response or answer based on the user's query. This step is crucial in providing accurate and contextually relevant responses to user queries. Langchain plays a role in chaining the llm, retriever and prompts to give the final answer which we stream through Flask ui.
Contributions to FlaskGPT are welcome! If you'd like to contribute, please fork the repository and create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.