Chatbot leveraging RAG for Financial Document (10Q, 10K) Summarization

Chatbot leveraging Retrieval Augmented Generation (RAG) for Financial Document Summarization

Technical Problem Formulation

Problem statement: Given a PDF document and a query, retrieve the relevant details and information from the document as per the query, and synthesize this information to generate accurate answers.
Data Ingestion and Processing: Reading PDFs of financial reports and split the documents for efficient text chunking of long documents.
Retrieval-Augmented Generation (RAG): Combination of document retrieval with the generative capabilities of the chosen language models.
Large Language Models: Evaluation of various models, including GPT-3.5-turbo, LLama 2, Gemma 1.1, etc.
Conversation Chain and Prompt Design: Crafting of a prompt template designed for concise two-sentence financial summaries.
User interface: Designing Chatbot like user interface.

Different answers generated by GPT4 (w/o RAG).

GPT-4 gives different results when asked the same question multiple times showing hallucinations. This behavior is not observed in our RAG system.

System architecture

Modified from blog.goopenai

LLMs and Experiments

GPT-3.5
LLama 2
Gemma 1.1
Flan-T5

Evaluation

Metrics:

Generation Metrics: Faithfulness and Answer Relevancy.
Retrieval: Context Recall and Context Precision.

Description:

Faithfulness: This measures the factual consistency of the generated answer against the given context. The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context.
Answer relevancy: Scores the relevancy of the answer according to the given question. Answers with incomplete, redundant or unnecessary information is penalized.
Context recall: measures the extent to which the retrieved context aligns with the annotated answer, treated as the ground truth.
Context precision: evaluates whether all of the ground-truth relevant items present in the contexts are ranked higher or not.

Installation

pip install -r requirements.txt

Run

python index.py

Code / Notebooks:

Notebooks referenced from Kaggle

LLama 2: rag_llama2.ipynb
Gemma 1.1: rag_gemma1.ipynb
GPT-3.5: pages/chatbot/chatbot_model.py
Evaluation: eval.py

Contributors

Contributor 1	Contributor 2	Contributor 3
Harshit Kumar	Sarthak Khandelwal	Alexander Leon

sarthakforwet / financial_document_summarization_through_rag Goto Github PK

financial_document_summarization_through_rag's Introduction

Chatbot leveraging RAG for Financial Document (10Q, 10K) Summarization

Technical Problem Formulation

System architecture

LLMs and Experiments

Evaluation

Installation

Run

Code / Notebooks:

Contributors

financial_document_summarization_through_rag's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent