Git Product home page Git Product logo

financial_document_summarization_through_rag's Introduction

Chatbot leveraging RAG for Financial Document (10Q, 10K) Summarization

Chatbot leveraging Retrieval Augmented Generation (RAG) for Financial Document Summarization

Technical Problem Formulation

  • Problem statement: Given a PDF document and a query, retrieve the relevant details and information from the document as per the query, and synthesize this information to generate accurate answers.
  • Data Ingestion and Processing: Reading PDFs of financial reports and split the documents for efficient text chunking of long documents.
  • Retrieval-Augmented Generation (RAG): Combination of document retrieval with the generative capabilities of the chosen language models.
  • Large Language Models: Evaluation of various models, including GPT-3.5-turbo, LLama 2, Gemma 1.1, etc.
  • Conversation Chain and Prompt Design: Crafting of a prompt template designed for concise two-sentence financial summaries.
  • User interface: Designing Chatbot like user interface.

Different answers generated by GPT4 (w/o RAG).

GPT-4 gives different results when asked the same question multiple times showing hallucinations. This behavior is not observed in our RAG system.

System architecture

Modified from blog.goopenai

LLMs and Experiments

  1. GPT-3.5
  2. LLama 2
  3. Gemma 1.1
  4. Flan-T5

Evaluation

Metrics:

  • Generation Metrics: Faithfulness and Answer Relevancy.
  • Retrieval: Context Recall and Context Precision.

Description:

  • Faithfulness: This measures the factual consistency of the generated answer against the given context. The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context.
  • Answer relevancy: Scores the relevancy of the answer according to the given question. Answers with incomplete, redundant or unnecessary information is penalized.
  • Context recall: measures the extent to which the retrieved context aligns with the annotated answer, treated as the ground truth.
  • Context precision: evaluates whether all of the ground-truth relevant items present in the contexts are ranked higher or not.

Installation

pip install -r requirements.txt

Run

python index.py

Code / Notebooks:

Notebooks referenced from Kaggle

Contributors

Contributor 1 Contributor 2 Contributor 3
Harshit Kumar Sarthak Khandelwal Alexander Leon

financial_document_summarization_through_rag's People

Contributors

kharshit avatar sarthakforwet avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.