Git Product home page Git Product logo

arshiasangwan / aragog Goto Github PK

View Code? Open in Web Editor NEW

This project forked from predlico/aragog

0.0 0.0 0.0 20.54 MB

ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research papers dataset. Includes modular code for easy experimentation and reusability.

Home Page: https://arxiv.org/abs/2404.01037

Python 5.25% Jupyter Notebook 94.75%

aragog's Introduction

ARAGOG - Advanced Retrieval Augmented Generation Output Grading ๐Ÿ•ท๏ธ

This repository contains the code, data, and analysis for our study [link later] on advanced Retrieval-Augmented Generation (RAG) techniques. It's part of our scientific paper investigating the efficacy of various RAG techniques in enhancing the precision and contextual relevance of LLMs.

Repository Structure

  • eval_questions/: Contains a JSON file with 107 QA pairs used in the evaluation.
  • papers_for_questions/: Holds a collection of AI-ArXiv papers that were utilized for creating the 107 QA pairs.
  • resources/: Includes essential resources like the prompt template and configuration files. Note: Actual config files need API keys and other settings to be filled out.
  • main.py: The main script where experiments are defined and executed.
  • res_analysis.ipynb: A Jupyter notebook for in-depth analysis of the final experimental results.
  • utils.py: Helper functions supporting various operations within the repository.
  • vector_db.py: Scripts for setting up different vector databases, such as Classic VDB, Sentence-window, and Document Summary.
  • final_results.xlsx: Spreadsheet containing the final results from our experiments, shared for transparency and scientific verification.

Getting Started

To replicate our experiments or to analyze our results, please ensure to fill in the necessary API keys and other configurations by creating a .env file (see .sample.env) - the .env is ignored in .gitignore for security.

Setup the python environment using either venv or pyenv or your favourite python environment amanger. Call the environment aragog or anything you like.

  • python3 -m venv aragog and activate it using source venv/bin/activate (Mac/Linux) or venv\Scripts\activate (Windows).
  • OR pyenv with pyenv virtualenv 3.12 aragog, then activate with pyenv local aragog.

Then run pip install -r requirements.txt to install all necessary dependencies.

Results examination

The res_analysis.ipynb notebook provides a detailed examination of the experimental results stored in final_results.xlsx.

Full replication

To set up vector databases for experiments, run the vector_db.py script. Subsequently, execute main.py to perform the experiments. Post-experimentation, use res_analysis.ipynb for analyzing the results. Helper functions in utils.py are employed across scripts to streamline processes.

Contribution

Contributions are welcome. For any changes or enhancements, please open an issue first to discuss what you would like to change.

License

This project is open-source and available under the MIT License.

aragog's People

Contributors

matouseibich avatar shivaynagpal avatar oliviermills avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.