Git Product home page Git Product logo

info_gpt's Introduction

Info-GPT

Information retrieval on your private data using Embeddings Vector Search and LLMs.

Architecture

image description

QA ChatBot

  1. The data is first scraped from the internal sources.
  2. The text data is then converted into embedding vectors using pre-trained models.
  3. These embeddings are stored into a Vector database, we have used Chroma in this project. Vector databases allows us to easily perform Nearest Neighbor Search on the embedding vectors.
  4. Once all data is ingested into the DB, we take the user query, fetch the top k matching documents from the DB and feed them into the LLM of our choice. The LLM then generates a summarized answer using the question and documents. This process is orchestrated by Langchain's RetrievalQA chain

Slack Bot

  1. The package also has an api folder which can be used to integrate it as a Slack slash command.
  2. The API is built using FastAPI, which provides a /slack POST endpoint, that acts as URL for slack command.
  3. Since the slash command has to response within 3 seconds, we offload the querying work to Celery and return processing response to user.
  4. Celery then performs the retrieval and summarization task on the query and sends the final response to the Slack provided endpoint.

Installation

Install using pip

The package can be easily installed by pip, using the following command:

pip install info_gpt[api] git+https://github.com/techytushar/info_gpt

Development install

  1. This package uses Poetry for dependency management, so install poetry first using instructions here
  2. [Optional] Update poetry config to create virtual environment inside the project only using poetry config virtualenvs.in-project true
  3. Run poetry install --all-extras to install all dependencies.
  4. Install the pre-commit hooks for linting and formatting using pre-commit install

Usage

All configurations are driven through the constants.py and api/constants.py. Most of them have a default value but some need to be provided explicitly, such as secrets and tokens.

Data ingestion

from info_gpt.ingest import Ingest
import asyncio

ingester = Ingest()
# ingest data from GitHub
asyncio.run(ingester.load_github("<org_name_here>", ".md"))
# ingest data from Confluence pages
ingester.load_confluence()

Run API locally

  1. Build the Docker image locally using docker build --build-arg SLACK_TOKEN=$SLACK_TOKEN -t info-gpt .
  2. Run the API using Docker Compose docker compose up
  3. You can use ngrok to expose the localhost URL to internet using ngrok http 8000 --host-header="localhost:8000"

Deployment

Deploy on K8s using Helm chart

---- WIP ----

info_gpt's People

Stargazers

Donald Pham avatar Nikolaus Schlemm avatar AasthaTechO avatar Prateek Ralhan avatar Siddhesh Bhasale avatar

Watchers

Tushar Mittal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.