Git Product home page Git Product logo

dss_llm_qa_retrieval_session's Introduction

DIY QA LLM BOT

This a repo to create a QA Retrieval Bot using both Open LLM as well as OpenAI in Databricks.
To get started please change the configuration notebook in util/notebook-config.py.

Please look at the documentation Below on configuration for specific LLM's

Please change the util/notebook-config.py to point to the location where your files are stored.

config['loc']  = <"Location where the files are stored">

LLM's Supported

Currently the code supports the following version

Runtime Tested

The following code is tested on ML DBR GPU 13.2 Runtime

Cluster Configurations

The Code to run open LLMS has been tested on the below single node cluster configurations:

  • AWS : g5-12xlarge [4 A10's]
  • Azure : NC24Ads_A100_v4 [1 A100]

The TGI pipeline has support to run on older Generation GPU's like the V100's but has not been tested extensively

Coverting PDF to txt

There are two ways to convert the PDF to TXT

  • Using Azure Form Recognizer To use form recognizer you need to add
config['use_azure_formrecognizer'] = True
and add the URL and KEY from the Azure portal 
config['formendpoint'] 
config['formkey']
  • Using Langchain PDF converter set
config['use_azure_formrecognizer'] = False

Runnig the code using Open AI

You can use any cluster to run the OpenAI model and need to set the following configs

set model_id t0 openai
config['model_id'] = 'openai'
os.environ['OPENAI_API_KEY'] = '<your open AI API Key>'

Note : when using OpenAI you do not need to run the 02_load_model Notebook and Azure open AI keys would need extra configurations to be added

Runnig the code using LLAMA-2 Models:

To use LLAMA-2 models, you need to agree to the terms and condition of HuggingFace and provide an API key to download the models Refer to these steps to download the key : https://huggingface.co/docs/api-inference/quicktour#get-your-api-token and set the below parameters

config['model_id'] = 'meta-llama/Llama-2-XXb-chat-hf'
config['HUGGING_FACE_HUB_TOKEN'] = '<your HF AI API Key>'

Note : to need to keep 02_load_model Notebook running to have the API running

Runnig the code using Mosiac MPT models:

you need to set the below config

config['model_id'] = 'mosaicml/mpt-30b-chat'

Note : to need to keep 02_load_model Notebook running to have the API running

Embedding Model

The current notebook use the following the embedding models

  • For openAI : text-embedding-ada-002
  • All open LLM's : intfloat/e5-large-v2

The open LLM embedding can be changed by over-riding the Dictionary in utils/notebook-config.py

dss_llm_qa_retrieval_session's People

Contributors

puneet-jain159 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.