Git Product home page Git Product logo

pydocqa's Introduction

docQA

Question Answering of private documents using open source LLM running localy.

NOTE: This software is in very early stages of development!

Quick start

  • Install podman (for macOS see here) and initialize it. i.e make sure podman ps command works in your terminal
  • Start the container mounting your document directory and vectorDB directory
    • Run ingest documents by running: podman run -it -v /mnt/c/Temp/books/:/docQA/docs -v /mnt/c/Temp/vecdb/:/docQA/vecdb docker.io/vsndev3/docqa_v1_mpt7b:latest /docQA/docQA.py import-pdfs --dir /docQA/docs
    • Run the above command with different PDF directories to have a combined vector DB at the given directory
  • Start the container in interactive mode
    • Chat with your documents by running: podman run -it -v /mnt/c/Temp/vecdb/:/docQA/vecdb docker.io/vsndev3/docqa_v1_mpt7b:latest
  • enjoy ๐ŸŽ‡๐Ÿธ

image

Running locally

Clone this repo using below command and install the python requiremens and run it!

Base tools installation

If you are not having python and git but you want to run from source then install below:

git clone https://github.com/vsndev3/docQA.git
pip3 install -r requirements.txt

cd src
mkdir docs vecdb
# copy PDFs to docs directory

python3 docQA.py import_pdfs --dir ./docs 
python3 docQA.py question

<Ctl+c> to quit once done
Preparing docker image

Creating Docker image

Creating container image packs models and python dependecies into single image, so that its easier to spin up and running even inside wallgardend enviroments. To build a docker image use the following command

  • Install podman (if you are not using docker) and optionaly podman desktop from https://podman.io/
  • Customize the Dockerfile to your needs, for example you can preload PDF files and models files as needed. It is not required to load source documents or models as they can be mounted when container is started
  • podman build -f Dockerfile -t my_docqa_container
  • Once podman build is successful run it podman.exe run -it localhost/my_docqa_container
Use cases

Adding documents

Start the application with import_pdfs option with the directory containing the PDF files. This is needed one time for scanning and creating vector database. There after this directory need not be specified or mounted (in case of container usage)

Assuming /dir/doc_folder_1 and /dir/doc_folder_2 has PDF files, then run one by one as below:

docQA.py import-pdfs --dir /dir/doc_folder_1
docQA.py import-pdfs --dir /dir/doc_folder_2 

Searching for data

Start the application with question option and in the prompt enter your question that could be retrieved from the documents you have already provided

To search start the application as below, after doing adding of documents at least one time.

docQA.py question

To start search with different threads:

docQA.py question --threads 12

To start the answering less rigid way, start with higher temperature

docQA.py question --temperature 0.8

Check the available parameters for tuning using docQA.py question --help

Deleting the data

At this moment parital delete is not implemented. To delete vector database created from the documents, delete the contents of vecdb directory

pydocqa's People

Contributors

vsndev3 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.