Question Answering of private documents using open source LLM running localy.
NOTE: This software is in very early stages of development!
- Install podman (for macOS see here) and initialize it. i.e make sure podman ps command works in your terminal
- Start the container mounting your document directory and vectorDB directory
- Run ingest documents by running:
podman run -it -v /mnt/c/Temp/books/:/docQA/docs -v /mnt/c/Temp/vecdb/:/docQA/vecdb docker.io/vsndev3/docqa_v1_mpt7b:latest /docQA/docQA.py import-pdfs --dir /docQA/docs
- Run the above command with different PDF directories to have a combined vector DB at the given directory
- Run ingest documents by running:
- Start the container in interactive mode
- Chat with your documents by running:
podman run -it -v /mnt/c/Temp/vecdb/:/docQA/vecdb docker.io/vsndev3/docqa_v1_mpt7b:latest
- Chat with your documents by running:
- enjoy ๐๐ธ
Clone this repo using below command and install the python requiremens and run it!
Base tools installation
If you are not having python and git but you want to run from source then install below:
- Python for your operating system from https://www.python.org/downloads/
- Git for your operating system from https://git-scm.com/downloads
git clone https://github.com/vsndev3/docQA.git
pip3 install -r requirements.txt
cd src
mkdir docs vecdb
# copy PDFs to docs directory
python3 docQA.py import_pdfs --dir ./docs
python3 docQA.py question
<Ctl+c> to quit once done
Preparing docker image
Creating container image packs models and python dependecies into single image, so that its easier to spin up and running even inside wallgardend enviroments. To build a docker image use the following command
- Install podman (if you are not using docker) and optionaly podman desktop from https://podman.io/
- Customize the Dockerfile to your needs, for example you can preload PDF files and models files as needed. It is not required to load source documents or models as they can be mounted when container is started
- podman build -f Dockerfile -t my_docqa_container
- Once podman build is successful run it podman.exe run -it localhost/my_docqa_container
Use cases
Start the application with import_pdfs option with the directory containing the PDF files. This is needed one time for scanning and creating vector database. There after this directory need not be specified or mounted (in case of container usage)
Assuming /dir/doc_folder_1 and /dir/doc_folder_2 has PDF files, then run one by one as below:
docQA.py import-pdfs --dir /dir/doc_folder_1
docQA.py import-pdfs --dir /dir/doc_folder_2
Start the application with question option and in the prompt enter your question that could be retrieved from the documents you have already provided
To search start the application as below, after doing adding of documents at least one time.
docQA.py question
To start search with different threads:
docQA.py question --threads 12
To start the answering less rigid way, start with higher temperature
docQA.py question --temperature 0.8
Check the available parameters for tuning using docQA.py question --help
At this moment parital delete is not implemented. To delete vector database created from the documents, delete the contents of vecdb directory