Complex questions requiring multihop reasoning pose distinct challenges in information retrieval. This capstone project explores the use of Graph Neural Networks (GNN) to potentially enhance retrieval strategies for such queries. Central to our approach is the construction of a knowledge graph that organizes information by linking passages to extracted entities and the titles of their source articles. This structuring allows the GNN to leverage the relational data between entities, aiding in the exploration of more effective retrieval strategies. The project is focused on developing and testing this framework to examine how GNNs can be integrated with knowledge graphs to assist in handling complex informational queries.
These instructions will get your copy of the project up and running on your local machine for development and testing purposes.
Before running the scripts, you'll need to install the required Python packages. You can install all the required packages using the following command:
pip install -r requirements.txt
Default embedding is bge-small-en-v1.5
Follow these steps to get a development environment running:
Run the following bash command to download the required HotpotQA dataset:
./dataset/download_datasets.sh
To execute the baseline model with default settings, use:
python hotpotqa_baseline.py
You can customize the script's execution by adjusting the command-line parameters:
python hotpotqa_baseline.py --model_name "YourModelName" --file_name "your_file.json" --top_k --retriever_mode
To construct the knowledge graph from the HotpotQA training file, execute:
python GraphBuilder.py
This will output a JSON file containing the triplets in kgs.json.
To train the model, run:
python train.py
Hyparameters are store at the top of the file.\ The model achieving the highest hit rate will be automatically stored in the output folder.
PyTorch - An open source machine learning framework.
PyTorch Geometric - A library for deep learning on graph and other irregular structures.
LlamaIndex
HuggingFace