Machine Reading Comprehension (MRC) is a critical facet of Natural Language Processing (NLP), particularly in the domain of Question Answering (QA) tasks. Question Answering involves comprehending the given corpus, then selecting the span of text in the corpus that answers to a given set of questions. In this project, we explored the different pre-trained transformer-based models to identify the strengths and weaknesses of the different models, respectively BERT, DistilBERT, RoBERTa, AlBERT and XLNet. To further improve the performance on QA, we proceed to fine-tune the models' hyperparameters to obtain best performing based model. We then explore the imperative of enhancing MRC performance through an ensemble methodology, amalgamating the capabilities of the fine-tuned transformer-based models. Our experiments demonstrates that our ensemble model outperforms all of the selected baseline models.
The project is built using the following technologies and frameworks:
- A0170723L
- A0241293J
- A0240932L
- A0238397M
- A0236491B
- jeffnjy/bert-base-test
- jeffnjy/albert-base-test
- jeffnjy/distilbert-base-test
- jeffnjy/roberta-base-test
- JiayanL/XLNET
To run the project locally, follow these steps:
-
Clone the repository to your local machine:
git clone [email protected]:Jnjy/CS4248_G10.git
-
Navigate to the project directory:
cd CS4248_G10
-
Install the required dependencies:
pip install -r requirements.txt
-
To run the ensemble method:
#1: Open
src/ensemble.py
#2: Select the ensemble method: hardmax / softmax by uncommenting the method you wish to run and comment the other
if __name__ == '__main__': run_soft_ensemble() # run_hard_ensemble()
#3: Open
script/run_ensemble.sh
to change your conda configuration## edit the file path to your conda file env filepath source /home/n/njinyuan/miniconda3/etc/profile.d/conda.sh ## edit the environment to your conda environment echo "activating environment" conda activate cs4248
#4: In the root folder, run
sbatch script/run_ensemble.sh
-
Open the slurmog to check for the evaluated results.
Now, you're ready to explore and experiment with our MRC project. Feel free to customize configurations and parameters based on your requirements in src/ensemble.py
.
# line 203
''' 2. create ensemble model '''
soft_weights = [(1, 1, 1, 1)]
# line 270
''' 2. create ensemble model '''
soft_weights = [(1, 1, 1, 1)]
Data on our experiments can be found here: https://docs.google.com/document/d/1DR2qqxTZ056ni9E6luCJ8e4A7bjkjCG0uhZH-JnLzRc/edit?usp=sharing
Happy coding!