This repository is intended to be used to NLP project development for creating a fact checking model.
We use the PUBHEALTH dataset (https://github.com/neemakot/Health-Fact-Checking repo)
Currently it only contains pre-processing code and a dummy model to verify dependencies.
Download data from google drive link if you have access. Then start working with the notebook src/NLPNotebook The training code is scattered in different notebooks.
- Pre-processing code and some initial experiments are in the notebook - NLPInitialNotebookPreprocessingT5SmallModel.ipynb
- NLPInitialNotebookPreprocessingT5SmallModel.ipynb - Contain pre-processing and t5-small training for explanation generation
- ClaimClassification.ipynb - Claim classification using DistillBert
- ExplanationGeneration.ipynb - Explanation Text generation
- InferenceNotebook.ipynb - Contains an example of using the saved model to predict
Available on google drive.