Git Product home page Git Product logo

health-fact-checking's Introduction

Explainable Fact-Checking for Public Health Claims

This repository contains data and code for the paper Explainable Fact-Checking for Public Health Claims (Kotonya and Toni, 2020). This research will be presented at The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020).

Introduction

Fact-checking is the task of verifying claims (i.e., distinguishing between false stories and facts) by assessing the assertions made by claims against credible evidence. The vast majority of fact-checking studies focus exclusively on political claims. Very little research explores fact-checking for other topics, specifically subject matters for which expertise is required. We present the first study in explainable fact-checking for claims which require specific expertise.

For our case study we choose the setting of public health. To support this, we construct a new dataset PUBHEALTH of 11.8K claims accompanied by journalist-crafted, gold standard explanations (i.e., judgments) to support the fact-check labels for claims. We explore two tasks: veracity prediction and explanation generation. We also define and evaluate, with humans and computationally, three coherence properties of explanation quality. Our results indicate that, by training on in-domain data, gains can be made in explainable, automated fact-checking for claims which require specific expertise.

Data

PUBHEALTH fact-checking dataset

We present PUBHEALTH, a comprehensive dataset for explainable automated fact-checking of public health claims. Each instance in the PUBHEALTH dataset has an associated veracity label (true, false, unproven, mixture). Furthermore each instance in the dataset has an explanation text field. The explanation is a justification for which the claim has been assigned a particular veracity label.

The dataset can be downloaded here.

OR

The dataset can be acquired using the following commands

 cd src
 ./download_data.sh

The following is an example instance of the PUBHEALTH dataset:

Field Example
claim Expired boxes of cake and pancake mix are dangerously toxic.
explanation What's True: Pancake and cake mixes that contain mold can cause life-threatening allergic reactions. What's False: Pancake and cake mixes that have passed their expiration dates are not inherently dangerous to ordinarily healthy people, and the yeast in packaged baking products does not "over time develops spores."
label mixture
claim URL https://www.snopes.com/fact-check/expired-cake-mix/
author(s) David Mikkelson
date published April 19, 2006
tags food, allergies, baking, cake
main_text In April 2006, the experience of a 14-year-old who had eaten pancakes made from a mix that had gone moldy was described in the popular newspaper column Dear Abby. The account has since been circulated widely on the Internet as scores of concerned homemakers ponder the safety of the pancake and other baking mixes lurking in their larders [...]
evidence sources [1] Bennett, Allan and Kim Collins. “An Unusual Case of Anaphylaxis: Mold in Pancake Mix.” American Journal of Forensic Medicine & Pathology. September 2001 (pp. 292-295). [2] Phillips, Jeanne. “Dear Abby.” 14 April 2006 [syndicated column].

More information about the PUBHEALTH dataset can be found in DATASHEET.md and README.md provided under under data/, including test/train/dev splits, and data collection and processing information.

PUBHEALTH evidence documents

We have are also collecting the original evidence documents cited in the fact-checking articles. We are currently updating this collection, however the current version can be downloaded using the following commands

 cd src
 ./download_evidence_docs.sh

Alternatively, you can download the evidence documents here.

The evidence documents are all text files with names formatted as doc_<CLAIM_ID>_<EVIDENCE_NUMBER>.txt.

Requirements

This project is built using Py36 and Tensorflow. To install the dependencies use the following command

pip install -r requirements.txt

There is the full list of requirements including versions:

Machine Learning, NLP, evaluation and visualization packages:

Reference

If you use the dataset, please cite the paper as formatted below.

@inproceedings{kotonya-toni-2020-explainable,
    title = "Explainable Automated Fact-Checking for Public Health Claims",
    author = "Kotonya, Neema  and
      Toni, Francesca",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.623",
    pages = "7740--7754",
}

Contact

Please feel free to contact Neema Kotonya if you have any queries.

health-fact-checking's People

Contributors

neemakot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.