MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Authors: Liyan Tang, Philippe Laban, Greg Durrett

Please check out our work here 📃

LLM-AggreFact Benchmark

Description

LLM-AggreFact is a fact verification benchmark. It aggregates 10 of the most up-to-date publicly available datasets on factual consistency evaluation across both closed-book and grounded generation settings. In LLM-AggreFact:

Documents come from diverse sources, including Wikipedia paragraphs, interviews, web text, covering domains such as news, dialogue, science, and healthcare.
Claims to be verified are mostly generated from recent generative models (except for one dataset of human-written claims), without any human intervention in any format, such as injecting certain error types into model-generated claims.

Benchmark Access

Our Benchmark is available on HuggingFace 🤗 More benchmark details can be found here.

from datasets import load_dataset
dataset = load_dataset("lytang/LLM-AggreFact")

The benchmark contains the following fields:

Field	Description
dataset	One of the 10 datasets in the benchmark
doc	Document used to check the corresponding claim
claim	Claim to be checked by the corresponding document
label	1 if the claim is supported, 0 otherwise

MiniCheck Model Evaluation Demo

Please first clone our GitHub Repo and install necessary packages from requirements.txt.

Our MiniCheck models are available on HuggingFace 🤗 More model details can be found from this collection. Below is a simple use case of MiniCheck. MiniCheck models will be automatically downloaded from Huggingface for the first time and cached in the specified directory.

from minicheck.minicheck import MiniCheck

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

# model_name can be one of ['roberta-large', 'deberta-v3-large', 'flan-t5-large']
# lytang/MiniCheck-Flan-T5-Large will be auto-downloaded from Huggingface for the first time
scorer = MiniCheck(model_name='flan-t5-large', device=f'cuda:0', cache_dir='./ckpts')
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9805923700332642, 0.007121307775378227]

A detailed walkthrough of the evaluation process on LLM-Aggrefact and replication of the results is available in this notebook: inference-example-demo.ipynb.

Synthetic Data Generation

Code for generating synthetic data (both C2D and D2C methods) is available in the synthetic_data_gen directory. Our 14K synthetic data (7K C2D and 7K D2C) used for model training is available on HuggingFace 🤗 and can be found here.

Citation

If you found our work useful, please consider citing our work.

@misc{tang2024minicheck,
      title={MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents}, 
      author={Liyan Tang and Philippe Laban and Greg Durrett},
      year={2024},
      eprint={2404.10774},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

liyan06 / minicheck Goto Github PK

minicheck's Introduction

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

LLM-AggreFact Benchmark

Description

Benchmark Access

MiniCheck Model Evaluation Demo

Synthetic Data Generation

Citation

minicheck's People

Contributors

Stargazers

Watchers

Forkers

minicheck's Issues

A request for finetune code about MiniCheck-FT5

Generation of "claims"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent