Git Product home page Git Product logo

haystack-entailment-checker's Introduction

Haystack Entailment Checker

Custom node for the Haystack NLP framework. Using a Natural Language Inference model, it checks whether a lists of Documents/passages entails, contradicts or is neutral with respect to a given statement.

Live Demo: Fact Checking ๐ŸŽธ Rocks! ย  Generic badge

How it works

Entailment Checker Node

  • The node takes a list of Documents (commonly returned by a Retriever) and a statement as input.
  • Using a Natural Language Inference model, the text entailment between each text passage/Document (premise) and the statement (hypothesis) is computed. For every text passage, we get 3 scores (summing to 1): entailment, contradiction and neutral.
  • The text entailment scores are aggregated using a weighted average. The weight is the relevance score of each passage returned by the Retriever, if availaible. It expresses the similarity between the text passage and the statement. Now we have a summary score, so it is possible to tell if the passages confirm, are neutral or disprove the user statement.
  • empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > threshold), it is better not to consider (K-N) less relevant documents.

Installation

pip install haystack-entailment-checker

Usage

Basic example

from haystack import Document
from haystack_entailment_checker import EntailmentChecker

ec = EntailmentChecker(
        model_name_or_path = "microsoft/deberta-v2-xlarge-mnli",
        use_gpu = False,
        entailment_contradiction_threshold = 0.5)

doc = Document("My cat is lazy")

print(ec.run("My cat is very active", [doc]))
# ({'documents': [...],
# 'aggregate_entailment_info': {'contradiction': 1.0, 'neutral': 0.0, 'entailment': 0.0}}, ...)

Fact-checking pipeline (Retriever + EntailmentChecker)

from haystack import Document, Pipeline
from haystack.nodes import BM25Retriever
from haystack.document_stores import InMemoryDocumentStore
from haystack_entailment_checker import EntailmentChecker

# INDEXING
# the knowledge base can consist of many documents
docs = [...]
ds = InMemoryDocumentStore(use_bm25=True)
ds.write_documents(docs)

# QUERYING
retriever = BM25Retriever(document_store=ds)
ec = EntailmentChecker()

pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=ec, name="EntailmentChecker", inputs=["Retriever"])

pipe.run(query="YOUR STATEMENT TO CHECK")

Acknowledgements ๐Ÿ™

Special thanks goes to @davidberenstein1957, who contributed to the original implementation of this node, in the Fact Checking ๐ŸŽธ Rocks! project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.