Git Product home page Git Product logo

hagrid's Introduction

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

HAGRID

Build License arXiv

HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) is a dataset for generative information-seeking scenarios. It is constructed on top of MIRACL ๐ŸŒ๐Ÿ™Œ๐ŸŒ, an information retrieval dataset that consists of queries along with a set of manually labelled relevant passages (quotes).

We collect attributed explanations for each question by eliciting prompts from GPT-3.5, based on the given relevant passages. The explanations adhere to an in-context citation style, similar to scientific articles, that reference the supporting quotes. We then ask human annotators to judge the explanations based on two criteria:

  1. Informativeness: whether they provide a direct answer to the question.
  2. Attributability: whether they are attributable to the source passages.

HAGRID workflow

Quick Links

Data

HAGRID is hosted on Hugging Face ๐Ÿค—: link.

import datasets
hagrid = datasets.load_dataset("miracl/hagrid", split="train")
print(hagrid[0])
Split #Q #A #Informativeness #Attribuatability
Train 1,922 3,214 3,214 754
Dev 716 1,318 1,157 826

Baselines (Coming soon!)

We are planning to release baseline models soon! Stay tuned!

Contact

If you have any questions, feel free to email us (project.miracl [at] gmail.com) or start a Github issue under this repository.

License

This work is licensed under the Apache 2 license. See LICENSE for details.

Citation

If you find this dataset and repository helpful, please cite HAGRID as follows:

@article{hagrid,
      title={{HAGRID}: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution}, 
      author={Ehsan Kamalloo and Aref Jafari and Xinyu Zhang and Nandan Thakur and Jimmy Lin},
      year={2023},
      journal={arXiv:2307.16883},
}

hagrid's People

Contributors

bjliudong avatar ehsk avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.