Git Product home page Git Product logo

followir's Introduction

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Official repository for the paper FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. Official evaluation can be done by installing the mteb library and evaluating your MTEB compatible model with zero (or only a few) lines of code changes!

Links

Binary Description
FollowIR-7B 7B parameter model that does document reranking given a query and instructions. It is finetuned from Mistral-7B on the datasets below
FollowIR-train The dataset used to train FollowIR-7B. It consists of TREC instructions and queries, and GPT generated synthetic documents that have been filtered.
FollowIR-train-raw The pre-filtered version of the train set above. This was not used in model training as some GPT generated data is incorrect.

You can also find the individual annotated test data (Robust04, Core17, and News21) although the format is best used with MTEB's evaluation code.

Installation

If you wish to reproduce the experiments in the paper you can use the following code:

git clone https://github.com/orionw/FollowIR.git
cd FollowIR/
conda create -n followir python=3.9 -y
conda activate followir
pip install -r requirements.txt
bash launch_all_jobs.sh

Usage

If your model is SentenceTransformer compatible and requires no special tokens for concatenating the query and instructions, you can simply use the following one line command:

mteb -m $MODEL_NAME -t $DATASET

for each of the datasets in {Robust04InstructionRetrieval, Core17InstructionRetrieval, News21InstructionRetrieval}

If you have a bi-encoder model but want to do something different than simply appending the instruction to the query with a space, you can extend DenseRetrievalExactSearch and check for instructions in kwargs. See (see models/base_sentence_transformers/ as a starting place for small modifiations and models/e5/ for an example with larger modifications).

Reranker Usage

Rerankers have now been added to MTEB! If you are using a reranker model, you will need to extend the DenseRetrievalExactSearch class and define an __init__ and predict function (see models/rerankers section for a variety of reranker examples). Your predict function should take in input_to_rerank which will be a tuple of the form:

# if there are no instructions, instructions will be a list of Nones
# Instructions will be present for all of the FollowIR datasets
queries, passages, instructions = list(zip(*input_to_rerank))

Your predict function should use these and return a list containing a score for each tuple item.

Citing

If you found the code, data or model useful, free to cite:

@misc{weller2024followir,
      title={FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions}, 
      author={Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
      year={2024},
      eprint={2403.15246},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

followir's People

Contributors

orionw avatar

Stargazers

 avatar Fengyu Cai avatar Apurv Verma avatar Hwi avatar Qiaosheng Chen avatar Xu Peng avatar Hansel avatar Tim avatar felix-wang avatar Marc Marone avatar  avatar Iheb Gafsi avatar kunlun531 avatar Cong Wei avatar Rodrigo Gonzalez avatar  avatar QIAN HONGJIN avatar  avatar Jiahui Du avatar  avatar Jeff Carpenter avatar  avatar John F. Wu avatar Jakub Bartczuk avatar  avatar Martin Salo avatar  avatar Shyam Peri avatar Qinyuan Cheng avatar Hakeem Demi avatar 唐国梁Tommy avatar  avatar

Watchers

 avatar

followir's Issues

Computing p-MRR

Hi, thanks for the great work.

I'd like to compute p-MRR in the paper, but not sure if it's implemented in this repo. There are some unclear parts so it would be good to see the code.

FYI, I already have a ranked document list for each query given the og and changed instructions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.