query-focused-sum's Introduction

Exploring Neural Models for Query-Focused Summarization

This is the official code repository for Exploring Neural Models for Query-Focused Summarization by Jesse Vig*, Alexander R. Fabbri*, Wojciech Kryściński*, Chien-Sheng Wu, and Wenhao Liu (*equal contribution).

We present code and instructions for reproducing the paper experiments and running the models against your own datasets.

Introduction
Two-stage models
Segment Encoder
Citation
License

Introduction

Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. In our paper we conduct a systematic exploration of neural approaches to QFS, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models. Within those categories, we investigate existing methods and present two model extensions that achieve state-of-the-art performance on the QMSum dataset by a margin of up to 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L.

Two-stage models

Two-step approaches consist of an extractor model, which extracts parts of the source document relevant to the input query, and an abstractor model, which synthesizes the extracted segments into a final summary.

See extractors directory for instructions and code for training and evaluating two-stage models.

Segment Encoder

The Segment Encoder is an end-to-end model that uses sparse local attention to achieve SOTA ROUGE scores on the QMSum dataset.

To replicate the QMSum experiments, or train and evaluate Segment Encoder on your own dataset, see the multiencoder directory.

Citation

When referencing this repository, please cite this paper:

@misc{vig-etal-2021-exploring,
      title={Exploring Neural Models for Query-Focused Summarization}, 
      author={Jesse Vig and Alexander R. Fabbri and Wojciech Kry{\'s}ci{\'n}ski and Chien-Sheng Wu and Wenhao Liu},
      year={2021},
      eprint={2112.07637},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2112.07637}
}

License

This repository is released under the BSD-3 License.

query-focused-sum's People

Contributors

Stargazers

Watchers

query-focused-sum's Issues

failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str

Hello there, thank you for a great paper and piece of work!

I tried to train multiencoder, but when I try to get raw data from https://github.com/Yale-LILY/QMSum, it seems to have a slightly different format

Failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str but the oroginal data has list of dict

I can hack around and change to format to introduce dummy meeting_id and make it look as expected but I wanted to first check if I am missing something or if there is an cleaner way to do so.

Question is: before running preprocess.py should one just get the jsonl files from https://github.com/Yale-LILY/QMSum or is there additional and different data expected beyond a simple transform to the original data?

Thank you in advance!

Preprocesed AquaMuse dataset?

Hi, thanks for the great work!

Are the preprocessed AquaMuse documents (and summaries) also available for download, similar to the QMSum dataset? It would be really helpful for us to get identical datasets (and save the overhead of processing the data from commoncrawl).

Thanks!

salesforce / query-focused-sum Goto Github PK

query-focused-sum's Introduction

Exploring Neural Models for Query-Focused Summarization

Table of contents

Introduction

Two-stage models

Segment Encoder

Citation

License

query-focused-sum's People

Contributors

Stargazers

Watchers

Forkers

query-focused-sum's Issues

failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str

Preprocesed AquaMuse dataset?

how to get {split}.rouge.256.jsonl when using chunks?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent