Git Product home page Git Product logo

few-shot-table-to-text-generation's Introduction

Few-Shot Table-to-Text Generation with Prototype Memory

Authors: Yixuan Su, Zaiqiao Meng, Simon Baker, and Nigel Collier

Code for EMNLP 2021 paper Few-Shot Table-to-Text Generation with Prototype Memory

1. Download Data and Pre-trained Models:

(1) Download Data link

unzip the data.zip and replace it with the empty data folder

(2) Pre-trained Checkpoints link

unzip the checkpoints.zip and replace it with empty checkpoints folder

2. Prototype Selector

(1) Enviornment Installation:

pip install -r prototype_selector_requirements.txt

(2) Training of Few-Shot-k setting for humans dataset:

cd ./prototype_selector/sh_folder/training/human/
chmod +x ./human-few-shot-k.sh
./human-few-shot-k.sh

(3) Inference of Few-Shot-k setting for humans dataset:

cd ./prototype_selector/sh_folder/inference/human/
chmod +x ./inference_human-few-shot-k.sh
./inference_human-few-shot-k.sh

3. Generator

(1) Enviornment Installation:

pip install -r generator_requirements.txt

(2) Training of Few-Shot-k setting for humans dataset:

cd ./generator/training/human/
chmod +x ./human-few-shot-k.sh
./human-few-shot-k.sh

(3) Inference of Few-Shot-k setting for humans dataset:

cd ./generator/inference/human/
chmod +x ./human-few-shot-k-inference.sh
./human-few-shot-k-inference.sh

4. Citation

If you find our paper and code useful, please kindly cite our paper:

@inproceedings{su-etal-2021-shot-table,
    title = "Few-Shot Table-to-Text Generation with Prototype Memory",
    author = "Su, Yixuan  and
      Meng, Zaiqiao  and
      Baker, Simon  and
      Collier, Nigel",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.77",
    pages = "910--917",
    abstract = "Neural table-to-text generation models have achieved remarkable progress on an array of tasks. However, due to the data-hungry nature of neural models, their performances strongly rely on large-scale training examples, limiting their applicability in real-world applications. To address this, we propose a new framework: Prototype-to-Generate (P2G), for table-to-text generation under the few-shot scenario. The proposed framework utilizes the retrieved prototypes, which are jointly selected by an IR system and a novel prototype selector to help the model bridging the structural gap between tables and texts. Experimental results on three benchmark datasets with three state-of-the-art models demonstrate that the proposed framework significantly improves the model performance across various evaluation metrics.",
}

few-shot-table-to-text-generation's People

Contributors

yxuansu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

a1oneee hibb-bb

few-shot-table-to-text-generation's Issues

Where is the content aware loss?

I can't find this content aware loss in your code. could you please tell me the corresponding code?Thank you for your time and really great work!

Technique used to extract the top 100 relevant documents?

As you have mentioned that you have used a Lucene based IR system and that would result in extracting the entire body of the wiki page containing the query i.e. the contents of the table. I would like to know the technique you have used to extract the relevant snippet instead of the entire wiki body.

Thanks in advance.

Why concatenate slots in a table with <SEP> & <EOS>?

Thanks for releasing your code! After reading your paper, however, I was a little confused about the preprocessing method for the input tables.

prototype_selector/dataclass.py line 74:

one_res_str = slot_key + ' ' + SEP + ' ' + slot_value + ' ' + EOS

Why do you use SEP and EOS to concatenate every key and value in the table? In my opinion, you should feed no more than 2 sentences with only one EOS into BERT.

I also wonder that you randomly retrieve 100 sentences using IR system to build R . Does that mean you are able to access all the gold summaries in the dev/test set, for we only have 100 samples in the train set?

Thanks in advance.

the number of prototypes (n)

In Figure2. when n is small (i.e., n ≤ 3), the model performances are relatively the same.
where is prototypes result file?
where is train_top_20_rerank_context.txt used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.