Git Product home page Git Product logo

cyclenlg's Introduction

Faithful Low-resource Data-to-Text Generation through Cycle Training

Howdy!

(July 11th, 2023 Update)

I'm happy to share that our code is publically available at: https://github.com/amzn/faithful-data2text-cycle-training

Our code include a little extra functionalies beyond what we have reported in this paper, and feel free to use it for further research : )

As the we don't have the ownership of the data, we advise you to check the offical data release of WebNLG and DART.

In the meantime, I will keep this repository open. If you have any questions regarding the preprocessing of the data, running of the model, data annotation, use of the slides, etc., you are very welcomed to open an issue here or email me directely via [wang(at)tamu.edu]. I'm also open to any potential collaboration opportunities.

Thanks again for your interests in our work, and we hope it would help you!

FAQ

  1. Preproceesing of WSQL and WTQ amzn/faithful-data2text-cycle-training#2 (comment)

  2. Preproceesing of camel-cased/snake-cased strings and accented characters #1 (comment)

  3. Sample inputs #1 (comment)

  4. Hyperparameters and backbone model #1 (comment)

Our code release is under Amazon's internal approval process. The code will be released at https://github.com/amzn, and we will update the repository link once approved. In the meantime, you are very welcome to contact us if you need any implementation assistance to replicate the work prior to the official code release.

Cite us

Please use the following bibtex when referencing this work:

@inproceedings{wang-etal-2023-faithful,
    title = "Faithful Low-Resource Data-to-Text Generation through Cycle Training",
    author = "Wang, Zhuoer  and
      Collins, Marcus  and
      Vedula, Nikhita  and
      Filice, Simone  and
      Malmasi, Shervin  and
      Rokhlenko, Oleg",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.160",
    pages = "2847--2867",
    abstract = "Methods to generate text from structured data have advanced significantly in recent years, primarily due to fine-tuning of pre-trained language models on large datasets. However, such models can fail to produce output faithful to the input data, particularly on out-of-domain data. Sufficient annotated data is often not available for specific domains, leading us to seek an unsupervised approach to improve the faithfulness of output text. Since the problem is fundamentally one of consistency between the representations of the structured data and text, we evaluate the effectiveness of cycle training in this work. Cycle training uses two models which are inverses of each other: one that generates text from structured data, and one which generates the structured data from natural language text. We show that cycle training, when initialized with a small amount of supervised data (100 samples in our case), achieves nearly the same performance as fully supervised approaches for the data-to-text generation task on the WebNLG, E2E, WTQ, and WSQL datasets. We perform extensive empirical analysis with automated evaluation metrics and a newly designed human evaluation schema to reveal different cycle training strategies{'} effectiveness of reducing various types of generation errors.Our code is publicly available at https://github.com/Edillower/CycleNLG.",
}

cyclenlg's People

Contributors

edillower avatar

Stargazers

Archer_2210 avatar  avatar Jeff Carpenter avatar Kaishuai Xu avatar Ethan, Wenjun Hou avatar Qingyun Wang avatar Mark Huang avatar Sandeep Patra avatar Hoang Thang Ta avatar

Watchers

 avatar Kostas Georgiou avatar

cyclenlg's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.