Git Product home page Git Product logo

css's Introduction

CSS

The dataset and code for ACL2023 Paper: A New Dataset and Empirical Study for Sentence Simplification in Chinese (https://aclanthology.org/2023.acl-long.462)

Introduction

CSS is the first dataset for assessing sentence simplification in Chinese.
CSS consists of 766 human simplifications associated with the 383 original sentences from the PFR corpus (two simplifications per original sentence).
You can see more details in our paper.

Files

  • test.json: CSS dataset.
  • additional_dataset_for_few-shot_setting.json: only one reference for each original sentence, for validation or few-shot setting.
  • LLMs_result: LLMs inference code and simplification result.
  • NMT_to_SS: build pseudo-Chinese-SS data, which is described in section 4.1.
  • baseline_code: train code and predict code.
  • dataset_statistics: feature analysis code.
  • count.out & stopwords.txt: support materials for dataset_statistics/feature_extraction.py.

Cite our Work

@inproceedings{yang-etal-2023-new,
    title = "A New Dataset and Empirical Study for Sentence Simplification in {C}hinese",
    author = "Yang, Shiping  and
      Sun, Renliang  and
      Wan, Xiaojun",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.462",
    pages = "8306--8321",
    abstract = "Sentence Simplification is a valuable technique that can benefit language learners and children a lot. However, current research focuses more on English sentence simplification. The development of Chinese sentence simplification is relatively slow due to the lack of data. To alleviate this limitation, this paper introduces CSS, a new dataset for assessing sentence simplification in Chinese. We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications. Furthermore, we test several unsupervised and zero/few-shot learning methods on CSS and analyze the automatic evaluation and human evaluation results. In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.",
}

css's People

Contributors

maybenotime avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.