Git Product home page Git Product logo

llm-data-annotator's Introduction

[ACL2023] Is GPT-3 a Good Data Annotator?

The repo is the source code for Is GPT-3 a Good Data Annotator?

Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Boyang Li, Shafiq Joty, Lidong Bing

Accepted at 61th Annual Meeting of the Association for Computational Linguistics (ACL'23).

Setup

1. Download the code

git clone https://github.com/DAMO-NLP-SG/LLM-Data-Annotator
cd LLM-Data-Annotator
unzip Data&Prompts&Codes.zip

2. Install dependencies

pip install -r requirements.txt

3. Run code

The files under the 'prompt' folder are prompts used in our work for different methods and the files under the 'data' folder are training data for different methods.

3.1. FewRel

cd FewRel/code

For FewRel, we wrote a simple relation extraction code, main.py to run our experiments. You may follow the instructions in the FewRel folder to run the code.

3.2. SST2

cd SST2

For SST2 we used the codebase from https://github.com/YJiangcm/SST-2-sentiment-analysis to run our experiments.

3.3. ASTE

cd ASTE

For ASTE we used the codebase from https://github.com/chiayewken/Span-ASTE to run our experiments.

3.4 CrossNER

cd CrossNER

For CrossNER we used the codebase from https://github.com/allanj/pytorch_neural_crf to run our experiments.

4. Usage of GPT-3 API

You may refer to the GPT-3 API reference (https://platform.openai.com/docs/introduction) for more details.

Here is an example to use GPT-3 API in python:

def prompt_text(prompt_content):
    result = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompt_content,
              max_tokens=2000,
              temperature=1.0
            )
    result_text = result['choices'][0]['text'].strip()
    return result_text

Citation

If you find our paper or this project helps your research, please kindly consider citing our paper in your publication.

@inproceedings{ding-etal-2023-gpt,
    title = "Is {GPT}-3 a Good Data Annotator?",
    author = "Ding, Bosheng  and
      Qin, Chengwei  and
      Liu, Linlin  and
      Chia, Yew Ken  and
      Li, Boyang  and
      Joty, Shafiq  and
      Bing, Lidong",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.626",
    doi = "10.18653/v1/2023.acl-long.626",
    pages = "11173--11195",
    abstract = "Data annotation is the process of labeling data that could be used to train machine learning models. Having high quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated im- impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.",
}

llm-data-annotator's People

Contributors

bosheng2020 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

llm-data-annotator's Issues

Question about PGI

Hi, bosheng. A nice work about LLM.
I have a question about PGI.
PGI instructs GPT-3 directly annotates the test data and does not annotate the training data.
So, what is the mean of 'num. of samples' for PGI in the Tables 1,2,3 and 4?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.