Git Product home page Git Product logo

yerevann / warp Goto Github PK

View Code? Open in Web Editor NEW
83.0 8.0 16.0 87 KB

Code for ACL'2021 paper WARP πŸŒ€ Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/

Home Page: https://mahnerak.com/WARP

License: MIT License

Python 88.38% Jsonnet 11.62%
adversarial natural-language-processing pretrained-models few-shot-learning

warp's Introduction

πŸŒ€ WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 Paper WARP: Word-level Adversarial ReProgramming.

WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set Model CB RTE
F1 Acc. Acc.
dev
GPT-3 Small 26.1 42.9 52.3
GPT-3 Med 40.4 58.9 48.4
GPT-3 57.2 82.1 72.9
PET (ALBERT) 59.4 85.1 69.8
iPET (ALBERT) 92.4 92.9 74.0
WARPinit (ALBERT) 84.0 87.5 71.8
test
GPT-3 52.0 75.6 69.0
PET (ALBERT) 60.2 87.2 67.2
iPET (ALBERT) 79.9 88.8 70.8
WARPinit (ALBERT) 70.2 82.4 69.1
Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server. We only show systems performing in a similar few-shot training setup using 32 examples.

Setup

The code requires YerevaNN's internal version of allennlp

git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .

Training

Linear Probing

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [],
        "reorder_optimized": false,
        "max_batch_size": 8,
        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'
    python -m allennlp train \
    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done

WARP_0

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [null, "<mask>"],
        "reorder_optimized": true,
        "max_batch_size": 8,
        "max_tokens_sq": 262144,
        "on_logits": "pre_decoder_layer_norm",
        "pooling_index": 1,
        "seed": 1
    }'
    python -m allennlp train \
    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done

Training WARP

export DATASET="rte"
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init":null,
    "dataset":"'$DATASET'",
    "ensure_whitespace_between":false,
    "lr":0.001,
    "max_batch_size":8,
    "max_tokens_sq":262144,
    "num_epochs":30,
    "prompt_better_init":"<mask>",
    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"<mask>",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
    "seed":1,
    "transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet

WARP_init

Few-Shot Experiments

export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init": {
        "entailment": " yes",
        "not_entailment": " instead"
    },
    "dataset":"few_rte",
    "eval_mode":false,
    "lr":0.001,
    "max_batch_size":2,
    "max_tokens_sq":131072,
    "num_epochs":100,
    "num_gradient_accumulation_steps":2,
    "prompt_better_init": "[PAD]",
    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "<mask>", [-20, ","], null, [-29, "!"],-30,-31],
    "seed":3,
    "str_cut_frac":0,
    "transformer_model":"albert-xxlarge-v2",
    "validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
export HPARAMS='{
   "benchmark":"super_glue",
   "classifier_init":{
      "entailment":" yes",
      "not_entailment":" instead"
   },
   "dataset":"few_rte",
   "grad_norm":1,
   "lr":0.001,
   "max_batch_size":2,
   "max_tokens_sq":131072,
   "num_epochs":30,
   "num_gradient_accumulation_steps":2,
   "prompt_better_init":"[PAD]",
   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"<mask>",[-20,","],null,[-29,"!"],-30,-31],
   "seed":1,
   "str_cut_frac":0.06,
   "transformer_model":"albert-xxlarge-v2",
   "validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test
python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.381",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}

warp's People

Contributors

bittlingmayer avatar mahnerak avatar tamohannes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

warp's Issues

How to fine-tune the model

Thank you for your contributions.I wanted to turn off adversarial reprogramming and fine-tune the model, but I didn't find a place to fine-tune the model.Dear author, can you help me

Position Id

Hi, thanks for your nice work.
When I read the source code, I have a simple question for the position id used in the code as follow,

parameters['position_ids'][0]

tensor([ 2, 47,  3,  4,  5,  6,  7, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 89,
        90, 91, 92, 93,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
        22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        40, 41, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
        69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
        87, 88, 94,  1,  1,  1], device='cuda:0')

I find that the position id is not ordered, and what are the benefits of such a position ID

couldn't read from warp.jsonnet

File "/home/hanlin/warp/allennlp/allennlp/common/params.py", line 488, in from_file
file_dict = json.loads(evaluate_file(params_file, ext_vars=ext_vars))
File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 169, in train_model_from_file
params = Params.from_file(parameter_filename, overrides)
File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 119, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/hanlin/warp/allennlp/allennlp/commands/init.py", line 119, in main
args.func(args)
File "/home/hanlin/warp/allennlp/allennlp/main.py", line 34, in run
main(prog="allennlp")
File "/home/hanlin/warp/allennlp/allennlp/main.py", line 38, in
run()

No module named 'aim.sdk.session

Hi, thanks for your nice work.
However, when I ran the follow code you provide:
for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
export HPARAMS='{
"dataset": "'$DATASET'",
"lr": 0.0001,
"num_epochs": 20,
"prompts": [],
"reorder_optimized": false,
"max_batch_size": 8,
"max_tokens_sq": 262144, "on_logits": false, "pooling_index": null, "seed": 1}'
python3 -m allennlp train
-s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done
,
I encountered the follow problem:
ModuleNotFoundError: No module named 'aim.sdk.session'.
And I check the version of package "aim = 3.0.2" and the functions in aim.sdk. I did not find the function "session".
Do I get the wrong aim version?

WARP prompt/mask token position

Hi,

Thank you for your work and for releasing the code! I just have a few questions regarding the ordering of the tokens.

From the script that you shared for training WARP, the prompt should be in the format of [-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,[MASK],-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29]. If I understood correctly, the hypothesis and the premise should be placed in the null position and the other tokens should be considered as the prompt tokens. However, as I looked through the code I found that the input is given as [[MASK], -10,-11,-12,-13,-14,-15,-16,-17,-18,-19,-20,-21,-22,-23,-24,-25,-26,-27,-28,-29, null, null] with the mask token at the first position and the hypothesis and the premise at the end of the prompt. Did I understand the code correctly? Could you clarify the positions of the prompt tokens?

Thanks again! πŸ˜„

bug

Hi, thanks for your nice work.
However, I meet a problem when I run the code following the readme.

allennlp.common.checks.ConfigurationError: huggingface not in acceptable choices for dataset_reader.type: ['conll2003', 'interleaving', 'sequence_tagging', 'sharded', 'babi', 'text_classification_json']. You should either use the --include-package flag to make sure the correct module is loaded, or use a fully qualified class name in your config file like {"model": "my_module.models.MyModel"} to have it imported automatically.

I do not use the allennlp before, could you help me?

datasets and debug

Thank you very much for your work. I would like to ask you some questions about the code. First of all, how can I change the datasets into my own? Second, I want to debug from.sh, but I can't find the program entry

The detail about manual prompts

Hi,

Thank you for your work and for releasing the code!
After reading your paper, I am confused about the manual prompts.

"""
In addition to the regular models where we initialize with [MASK] tokens,
we performed a run on the GLUE datasets with the same prompt
[CLS] "S1"? [MASK]. "S2"! [SEP] for all the tasks
"""
In the manual prompts, I want to know where to insert the prompts. Wouldn't the original WARP also have [CLS], [SEP] and [MASK] special tokens? What is the difference between WARPinit and WARP8 in the insertion position of prompts?

I don't know much about this field, thank you very much for answering my question

no output during evaluation

Hi, thanks for your job for the paper and code.
I can train the model smoothly. However, when I run

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

I'm confused about the parameters on the last line, as well as the data organization, cause the datasets were downloaded automatically to /home/dqx/.cache/huggingface/datasets/glue/mnli/train.arrow,but I'm not clear about the data in tsv format. Are they downloaded manually and just stored under WARP/ ? And how can I evaluate the model on my own testing data?
If I upload rte.dev as the testing data to WARP/ and run the following command

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor super_glue --output-file test_output.tsv .aim/t-rte  dev.tsv

the terminal output will be a list like "[...fixed_cross_validation_79', 'few_wsc.fixed_cross_validation_80', 'few_wsc.fixed_cross_validation_81', 'few_wsc.fixed_cross_validation_82', 'few_wsc.fixed_cross_validation_83', 'few_wsc.fixed_cross_validation_84', 'few_wsc.fixed_cross_validation_85', 'few_wsc.fixed_cross_validation_86', 'few_wsc.fixed_cross_validation_87', 'few_wsc.fixed_cross_validation_88', 'few_wsc.fixed_cross_validation_89', 'few_wsc.fixed_cross_validation_90', 'few_wsc.fixed_cross_validation_91', 'few_wsc.fixed_cross_validation_92', 'few_wsc.fixed_cross_validation_93', 'few_wsc.fixed_cross_validation_94', 'few_wsc.fixed_cross_validation_95', 'few_wsc.fixed_cross_validation_96', 'few_wsc.fixed_cross_validation_97', 'few_wsc.fixed_cross_validation_98', 'few_wsc.fixed_cross_validation_99', 'few_wsc.fixed_cross_validation_100']" but there's nothing in test_output.tsv.

I'm really confused about this, looking forward to your response, thanks😊✨

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.