yerevann / warp Goto Github PK

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/

Home Page: https://mahnerak.com/WARP

License: MIT License

Python 88.38% Jsonnet 11.62%

adversarial natural-language-processing pretrained-models few-shot-learning

warp's Introduction

🌀 WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 Paper WARP: Word-level Adversarial ReProgramming.

^{WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.}

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set	Model	CB		RTE
Set	Model	F₁	Acc.	Acc.
dev
	GPT-3 Small	26.1	42.9	52.3
	GPT-3 Med	40.4	58.9	48.4
	GPT-3	57.2	82.1	72.9
	PET (ALBERT)	59.4	85.1	69.8
	iPET (ALBERT)	92.4	92.9	74.0
	WARP_init (ALBERT)	84.0	87.5	71.8
test
	GPT-3	52.0	75.6	69.0
	PET (ALBERT)	60.2	87.2	67.2
	iPET (ALBERT)	79.9	88.8	70.8
	WARP_init (ALBERT)	70.2	82.4	69.1

^{Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server.
We only show systems performing in a similar few-shot training setup using 32 examples.}

Setup

The code requires YerevaNN's internal version of allennlp

git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .

Training

Linear Probing

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [],
        "reorder_optimized": false,
        "max_batch_size": 8,
        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'
    python -m allennlp train \
    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done

WARP_0

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [null, "<mask>"],
        "reorder_optimized": true,
        "max_batch_size": 8,
        "max_tokens_sq": 262144,
        "on_logits": "pre_decoder_layer_norm",
        "pooling_index": 1,
        "seed": 1
    }'
    python -m allennlp train \
    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done

Training WARP

export DATASET="rte"
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init":null,
    "dataset":"'$DATASET'",
    "ensure_whitespace_between":false,
    "lr":0.001,
    "max_batch_size":8,
    "max_tokens_sq":262144,
    "num_epochs":30,
    "prompt_better_init":"<mask>",
    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"<mask>",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
    "seed":1,
    "transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet

WARP_init

Few-Shot Experiments

export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init": {
        "entailment": " yes",
        "not_entailment": " instead"
    },
    "dataset":"few_rte",
    "eval_mode":false,
    "lr":0.001,
    "max_batch_size":2,
    "max_tokens_sq":131072,
    "num_epochs":100,
    "num_gradient_accumulation_steps":2,
    "prompt_better_init": "[PAD]",
    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "<mask>", [-20, ","], null, [-29, "!"],-30,-31],
    "seed":3,
    "str_cut_frac":0,
    "transformer_model":"albert-xxlarge-v2",
    "validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

export HPARAMS='{
   "benchmark":"super_glue",
   "classifier_init":{
      "entailment":" yes",
      "not_entailment":" instead"
   },
   "dataset":"few_rte",
   "grad_norm":1,
   "lr":0.001,
   "max_batch_size":2,
   "max_tokens_sq":131072,
   "num_epochs":30,
   "num_gradient_accumulation_steps":2,
   "prompt_better_init":"[PAD]",
   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"<mask>",[-20,","],null,[-29,"!"],-30,-31],
   "seed":1,
   "str_cut_frac":0.06,
   "transformer_model":"albert-xxlarge-v2",
   "validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.381",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}

warp's People

Contributors

Stargazers

Watchers

Forkers

bittlingmayer trendingtechnology hunterjiang97 jianzhengming leeyy2020 ehosseiniasl xyz321123 niloysaha mishra-sid 00mjk yurikim2145 mshabdiz m1cloud zhenhua32 coisinistar

warp's Issues

How to fine-tune the model

Thank you for your contributions.I wanted to turn off adversarial reprogramming and fine-tune the model, but I didn't find a place to fine-tune the model.Dear author, can you help me

Position Id

Hi, thanks for your nice work.
When I read the source code, I have a simple question for the position id used in the code as follow,

parameters['position_ids'][0]

tensor([ 2, 47,  3,  4,  5,  6,  7, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 89,
        90, 91, 92, 93,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
        22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        40, 41, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
        69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
        87, 88, 94,  1,  1,  1], device='cuda:0')

I find that the position id is not ordered, and what are the benefits of such a position ID

couldn't read from warp.jsonnet

File "/home/hanlin/warp/allennlp/allennlp/common/params.py", line 488, in from_file
file_dict = json.loads(evaluate_file(params_file, ext_vars=ext_vars))
File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 169, in train_model_from_file
params = Params.from_file(parameter_filename, overrides)
File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 119, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/hanlin/warp/allennlp/allennlp/commands/init.py", line 119, in main
args.func(args)
File "/home/hanlin/warp/allennlp/allennlp/main.py", line 34, in run
main(prog="allennlp")
File "/home/hanlin/warp/allennlp/allennlp/main.py", line 38, in
run()

No module named 'aim.sdk.session

Hi, thanks for your nice work.
However, when I ran the follow code you provide:
for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
export HPARAMS='{
"dataset": "'$DATASET'",
"lr": 0.0001,
"num_epochs": 20,
"prompts": [],
"reorder_optimized": false,
"max_batch_size": 8,
"max_tokens_sq": 262144, "on_logits": false, "pooling_index": null, "seed": 1}'
python3 -m allennlp train
-s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done
,
I encountered the follow problem:
ModuleNotFoundError: No module named 'aim.sdk.session'.
And I check the version of package "aim = 3.0.2" and the functions in aim.sdk. I did not find the function "session".
Do I get the wrong aim version?

WARP prompt/mask token position

Hi,

Thank you for your work and for releasing the code! I just have a few questions regarding the ordering of the tokens.

From the script that you shared for training WARP, the prompt should be in the format of [-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,[MASK],-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29]. If I understood correctly, the hypothesis and the premise should be placed in the null position and the other tokens should be considered as the prompt tokens. However, as I looked through the code I found that the input is given as [[MASK], -10,-11,-12,-13,-14,-15,-16,-17,-18,-19,-20,-21,-22,-23,-24,-25,-26,-27,-28,-29, null, null] with the mask token at the first position and the hypothesis and the premise at the end of the prompt. Did I understand the code correctly? Could you clarify the positions of the prompt tokens?

Thanks again! 😄

bug

Hi, thanks for your nice work.
However, I meet a problem when I run the code following the readme.

allennlp.common.checks.ConfigurationError: huggingface not in acceptable choices for dataset_reader.type: ['conll2003', 'interleaving', 'sequence_tagging', 'sharded', 'babi', 'text_classification_json']. You should either use the --include-package flag to make sure the correct module is loaded, or use a fully qualified class name in your config file like {"model": "my_module.models.MyModel"} to have it imported automatically.

I do not use the allennlp before, could you help me?

datasets and debug

Thank you very much for your work. I would like to ask you some questions about the code. First of all, how can I change the datasets into my own? Second, I want to debug from.sh, but I can't find the program entry

The detail about manual prompts

Hi,

Thank you for your work and for releasing the code!
After reading your paper, I am confused about the manual prompts.

"""
In addition to the regular models where we initialize with [MASK] tokens,
we performed a run on the GLUE datasets with the same prompt
[CLS] "S1"? [MASK]. "S2"! [SEP] for all the tasks
"""
In the manual prompts, I want to know where to insert the prompts. Wouldn't the original WARP also have [CLS], [SEP] and [MASK] special tokens? What is the difference between WARPinit and WARP8 in the insertion position of prompts?

I don't know much about this field, thank you very much for answering my question

no output during evaluation

Hi, thanks for your job for the paper and code.
I can train the model smoothly. However, when I run

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

I'm confused about the parameters on the last line, as well as the data organization, cause the datasets were downloaded automatically to /home/dqx/.cache/huggingface/datasets/glue/mnli/train.arrow，but I'm not clear about the data in tsv format. Are they downloaded manually and just stored under WARP/ ? And how can I evaluate the model on my own testing data?
If I upload rte.dev as the testing data to WARP/ and run the following command

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor super_glue --output-file test_output.tsv .aim/t-rte  dev.tsv

the terminal output will be a list like "[...fixed_cross_validation_79', 'few_wsc.fixed_cross_validation_80', 'few_wsc.fixed_cross_validation_81', 'few_wsc.fixed_cross_validation_82', 'few_wsc.fixed_cross_validation_83', 'few_wsc.fixed_cross_validation_84', 'few_wsc.fixed_cross_validation_85', 'few_wsc.fixed_cross_validation_86', 'few_wsc.fixed_cross_validation_87', 'few_wsc.fixed_cross_validation_88', 'few_wsc.fixed_cross_validation_89', 'few_wsc.fixed_cross_validation_90', 'few_wsc.fixed_cross_validation_91', 'few_wsc.fixed_cross_validation_92', 'few_wsc.fixed_cross_validation_93', 'few_wsc.fixed_cross_validation_94', 'few_wsc.fixed_cross_validation_95', 'few_wsc.fixed_cross_validation_96', 'few_wsc.fixed_cross_validation_97', 'few_wsc.fixed_cross_validation_98', 'few_wsc.fixed_cross_validation_99', 'few_wsc.fixed_cross_validation_100']" but there's nothing in test_output.tsv.

I'm really confused about this, looking forward to your response, thanks😊✨