osainz59 / ask2transformers Goto Github PK

A Framework for Textual Entailment based Zero Shot text classification

Home Page: https://osainz59.github.io/Ask2Transformers/

License: Apache License 2.0

Python 100.00%

topic-modeling topic-classification deep-learning transformers zero-shot mnli nlp nlp-tools nlp-tool relation-extraction

ask2transformers's Introduction

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

This repository contains the code for out of the box ready to use zero-shot classifiers among different tasks, such as Topic Labelling or Relation Extraction. It is built on top of 🤗 HuggingFace Transformers library, so you are free to choose among hundreds of models. You can either, use a dataset specific classifier or define one yourself with just labels descriptions or templates! The repository contains the code for the following publications:

📄 Ask2Transformers - Zero Shot Domain Labelling with Pretrained Transformers accepted in GWC2021.
📄 Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction accepted in EMNLP2021
📄 Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning accepted as Findings in NAACL2022

To get started with the repository consider reading the new documentation!

Demo 🕹️

We have realeased a demo on Zero-Shot Information Extraction using Textual Entailment (ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations) accepted in the Demo Track of NAACL 2022. The code is publicly availabe on its own GitHub repository: ZS4IE.

Installation

By using Pip (check the last release)

pip install a2t

By clonning the repository

git clone https://github.com/osainz59/Ask2Transformers.git
cd Ask2Transformers
pip install .

Or directly by

pip install git+https://github.com/osainz59/Ask2Transformers

Models

Available models

By default, roberta-large-mnli checkpoint is used to perform the inference. You can try different models to perform the zero-shot classification, but they need to be finetuned on a NLI task and be compatible with the AutoModelForSequenceClassification class from Transformers. For example:

roberta-large-mnli
joeddav/xlm-roberta-large-xnli
facebook/bart-large-mnli
microsoft/deberta-v2-xlarge-mnli

Coming soon: t5-large like generative models support.

Pre-trained models 🆕

We now provide (task specific) pre-trained entailment models to: (1) reproduce the results of the papers and (2) reuse them for new schemas of the same tasks. The models are publicly available on the 🤗 HuggingFace Models Hub.

The model name describes the configuration used for training as follows:

HiTZ/A2T_[pretrained_model]_[NLI_datasets]_[finetune_datasets]

pretrained_model: The checkpoint used for initialization. For example: RoBERTa_large.
NLI_datasets: The NLI datasets used for pivot training.
- S: Standford Natural Language Inference (SNLI) dataset.
- M: Multi Natural Language Inference (MNLI) dataset.
- F: Fever-nli dataset.
- A: Adversarial Natural Language Inference (ANLI) dataset.
finetune_datasets: The datasets used for fine tuning the entailment model. Note that for more than 1 dataset the training was performed sequentially. For example: ACE-arg.

Some models like HiTZ/A2T_RoBERTa_SMFA_ACE-arg have been trained marking some information between square brackets ('[[' and ']]') like the event trigger span. Make sure you follow the same preprocessing in order to obtain the best results.

Training your own models

There is no special script for fine-tuning your own entailment based models. In our experiments, we have used the publicly available run_glue.py python script (from HuggingFace Transformers). To train your own model, first, you will need to convert your actual dataset in some sort of NLI data, we recommend you to have a look to tacred2mnli.py script that serves as an example.

Tutorials (Notebooks)

Coming soon!

Results and evaluation

To obtain the results reported in the papers run the evaluation.py script with the corresponding configuration files. A configuration file containing the task and evaluation information should look like this:

{
    "name": "BabelDomains",
    "task_name": "topic-classification",
    "features_class": "a2t.tasks.text_classification.TopicClassificationFeatures",
    "hypothesis_template": "The domain of the sentence is about {label}.",
    "nli_models": [
        "roberta-large-mnli"
    ],
    "labels": [
        "Animals",
        "Art, architecture, and archaeology",
        "Biology",
        "Business, economics, and finance",
        "Chemistry and mineralogy",
        "Computing",
        "Culture and society",
        ...
        "Royalty and nobility",
        "Sport and recreation",
        "Textile and clothing",
        "Transport and travel",
        "Warfare and defense"
    ],
    "preprocess_labels": true,
    "dataset": "babeldomains",
    "test_path": "data/babeldomains.domain.gloss.tsv",
    "use_cuda": true,
    "half": true
}

Consider reading the papers to access the results.

About legacy code

The old code of this repository has been moved to a2t.legacy module and is only intended to be use for experimental reproducibility. Please, consider moving to the new code. If you need help read the new documentation or post an Issue on GitHub.

Citation

If you use this work, please consider citing at least one of the following papers. You can find the bibtex files in their corresponding aclanthology page.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, and Bonan Min. 2022. ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, pages 27–38, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, and Eneko Agirre. 2022. Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2439–2455, Seattle, United States. Association for Computational Linguistics.

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, and Eneko Agirre. 2021. Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1199–1212, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Oscar Sainz and German Rigau. 2021. Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models. In Proceedings of the 11th Global Wordnet Conference, pages 44–52, University of South Africa (UNISA). Global Wordnet Association.

ask2transformers's People

Contributors

Stargazers

Watchers

Forkers

barana91 anshiquanshu66 techthiyanes rajunzera emrecncelik mengruwg zhuowenzheng zino-chata 965694547 abtuo mittalpusa apsarageek tianhaofu aimarsg

ask2transformers's Issues

Typo in apostrophes

In the templates in "tacred.relation.config.json", for the relation "per:date_of_birth", there is the following template.
"{subj}\u00e2\u0080\u0099s birthday is on {obj}."

Shouldn't it be as below?
"{subj}'s birthday is on {obj}."

It looks like the apostrophe character (') is replaced with \u00e2\u0080\u0099. Is that a typo or is it intentional?

There is also the same issue in the following template as well.
"{obj} is the cause of {subj}\u00e2\u0080\u0099s death."

Please update the README?

This seems like it would be really cool to use but unfortunately the example code in the readme is no longer valid, and I've been struggling for a long time to get this thing to work. It seems "NLITopicClassifier" now requires an additional argument to specify which pretrained model to use, but not all of them work.

Provide further explanation or documentation?

Tutorial or examples

Hi,
Thanks for the framework.
By when tutorial or examples will be available so that we can have a way to utilise the same.
If you have any related to Text Classification & Relation Extraction.. please share.
Regards,
Gaurav

Incomplete documentation

I am still working on the documentation, so if you miss something, please consider posting it here!

Thank you!

Run GLUE for fine-tuning Few-Shot Relation Classification

Hi,
Thank you for this amazing repository, it is exactly what I was looking for! And congratulations for this work :)

I have two doubts about the fine-tuning process for Few-Shot RC:

1. I successfully used the provided script to convert TACRED data to MNLI, however, the run_glue.py script takes too long to train (I'm using Colab and it's taking like 23 hours per epoch, with a huge number of examples and optimization steps even with tiny splits of the dataset ). Am I missing something? These are the parameters I used:

python Ask2Transformers/a2t/relation_classification/run_glue.py --task_name mnli --train_file train.mnli.json --validation_file dev.mnli.json --test_file test.mnli.json --model_name_or_path roberta-large-mnli --output_dir output --cache_dir cache --do_train --do_eval --do_predict --seed 6 --per_device_train_batch_size 16 --overwrite_output_dir

2. After running the run_glue.py script, is the fine-tuned model supposed to be found in the output (--output_dir) directory? (I'm using hugging face transformers). Sorry if this question seems dumb, I'm just not sure how to proceed and use the fine-tuned model after training.

I would appreciate if you could give me some guidance about this process.
Thank you!

Positive (isNext) output for Next Sentence Prediction might be 0

Thanks for sharing the code you used in your research, it's really useful!

Before coming across your research, I've seen some other papers using NSP for topic classification, and accuracy of NSP models were almost on par with NLI models (Ma et al. 2021; Sun et al. 2021). So I was surprised to see NSP perform as bad as a random model.

At first I thought this might have happened because of the data you used. However, I saw that you defined default positive output for NSP as 1 in this line. HuggingFace documentation for NSP gives this example:

outputs = model(**encoding, labels=torch.LongTensor([1]))
logits = outputs.logits
assert logits[0, 0] < logits[0, 1]  # next sentence was random

I may be wrong, but I think last line of this example says that the output with index 0 is the positive output (isNext). I am not certain if this is the problem, but I think we should look into that.

Thank you.

Zero-shot Tacred Relation Classification

For your zero-shot Tacred relation classification results with 1% dev (see the above picture), the paper says you did 100 runs. Did you use the following set as the 1% dev set for all 100 runs? If not, how did you create 100 1% dev sets?

Ask2Transformers/resources/tacred_splits/dev/dev.0.01.split.txt

How to reproduce the EAE task result?

Excuse me,I want to reproduce the results in your team's paper: Textual Entailment for Event Argument Extraction: Zero and Few-Shot with Multi-Source Learning.However , readme.txt hadn't provide the full procedures to reproduce the Event Argument Extraction results.So I want to know how to reproduce the result.

verbalization

Hi Sainz,
I am trying to reproduce results on your paper "Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning", but cannot find the method of verbalization either in the paper or code. In Label verbalization section, you mentioned that "A verbalization is generated using templates that have been manually written based on the task guidelines of each dataset.", may I know how a sentence is generated after giving the model labels, template and original sentence? For example, when filling the template of " bought something" in Figure 1, how does the model know to choose "John D. Idol" but not "hired"? Please kindly reply when you have time.

fine-tuning Few-Shot Relation Classification

Hi,
Thank you for sharing this wonderful work! Could you detail the hyperparameters of fine-tuning TACRED?
I run the experiments with

python run_glue.py \ --model_name_or_path roberta-large-mnli \ --train_file /data/tacred/train.mnli.json \ --validation_file /data/tacred/dev.mnli.json \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --learning_rate 4e-6 \ --num_train_epochs 2 \ --overwrite_output_dir \ --fp16 True \ --gradient_accumulation 1 \ --output_dir ./results \ --save_steps 5000 \ --seed 0 \ --warmup_steps 1000

and then evaluate the final results with
python evaluation.py --config ../resources/predefined_configs/tacred.relation.config.json (modify the "nli_models" with tuned model path)

the test resutls are
"test": {"optimal_threshold": 0.5, "positive_accuracy": 0.9004511278195488, "precision": 0.43177373251090717, "recall": 0.8631578947368421, "f1-score": 0.5756117127958283 }

Could u kindly correct me how to tune and evaluate? Thank you!

Few-Shot RE

Hi!
Do you have any plans for releasing the code for fine-tuning the relation extraction model?

Fewshot checkpoints for TACRED

Hi, I am just wondering if it is possible for you to release fewshot checkpoints for TACRED. Thx!

osainz59 / ask2transformers Goto Github PK

ask2transformers's Introduction

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Demo 🕹️

Installation

Models

Available models

Pre-trained models 🆕

HiTZ/A2T_[pretrained_model]_[NLI_datasets]_[finetune_datasets]

Training your own models

Tutorials (Notebooks)

Results and evaluation

About legacy code

Citation

ask2transformers's People

Contributors

Stargazers

Watchers

Forkers

ask2transformers's Issues

Recommend Projects

Recommend Topics

Recommend Org