damo-nlp-sg / ie-e2h Goto Github PK

Easy-to-Hard Learning for Information Extraction (ACL 2023 Findings)

Home Page: https://arxiv.org/abs/2305.09193

Shell 7.87% Python 92.13%

ie-e2h's Introduction

Easy-to-Hard Learning for Information Extraction

Code for paper Easy-to-Hard Learning for Information Extraction (Findings of ACL 2023).

Information extraction (IE) systems aim to automatically extract structured information, such as named entities, relations between entities, and events, from unstructured texts. While most existing work addresses a particular IE task, universally modeling various IE tasks with one model has achieved great success recently. Despite their success, they employ a one-stage learning strategy, i.e., directly learning to extract the target structure given the input text, which contradicts the human learning process. In this paper, we propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage, for IE by mimicking the human learning process. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability. Extensive experiments across four IE tasks demonstrate the effectiveness of our framework. We achieve new state-of-the-art results on 13 out of 17 datasets.

Prepare the environment

Please run the following commands:

conda create -n E2H python=3.8
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install wandb

Experiments are done on NVIDIA Tesla A100 80G.

Prepare datasets

For datasets that are publicly available, we put the processed datasets in ./dataset_processing/converted_data. For those datasets that are not publicly avaliable, please follow steps here to obtain them.

After that, please link the preprocessed datasets as:

ln -s dataset_processing/converted_data/ data

To obtain datasets used in low-resource scenarios, please run the following commands:

cd dataset_processing
bash run_sample.bash

Training and Evaluation

Setup your W&B account following this tutorial. Afterwards, uncomment the wandb.init() and wandb.config.update() statements in skill_{entity/relation/event/aste}.py and use your own entity and project names.

For training and evaluation, please run the following command

bash run_e2h.bash {model size} {dataset} {task}

Choose the model size from base and large, and (task, dataset) pairs from the following table.

Task	Dataset
entity	conll03
entity	ace04ent
entity	ace05ent
relation	conll04
relation	scierc
relation	ace05rel
event	ace05e
event	ace05e+
event	casie
aste	14lap
aste	14res
aste	15res
aste	16res

For training and evaluation in low-resource scenarios, please run the following command

bash run_e2h_ratio.bash {model size} {dataset} {task}

After training, for each (model_size, task, dataset) triplet, there will be a directory corresponding to it. This directory will contain many prediction folders and a file named best.performance.now that summarizes the results. Please check ./config, run_exp_e2h.bash, run_exp_e2h_ratio.bash, e2h.bash, and e2h_ratio.bash for details.

Citation

If you find the code helpful, please cite the following paper:

@inproceedings{gao-etal-2023-easy,
    title = "Easy-to-Hard Learning for Information Extraction",
    author = "Gao, Chang  and
      Zhang, Wenxuan  and
      Lam, Wai  and
      Bing, Lidong",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.754",
    pages = "11913--11930",
}

Acknowledgement

This implementation is based on the code of UIE. Thanks for their contributions.

ie-e2h's People

Contributors

Stargazers

Watchers

Forkers

gao-xiao-bai dev-hjjoo

ie-e2h's Issues

ValueError: min() arg is an empty sequence

I processed the data according to your instructions and ran the code,bash run_e2h_ratio.bash base ace04ent entity，After running the model for a period of time, prompt an error message，Then I reported the following error message, and I don't know how to modify it, so may I ask how you can solve it，thank you very much.
run_command: python3 Map Config config/offset_map/closest_offset_en.yaml exp_name: output-e2h-ratio/EXP-EnA04-TB main_output_dir: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5 main_stdout_file: /dev/stdout main_stderr_file: /dev/stderr easy_output_dir: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1 easy_stdout_file: /dev/stdout easy_stderr_file: /dev/stderr hard_output_dir: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5 hard_stdout_file: /dev/stdout hard_stderr_file: /dev/stderr ratio_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.01 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.01_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.01_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.01_run1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.05 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.05_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.05_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.05_run1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.1 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.1_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.1_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.1_run1 Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence

Hello, may I ask if you can provide a trained model？

Inference file

can you provide the inference file, the inference file is the same as the inference file provided by UIE?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.