Git Product home page Git Product logo

ie-e2h's Introduction

Easy-to-Hard Learning for Information Extraction

Code for paper Easy-to-Hard Learning for Information Extraction (Findings of ACL 2023).

Information extraction (IE) systems aim to automatically extract structured information, such as named entities, relations between entities, and events, from unstructured texts. While most existing work addresses a particular IE task, universally modeling various IE tasks with one model has achieved great success recently. Despite their success, they employ a one-stage learning strategy, i.e., directly learning to extract the target structure given the input text, which contradicts the human learning process. In this paper, we propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage, for IE by mimicking the human learning process. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability. Extensive experiments across four IE tasks demonstrate the effectiveness of our framework. We achieve new state-of-the-art results on 13 out of 17 datasets.

Prepare the environment

Please run the following commands:

conda create -n E2H python=3.8
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install wandb

Experiments are done on NVIDIA Tesla A100 80G.

Prepare datasets

For datasets that are publicly available, we put the processed datasets in ./dataset_processing/converted_data. For those datasets that are not publicly avaliable, please follow steps here to obtain them.

After that, please link the preprocessed datasets as:

ln -s dataset_processing/converted_data/ data

To obtain datasets used in low-resource scenarios, please run the following commands:

cd dataset_processing
bash run_sample.bash

Training and Evaluation

Setup your W&B account following this tutorial. Afterwards, uncomment the wandb.init() and wandb.config.update() statements in skill_{entity/relation/event/aste}.py and use your own entity and project names.

For training and evaluation, please run the following command

bash run_e2h.bash {model size} {dataset} {task}

Choose the model size from base and large, and (task, dataset) pairs from the following table.

Task Dataset
entity conll03
entity ace04ent
entity ace05ent
relation conll04
relation scierc
relation ace05rel
event ace05e
event ace05e+
event casie
aste 14lap
aste 14res
aste 15res
aste 16res

For training and evaluation in low-resource scenarios, please run the following command

bash run_e2h_ratio.bash {model size} {dataset} {task}

After training, for each (model_size, task, dataset) triplet, there will be a directory corresponding to it. This directory will contain many prediction folders and a file named best.performance.now that summarizes the results. Please check ./config, run_exp_e2h.bash, run_exp_e2h_ratio.bash, e2h.bash, and e2h_ratio.bash for details.

Citation

If you find the code helpful, please cite the following paper:

@inproceedings{gao-etal-2023-easy,
    title = "Easy-to-Hard Learning for Information Extraction",
    author = "Gao, Chang  and
      Zhang, Wenxuan  and
      Lam, Wai  and
      Bing, Lidong",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.754",
    pages = "11913--11930",
}

Acknowledgement

This implementation is based on the code of UIE. Thanks for their contributions.

ie-e2h's People

Contributors

gao-xiao-bai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ie-e2h's Issues

ValueError: min() arg is an empty sequence

I processed the data according to your instructions and ran the code,bash run_e2h_ratio.bash base ace04ent entity,After running the model for a period of time, prompt an error message,Then I reported the following error message, and I don't know how to modify it, so may I ask how you can solve it,thank you very much.
run_command: python3 Map Config config/offset_map/closest_offset_en.yaml exp_name: output-e2h-ratio/EXP-EnA04-TB main_output_dir: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5 main_stdout_file: /dev/stdout main_stderr_file: /dev/stderr easy_output_dir: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1 easy_stdout_file: /dev/stdout easy_stderr_file: /dev/stderr hard_output_dir: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5 hard_stdout_file: /dev/stdout hard_stderr_file: /dev/stderr ratio_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.01 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.01_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.01_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.01_run1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.05 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.05_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.05_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.05_run1 run_data_folder: data/text2spotasoc/entity/mrc_ace04/ratio/seed1/0.1 main_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/main/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.1_run1 easy_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/easy/first,second_entity_mrc_ace04_t5-base_spotasoc_ee100_384_linear_lr1e-4_ls0_b64_wu0_er0.1/0.1_run1 hard_run_output_folder: output-e2h-ratio/EXP-EnA04-TB/hard/first,second_entity_mrc_ace04_t5-base_spotasoc_e100_ee100_384_384_linear_lr1e-4_ls0_b64_wu0_er0.1_sn2_m5/0.1_run1 Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 264, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence Traceback (most recent call last): File "scripts/summary_result.py", line 282, in <module> main() File "scripts/summary_result.py", line 271, in main result_summary.result_to_table_reduce( File "scripts/summary_result.py", line 202, in result_to_table_reduce print("min: ", min(result_dict[key])) ValueError: min() arg is an empty sequence

Inference file

can you provide the inference file, the inference file is the same as the inference file provided by UIE?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.