receiling / unire Goto Github PK

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. It is based on our NERE toolkit (https://github.com/Receiling/NERE).

License: MIT License

Shell 1.42% Python 98.58%

information-extraction natural-language-processing relation-extraction

unire's Introduction

Hi there 👋

Here are some ideas to get you started:

🔭 I’m currently working on ...
🌱 I’m currently learning ...
👯 I’m looking to collaborate on ...
🤔 I’m looking for help with ...
💬 Ask me about ...
📫 How to reach me: ...
😄 Pronouns: ...
⚡ Fun fact: ... -->

unire's People

Contributors

Stargazers

Watchers

Forkers

yotofu liu4lin icxa cttcss mingasd xjthm2019 seawolfxiwu nlper01 zhanghaok abhishek-rajendra spalatla38 nlpersecjtu lyssym antnlp farimafatahi wqwqqe tongashley maplewyf chz367

unire's Issues

KeyError: 'articleId'

当我训练SciERC时，发生如下错误，有人遇到过吗？
[2021-12-30 12:30:40,742 - entity_relation_joint_decoder.py - line:274 - INFO]: Load bert tokenizer successfully.
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\UniRE-master\entity_relation_joint_decoder.py", line 330, in
main()
File "C:\Users\Administrator\Desktop\UniRE-master\entity_relation_joint_decoder.py", line 295, in main
ace_dataset.build_dataset(vocab=vocab,
File "C:\Users\Administrator\Desktop\UniRE-master\inputs\datasets\dataset.py", line 78, in build_dataset
instance_settting['instance'].count_vocab_items(counter,
File "C:\Users\Administrator\Desktop\UniRE-master\inputs\instance.py", line 61, in count_vocab_items
field.count_vocab_items(counter, sentences)
File "C:\Users\Administrator\Desktop\UniRE-master\inputs\fields\token_field.py", line 39, in count_vocab_items
for sentence in sentences:
File "C:\Users\Administrator\Desktop\UniRE-master\inputs\dataset_readers\ace_reader_for_joint_decoding.py", line 37, in iter
state, results = self.get_tokens(line)
File "C:\Users\Administrator\Desktop\UniRE-master\inputs\dataset_readers\ace_reader_for_joint_decoding.py", line 90, in get_tokens
logger.error("article id: {} sentence id: {} doesn't contain 'sentText'.".format(line['articleId'], line['sentId']))
KeyError: 'articleId'

关于Pre-trained Models 百度网盘文件失效

你好，Pre-trained Models 里面的百度网盘链接里面文件没了，可以重新提供一下文件吗？

为什么不使用WebNLG和NYT

感谢作者分享的代码，我刚接触实体关系联合抽取任务。

为什么作者在文中提到了overlapping triples，可是不用NYT和WebNLG数据集，而要使用ACE呢？

我也注意到，实体关系联合抽取的文章，一部分用ACE，一部分用ＮＹＴ，这之间又有什么区别呢？

感谢楼主！

关于ACE2004交叉验证部分的疑惑

尊敬的作者，您好！

看了您的文章很受启发，但有些疑惑在于，您是怎么做ACE2004数据集上的5-fold交叉验证的，您是在这五个数据集上跑了五次然后取平均值吗？

其次，如果可以的话，希望获取ACE的数据集。([email protected])

祝您工作顺利，平安喜乐！

关于评估的一个小问题

你好，我想请教一下关于数据集中两个实体之间关系为None的是如何在评估里面处理的？

How to reproduce the result for SciERC in the paper?

Thanks for your amazing work!
I do the data-preprocess and train as the guide in the Readme.md
I see the code set the seed for random、torch and np.
I train the model with the SciERC dataset and early stop at epoch 229 , get the result:

I train it with a RTX3090 GPU without any modify the code (same as the Readme.md)
I see, in the paper, the result is:

the F1 for the ent is 68.4 and i get 66.03
How to reproduce the result in the paper?
Sincerely looking forward to your reply!

如何处理存在overlap的entity？

作者你好，感谢你们开源了代码，代码非常整洁易读。我注意到在论文中你们也提到UniRE无法处理overlap entity的情况，在阅读代码时，我发现这部分包含overlap的entity的句子似乎被直接扔掉了，我是根据

UniRE/inputs/dataset_readers/ace_reader_for_joint_decoding.py

Line 174 in 4f59bea

return False, results

这个，但我不确定我的理解是否是正确的。希望能得到你的解答，另外从

UniRE/entity_relation_joint_decoder.py

Line 281 in 4f59bea

ace_dev_reader = ACEReaderForJointDecoding(cfg.dev_file, False, max_len)

这里看起来，似乎所有的数据集都被设定为train了，因此所有超过长度的句子也被直接扔掉了。所以我的疑问是
（1）会把test中含有overlap entity的句子直接扔掉了嘛；
（2）会扔掉在test中长度超过阈值的sample嘛？

Traceback (most recent call last):
  File "entity_relation_joint_decoder.py", line 330, in <module>
    main()
  File "entity_relation_joint_decoder.py", line 316, in main
    model.load_state_dict(state_dict,False)
  File "/home/jsj201-9/anaconda3/envs/unire/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for EntRelJointDecoder:
        size mismatch for embedding_model.bert_encoder.bert_model.embeddings.word_embeddings.weight: copying a param with shapetorch.Size([31090, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).