Git Product home page Git Product logo

yuanxiaosc / entity-relation-extraction Goto Github PK

View Code? Open in Web Editor NEW
1.2K 25.0 269.0 5.97 MB

Entity and Relation Extraction Based on TensorFlow and BERT. 基于TensorFlow和BERT的管道式实体及关系抽取,2019语言与智能技术竞赛信息抽取任务解决方案。Schema based Knowledge Extraction, SKE 2019

Home Page: https://yuanxiaosc.github.io/2019/05/17/多关系抽取研究/

Python 100.00%
tensorflow entity-extraction relation-extraction pipeline-framework bert-model competition-code

entity-relation-extraction's People

Contributors

mymusise avatar yuanxiaosc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

entity-relation-extraction's Issues

test文件夹里面的文件哪里来

您好,test文件夹里面的文件是需要自己准备么,按照步骤执行没有生成,如果自己准备的话,格式是啥样的?

运行 run_predicate_classification.py 中出现keyError

你好,我按照ReadMe.md中的方式运行run_predicate_classification.py,出现如下的错误:

Traceback (most recent call last):
File "run_predicate_classification.py", line 821, in
tf.app.run()
File "/data/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run_predicate_classification.py", line 698, in main
train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
File "run_predicate_classification.py", line 385, in file_based_convert_examples_to_features
max_seq_length, tokenizer)
File "run_predicate_classification.py", line 347, in convert_single_example
label_ids = _predicate_label_to_id(label_list, label_map)
File "run_predicate_classification.py", line 371, in _predicate_label_to_id
predicate_label_ids[predicate_label_map[label]] = 1
KeyError: ''

不知道这个问题应该如何解决

code bugs for row_label_ids ?

In this line 579:

                row_label_ids = tf.reduce_sum(tf.ones_like(elements_equal), -1)

should be:
row_label_ids = tf.reduce_sum(tf.ones_like(label_ids), -1)

运行run_predicate_classification.py训练脚本报错,能帮忙看看嘛?

Traceback (most recent call last):
File "run_predicate_classification.py", line 812, in
tf.app.run()
File "/home/anaconda/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run_predicate_classification.py", line 690, in main
train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
File "run_predicate_classification.py", line 381, in file_based_convert_examples_to_features
max_seq_length, tokenizer)
File "run_predicate_classification.py", line 343, in convert_single_example
label_ids = _predicate_label_to_id(label_list, label_map)
File "run_predicate_classification.py", line 367, in _predicate_label_to_id
predicate_label_ids[predicate_label_map[label]] = 1
KeyError: ''

关于效果和联合训练

不知道您这种管道式的训练效果如何呢?
另外,基于bert的高准确率,在构建损失函数的时候,直接组合两个任务的损失,也就是分类损失和标注损失的和,然后做fine-tunning?您这种方法可行吗?
有看到有人做联合训练的,好像目前效果都还不够。
谢谢!

请问GPU的接口在哪里

您好,我在运行时程序自动选择了CPU,但我的其它程序都会自动选择GPU。请问这个使用GPU的接口在哪里?

运行出错报keyError的问题

def _predicate_label_to_id(predicate_label, predicate_label_map): 函数是把关系标签转换成onehot向量,关系全部定义在 def get_labels(self):中,只要是这里面的关系都可以转换,所以你可以输出一下这个key,即predicate_label_map[label] 中的label看看

Originally posted by @yuanxiaosc in #4 (comment)
作者你好,我在训练关系分类模型时也出现keyError。将label print出来后发现label为空,原语料中这一句话确实是没有关系类别。请问该怎么解决的?

一句话中同一种关系出现多次

比如在一句话中,“张三的国籍是**,李四的国籍是印度”,出现两次“国籍”这个关系需要预测,那么在关系分类模型中,对应label是国籍,还是国籍,国籍呢?

生成实体-关系结果过程中出现问题

在运行python produce_submit_json_file.py时候,

python produce_submit_json_file.py

Traceback (most recent call last):
File "produce_submit_json_file.py", line 324, in
spo_list_manager = Sorted_relation_and_entity_list_Management(TEST_DATA_DIR, MODEL_OUTPUT_DIR, Competition_Mode=Competition_Mode)
File "produce_submit_json_file.py", line 133, in init
File_Management.init(self, TEST_DATA_DIR=TEST_DATA_DIR, MODEL_OUTPUT_DIR=MODEL_OUTPUT_DIR, Competition_Mode=Competition_Mode)
File "produce_submit_json_file.py", line 82, in init
self.MODEL_OUTPUT_DIR = get_latest_model_predict_data_dir(MODEL_OUTPUT_DIR)
File "produce_submit_json_file.py", line 22, in get_latest_model_predict_data_dir
if not os.path.exists(new_ckpt_dir):
UnboundLocalError: local variable 'new_ckpt_dir' referenced before assignment

出现 local variable 'new_ckpt_dir' referenced before assignment

run_sequence_labeling评测错误

您好,请问您在run_sequence_labeling做评测时有报下面这个错误吗?
TypeError: Values of eval_metric_ops must be (metric_value, update_op) tuples, given: Tensor("ArgMax:0", shape=(?,), dtype=int32) for key: predicate_prediction

没有找到生成token_in.txt等txt文件的代码?

token_in.txt,predicate_out.txt等txt文件是在哪里生成的呢,我看第一个issue也是问的这个问题,下载训练数据是没有问题,但是很明显txt文件没有啊,请问博主可以提供该文件和文件的生成脚本吗?感谢

dataset

请问可以提供一下完整的训练数据集吗,我没有参加比赛,所以下载不了

作者您好,运行关系分类模型报错keyError

File "run_predicate_classification.py", line 812, in
tf.app.run()
File "D:\Anaconda3\envs\comp\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "run_predicate_classification.py", line 690, in main
train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
File "run_predicate_classification.py", line 381, in file_based_convert_examples_to_features
max_seq_length, tokenizer)
File "run_predicate_classification.py", line 343, in convert_single_example
label_ids = _predicate_label_to_id(label_list, label_map)
File "run_predicate_classification.py", line 367, in _predicate_label_to_id
predicate_label_ids[predicate_label_map[label]] = 1
KeyError: ''

No such file or directory: 'bin/subject_object_labeling/sequence_labeling_data/test/token_in_and_one_predicate.txt'

进行 序列标注模型预测 时候出现Error

python run_sequnce_labeling.py \
  --task_name=SKE_2019 \
  --do_predict=true \
  --data_dir=bin/subject_object_labeling/sequence_labeling_data \
  --vocab_file=pretrained_model/chinese_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=pretrained_model/chinese_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=output/sequnce_labeling_model/epochs9/model.ckpt-22000 \
  --max_seq_length=128 \
  --output_dir=./output/sequnce_infer_out/epochs9/ckpt22000

Exception:

W0202 13:40:21.592350 139693254723392 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
Traceback (most recent call last):
  File "run_sequnce_labeling.py", line 885, in <module>
    tf.app.run()
  File "/srv/jupyterhub/envs/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/srv/jupyterhub/envs/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/srv/jupyterhub/envs/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run_sequnce_labeling.py", line 826, in main
    predict_examples = processor.get_test_examples(FLAGS.data_dir)
  File "run_sequnce_labeling.py", line 235, in get_test_examples
    with open(os.path.join(data_dir, os.path.join("test", "token_in_and_one_predicate.txt")), encoding='utf-8') as token_in_f:
FileNotFoundError: [Errno 2] No such file or directory: 'bin/subject_object_labeling/sequence_labeling_data/test/token_in_and_one_predicate.txt'

看了下 bin/subject_object_labeling/sequence_labeling_data/test/ 目录是空的

另外:bin/prepare_data_for_labeling_infer.py 好像没有这个脚本。

如何进行评测

生成keep_empty_spo_list_subject_predicate_object_predict_output.json后如何进行评测呢,是直接用 bin/evaluation/中的calc_pr.py吗 使用这个函数 把keep_empty_spo_list_subject_predicate_object_predict_output.json 作为 predict_file参数 但是提示:
predict file is error
{"errorCode": 1, "errorMsg": "file_reading_error"}

保存模型时系统找不到指定的路径

您好:

我在运行代码时遇到如下错误

INFO:tensorflow:Saving checkpoints for 0 into ./output/predicate_classification_model/epochs6/model.ckpt.
2019-12-15 15:31:02.192730: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:109 : Not found: Failed to create a NewWriteableFile: ./output/predicate_classification_model/epochs6/model.ckpt-0_temp_58f0ef5ae6ef4fadb59db0652cc8e3ec/part-00000-of-00001.data-00000-of-00001.tempstate4867356647866357195 : 系统找不到指定的路径。

系统环境 :Windows 10

请问如何解决呢?
谢谢!

运行出错

AttributeError: module 'tokenization' has no attribute 'FullTokenizer'

在特定gpu上运行程序

你好,我在运行过程中按照ReadMe中的命令运行时,程序会默认检测当前空闲gpu并且全部占满gpu,我想请问一下如果我想只在某一块gpu或者某几块gpu上运行的话,有什么办法吗

作者你好,代码运行出错

sequence_labeling_data_manager.py是否没有添加对test数据的处理。
我没有找到能够运行run_sequence_labeling.py的test数据集,也就是不能找到test/token_in_and_one_predicate.txt文件,是我忽略了某些操作吗,谢谢

请问postag embedding是怎么做的?

请问 postag embedding 部分是对每个字或者每个词对应的postag进行embedding吗?比如{"word": "的", "pos": "u"} 是对“u”进行embedding吗?

关于预测效果询问

大神你好,按照这个代码运行,关系分类 6轮,序列标注9轮,训练出了来预测结果f1只有0.67.请问是哪里的问题。在自己的数据上训练的。数据量差不多,关系分类有60多种。也用了官方数据训练,f1也不到0.7.请问是哪里有问题吗。

**原论文

作者,您好!先进行关系分类再识别实体这种**有相关论文提供吗?

准确率问题:用了跟您一样的数据,准确率只有百分之五十多

具体截图如下:准确率和召回率
KF4DANL%ENNFYUDZMZU6B(I
precision:0.5743
recall:0.5948
F1-score:0.5844

实体关系分类的完全准确率和部分准确率:
MYG8()08L%GQLGV9L_ZGI(T
图片上传失败,内容如下:

correct_line: 509, line: 1000, percentage: 50.9000%

superset_line: 141, line: 1000, percentage: 14.1000%

subset_line: 254, line: 1000, percentage: 25.4000%

没有想到原因,望指教!

some mistakes happen when i run your code.

some mistakes happen when run eval, can you help me?

here are the details:
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "run_predicate_classification.py", line 812, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run_predicate_classification.py", line 741, in main
result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2424, in evaluate
rendezvous.raise_errors()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2418, in evaluate
name=name
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 478, in evaluate
return _evaluate()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 460, in _evaluate
self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1484, in _evaluate_build_graph
self._call_model_fn_eval(input_fn, self.config))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1520, in _call_model_fn_eval
features, labels, model_fn_lib.ModeKeys.EVAL, config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2195, in _call_model_fn
features, labels, mode, config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2479, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1259, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1538, in _call_model_fn
return estimator_spec.as_estimator_spec()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 330, in as_estimator_spec
prediction_hooks=self.prediction_hooks + hooks)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/model_fn.py", line 236, in new
'tuples, given: {} for key: {}'.format(value, key))
TypeError: Values of eval_metric_ops must be (metric_value, update_op) tuples, given: Tensor("Abs:0", shape=(), dtype=float32) for key: eval_accuracy

关于处理和实体类型确认的问题

您好,感谢您提供的代码,本人学习您的代码过程中,有两个疑惑:

  1. 您在前期处理bert tokenizer 产生##字符串时候,做了[##WordPiece]替换,这一步的目的我不太理解,我本人觉得应该是可以省略此步骤,因为这样可以减少第二个模型的类型,且在预测结果也方便了处理。
  2. 是否对于一种关系有多种实体类型情况,是否就无法确定两实体到底属于这关系下的那个实体类型了

关于关系抽取的问题

您好,阅读了您的代码,主要有两个问题不太理解,想咨询一下您

  1. 分类模型在modeling的训练模型最终的结果是batch_size128768,128代表的是句长,为什么最后只取第一个字作为分类的标准?后面的字都不需要了吗?
    image
  2. 在关系抽取模型里,我看到代码是分别预测标签和文本的BIO分别训练,也就是说预测标签结果不依赖于文本的BIO
    image
    image
    但是一般不是先给文本打上BIO标签以后,再和BIO标签一起去训练文本得到预测标签吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.