macanv / bert-bilstm-crf-ner Goto Github PK

View Code? Open in Web Editor NEW

4.6K 4.6K 1.3K 3.84 MB

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Home Page: https://github.com/macanv/BERT-BiLSMT-CRF-NER

Python 96.96% Shell 0.01% Perl 3.03%

bert bert-bilstm-crf blstm crf named-entity-recognition ner

bert-bilstm-crf-ner's People

Contributors

Stargazers

Watchers

Forkers

wangxuekui zheng5yu9 zgd716 gingersna fendaq ustbliubo2014 pustar asuolai yuanjie-ai amoliu greengrass2015 isaac09 vickzhang zhouyonglong dsindex enod allensmile mbyase everwind bigzt huguanglong chaoongithub eva-n27 alexyoung757 tigeryang93 itsmengzaime nwpuit lovehoroscoper jcsyl lidhcs aiedward gokunwu tifoit 36984712 zfxsteven caibinbupt youxuanxue sericwong qq547276542 ubear yueping123 wangjunji xhui28 nguyendung haif-liu weiczhu yclinyimeng haiyangasd moonlight1776 frankchu0229 cliuxinxin tiffen xueguohua tu-cao iceelor newwaylw pengxy morindaz robink87 lufenggui cindytech chl916185 gdh756462786 casillas-qf wangbq18 david-lee-1990 xujing1022 wyazx wangjinghe zhyuxie mzhengmit elfsong skywindy kun-cockpit-tech xumeng123 jz3707 ns2mitu jackysnake abc3436645 codeants2012 wut0n9 chenztchan dx2048 juventi zhangli0713 cjm1044642385 hatleon zxz53000 caerusy zhangyunfeng111 justingoes wwbin2017 hualichenxi wchaos chenny0808 tomllt 2efper billechu oldfresher baylee001

bert-bilstm-crf-ner's Issues

do_train=True ?

python3 bert_lstm_ner.py
--task_name="NER" \
--do_train=True
--do_eval=True
--do_predict=True
--data_dir=NERdata
--vocab_file=checkpoint/vocab.txt \
--bert_config_file=checkpoint/bert_config.json \
--init_checkpoint=checkpoint/bert_model.ckpt
--max_seq_length=128
--train_batch_size=32
--learning_rate=2e-5
--num_train_epochs=3.0
--output_dir=./output/result_dir/

where is bert_lstm_ner.py?

请问bert_lstm_ner.py在哪里？我想自己训练数据，但是�没找到bert_lstm_ner.py，是run Python3.6 site package 的bert_lstm_ner.py吗？

模型收敛情况

问下各位，这模型训练到多少个echo开始收敛，损失函数在train上多大，最近我在上千万数据上做ner，模型一直不收敛，所有的数据都过一遍不见收敛，求交流？

模型训练问题

您好，我在使用新版本进行模型训练的时候碰到如下错误，找不到问题，求助：
2019-02-11 02:01:02.791404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-02-11 02:01:03.806589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-11 02:01:03.806652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-02-11 02:01:03.806674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-02-11 02:01:03.806956: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-02-11 02:01:03.807062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10754 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2019-02-11 02:02:25.470659: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
shape of input_ids (?, 128)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/metrics_impl.py:259: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/bert_base/train/tf_metrics.py:141: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Traceback (most recent call last):
File "/usr/local/bin/bert-base-ner-train", line 10, in
sys.exit(train_ner())
File "/usr/local/lib/python3.6/dist-packages/bert_base/runs/init.py", line 37, in train_ner
train(args=args)
File "/usr/local/lib/python3.6/dist-packages/bert_base/train/bert_lstm_ner.py", line 616, in train
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1183, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1217, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1411, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 788, in exit
self._close_internal(exception_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 821, in _close_internal
h.end(self._coordinated_creator.tf_sess)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 588, in end
self._save(session, last_step)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 607, in _save
if l.after_save(session, step):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 517, in after_save
self._evaluate(global_step_value) # updates self.eval_result
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 537, in _evaluate
self._evaluator.evaluate_and_export())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 912, in evaluate_and_export
hooks=self._eval_spec.hooks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 474, in evaluate
return _evaluate()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 460, in _evaluate
self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1424, in _evaluate_build_graph
self._call_model_fn_eval(input_fn, self.config))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1460, in _call_model_fn_eval
features, labels, model_fn_lib.ModeKeys.EVAL, config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1171, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/bert_base/train/bert_lstm_ner.py", line 421, in model_fn
eval_metric_ops=eval_metrics
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/model_fn.py", line 194, in new
raise ValueError('Missing loss.')
ValueError: Missing loss.

读取数据时对空格的处理好像有些问题

在原数据中的原文如果为空格，则在读取数据的read_data函数中，会出现错误。

这个是原数据

这个是预测后的结果

原因应该是在read_data和convert_single_example两个函数中，将字符串用空格split时出现问题。需要在语聊中提前将空格替换掉。

请问这个是不是只能在TPU条件下运行

运行时候发现只能在TPU条件下。

实时预测计算部分feed-dict部分比较耗时，怎么优化？

已经跑起来了！

把train_batch_size调整为16，本人在11G显存GPU1080Ti跑起来了，accurcy99%，感谢！

BLSTM-CRF的使用与不使用的效果对比

有没有试过BiLSTM-CRF不用的情况下效果怎么样？

训练自己模型的问题

假设我要对植物、动物名词进行NER，请问需要多少大概多少语料做训练集？

我现在标注了十几篇文章中的植物名(B-PLANT,I-PLANT)、动物名(B-AN,I-AN)，追加到了NERdata/train.txt中，然后用bert_lstm_ner.py训练出了模型。
之后用模型想提取一些动植物名，可以识别出ORG、LOC、PER，但就是识别不出PLANT和AN？
不清楚是哪方面的问题，训练集太少吗？

tf.sequence_mask使用是否错误？

作者，您好，请问下在bert_lstm_ner.py文件中metric_fn函数中，weight=tf.sequence_mask(FLAGS.max_seq_length)是否错误，
官方API ：tf.sequence_mask(lengths, maxlen=None, dtype=tf.bool, name=None)，这里应该接受一个lengths参数，也就是一个batch_size长的列表，记录batch_size中每条序列真实的长度。
盼回复！

关于bert优化器的疑问

@macanv [您好。我在调用bert中的optimization.create_optimizer()时，返回了一个该函数内部的bug，不知道您在该项目过程中有没有遇到过。

droupout typo

这个issue很微小，just FYI， droupout被携程了dropout

请问有不加LSTM层的对比么？

如题；

How can I train this model with GPU=8 to avoid OOM problem?

使用的时候performance提升不上去

在CoNLL数据集上实验，epoch设置为3,10,20结果performance都差不多卡在73% 提升不上去，数据集已经根据要求改成了需要的格式，其他和教程的都一致就是performance训练不出来

你好我设置了在电脑GPU上运行，怎么却一直在CPU上运行呢？

关于数据ORI

请问该数据在原始未加入bert的时候各评价指标是多少？

需要多大的显存能跑起来呢？

我的电脑是8g的，模型运行不了。有没有大佬愿意指导一下 wx：ST1762393631

重复操作

BERT-BiLSTM-CRF-NER/bert_lstm_ner.py

Line 264 in 4c7c7f7

label_map[label] = i

可以在方法外建立一个label_map, 然后作为参数传入，不需要对每个example都从label_list开始创建一个label_id map.

迭代少量Epoch

当迭代轮数较少的时候，比如训练时跑了3个epoch，出现token丢失的情况有遇到吗？
比如test data：
word1，tag1
word2，tag2
word3，tag3

预测的时候变成了：
word1，tag1'
word2，tag2'
..
word3 没了

fix 'idx out of range 的错误'

line_token = [x for x in str(predict_line.text)]

max_seq_length 不够怎么办？

有的句子比较长，128的长度完全不够，设置到500（语料里有甚至更长的）会导致OOM，机器内存不够用，想知道max_sqe_length不够用怎么解决，不能覆盖完句子，会有什么不好的影响

使用命令行进行NER训练时报错 AttributeError: module 'tensorflow.data' has no attribute 'experimental'，TensorFlow版本1.10.0，另外1.11.0和1.12.0也尝试过，同样报错

大佬你好，我碰到以下问题，还麻烦您帮忙解答下，谢谢！
1、命令行的方式我尝试过，一直报下面的错误，我用的TensorFlow的版本是1.10.0，另外，1.11.0和1.12.0我都尝试过，都不行，用的是大佬提供的原始训练语料train.txt ，报错信息如下：
Traceback (most recent call last):
File "/home/software/anaconda3/bin/bert-base-ner-train", line 11, in
sys.exit(train_ner())
File "/home/software/anaconda3/lib/python3.6/site-packages/bert_base/runs/init.py", line 37, in train_ner
train(args=args)
File "/home/software/anaconda3/lib/python3.6/site-packages/bert_base/train/bert_lstm_ner.py", line 616, in train
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 451, in train_and_evaluate
return executor.run()
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 590, in run
return self.run_local()
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 691, in run_local
saving_listeners=saving_listeners)
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1167, in _train_model_default
input_fn, model_fn_lib.ModeKeys.TRAIN))
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1011, in _get_features_and_labels_from_input_fn
result = self._call_input_fn(input_fn, mode)
File "/home/software/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1100, in _call_input_fn
return input_fn(**kwargs)
File "/home/software/anaconda3/lib/python3.6/site-packages/bert_base/train/bert_lstm_ner.py", line 329, in input_fn
d = d.apply(tf.data.experimental.map_and_batch(lambda record: _decode_record(record, name_to_features),
AttributeError: module 'tensorflow.data' has no attribute 'experimental'

UnboundLocalError = RCV1 attempt

I'm looking at ways to fine tune with RCV1 as opposed to the supplied dev.txt, train.txt and test.txt.

I also use cased_L-24_H-1024_A-16 as opposed to the Chinese provided one.

Whether I rename the three new files (RCV1, as found and compiled here in the second reply) as dev.txt, train.txt and test.txt or I change them from dev.txt, train.txt and test.txt to eng.testa, eng.testb and eng.train in bert_lstm_ner.py I am faced with this following error:

UnboundLocalError: local variable 'word' referenced before assignment

Any ideas on how to resolve this?

If I simply follow the readme then everything works fine, only once I make a change as mentioned above do I face the problem.

评价指标

@macanv 您好。我对您项目中的评价指标有点疑惑。我看代码里好像是以tag为单位计算的F1（例如 B-LOC，I-LOC，B-ORG, I-ORG 是单独计算的），但NER一般是以实体类型为单位计算的（如LOC, ORG）。您项目里附加的正确的评测脚本conlleva好像并没有用上。

测试结果和验证结果是nan

INFO:tensorflow:***** Eval results *****
INFO:tensorflow: eval_f = nan
INFO:tensorflow: eval_precision = nan
INFO:tensorflow: eval_recall = nan
INFO:tensorflow: global_step = 327
INFO:tensorflow: loss = 2.0464203

INFO:tensorflow:***** Predict results *****
INFO:tensorflow: eval_f = nan
INFO:tensorflow: eval_precision = nan
INFO:tensorflow: eval_recall = nan
INFO:tensorflow: global_step = 327
INFO:tensorflow: loss = 2.4771109

entity level
processed 214543 tokens with 7450 phrases; found: 7740 phrases; correct: 6676.
accuracy: 98.83%; precision: 86.25%; recall: 89.61%; FB1: 87.90
LOC: precision: 87.66%; recall: 89.81%; FB1: 88.72 3549
ORG: precision: 77.08%; recall: 85.69%; FB1: 81.15 2408
PER: precision: 95.85%; recall: 93.90%; FB1: 94.87 1783

运行报错, 没有checkpoint/bert_config.json文件

c_api.TF_GetCode(self.status.status))

tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: checkpoint/bert_config.json : The system cannot find the file specified.
; No such file or directory

将ner改为分类的的模式遇到的问题

我用原生的bert代码训练了一个2分类模型，准备用于意图识别，可是做在线预测的时候，在CPU的环境下，一条预测需要4s多，所以想试试这套试试。
打开服务时，是成功的。

但进行预测时，就一直停在这。

输入的都能打印出来就是进不到下一步

缺少label_list.pkl

Errno 2] No such file or directory: 'F:\BERT-BiLSTM-CRF-NER-master\output\label_list.pkl'
是在运行bert_lstm_ner.py的时候发生的

idx和len_seq的比较是否应该是>=

BERT-BiLSTM-CRF-NER/bert_lstm_ner.py

Line 786 in 38be088

if idx > len_seq:

please help me ,How can I solve this error?

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[8192,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node bert/encoder/layer_11/intermediate/dense/mul_1}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_11/intermediate/dense/BiasAdd, bert/encoder/layer_11/intermediate/dense/mul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node crf_loss/Mean/_4197}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2965_crf_loss/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1117 G /usr/lib/xorg/Xorg 263MiB |
| 0 2047 G compiz 135MiB |
| 0 19793 G ...-token=92AF324520652561B673766DB2684322 58MiB |
+-----------------------------------------------------------------------------+

I run this program in the terminal .......

训练好的同一个model两次预测结果不一致。。。

RT，有人出现过这样的情况吗。。。

关于实体标签的一点小建议

首先感谢作者的分享，关于实体属性现在是通过读取train文件获取标签，建议能放出配置的地方或者放出接口，大家可以根据需要自行设定。主要是在自定义进行模型训练的时候，有一些自定义的标签。

请问下使用的硬件配置

我尝试用8G GPU训练，OOM了（batch_size：32）。

运行了24小时还没结束

训练集和测试集都有点大，80多M和70M。

https://github.com/macanv/BERT-BiLSTM-CRF-NER master clone下来的代码缺了很多东西

如题，https://github.com/macanv/BERT-BiLSTM-CRF-NER master clone下来的代码缺了很多东西，没有bert、train、NERdata等很多内容，麻烦大佬看下，是我下载的不对吗

我本地设备titan xp，运行代码后报OOM错误，是参数太多了吗，有人成功运行过吗

请问有人跟我一样load不进dev test 的数据吗？

我试着运行了主程序，可是发现每次都出现dev test数据load不进去的情况，train就没有这样的问题，查了代码觉得没什么问题，请问有人跟我遇到一样的问题吗？能否帮忙解答一下，我可能错过了什么，万分感谢~

运行报错，没有tf_record文件

请问这是为什么，小白，也不明白这个record是什么。。。

用自己的数据集跑出来的结果不是很好

用默认给的数据集跑出来效果

用自己的数据集跑出来的结果不是很好，

为什么？有人和我一样吗？

在计算真实序列时不应该用input_ids，而应该用input_mask

在代码中调用Blstm + crf时需要传入序列的真实长度列表，但在代码中是用input_ids来求的：
used = tf.sign(tf.abs(input_ids))
lengths = tf.reduce_sum(used, reduction_indices=1)
这个是不是应该改成：
used = tf.sign(tf.abs(input_mask))
lengths = tf.reduce_sum(used, reduction_indices=1)
因为input_mask中真实token是用1表示的，pad是用0表示的。这样调用reduce_sum函数才能得到真实的序列长度。

CoNLL2003数据集测试结果

@macanv，您好，请问您测试了在CoNLL2003数据集上的结果吗？

下载的bert中文预训练文件放在哪里？

请问chinese_L-12_H-768_A-12文件夹是放在BERT-BiLSTM-CRF-NER-master文件夹下么？

你可以尝试一下增加这个条件判断解决

https://github.com/macanv/BERT-BiLSMT-CRF-NER/blob/d5054b26aac0a51adbf6f41bb5b0042af28e6b52/bert_lstm_ner.py#L784
关于上面的索引越界的bug，你可以在line 227添加以下条件进行简单分句加以解决
if len(contends) == 0 and words[-1] in ['。', '？', '！', '.']
你可以测试一下

测试的时候，输入的句子长度是不是有限制？

菜鸟发问，
例如：
xxxx地名xxxx
xxxxxxxxx，xxxx地名xxxx
第二个句子“地名“识别不出来。

运行过程有报错

楼主，您好，有以下报错，请问data.conf这个文件是您忘了上传了吗？还是我需要配置成什么样子的？
Traceback (most recent call last):
File "bert_lstm_ner.py", line 815, in
tf.app.run()
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "bert_lstm_ner.py", line 724, in main
with codecs.open(FLAGS.data_config_path, 'a', encoding='utf-8') as fd:
File "/usr/local/anaconda3/lib/python3.6/codecs.py", line 897, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: '/home/wangyingshuai/BERT-BiLSMT-CRF-NER/data.conf'