fuyanzhe2 / name-entity-recognition Goto Github PK

View Code? Open in Web Editor NEW

563.0 563.0 195.0 18.37 MB

Lstm-crf,Lattice-CRF,bert-ner及近年ner相关论文follow

Python 94.28% Shell 0.14% Perl 5.57%

name-entity-recognition's Introduction

Name-Entity-Recognition

Lstm-crf,Lattice-CRF,bert-ner及近年ner相关论文follow

ChineseNER 中文NER

tensorflow 1.4.0

use method :

python3 main.py

详细使用原来即实验结果见博客https://www.jianshu.com/p/aed50c1b2930

fyz_lattice_NER 中文NER lattice model

pytorch 0.4.0
Python 3.6
use method :

python3 main.py

或者直接配置然后运行：bash fyz_run_decode.sh

详细使用原来即实验结果见博客
文件中需要的两个词向量地址：
提取码：vgwi

解压文件放到data/ 文件夹下即可

BERT-BiLSTM-CRF-NER

tensorflow 1.11.0
use method :

下载bert的中文模型
解压放到checkpoint的目录下即可
运行：
python3 main.py（也可以根据代码设置命令行参数）
代码详细使用说明见[博客]：(https://www.jianshu.com/p/b05e50f682dd)

name-entity-recognition's People

Contributors

Stargazers

Watchers

Forkers

jcsyl lbda1 tifoit chaconez uestc-chen delaiahz kun-cockpit-tech alucardmini elegant-bot zhangjiekui dx2048 chl916185 chongp akalz cjm1044642385 lydonnieliu chenny0808 zhangyunfeng111 moolighty paulpig qianrenjian xunan0812 snowcranestart zhongyunuestc zyxpaidaxing dst1213 tianyunzqs pokbe gdh756462786 mennianshi wuliuyuedetian areafather leileixiao rock999 foye501 corgi66 moshizhiyin thzll2001 zhanglv0209 hanhongchang shencunzailaozhang hkxiron lvcheer cdjasonj cnfive shunyuanxue tslnihaogit fw339wj lzjtt2017 chuyelei aeoling cdhero yuweifamily hogking caoyuji1986 moonlione liaomingyue janciswang fyh97 vail-qin callmeno1 lwh2016 debby1103 wengbenjue liuhaolinwen fengdf frances255 18813055625 wanghychn crystal0913 hoofaya joydajunspacecraft xiaoduozhou newzq melina-zh jemmryx wilsonsky18 huanyunxuanzi jdc08161063 smilealvin allensmile jasonhoou yolymaker nxw1994 simona081 casually-pylearner xtvf callmechenchen yueyedeai zp1481616577 ericperfect berryhn scievan lei522 enno-h xiaojie2018 sofiaxue xuezhongfei2008 90217 thcrwi

name-entity-recognition's Issues

换成自己的数据集报错

你好，我把数据集换成自己的就会这样报错，请问如何能够解决或者是什么原因呢？

    train(data, save_model_dir,dset_dir, seg)
  File "C:/Users/11/Desktop/Name-Entity-Recognition-master/Name-Entity-Recognition-master/fyz_lattice_NER/main.py", line 310, in train
    print("     Instance: %s; Time: %.2fs; loss: %.4f; acc: %s/%s=%.4f"%(end, temp_cost, sample_loss, right_token, whole_token，(right_token+0.)/whole_token))
ZeroDivisionError: float division by zero

为什么+1 num_labels=len(label_list) + 1

为什么模型输出要比label_list多一个呢
https://github.com/FuYanzhe2/Name-Entity-Recognition/blob/master/BERT-BiLSTM-CRF-NER/bert_lstm_ner.py#L652

数据集

你好，请教一下，fyz_lattice_NER 中文NER lattice model这个的数据集在哪里下载？
谢谢

请教一下，global_step/sec: 这是什么鬼，怎么不让他输出这些啊

输出有很多这个日志，这是干什么的啊 @FuYanzhe2 @fuyanzhe
I0823 07:37:37.097895 139764188952448 tpu_estimator.py:2159] global_step/sec: 0.522892
I0823 07:37:37.098161 139764188952448 tpu_estimator.py:2160] examples/sec: 16.7326
I0823 07:37:39.207139 139764188952448 tpu_estimator.py:2159] global_step/sec: 0.

自己的数据集

请问在使用您的模型用在自己的数据集上时，是不是不用输入自己标注实体的tag了呢？

您好，我看了您的命名实体识别的代码，发现在评估代码的时候会出现，utf8错误，如果更改evesteps值为1，有结果但为nan，还有就是 out of index 怎么解决呀

您好，我看了您的命名实体识别的代码，发现在评估代码的时候会出现，utf8错误，如果更改evesteps值为1，有结果但为nan，还有就是 out of index 怎么解决呀

tf_metrics.py 这里面的接口有用吗

我这边调用了这个接口，返回的都是nan 这个要怎么办

验证时eval_loss为何重新定义，以及和total_loss的区别

为什么验证的时候重新定义了eval_loss，原来tatal_loss此时又代表什么呢？
代码位置如下：

针对NER ,进行了修改

        def metric_fn(label_ids, logits, trans):
            # 首先对结果进行维特比解码
            # crf 解码

            weight = tf.sequence_mask(FLAGS.max_seq_length)
            precision = tf_metrics.precision(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)
            recall = tf_metrics.recall(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)
            f = tf_metrics.f1(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)

            return {
                "eval_precision": precision,
                "eval_recall": recall,
                "eval_f": f,
                # "eval_loss": loss,
            }

        eval_metrics = (metric_fn, [label_ids, logits, trans])
        # eval_metrics = (metric_fn, [label_ids, logits])
        output_spec = tf.contrib.tpu.TPUEstimatorSpec(
            mode=mode,
            loss=total_loss,
            eval_metrics=eval_metrics,
            scaffold_fn=scaffold_fn)  #

fyz_lattice_NER不管是跑main.py或是fyz_run_decode.sh都报错呢

已经按ｍｄ文件在fyz_lattice_NER下载好lattice 词向量了，工作目录ｔｒｅｅ如下：

可是不管是跑main.py或是fyz_run_decode.sh都报缺少文件，内容如下：

result of lattice LSTM

Thank you for your work to modify the lattice LSTM code! I have tried your lattice LSTM code, but my f1-value just achieve 0.82, much lower than the other models with the same dataset. I want to know how about your result, tks.

为什么crf的损失值会是负数呢？？？

BERT-BiLSTM-CRF-NER文件夹中bert_lstm_ner.py中的crf损失值在迭代很多次之后会变成负数？不知道为什么会这样，还望解答一下，还有为什么num_labels=len(label_list)+1?我隐约感觉到损失值为负数与标签数是有关系的。

运行的时候为什么老是报 name “os” is not defined 呢，都 import os了呀，请指教，谢谢

Traceback (most recent call last):
File "main.py", line 48, in
flags.DEFINE_string("emb_file", "wiki_100.utf8", "Path for pre_trained embedding")
NameError: name 'os' is not defined

crf层并没有起作用

label_test.txt文件中会出现O后面连接I的情况

BERT那个为什么没有main.py，应该怎么运行呢

如何取出向量值呢

想问一下大家如何取出字向量的值呢，我没法在原有的图中运行sess.run(tensor)，因为我无法定义这个sess。可是当我新建了一个图之后，运行sess.run(tensor)，会报错：RuntimeError: The Session graph is empty. Add operations to the graph before calling run()。如图我想取出embeddings中的一个batch的数据，但是当我调用这个函数后，就提示上述错误，我百度谷歌了很多答案，但是都解决不了，连问题出在哪儿都不知道，麻烦大家看看能否提供点思路，谢谢大家。

BERT的预训练数据和方法在哪里？

请问数据集在哪里呢

'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

Traceback (most recent call last):
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 231, in
if name == "main":
File "C:\Users\Chenxinliang\Desktop\BERT-NER_3.0_test\venv\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 225, in main
clean(FLAGS)
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 191, in train

File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 88, in evaluate
# print(ner_results)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\utils.py", line 69, in test_ner
eval_lines = return_report(output_file)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\conlleval.py", line 282, in return_report
counts = evaluate(f)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\conlleval.py", line 74, in evaluate
for line in iterable:
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 711, in next
return next(self.reader)
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 642, in next
line = self.readline()
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 555, in readline
data = self.read(readsize, firstline=True)
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 501, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

您好，运行您的chinese_ner 出现这种错误，请问怎么解决呢？

num_labels in crf_layer ?

你好，对于crf_loss的计算有个疑问

在Chinese_ner项目的crf_loss层用的是num_tags+1对输出的头和尾加上了一个类似和的标签，用来计算转移概率。

Name-Entity-Recognition/Chinese_ner/model.py

Line 182 in 598b264

shape=[self.num_tags + 1, self.num_tags + 1],

但是在BERT-BiLSTM-CRF-NER项目的crf_loss层用的是num_labels，
https://github.com/FuYanzhe2/Name-Entity-Recognition/blob/master/BERT-BiLSTM-CRF-NER/lstm_crf_layer.py#L138
这是因为BERT输出的embedding已经包括[CLS]和[SEP]吗？

请教一下关于文件缺失的问题

运行 fyz_lattice_NER 时报错如下：

FileNotFoundError: [Errno 2] No such file or directory: 'test_data/fyz.train.embs'

请问是不是缺失了这部分的文件？

纯萌新，因为最近的研究需要参考您的这部分代码，望不吝赐教。:D