Git Product home page Git Product logo

name-entity-recognition's Introduction

Name-Entity-Recognition

Lstm-crf,Lattice-CRF,bert-ner及近年ner相关论文follow

  • ChineseNER 中文NER

tensorflow 1.4.0

use method :

python3 main.py

详细使用原来即实验结果见博客https://www.jianshu.com/p/aed50c1b2930

  • fyz_lattice_NER 中文NER lattice model

pytorch 0.4.0
Python 3.6
use method :

python3 main.py

或者直接配置然后运行:bash fyz_run_decode.sh

详细使用原来即实验结果见博客
文件中需要的两个词向量地址
提取码:vgwi

解压文件 放到data/ 文件夹下即可

  • BERT-BiLSTM-CRF-NER

tensorflow 1.11.0
use method :

下载bert的中文模型
解压放到checkpoint的目录下即可
运行:
python3 main.py(也可以根据代码设置命令行参数)
代码详细使用说明见[博客]:(https://www.jianshu.com/p/b05e50f682dd)

name-entity-recognition's People

Contributors

fuyanzhe avatar fuyanzhe2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

name-entity-recognition's Issues

换成自己的数据集报错

你好,我把数据集换成自己的就会这样报错,请问如何能够解决或者是什么原因呢?

    train(data, save_model_dir,dset_dir, seg)
  File "C:/Users/11/Desktop/Name-Entity-Recognition-master/Name-Entity-Recognition-master/fyz_lattice_NER/main.py", line 310, in train
    print("     Instance: %s; Time: %.2fs; loss: %.4f; acc: %s/%s=%.4f"%(end, temp_cost, sample_loss, right_token, whole_token,(right_token+0.)/whole_token))
ZeroDivisionError: float division by zero

数据集

你好,请教一下,fyz_lattice_NER 中文NER lattice model这个的数据集在哪里下载?
谢谢

main.py文件

请问BERT-BiLSTM-CRF-NER文件夹中没有main.py文件么?

自己的数据集

请问在使用您的模型用在自己的数据集上时,是不是不用输入自己标注实体的tag了呢?

验证时eval_loss为何重新定义,以及和total_loss的区别

为什么验证的时候重新定义了eval_loss,原来tatal_loss此时又代表什么呢?
代码位置如下:

针对NER ,进行了修改

        def metric_fn(label_ids, logits, trans):
            # 首先对结果进行维特比解码
            # crf 解码

            weight = tf.sequence_mask(FLAGS.max_seq_length)
            precision = tf_metrics.precision(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)
            recall = tf_metrics.recall(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)
            f = tf_metrics.f1(label_ids, pred_ids, num_labels, [2, 3, 4, 5, 6, 7], weight)

            return {
                "eval_precision": precision,
                "eval_recall": recall,
                "eval_f": f,
                # "eval_loss": loss,
            }

        eval_metrics = (metric_fn, [label_ids, logits, trans])
        # eval_metrics = (metric_fn, [label_ids, logits])
        output_spec = tf.contrib.tpu.TPUEstimatorSpec(
            mode=mode,
            loss=total_loss,
            eval_metrics=eval_metrics,
            scaffold_fn=scaffold_fn)  #

result of lattice LSTM

Thank you for your work to modify the lattice LSTM code! I have tried your lattice LSTM code, but my f1-value just achieve 0.82, much lower than the other models with the same dataset. I want to know how about your result, tks.
res

为什么crf的损失值会是负数呢???

BERT-BiLSTM-CRF-NER文件夹中bert_lstm_ner.py中的crf损失值在迭代很多次之后会变成负数?不知道为什么会这样,还望解答一下,还有为什么num_labels=len(label_list)+1?我隐约感觉到损失值为负数与标签数是有关系的。

如何取出向量值呢

想问一下大家如何取出字向量的值呢,我没法在原有的图中运行sess.run(tensor),因为我无法定义这个sess。可是当我新建了一个图之后,运行sess.run(tensor),会报错:RuntimeError: The Session graph is empty. Add operations to the graph before calling run()。如图我想取出embeddings中的一个batch的数据,但是当我调用这个函数后,就提示上述错误,我百度谷歌了很多答案,但是都解决不了,连问题出在哪儿都不知道,麻烦大家看看能否提供点思路,谢谢大家。
image

'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

Traceback (most recent call last):
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 231, in
if name == "main":
File "C:\Users\Chenxinliang\Desktop\BERT-NER_3.0_test\venv\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 225, in main
clean(FLAGS)
File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 191, in train

File "C:/Users/Chenxinliang/Desktop/Name-Entity-Recognition-master/Chinese_ner/main.py", line 88, in evaluate
# print(ner_results)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\utils.py", line 69, in test_ner
eval_lines = return_report(output_file)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\conlleval.py", line 282, in return_report
counts = evaluate(f)
File "C:\Users\Chenxinliang\Desktop\Name-Entity-Recognition-master\Chinese_ner\conlleval.py", line 74, in evaluate
for line in iterable:
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 711, in next
return next(self.reader)
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 642, in next
line = self.readline()
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 555, in readline
data = self.read(readsize, firstline=True)
File "C:\Users\Chenxinliang\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 501, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

您好,运行您的chinese_ner 出现这种错误,请问怎么解决呢?

num_labels in crf_layer ?

你好,对于crf_loss的计算有个疑问

在Chinese_ner项目的crf_loss层用的是num_tags+1对输出的头和尾加上了一个类似和的标签,用来计算转移概率。

shape=[self.num_tags + 1, self.num_tags + 1],

但是在BERT-BiLSTM-CRF-NER项目的crf_loss层用的是num_labels,
https://github.com/FuYanzhe2/Name-Entity-Recognition/blob/master/BERT-BiLSTM-CRF-NER/lstm_crf_layer.py#L138
这是因为BERT输出的embedding已经包括[CLS][SEP]吗?

请教一下关于文件缺失的问题

运行 fyz_lattice_NER 时报错如下:

FileNotFoundError: [Errno 2] No such file or directory: 'test_data/fyz.train.embs'

请问是不是缺失了这部分的文件?

纯萌新,因为最近的研究需要参考您的这部分代码,望不吝赐教。:D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.