Git Product home page Git Product logo

ner_pytorch's Introduction

NER_pytorch

Named Entity Recognition on CoNLL dataset using BiLSTM+CRF implemented with Pytorch

paper

  • Neural Architectures for Named Entity Recognition

  • End-toEnd Sequence labeling via BLSTM-CNN-CRF

code

This code is customized so that i use latest Pytorch version(1.1.0) starting with https://github.com/ZhixiuYe/NER-pytorch

To use jupyter notebook to visualize the result, i transform ~.py into .ipynb

The f1 score performane of test CoNLL data is 91.3%

Conll performance

f1 91.3%

0. prepare data

To get pre-trained word embedding vector Glove

run prepare_data.ipynb

1. train

150 epoch is enough, 24h with oneP100 GPU, 51 epoch has best f1 score, i use visdom

model shape

  1. word embedding with Glove(100d) + charactor embedding with CNN(25d)

  2. BiLSTM 1 layer + Highway

  3. Linear 400d -> 19d with tanh

     BiLSTM_CRF(
               (char_embeds): Embedding(85, 25)
               (char_cnn3): Conv2d(1, 25, kernel_size=(3, 25), stride=(1, 1), padding=(2, 0))
               (word_embeds): Embedding(400176, 100)
               (dropout): Dropout(p=0.5)
               (lstm): LSTM(125, 200, bidirectional=True)
               (hw_trans): Linear(in_features=25, out_features=25, bias=True)
               (hw_gate): Linear(in_features=25, out_features=25, bias=True)
               (h2_h1): Linear(in_features=400, out_features=200, bias=True)
               (tanh): Tanh()
               (hidden2tag): Linear(in_features=400, out_features=19, bias=True)
     )
    

    run 1. train.ipynb

2. evaluation

run 2. evaluation.ipynb

Result

ex_screenshot

data

https://www.clips.uantwerpen.be/conll2003/ner/

The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Here is an example:

    word     | POS | Syntatic chunk tag | named entity tag
    U.N.       NNP   I-NP                 I-ORG 
    official   NN    I-NP                 O 
    Ekeus      NNP   I-NP                 I-PER 
    heads      VBZ   I-VP                 O 
    for        IN    I-PP                 O 
    Baghdad    NNP   I-NP                 I-LOC 
    .          .     O                    O 

ner_pytorch's People

Contributors

keep-steady avatar

Stargazers

 avatar  avatar  avatar Julio Cesar de Azeredo avatar  avatar Handsomeqqqqqqq avatar Stanislav Chistyakov avatar vanhopex avatar Lohanna avatar Yuchen Zhang avatar Mahmoud1Ali3ng avatar pooh4880 avatar youzilai avatar Young_Painter_L avatar Inhwan Lee avatar MrBananaHuman avatar wonyoungseo avatar Srikanta Prasad (Sri) avatar Anton avatar Tae Young Kang avatar gyunggyung avatar  avatar 설수웅 avatar Austin Hwang avatar Zhifang Fan avatar IrvingBei avatar  avatar  avatar Apurv Verma avatar  avatar Wang, Zhen avatar dawn2034 avatar Chen-Chen Yu avatar  avatar  avatar  avatar Seder(方进) avatar 爱可可-爱生活 avatar yuanke avatar Tiberiu Ichim avatar Mike avatar David S. Batista avatar

Watchers

James Cloos avatar

ner_pytorch's Issues

UnboundLocalError: local variable 'best_idx' referenced before assignment

In Training phase i get this error:

UnboundLocalError                         Traceback (most recent call last)

<ipython-input-30-730ad33b5889> in <module>()
    110     model.train(False)  # evaluation을 위해, 훈련 X
    111     #global best_idx
--> 112     best_train_F, new_train_F, _,_ = evaluating(model, test_train_data, best_train_F, epoch)
    113     best_test_F,  new_test_F,  _,opts.best_idx = evaluating(model, test_data, best_test_F, epoch)
    114     # validation의 결과가 best면 save를 True로, 모델이 저장되도록

<ipython-input-29-76c9c4a8fd6b> in evaluating(model, datas, best_F, epoch, display_confusion_matrix)
    118     # new_F : 현재 sample의   F score
    119     # save  : 최고 F score를 갱신하면 save = True를 return
--> 120     return best_F, new_F, save, best_idx
    121     print(best_idx)

UnboundLocalError: local variable 'best_idx' referenced before assignment

Any idea how to solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.