Git Product home page Git Product logo

ner_telugu's Introduction

Named Entity Recognition for Telugu using LSTM-CRF

The code for the paper titled "Named Entity Recognition for Telugu using LSTM-CRF".

The dataset can be found in the data/Gold_Data_Telugu folder. The code for reproducing the results is in the lstmcrf folder.

Steps to reproduce LSTM-CRF results:

  1. Download fastText pre-trained word vectors for Telugu from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md and put them in a folder called vectors in the data directory (ie. in data/vectors).

  2. Run the build_data.py file which generates the vocabulary and the directory structure

  3. Train the model by running train.py

  4. Get the model's predictions on the test set by executing predict_test.py

  5. Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/LSTM-CRF/predictions/predictions_9-no-dev.txt". The values of the various metrics will be displayed.

Steps to reproduce YamCha results:

  1. Run "./configure"

  2. Run "make"

  3. Execute "sudo make install"

  4. Execute 'make CORPUS=../data/Gold_Data_Telugu/train_sentences_9_IOB.txt MODEL=mon_project train SVM_PARAM="-t1 -d2 -c1" train' in the yamcha folder to train the model.

  5. Execute 'yamcha -m mon_project.model < ../data/Gold_Data_Telugu/test_sentences_9_IOB.txt > ../data/YamCha/results9_IOB.txt' to get the predictions of the model of the test set.

  6. Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/YamCha/results9_IOB.txt". The values of the various metrics will be displayed.

Steps to reproduce CRF++ results:

  1. Run "./configure"

  2. Run "make"

  3. Execute "sudo make install"

  4. To train the model, run "./crf_learn -f 3 -c 1.5 template ../data/Gold_Data_Telugu/train_sentences_9_IOB.txt model"

  5. To get the predictions of the model of the test set, run "./crf_test -m model ../data/Gold_Data_Telugu/test_sentences_9_IOB.txt > ../data/CRF++/results9_IOB.data"

  6. Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/CRF++/results9_IOB.data". The values of the various metrics will be displayed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.