Git Product home page Git Product logo

hmm-tagger's Introduction

概述

1.此程序主要用于序列标记,比如中文分词(B/E/M/S),比如词性标注(n/v/a/m...)。
2.此处HMM模型训练,是监督式学习,即训练文件是带有标记的。
3.需要python3及以上,需要安装numpy,文件编码格式需为utf8。
4.先运行data文件夹下的wordseg2hmmtrain.py和pos2hmmtrain.py,再运行hmm_learn.py。

训练HMM模型(hmm_learn.py train model)

比如中文分词 hmm_learn.py data/pd98month1_wordseg_hmmtrain data/pd98month1_wordseg_model
比如词性标注 hmm_learn.py data/pd98month1_pos_hmmtrain data/pd98month1_pos_model

测试HMM模型(hmm_test.py model test result [options])

比如中文分词 hmm_test.py data/pd98month1_wordseg_model data/test_wordseg data/test_wordseg_result
比如词性标注 hmm_test.py data/pd98month1_pos_model data/test_pos data/test_pos_result

data文件夹下的辅助脚本

  • 将分词文本转为hmm训练格式 wordseg2hmmtrain.py

比如 wordseg2hmmtrain.py pd98month1_wordseg pd98month1_wordseg_hmmtrain

  • 将词性标注文本转为hmm训练格式 pos2hmmtrain.py

比如 pos2hmmtrain.py pd98month1_pos pd98month1_pos_hmmtrain

  • 将分词标记模型转为正常分词 wordseg_result_postproc.py

比如 wordseg_result_postproc.py test_wordseg_result test_wordseg_normal

data文件夹下的辅助文件

pd98month1_wordseg(1998年人民日报1月的分词文本)
pd98month1_wordseg_hmmtrain(1998年人民日报1月的分词hmm训练格式)
pd98month1_pos(1998年人民日报1月的词性标注文本)
pd98month1_pos_hmmtrain(1998年人民日报1月的词性标注hmm训练格式)

hmm-tagger's People

Contributors

gavinliu1990 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.