Git Product home page Git Product logo

xuwenzhi1104 / mynlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mayabot/mynlp

0.0 1.0 0.0 6.42 MB

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

Home Page: https://mynlp.mayabot.com/

License: Apache License 2.0

Java 63.21% Kotlin 36.76% Shell 0.03%

mynlp's Introduction

Mynlp: 高性能、可扩展的中文NLP工具包

mynlp badge all mynlp mynlp

mynlp banner
Note
访问完整在线文档 mynlp.mayabot.com

安装

该章节介绍如何安装和简单使用mynlp的基础功能。

mynlp已经发布在Maven**仓库中,所以只需要在Maven或者Gradle中引入mynlp.jar依赖即可。

Gradle
compile 'com.mayabot.mynlp:mynlp:4.0.0'
Maven
<dependency>
  <groupId>com.mayabot.mynlp</groupId>
  <artifactId>mynlp</artifactId>
  <version>4.0.0</version>
</dependency>

因为资源文件较大,所以mynlp.jar包默认不包括资源文件(词典和模型文件)依赖。

懒人方案,通过引用mynlp-all依赖默认提供的资源词典,满足大部分需求。

依赖 mynlp-all
compile 'com.mayabot.mynlp:mynlp-all:4.0.0'

词典和模型资源

Table 1. 词典&模型资源列表
Gradle 坐标 mynlp-all依赖 文件大小 说明

com.mayabot.mynlp.resource:mynlp-resource-coredict:1.0.0

Y

18.2M

核心词典(20w+词,500w+二元)

com.mayabot.mynlp.resource:mynlp-resource-pos:1.0.0

Y

17.5M

词性标注模型(感知机模型)

com.mayabot.mynlp.resource:mynlp-resource-ner:1.0.0

Y

13.4M

命名实体识别(人名识别、其他NER)

com.mayabot.mynlp.resource:mynlp-resource-pinyin:1.1.0

Y

272K

拼音词典、拼音切分模型

com.mayabot.mynlp.resource:mynlp-resource-transform:1.0.0

Y

478K

繁简体词典

com.mayabot.mynlp.resource:mynlp-resource-cws:1.0.0

N

62.4M

感知机分词模型

com.mayabot.mynlp.resource:mynlp-resource-custom:1.0.0

N

2.19M

自定义扩展词库

根据实际的需要,按需引入资源包。

一个Gradle引用的例子
compile 'com.mayabot.mynlp:mynlp:3.2.0'

// 核心词典
implementation 'com.mayabot.mynlp.resource:mynlp-resource-coredict:1.0.0'

// 词性标注
implementation 'com.mayabot.mynlp.resource:mynlp-resource-pos:1.0.0'

// 命名实体
implementation 'com.mayabot.mynlp.resource:mynlp-resource-ner:1.0.0'

// 拼音
implementation 'com.mayabot.mynlp.resource:mynlp-resource-pinyin:1.1.0'

// 繁简体转换
implementation 'com.mayabot.mynlp.resource:mynlp-resource-transform:1.0.0'

// 感知机分词模型
//   implementation 'com.mayabot.mynlp.resource:mynlp-resource-cws:1.0.0'

// 自定义扩展词库
//   implementation 'com.mayabot.mynlp.resource:mynlp-resource-custom:1.0.0'

访问完整在线文档

致谢以下优秀开源项目

  • HanLP

  • ansj_seg

mynlp实现参考了他们算法实现和部分代码

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.