Git Product home page Git Product logo

fnlp's Introduction

FudanNLP (FNLP)

2018.12.16 我们很高兴发布了FudanNLP的后续版本,一个全新的自然语言处理工具FastNLP。FudanNLP不再更新。

2018.12.16 We are delighted to announce a new brand toolkit FastNLP, a major update of the FudanNLP. The FudanNLP is no longer updated.

====

介绍(Introduction)

FNLP主要是为中文自然语言处理而开发的工具包,也包含为实现这些任务的机器学习算法和数据集。 本工具包及其包含数据集使用LGPL3.0许可证。

FNLP is developed for Chinese natural language processing (NLP), which also includes some machine learning algorithms and [DataSet data sets] to achieve the NLP tasks. FudanNLP is distributed under LGPL3.0.

If you're new to FNLP, check out the Quick Start (使用说明) page.

原FudanNLP项目地址:http://code.google.com/p/fudannlp

功能(Functions)

	信息检索: 文本分类 新闻聚类
	中文处理: 中文分词 词性标注 实体名识别 关键词抽取 依存句法分析 时间短语识别
	结构化学习: 在线学习 层次分类 聚类

[ChangeLog 更新日志(ChangeLog)]
[性能测试(Benchmark)] (Benchmark) [开发计划(Development Plan)] (DevPlan) 开发人员列表(Developers)

Demos

你可以通过试用下面的网站来测试部分功能。 You can also use the following site to check the partial functionality. Demo Website(演示网站)

有遇到FNLP不能处理的例子,请到这里提交: 协同数据收集

有问题请查看FAQ或到 QQ群(253541693)讨论。

使用(Usages)

FNLP入门教程

除了源码文件,还需要下载FNLP模型文件。由于模型文件较大,不便于存放在源码库之中,请至Release页面下载,并将模型文件放在“models”目录。

  • seg.m 分词模型
  • pos.m 词性标注模型
  • dep.m 依存句法分析模型

欢迎大家提供非Java语言的接口。

引用(Citation)

If you would like to acknowledge our efforts, please cite the following paper. 如果我们的工作对您有帮助,请引用下面论文。

	Xipeng Qiu, Qi Zhang and Xuanjing Huang, FudanNLP: A Toolkit for Chinese Natural Language Processing, In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2013.*


	@INPROCEEDINGS{Qiu:2013,
	author = {Xipeng Qiu and Qi Zhang and Xuanjing Huang},
	title = {FudanNLP: A Toolkit for Chinese Natural Language Processing},
	booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics},
	year = {2013},
	}

这里 或 [DBLP](http://scholar.google.com/citations?sortby=pubdate&hl=en&user=Pq4Yp_kAAAAJ&view_op=list_works Google Scholar] 或 [http://www.informatik.uni-trier.de/~ley/pers/hd/q/Qiu:Xipeng.html) 可以找到更多的相关论文。

We used JProfiler to help optimize the code.

本网站(或页面)的文字允许在CC-BY-SA 3.0协议和GNU自由文档许可证下修改和再使用。

fnlp's People

Contributors

felixonmars avatar gitsamshi avatar lsyin avatar qhhonx avatar sywu avatar xpqiu avatar zhudebin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fnlp's Issues

分词不准确

比如下面这个例子

小明硕士毕业于**科学院计算所,后在日本京都大学深造
小明 硕士 毕业于 ** 科学院 计算 所 , 后 在 日本 京都 大学 深造

毕业于的

请勿发布广告信息或其他无关评论,否则将会删除评论并扣分,严重者给予封号处理。
请 勿 发布 广告 信息 或 其他 无关 评论 , 否则 将 会 删除 评论 并扣分 , 严重者 给予 封号 处理 。

并扣分中的

这两个是非常非常非常明显的错误,分词差成这样后面的关键词提取、词性标注都不好做了

app/lucene/demo/BuildIndex.java在lucene5.0.0包下出现错误!

在IndexWriter调用addDocumentupdateDocument时均出现异常,fnlp目前是否只支持到lucene4.7

Exception in thread "main" java.lang.AbstractMethodError: org.apache.lucene.analysis.Analyzer.createComponents(Ljava/lang/String;)Lorg/apache/lucene/analysis/Analyzer$TokenStreamComponents;
    at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:179)
    at org.apache.lucene.document.Field.tokenStream(Field.java:556)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:606)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)

运行Demo异常 基于2015-12-11 Qiu修改POM之后的版本

运行NLP test时异常
模型文件读入错误: ../models/pos.m
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3063)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2864)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1638)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at gnu.trove.map.custom_hash.TObjectIntCustomHashMap.readExternal(TObjectIntCustomHashMap.java:1139)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at java.util.HashMap.readObject(HashMap.java:1180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.fnlp.nlp.cn.tag.AbstractTagger.loadFrom(AbstractTagger.java:205)
at org.fnlp.nlp.cn.tag.AbstractTagger.(AbstractTagger.java:73)
at org.fnlp.nlp.cn.tag.POSTagger.(POSTagger.java:114)
at org.fnlp.demo.nlp.PartsOfSpeechTag.main(PartsOfSpeechTag.java:45)
at org.fnlp.demo.NLPTest.test(NLPTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

org.fnlp.util.exception.LoadModelException: java.io.EOFException: Unexpected end of ZLIB input stream
at org.fnlp.nlp.cn.tag.AbstractTagger.loadFrom(AbstractTagger.java:208)
at org.fnlp.nlp.cn.tag.AbstractTagger.(AbstractTagger.java:73)
at org.fnlp.nlp.cn.tag.POSTagger.(POSTagger.java:114)
at org.fnlp.demo.nlp.PartsOfSpeechTag.main(PartsOfSpeechTag.java:45)
at org.fnlp.demo.NLPTest.test(NLPTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3063)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2864)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1638)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at gnu.trove.map.custom_hash.TObjectIntCustomHashMap.readExternal(TObjectIntCustomHashMap.java:1139)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at java.util.HashMap.readObject(HashMap.java:1180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.fnlp.nlp.cn.tag.AbstractTagger.loadFrom(AbstractTagger.java:205)
... 32 more

this.labels = this.factory.DefaultLabelAlphabet();空指针

CNFactory factory = CNFactory.getInstance("models");执行这句话就抛异常,文件目录没错

Exception in thread "main" java.lang.NullPointerException
at org.fnlp.nlp.cn.tag.AbstractTagger.(AbstractTagger.java:79)
at org.fnlp.nlp.cn.tag.CWSTagger.(CWSTagger.java:75)
at org.fnlp.nlp.cn.CNFactory.loadSeg(CNFactory.java:219)
at org.fnlp.nlp.cn.CNFactory.getInstance(CNFactory.java:164)
at org.fnlp.nlp.cn.CNFactory.getInstance(CNFactory.java:144)

测试异常 基于2015-12-11 Qiu修改POM之后的版本

按照QuickTutorial(链接https://github.com/xpqiu/fnlp/wiki/quicktutorial)中的步骤编译工程,在测试分词(命令 java -Xmx1024m -Dfile.encoding=UTF-8 -classpath "fnlp-core/target/fnlp-core-2.1-SNAPSHOT.jar:libs/trove4j-3.0.3.jar:libs/commons-cli-1.2.jar" org.fnlp.nlp.cn.tag.CWSTagger -s models/seg.m "自然语言是人类交流和思维的主要工具,是人类智慧的结晶。")的时候提醒"找不到或无法加载主类 org.fnlp.nlp.cn.tag.CWSTagger "。

使用Eclipse测试分词时报异常:
java.io.FileNotFoundException: ..\tmp\ar-train.txt (系统找不到指定的路径。)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(Unknown Source)
at org.fnlp.data.reader.SimpleFileReader.init(SimpleFileReader.java:100)
at org.fnlp.data.reader.SimpleFileReader.(SimpleFileReader.java:90)
at org.fnlp.nlp.cn.anaphora.train.ARClassifier.train(ARClassifier.java:144)
at org.fnlp.nlp.cn.anaphora.train.ARClassifier.main(ARClassifier.java:75)
Exception in thread "main" java.lang.NullPointerException
at org.fnlp.data.reader.SimpleFileReader.hasNext(SimpleFileReader.java:116)
at org.fnlp.ml.types.InstanceSet.loadThruStagePipes(InstanceSet.java:214)
at org.fnlp.nlp.cn.anaphora.train.ARClassifier.train(ARClassifier.java:144)
at org.fnlp.nlp.cn.anaphora.train.ARClassifier.main(ARClassifier.java:75)

一点疑问

你好,请问词性标注和命名实体模型是怎么得到的,如果我想增加一类实体类型,比如学校名,应该怎么做?
另外我想往词典里面增加一个词,以及其对应的实体类型,应该怎么做呢?谢谢。

memory usage of the models

seems that the memory usage is quite a problem.
according to my test,
after loading the dep.m model, memory usage increased over 450MB,
after loading the pos.m model, memory usage increased another 240MB,

I also noticed that the dep.m and pos.m are all less than 50MB

I'm not quite familiar with Java, but I think this kind of memory usage can be problematic.

I used runtime.totalMemory() and runtime.freeMemory(); to measure memory usage, I'm not sure whether this makes sense.

I'll keep investigating this memory usage issue, and looking forward to your help.

Thanks!

english speaker

hello,

i'm a software developer in the us. i am interested in multi-lingual computer communication. chinese language would be very useful for an exploratory project, however, i don't speak chinese. could anyone there help me? i would think someone would find this project interesting and i (potentially) have other talented collaborators (based on if they can be convinced of worthiness, etc).

引入正则分词

是否已经有正则分词,如果有,烦请告知调用方法;如没有,能否引入?

加群

输入Neuro Linguistic Programming提示答案错误?求指教

Why using MurmurHash instead of String.hashCode()?

My profiling shows String.getBytes() are quiet slow, while MurmurHash would do getBytes() every single hash because it is bytes oriented.

If the primary purpose is speeding up hashing, why not use String.hashCode() instead?

在运行demo时出现NullPointerException异常

根据QuickTutorial,按照步骤在Eclipse中进行一步步设置,运行demo的时候出现java.lang.NullPointerException,看了下程序源码,貌似在labels = factory.DefaultLabelAlphabet();处爆出,该语句位于org.fnlp.nlp.cn.tag.AbstractTagger类的public AbstractTagger(String file) throws LoadModelException方法中。

请指教。

2.1版本词性标注的一点小问题(标点识别错误)

在快速入门中:
java -Xmx1024m -Dfile.encoding=UTF-8 -classpath "fnlp-core/target/fnlp-core-2.1-SNAPSHOT.jar:libs/trove4j-3.0.3.jar:libs/commons-cli-1.2.jar" org.fnlp.nlp.cn.tag.POSTagger -s models/seg.m models/pos.m "周杰伦出生于台 湾,生日为79年1月18日,他曾经的绯闻女友是蔡依林。"

输出:
周杰伦/人名 出生/动词 于/介词 **/地名 ,/动词 生日/名词 为/介词 79年/时间短语 1月/时间短语 18日/时间短语 ,/标点 他/人称代词 曾经/形容词 的/结构助词 绯闻/名词 女友/名词 是/动词 蔡依林/人名 。/标点

前后两个标点识别结果不一致。

托管在maven上的包安装问题

在配置好pod.xml后,使用以下命令
mV clean install 后出现以下错误:
[INFO] ------------------------------------------------------------------------
Downloading: https://oss.sonatype.org/content/repositories/snapshots/org/fnlp/fnlp-core/2.0/fnlp-core-2.0.pom
Downloading: https://repo.maven.apache.org/maven2/org/fnlp/fnlp-core/2.0/fnlp-core-2.0.pom
Downloaded: https://repo.maven.apache.org/maven2/org/fnlp/fnlp-core/2.0/fnlp-core-2.0.pom (5 KB at 4.4 KB/sec)
Downloading: https://oss.sonatype.org/content/repositories/snapshots/org/fnlp/fnlp-all/2.0-SNAPSHOT/maven-metadata.xml
Downloading: https://oss.sonatype.org/content/repositories/snapshots/org/fnlp/fnlp-all/2.0-SNAPSHOT/fnlp-all-2.0-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.394 s
[INFO] Finished at: 2015-10-17T14:55:30-04:00
[INFO] Final Memory: 10M/305M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project cip-guuud-lib: Could not resolve dependencies for project batc-cip:cip-guuud-lib:pom:0.0.1-SNAPSHOT: Failed to collect dependencies at org.fnlp:fnlp-core:jar:2.0: Failed to read artifact descriptor for org.fnlp:fnlp-core:jar:2.0: Could not find artifact org.fnlp:fnlp-all:pom:2.0-SNAPSHOT in sonatype-snapshots (https://oss.sonatype.org/content/repositories/snapshots) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project cip-guuud-lib: Could not resolve dependencies for project batc-cip:cip-guuud-lib:pom:0.0.1-SNAPSHOT: Failed to collect dependencies at org.fnlp:fnlp-core:jar:2.0

下面省略。该项目是使用fnlp,stanford nlp-core 以及open nlp的项目。 fnlp主要用于对中文的分析。问题在于,无法从托管在Maven的项目上正确下载下来。请问该如何解决?万分感谢

跑测试程序失败

当运行如下代码时,发生错误:

	CNFactory factory = CNFactory.getInstance("models");
	HashMap<String, String> result = factory.ner("詹姆斯·默多克和丽贝卡·布鲁克斯 鲁珀特·默多克旗下的美国小报《纽约邮报》的职员被公司律师告知,保存任何也许与电话窃听及贿赂有关的文件。");

	 	// 显示标注结果
	System.out.println(result);

报如下错误:

Exception in thread "main" java.lang.NoClassDefFoundError: gnu/trove/map/hash/TCharCharHashMap
	at org.fnlp.nlp.cn.ChineseTrans.ensureST(ChineseTrans.java:54)
	at org.fnlp.nlp.cn.ChineseTrans.<init>(ChineseTrans.java:48)
	at org.fnlp.nlp.cn.CNFactory.<clinit>(CNFactory.java:54)
	at Test.main(Test.java:9)
Caused by: java.lang.ClassNotFoundException: gnu.trove.map.hash.TCharCharHashMap
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	... 4 more

我在您给的链接上下载的trove,但并没有TCharCharHashMap这个类。
请问,这个问题怎么解决?

eclipse构建出错


Test set: org.fnlp.nlp.cn.tag.CWSTaggerTest

Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.209 sec <<< FAILURE!
testTagString2(org.fnlp.nlp.cn.tag.CWSTaggerTest) Time elapsed: 0.017 sec <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.fnlp.nlp.cn.tag.CWSTaggerTest.testTagString2(CWSTaggerTest.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

windows下运行 修改classpath 需要加上 .;

java -Xmx1024m -Dfile.encoding=UTF-8 -classpath ".;fnlp-core/target/fnlp-core-2.1-SNAPSHOT.jar;libs/trove4j-3.0.3.jar;libs/commons-cli-1.2.jar" org.fnlp.nlp.cn.tag.POSTagger -s models/seg.m models/pos.m "周杰伦出生于**,生日为79年1月18日,他曾经的绯闻女友是蔡依林。"

之字结构 打错了

org.fnlp.nlp.corpus.ctbconvert.DependentTreeProducter 类中的makeFirstClass方法是打错了,还是就这样定义的?
614: else if(is之字结构(root.label))
root.label.setDepClass("的字结构");

[Anaphora]class中resolve方法 java.lang.NullPointerException

public class AnaphoraResolution {
public static void main(String args[]) throws Exception{
String str2 = "复旦大学创建于1905年,它位于上海市,这个大学培育了好多优秀的学生。";

String str3[] = {"复旦","大学","创建","于","1905年",",","它","位于","上海市",",","这个","大学","培育","了","好多","优秀","的","学生","。"};
String str4[] = {"专有名","名词","动词","介词","时间短语","标点","代词","动词","专有名","标点","限定词","名词","动词","动态助词","数词","形容词","结构助词","名词","标点"};
    String str5[][][] = new String[1][2][str3.length];
    str5[0][0] = str3;
    str5[0][1] = str4;
    Anaphora aa2 = new Anaphora("../models/ar.m");
    LinkedList<EntityGroup> res3 = aa2.resolve(str5,str2);
    System.out.println(res3);
}

}

使用anaphora解释器的时候,在这里LinkedList res3 = aa2.resolve(str5,str2);
总会有java.lang.NullPointerException,demo中的所有方法都试过 都有这个问题~

跑测试失败

运行命令 mvn clean package


Results :

Tests in error:
org.fnlp.nlp.cn.tag.POSTaggerTest: java.io.FileNotFoundException: ../models/pos.m (No such file or directory)
org.fnlp.nlp.cn.tag.CWSTaggerTest: java.io.FileNotFoundException: ../models/seg.m (No such file or directory)

Tests run: 33, Failures: 0, Errors: 2, Skipped: 0

使用FNLPAnalyzer时,highlighter高亮显示将出现错误

QueryStr: 太平洋
使用SmartChineseAnalyzer时,
结果为沃克环流 在赤道附近的<font color='red'>太平洋</font>海区,信风驱使着赤道暖流自东向西流。
使用FNLPAnalyzer时,
结果为沃克环流 在赤道附近<font color='red'>的太</font><font color='red'>在赤</font>平洋海<font color='red'>在赤</font><font color='red'> 在</font>区,信风驱使着赤道暖流自东向西流。

下载的代码缺少类

org.fnlp.wsytry.MultiCorpusClusterTagger
提示缺少:
org.fnlp.corpus.transform.tree.RelationalTree;
org.fnlp.ml.classifier.struct.inf.MultiCorpusViterbi;
org.fnlp.ml.classifier.struct.update.MultiCorpusViterbiPAUpdate;

NullPointerError when invoking CNFactory.parse2T

[java] Exception in thread "main" java.lang.NullPointerException
[java] at org.fnlp.nlp.parser.dep.JointParser._getBestParse(JointParser.java:128)
[java] at org.fnlp.nlp.parser.dep.JointParser.parse2T(JointParser.java:220)
[java] at org.fnlp.nlp.parser.dep.JointParser.parse2T(JointParser.java:230)
[java] at org.fnlp.nlp.cn.CNFactory.parse2T(CNFactory.java:306)

I believe the code that causes this error is the function "private Predict estimateActions(JointParsingState state)" in file "JointParser.java", on line 167
for (int i = 0; i < 2; i++) {
Integer guess = ret.getLabel(i);
if(guess==null) //bug:可能为空,待修改。 xpqiu
break;
String action = la.lookupString(guess);
result.add(action,ret.getScore(i));

Could you please help solve or kindly give some tips on solving this problem?

实体识别有问题

在fnlp-demo下的org.fnlp.demo.nlp.NamedEntityRecognition类运行会报错,原因好像是在part-of-speech标注的时候会产生实体名这样的标签,但是org.fnlp.nlp.cn.PartOfSpeech枚举中没有实体名这一名字导致该类isEntiry函数出错。

CNFactory 的 .ner 方法不应为静态方法

在 Wiki 页面的 Quick Tutorial 介绍中的例子:

public static void main(String[] args) throws Exception {

    // 创建中文处理工厂对象,并使用“models”目录下的模型文件初始化
    CNFactory factory = CNFactory.getInstance("models");

    // 使用标注器对包含实体名的句子进行标注,得到结果
    HashMap result = factory.ner("詹姆斯·默多克和丽贝卡·布鲁克斯 鲁珀特·默多克旗下的美国小报《纽约邮报》的职员被公司律师告知,保存任何也许与电话窃听及贿赂有关的文件。");

    // 显示标注结果
    System.out.println(result);
}

因为 .ner 是静态方法,所以代码提示会给出警告/建议,应该用类名CNFactory调用 .ner 方法,但是这样的话会返回 null .

我想在hadoop里运行一下 FNLP 结果models/pos.m 找不到

我分别在 提交的程序里放了一份
在hadoop/lib 里以散文件形式放了一份
在hadoop/lib里以jar形式放了一份
在 hdfs /home//models 里放了一份
在hdfs /models 里也放了一份
问一下应该放在那里哈?
要不我试试放在/tmp里?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.