Git Product home page Git Product logo

fudannlp's People

Watchers

 avatar  avatar

fudannlp's Issues

词性标注的文档哪里可以下?

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 14 Mar 2012 at 2:41

php jdk

class NplRequest{
    private $fudanUrl = "http://jkx.fudan.edu.cn/fudannlp/";
    private $connecttimeout = 20;
    private $timeout = 10;
    private $ssl_verifypeer = FALSE;

    public $http_code;
    public $http_info = array();
    public $url;

    function npl($key,$str){
        $response = $this->http($this->fudanUrl.$key."/".$str,"GET","");
        return $response;
    }

    function http($url,$method,$param){
        $ci = curl_init();
        curl_setopt($ci, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
        curl_setopt($ci, CURLOPT_CONNECTTIMEOUT, $this->connecttimeout);
        curl_setopt($ci, CURLOPT_TIMEOUT, $this->timeout);
        curl_setopt($ci, CURLOPT_RETURNTRANSFER, TRUE);
        curl_setopt($ci, CURLOPT_ENCODING, "");
        curl_setopt($ci, CURLOPT_SSL_VERIFYPEER, $this->ssl_verifypeer);
        curl_setopt($ci, CURLOPT_SSL_VERIFYHOST, 1);
        curl_setopt($ci, CURLOPT_HEADER, FALSE);

        if($method == "POST"){
            curl_setopt($ci, CURLOPT_POST, TRUE);
        }else{
            $url = "{$url}?{$param}";
        }
        curl_setopt($ci, CURLOPT_URL, $url );
        curl_setopt($ci, CURLINFO_HEADER_OUT, TRUE );

        $response = curl_exec($ci);
        $this->http_code = curl_getinfo($ci, CURLINFO_HTTP_CODE);
        $this->http_info = array_merge($this->http_info, curl_getinfo($ci));
        $this->url = $url;
        curl_close ($ci);
        return $response;
    }
}

Original issue reported on code.google.com by [email protected] on 13 May 2013 at 9:24

如何使用自定义词性标签?

测试POSTagger的时候在dict_pos.txt加入了自定义词性标签,运行��
�现错误:
自定义词性标签只能在下面列表中:...

请问如何使用自定义词性标签?

Original issue reported on code.google.com by [email protected] on 5 Jun 2013 at 8:02

A small bug

"今天好不热闹"的词性标记结果是:
今天/时间短语
好不热闹/标点

Original issue reported on code.google.com by [email protected] on 18 May 2013 at 9:01

一些bad case

浙江省了大批投资
浙江省了解这个情况的人不多
从北京经济南下徐州
发展**家服装需求大增
我们提供高档和服务必前来选购
我们提供高档设备和服务。
这台计算机系统盘出了故障
丹东西安全是我喜欢的地方
南京的市长江大桥说南京市长江大桥好长
这事儿的确定不下来

Original issue reported on code.google.com by [email protected] on 12 Jul 2013 at 10:41

句法分析抛出异常,超内存怎么办?

What steps will reproduce the problem?
1.句法分析抛出异常Exception in thread "main" 
java.lang.OutOfMemoryError: Java heap space
2.at java.lang.reflect.Array.newArray(Native Method)
    at java.lang.reflect.Array.newInstance(Unknown Source)
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 21 Jul 2013 at 8:42

FudanNLP-bin-0.95.zip 词性标注出错

下载工程,导入eclipse中,运行实例代码PartsOfSpeechTag.java,出�
��下错误

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.reflect.Array.newArray(Native Method)
    at java.lang.reflect.Array.newInstance(Array.java:52)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1630)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1322)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
    at edu.fudan.ml.struct.classifier.Linear.readModel(Linear.java:116)
    at edu.fudan.ml.struct.classifier.Linear.readModel(Linear.java:111)
    at edu.fudan.nlp.tag.POSTagger.<init>(POSTagger.java:42)
    at PartsOfSpeechTag.main(PartsOfSpeechTag.java:20)

Original issue reported on code.google.com by [email protected] on 2 Jun 2011 at 10:12

websphare 500错误了

今天看websphare 报错了,啥时候能好啊?
我自己搭建了一个java服务,但是需要内存很大,很难承受得�
��的。求重启。

Original issue reported on code.google.com by [email protected] on 27 May 2013 at 8:54

分词所用词典的记录条数是多少

请问分词所用词典是自己编写的词典还是引用别的词典,大��
�有多少条记录啊?
另外自己能增加带词性的用户词典么?

期待回复

Original issue reported on code.google.com by [email protected] on 31 May 2013 at 2:06

词性标注

每一个段落开始都会有一个空的词性标注(有词性标注,但��
�词)


Original issue reported on code.google.com by [email protected] on 27 Mar 2013 at 2:07

1.5版句法分析测试是否内存泄漏

在运行DepParser 这个类的时候 ,报 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
将本机的内存调整为512后,还是报上述剖,是否有内存泄露��
�题,还是其它原因?

Original issue reported on code.google.com by [email protected] on 15 Mar 2013 at 8:11

调用依存句法分析出错

使用1.5版本的依存句法分析出错。调用JointParser parser = new 
JointParser("models/dep.m");时报内存不够的错误(已设置-Xmx2048m)�
��

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.reflect.Array.newArray(Native Method)
    at java.lang.reflect.Array.newInstance(Unknown Source)
    at java.io.ObjectInputStream.readArray(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readArray(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readObject(Unknown Source)
    at edu.fudan.nlp.parser.dep.YamadaParser.loadModel(Unknown Source)
    at edu.fudan.nlp.parser.dep.YamadaParser.<init>(Unknown Source)
    at edu.fudan.nlp.parser.dep.JointParser.<init>(Unknown Source)
    at com.netease.wordseg.SegDemo.testFudanNLP(SegDemo.java:48)
    at com.netease.wordseg.SegDemo.main(SegDemo.java:72)

Original issue reported on code.google.com by [email protected] on 22 Nov 2012 at 9:34

能支持自定义词性标签吗?

dict_pos.txt中加了一个自定义词性标签,测试出异常:
edu.fudan.util.exception.LoadModelException: 
自定义词性标签只能在下面列表中:...

请问能支持自定义词性标签吗?

Original issue reported on code.google.com by [email protected] on 3 Jun 2013 at 4:40

恭喜1.0发布


介绍下1.0的新特性把

自定义词典和停用词能实现么?




Original issue reported on code.google.com by [email protected] on 1 Aug 2011 at 7:22

使用下载包中的example里的edu.fudan.example.nlp.ChineseWordSegmentation测试出现以下错误:

不使用词典的分词:
媒体 计算 研究所 成立 了 , 高级 数据 挖掘 ( data mining ) 很 
难 。
媒体 计算 研究所 成立 了 , 高级 数据 挖掘 ( data mining ) 很 
难 。

设置临时词典:
java.lang.ArrayIndexOutOfBoundsException: 1

    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:39)

使用词典的分词:
媒体计算研究所 成立 了 , 高级 数据挖掘 很 难

使用不严格的词典的分词:
媒体计算研究所 成立 了 , 高级 数据挖掘 很 难
我 送给 力学系 的 同学 一 个 玩具 ( 送给 给力 力学 力学系 
都 在 词典 中 )

处理文件:
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)
java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.fudan.nlp.cn.tag.format.FormatCWS.toString(FormatCWS.java:82)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:146)
    at edu.fudan.nlp.cn.tag.CWSTagger.tag(CWSTagger.java:1)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:124)
    at edu.fudan.nlp.cn.tag.AbstractTagger.tagFile(AbstractTagger.java:109)
    at edu.fudan.example.nlp.ChineseWordSegmentation.main(ChineseWordSegmentation.java:61)


但当开启英文预处理后,又不会出现以上错误:
32行的语句注释后:tag.setEnFilter(false);
这是什么原因?
































Original issue reported on code.google.com by [email protected] on 3 May 2013 at 3:18

demo cli fails

java -classpath fudannlp.jar;lib/commons-cli-1.2.jar;lib/trove
jar; edu.fudan.nlp.cn.tag.CWSTagger -s models/seg.m 
"复旦自然语言处理是垃圾。"

https://code.google.com/p/fudannlp/wiki/fudannlp_cli

Original issue reported on code.google.com by [email protected] on 9 May 2013 at 7:56

如何使用自定义词性标签?

测试POSTagger的时候在dict_pos.txt加入了自定义词性标签,运行��
�现错误:
自定义词性标签只能在下面列表中:...

请问如何使用自定义词性标签?

Original issue reported on code.google.com by [email protected] on 5 Jun 2013 at 8:05

能不能處理大量文本

不知道能不能處理大部份的資料
我想把檔案掉進去
然後把結果來做人文研究

謝謝

Original issue reported on code.google.com by [email protected] on 19 Mar 2013 at 2:21

PHP

PHP调用接口,有没有文档说明下呢?

Original issue reported on code.google.com by [email protected] on 25 Jul 2013 at 3:27

当句子有字母时分词错误

FudanNLP1.05, 
或者使用在线demo (http://jkx.fudan.edu.cn/nlp/fudannlp.do) 
对如下句子分词:

 VB对动态网页支持不够好

期待结果:至少单词VB后面应该分界:VB 对 动态 网页 支持 
不够 好
程序结果:VB对 动态 网页 支持 不 够 好

如果是训练语料存在这种误差,那么应该进行预处理/后处理�
��采用rules来切分不同字符集之间的混合句子。

Original issue reported on code.google.com by [email protected] on 5 Jun 2013 at 1:28

NERTagger处理以空格开头的文本时异常

版本:1.0

重现步骤:
1. 构建一个文本,以空格(半角或全角)开头。
2. 创建NERTagger对象,装载模型。
3. 用这个NERTagger的tag方法处理这个文本。

实际结果:
tag函数返回一个空的哈希,没有抛出异常;但是标准错误流��
�出了如下异常栈:
java.lang.ArrayIndexOutOfBoundsException: -1
    at edu.fudan.ml.inf.struct.LinearViterbi.getPath(LinearViterbi.java:100)
    at edu.fudan.ml.inf.struct.AbstractViterbi.getBest(AbstractViterbi.java:21)
    at edu.fudan.ml.classifier.Linear.predict(Linear.java:42)
    at edu.fudan.nlp.tag.NERTagger.tag(NERTagger.java:32)
    at com.github.wks.tdtutils.segment.FudanNER.tag(FudanNER.java:20)
    at nerdiagnose.NerDiagnose.main(NerDiagnose.java:22)

期望的结果:
1. 前导空格应该被忽略。
2. 
可以规定NERTagger必须处理某些规范的句子或者篇章,但是如��
�输入是非法的,那么:
   1. 如果可以容错,那么应该给出正确结果,不应该显示异常。如果要记录,应该使用日志记录(如slf4j等框架)。
   2. 如果这个错误是致命的,那么这个异常就应当立即抛出,程序不应该继续执行。

总之,在catch中用e.printStackTrace()来处理异常,然后让程序继��
�执行,是不可靠的。

Original issue reported on code.google.com by [email protected] on 7 Sep 2011 at 2:51

人名识别不准

What steps will reproduce the problem?
1.人是会死的,柏拉图是人

人名识别不是很准确


Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 6:53

SetTagType("en")之后所有的标记都变成NULL

        POSTagger posTagger = new POSTagger("./models/seg.m", "./models/pos.m");
        posTagger.SetTagType("en");
        System.out.println(posTagger.tag("Paper is a thin material mainly used for writing upon, printing upon, drawing or for packaging."));

POS结果如下
Paper/null is/null a/null thin/null material/null mainly/null used/null 
for/null writing/null upon/null ,/null printing/null upon/null ,/null 
drawing/null or/null for/null packaging/null ./null

Original issue reported on code.google.com by [email protected] on 29 Mar 2013 at 3:01

是否可在Python中调用?

想在Python中调用fudannlp
我知道可在Python中调用Java语言
不知道fudannlp这个package是否可以在Python中直接import
谢谢


Original issue reported on code.google.com by [email protected] on 12 Jun 2013 at 6:44

分词有待改善之处

重现步骤
1.针对“穿上日本和服装嫩”进行汉字分词

期望结果
”穿上 日本 和服 装嫩 “

实际得到的结果
“穿 上 日本 和 服装嫩”


使用的版本
webservice: http://jkx.fudan.edu.cn/fudannlp/


Original issue reported on code.google.com by [email protected] on 6 Apr 2011 at 1:44

1.05版本分词器分词bug

发现1.05版本的分词器对于标点和英文单词的分词不是特别好

        tag = new CWSTagger("./models/seg.c7.110918.gz",         "./models/dict.txt");
        System.out.println("\n使用词典");
        str = "今天的#NEXT WAVE#新星是一位“天之骄子”";
        s = tag.tag(str);
        System.out.println(s);
今天的#NEXT WAVE#新星是一位“天之骄子”
会把#NEXT WAVE#分成#NEXT/WAVE#

今天的NEXT WAVE新星是一位“天之骄子”
会把NEXT WAVE分成NEXTWAVE

自定义词典中并无这些单词,请问分词是否仍有特殊配置?

Original issue reported on code.google.com by [email protected] on 30 May 2012 at 6:33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.