Git Product home page Git Product logo

Comments (6)

anderscui avatar anderscui commented on August 16, 2024

@nelson6699520 嗯,没注意到jieba有更新。。。这个要更新的,主要根据jieba的更新内容。

另外,请问有分词相关的其它需求吗?如果可以,还是希望.NET版本有点自己的内容:)

from jieba.net.

nelson6699520 avatar nelson6699520 commented on August 16, 2024

我認為jieba缺乏一些較為實用的分詞功能,例如:分詞器沒有stop word; 中文數字轉阿拉伯數字;字典同義詞功能。(如有請告之)

同時我發現jeiba.net佔用記憶體較大,不知道可否優化,期望它越來越完善。

from jieba.net.

anderscui avatar anderscui commented on August 16, 2024

@nelson6699520 谢谢,stop words这里需要改进一下,我也有同感;同义词的话需要找一个好的corpus,容我慢慢看:)

from jieba.net.

Linusp avatar Linusp commented on August 16, 2024

@anderscui 同义词的话 Open Multilingual Wordnet 就可以,只是里面的内容质量怎么样不太清楚

from nltk.corpus import wordnet as wn

for syn in wn.synsets(u'快乐'), lang='cmn'):
    print u'/ '.join(syn.lemma_names(lang='cmn'))

结果

快乐/ 愉快
喜悦/ 快乐/ 快活/ 欢乐/ 欢喜/ 高兴
好运气/ 幸福/ 幸运/ 快乐

注,上述代码需要先下载 omw 语料

import nltk

nltk.download('omw')

from jieba.net.

anderscui avatar anderscui commented on August 16, 2024

@nelson6699520 @Linusp 请问你们觉得什么场景下面需要返回同义词?返回的结果希望是什么?

比如:他看起来很高兴。

这句话分词之后,有一个“高兴”,如果我要查询它的同义词,可以用这样的方式:

var w = "高兴";
var syns = dict.synonym(w);
// syns => { "快乐", "欢喜", "欢乐" } 

还是说分词之后直接就包含了同义词?

from jieba.net.

nelson6699520 avatar nelson6699520 commented on August 16, 2024

我期望是分词后包含同义词。

from jieba.net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.