Comments (6)
@yukaizhao , easy_install被切开了,是因为我没有把 ‘_’ 纳入可参与成词的字符。英文半角的空格的确被yield出来了,过滤一下就好了。“好用的”被切成单字,是因为“好”和“用“的单字概率较大,除非你在自定义词典中调高”好用“的概率。
你git pull 一下,看看test_userdict.py 这个例子能否满足你的需求?
from jieba.
效果:
easy_install
is
great
python
的
正则表达式
是
好用
的
·
from jieba.
谢谢回复。
_下划线不应该作为单词分隔的依据希望在新的稳定版本中修复此问题
另外好,用单字的概率是很大,但是当他们在一起时则不应该切成两个词,这个问题是不是可以从算法上修改呀,也希望能做修改增强。
from jieba.
另外半角空格被yield出来其实是完全没有意义的,建议在jieba中处理,而不是让每个使用jieba的用户都去过滤这个空格
from jieba.
@yukaizhao , 空格已经在昨天的提交中过滤掉了。
from jieba.
@yukaizhao ,这个不是算法的问题,主要是词典中的词频有些并不准确。
from jieba.
Related Issues (20)
- 一直有个疑惑:jieba中的lcut()方法中的“l”代表什么意思? HOT 3
- 原来作者去搞web3去了,这个项目很少维护了 HOT 2
- 如何快速用上官方自带的停用词库? HOT 1
- 弹幕分析 HOT 2
- DELETED
- 连续跑了100w行之后就坏了 HOT 2
- 分词后的结果如何还原
- Keywords are not extracted in Farsi (Persian) documents
- A Python issue
- '是因为' doesn't cut as expected HOT 2
- debug状态下,jieba.enable_paddle()需要装opencv
- 词性划分明显错误 HOT 1
- How to change the decoder HOT 3
- in 1 py, every time when Jieba was called, it needs "building..., loading, loading". Why it needs building every time? HOT 1
- 大家散了吧!LLM的分词效果更好 HOT 4
- vivos和vivo同时存在于字典中,但是vivo无法被召回 HOT 2
- enable_paddle不支持paddlepaddle 2.5.0版本
- Errors while splitting a text HOT 1
- jieba.set_dictionary 给定的词性没有效果,统一变成了x HOT 2
- 好久没更新了 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jieba.