Comments (4)
你用的es版本是什么?word版本是什么?
请参考项目主页第18条:
18、ElasticSearch插件:
1、打开命令行并切换到elasticsearch的bin目录
cd elasticsearch-2.0.0-beta2/bin
2、运行plugin脚本安装word分词插件:
./plugin install http://apdplat.org/word/archive/v1.3.zip
安装的时候注意:
如果提示:
ERROR: failed to download
或者
Failed to install word, reason: failed to download
或者
ERROR: incorrect hash (SHA1)
则重新再次运行命令,如果还是不行,多试两次
如果是elasticsearch1.x系列版本,则使用如下命令:
./plugin -u http://apdplat.org/word/archive/v1.3.1.zip -i word
3、修改文件elasticsearch-2.0.0-beta2/config/elasticsearch.yml,新增如下配置:
index.analysis.analyzer.default.type : "word"
index.analysis.tokenizer.default.type : "word"
4、启动ElasticSearch测试效果,在Chrome浏览器中访问:
http://localhost:9200/_analyze?analyzer=word&text=杨尚川是APDPlat应用级产品开发平台的作者
5、自定义配置
修改配置文件elasticsearch-2.0.0-beta2/plugins/word/word.local.conf
6、指定分词算法
修改文件elasticsearch-2.0.0-beta2/config/elasticsearch.yml,新增如下配置:
index.analysis.analyzer.default.segAlgorithm : "ReverseMinimumMatching"
index.analysis.tokenizer.default.segAlgorithm : "ReverseMinimumMatching"
这里segAlgorithm可指定的值有:
正向最大匹配算法:MaximumMatching
逆向最大匹配算法:ReverseMaximumMatching
正向最小匹配算法:MinimumMatching
逆向最小匹配算法:ReverseMinimumMatching
双向最大匹配算法:BidirectionalMaximumMatching
双向最小匹配算法:BidirectionalMinimumMatching
双向最大最小匹配算法:BidirectionalMaximumMinimumMatching
全切分算法:FullSegmentation
最少词数算法:MinimalWordCount
最大Ngram分值算法:MaxNgramScore
如不指定,默认使用双向最大匹配算法:BidirectionalMaximumMatching
from word.
首先感谢你的贡献,我使用的es版本是v2.0.0-beta2, word的版本是http://apdplat.org/word/archive/v1.3.zip
在浏览器可成功运行
http://localhost:9200/_analyze?analyzer=word&text=杨尚川是APDPlat应用级产品开发平台的作者
但是无法对es创建索引
from word.
已经修复,请使用elasticsearch-2.0.0-rc1和http://apdplat.org/word/archive/v1.3.zip
from word.
已经可以创建index了,谢谢你
from word.
Related Issues (20)
- 任意一个测试案例启动时报出 java heap out memory exception
- 空格导致 solr6 edismax pf 不能使用
- 对文件进行分词 计算分词速度 java.lang.ArithmeticException: / by zero HOT 2
- 在spring boot下的分词结果不一致的问题 HOT 1
- 词条里面的大些英文被自动转换成小写 HOT 1
- 怎么打包能让词典放在外面方便修改? HOT 4
- word分词出来姓名之后如何程序如何判断是姓名
- 能否指定部分词不切分?
- word1.3 Word类的getPartOfSpeech()方法始终返回null
- 使用http请求动态添加新词的问题
- 用Lucene插件,怎么保留停用词进行分词 HOT 1
- 对于大型语料库的解析,如何能避免GC overhead limit exceeded
- 分词不准确问题
- 正向最大匹配,字典中没有的词语,导致分词结果为空
- 直接代码维护自定义词库会报错,请问调用前需要哪些设置呢 HOT 1
- 请教各位大哥,这个怎么用呢?小白提问
- 在 Elasticsearch 7.4.2 中安装后,启动报错 HOT 1
- 怎么启用三元模型
- 请教怎么将四字短语拆分成两字短语
- 出现 GC overhead limit exceeded java heap out memory exception,如何过滤不需要加载的资源呢?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from word.