skroutz / elasticsearch-analysis-turkishstemmer Goto Github PK
View Code? Open in Web Editor NEWElasticSearch analysis plugin providing Turkish stemming functionality
ElasticSearch analysis plugin providing Turkish stemming functionality
Çini
(ceramics) stems to Çin
(China)Salı
(Tuesday) stems to sal
kuma
(second wife!) stems to kum
(sand)vibe
stems to vip
It seems that newer version of ES may return null
from Analysis.getWordList()
:
public static List<String> getWordList(Environment env, Settings settings,
String settingPath, String settingList, boolean removeComments) {
String wordListPath = settings.get(settingPath, null);
if (wordListPath == null) {
List<String> explicitWordList = settings.getAsList(settingList, null);
if (explicitWordList == null) {
return null;
//(...)
and this is causing NPE in .isEmpty()
check:
private CharArraySet parseExceptions(Environment env, Settings settings, String settingPrefix) {
List<String> exceptionsList = Analysis.getWordList(env, settings, settingPrefix);
if (exceptionsList.isEmpty()) {
In ES 5.x this was not an issue because getReaderFromFile()
was used and there was a check for null
:
exceptionsReader = Analysis.getReaderFromFile(env, settings, settingPrefix);
} catch (InvalidPathException e) {
logger.info("failed to find the " + settingPrefix + ", using the default set");
}
if (exceptionsReader != null) {
try {
exceptionsList = Analysis.loadWordList(exceptionsReader, "#");
The whole exception:
java.lang.NullPointerException: Cannot invoke "java.util.List.isEmpty()" because "exceptionsList" is null
at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.parseExceptions(TurkishStemmerTokenFilterFactory.java:98) ~[?:?]
at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.parseProtectedWords(TurkishStemmerTokenFilterFactory.java:52) ~[?:?]
at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.<init>(TurkishStemmerTokenFilterFactory.java:30) ~[?:?]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:444) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:280) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:215) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:438) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:655) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:558) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1196) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1149) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addComponentTemplate(MetadataIndexTemplateService.java:265) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$2.execute(MetadataIndexTemplateService.java:188) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.10.2.jar:7.10.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Hi,
We are trying to install that plugin like above settings :
.../_settings -d '{"analysis" : {"filter" : {"stem-turkish" : {"type" : "turkish_stemmer"}}}}'
but we couldnt open the indices after that. U can find logs above :
org.elasticsearch.indices.IndexCreationException: [netmoda] failed to create index
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:312)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:181)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find token filter type [turkish_stemmer] for [stem-turkish]
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:249)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:296)
... 7 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [turkish_stemmer]
at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:471)
at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:459)
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:239)
... 15 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.turkishstemmer.TurkishStemmerTokenFilterFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:469)
ERROR: This plugin was built with an older plugin structure. Remove the intermediate "elasticsearch" directory within the plugin zip
"yüzme" ("swimming") and "yüzücü" ("swimmer") should have a common stem. However, we cannot use "yüz" since it means something else entirely ("face"). This could be a bit hard to tackle correctly.
How can i install stemmer plugin on Elastic 5.x ? is it compatible ?
Lucene expects that TurkishStemmerTokenFilter
is final or at least TurkishStemmerTokenFilter.incrementToken()
is final. The check is in boolean org.apache.lucene.analysis.TokenStream.assertFinal()
:
private boolean assertFinal() {
try {
final Class<?> clazz = getClass();
if (!clazz.desiredAssertionStatus())
return true;
assert clazz.isAnonymousClass() ||
(clazz.getModifiers() & (Modifier.FINAL | Modifier.PRIVATE)) != 0 ||
Modifier.isFinal(clazz.getMethod("incrementToken").getModifiers()) :
"TokenStream implementation classes or at least their incrementToken() implementation must be final";
return true;
} catch (NoSuchMethodException nsme) {
return false;
}
}
in production asserts are not evaluated, but when I'm testing/debugging my configuration - it breaks the Elasticsearch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.