Git Product home page Git Product logo

elasticsearch-analysis-turkishstemmer's People

Contributors

astathopoulos avatar bill-kolokithas avatar chief avatar greenonion avatar lovemeblender avatar m-peter avatar ptanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-turkishstemmer's Issues

Stemming Issues

  • Çini (ceramics) stems to Çin (China)
  • Salı (Tuesday) stems to sal
  • kuma (second wife!) stems to kum (sand)
  • vibe stems to vip

Can't load default exceptions list in TurkishStemmerTokenFilterFactory

It seems that newer version of ES may return null from Analysis.getWordList():

    public static List<String> getWordList(Environment env, Settings settings,
                                           String settingPath, String settingList, boolean removeComments) {
        String wordListPath = settings.get(settingPath, null);

        if (wordListPath == null) {
            List<String> explicitWordList = settings.getAsList(settingList, null);
            if (explicitWordList == null) {
                return null;
//(...)

and this is causing NPE in .isEmpty() check:

  private CharArraySet parseExceptions(Environment env, Settings settings, String settingPrefix) {
    List<String> exceptionsList  = Analysis.getWordList(env, settings, settingPrefix);
    if (exceptionsList.isEmpty()) {

In ES 5.x this was not an issue because getReaderFromFile() was used and there was a check for null:

      exceptionsReader = Analysis.getReaderFromFile(env, settings, settingPrefix);
    } catch (InvalidPathException e) {
      logger.info("failed to find the " + settingPrefix + ", using the default set");
    }

    if (exceptionsReader != null) {
      try {
        exceptionsList = Analysis.loadWordList(exceptionsReader, "#");

The whole exception:

java.lang.NullPointerException: Cannot invoke "java.util.List.isEmpty()" because "exceptionsList" is null
	at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.parseExceptions(TurkishStemmerTokenFilterFactory.java:98) ~[?:?]
	at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.parseProtectedWords(TurkishStemmerTokenFilterFactory.java:52) ~[?:?]
	at org.elasticsearch.index.analysis.TurkishStemmerTokenFilterFactory.<init>(TurkishStemmerTokenFilterFactory.java:30) ~[?:?]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:444) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:280) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:215) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:438) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:655) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:558) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1196) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1149) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addComponentTemplate(MetadataIndexTemplateService.java:265) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$2.execute(MetadataIndexTemplateService.java:188) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.10.2.jar:7.10.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

couldn't install plugin

Hi,

We are trying to install that plugin like above settings :

.../_settings -d '{"analysis" : {"filter" : {"stem-turkish" : {"type" : "turkish_stemmer"}}}}'

but we couldnt open the indices after that. U can find logs above :

org.elasticsearch.indices.IndexCreationException: [netmoda] failed to create index
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:312)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:181)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find token filter type [turkish_stemmer] for [stem-turkish]
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:249)
    at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
    at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
    at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
    at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:296)
    ... 7 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [turkish_stemmer]
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:471)
    at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:459)
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:239)
    ... 15 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.turkishstemmer.TurkishStemmerTokenFilterFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:469)

Stem of yüzme, yüzücü etc

"yüzme" ("swimming") and "yüzücü" ("swimmer") should have a common stem. However, we cannot use "yüz" since it means something else entirely ("face"). This could be a bit hard to tackle correctly.

How to install

How can i install stemmer plugin on Elastic 5.x ? is it compatible ?

elasticsearch-analysis-turkishstemmer can't be used when asserts are enabled in Lucene (using Elasticsearch)

Lucene expects that TurkishStemmerTokenFilter is final or at least TurkishStemmerTokenFilter.incrementToken() is final. The check is in boolean org.apache.lucene.analysis.TokenStream.assertFinal():

 private boolean assertFinal() {
    try {
      final Class<?> clazz = getClass();
      if (!clazz.desiredAssertionStatus())
        return true;
      assert clazz.isAnonymousClass() ||
        (clazz.getModifiers() & (Modifier.FINAL | Modifier.PRIVATE)) != 0 ||
        Modifier.isFinal(clazz.getMethod("incrementToken").getModifiers()) :
        "TokenStream implementation classes or at least their incrementToken() implementation must be final";
      return true;
    } catch (NoSuchMethodException nsme) {
      return false;
    }
  }

in production asserts are not evaluated, but when I'm testing/debugging my configuration - it breaks the Elasticsearch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.