Git Product home page Git Product logo

Comments (4)

dsmiley avatar dsmiley commented on July 20, 2024

Done. (7.4.0)

Some important differences:

  • the htmlOffsetAdjust is not in Solr because I didn't want to add yet another dependency to Solr, even if the license was fine.
  • ConcatenateGraphFilterFactory is in Lucene and is better than the STT's ConcatenateFilterFactory. The Tagger (both in Solr and this project) is hard-wired to use a particular separator char, and that separator char is different between the tagger in Solr, and the STT here. Thus you can't use the STT here with CGFF, nor can you use Solr's new tagger with the CFF here.
  • The tagger in Solr cannot be used with a shingling strategy to find partial dictionary matches because of difficulties in being able to configure the single factory. We'd need to specify the particular character the CGFF uses, as it is not valid in XML and Solr's schema is XML. Any way this feature seems dubious.

from solrtexttagger.

akurniawan avatar akurniawan commented on July 20, 2024

hi @dsmiley , sorry for bringing this closed issue. I'm wondering whether for the last point, where we can't do any partial match to the document anymore, we have a solution to work around it? I found a reference from stackoverflow asking exactly this question as well https://stackoverflow.com/questions/58413033/is-there-a-way-to-use-solr-text-tagger-along-with-n-edge-gram-filter. thanks a lot!

from solrtexttagger.

dsmiley avatar dsmiley commented on July 20, 2024

My release note there pertained to Shingling, which combines spans of tokens prior to CGFF. But I see you are using NGram (partial within-word matches) applied after CGFF. My note doesn't apply then. I think your configuration should work but I don't know why it doesn't. You may have to experiment a bit. Like look at the terms using the "/terms" to see if they look as expected. Failing that maybe use a debugger to see what's up. Maybe you need to add a filter that trims off a trailing null byte, which could happen... but I don't see how that inefficiency would cause the overall approach you have to not work. I'm too busy to troubleshoot this. I'm not sure how to subscribe/watch particular stackoverflow questions but FWIW I did up-vote it and marked it as a favorite.

from solrtexttagger.

akurniawan avatar akurniawan commented on July 20, 2024

thanks for the answer! sure, I understand you can't help to troubleshoot this, and I'm doing the debugging right now. so far what I can find is there are no ngram terms on /terms, so maybe something weird happens after CGFF. still trying to find out how to debug solr since this is my first time using solr. thanks anyway for the help @dsmiley !

from solrtexttagger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.