Comments (4)
Done. (7.4.0)
Some important differences:
- the
htmlOffsetAdjust
is not in Solr because I didn't want to add yet another dependency to Solr, even if the license was fine. - ConcatenateGraphFilterFactory is in Lucene and is better than the STT's ConcatenateFilterFactory. The Tagger (both in Solr and this project) is hard-wired to use a particular separator char, and that separator char is different between the tagger in Solr, and the STT here. Thus you can't use the STT here with CGFF, nor can you use Solr's new tagger with the CFF here.
- The tagger in Solr cannot be used with a shingling strategy to find partial dictionary matches because of difficulties in being able to configure the single factory. We'd need to specify the particular character the CGFF uses, as it is not valid in XML and Solr's schema is XML. Any way this feature seems dubious.
from solrtexttagger.
hi @dsmiley , sorry for bringing this closed issue. I'm wondering whether for the last point, where we can't do any partial match to the document anymore, we have a solution to work around it? I found a reference from stackoverflow asking exactly this question as well https://stackoverflow.com/questions/58413033/is-there-a-way-to-use-solr-text-tagger-along-with-n-edge-gram-filter. thanks a lot!
from solrtexttagger.
My release note there pertained to Shingling, which combines spans of tokens prior to CGFF. But I see you are using NGram (partial within-word matches) applied after CGFF. My note doesn't apply then. I think your configuration should work but I don't know why it doesn't. You may have to experiment a bit. Like look at the terms using the "/terms" to see if they look as expected. Failing that maybe use a debugger to see what's up. Maybe you need to add a filter that trims off a trailing null byte, which could happen... but I don't see how that inefficiency would cause the overall approach you have to not work. I'm too busy to troubleshoot this. I'm not sure how to subscribe/watch particular stackoverflow questions but FWIW I did up-vote it and marked it as a favorite.
from solrtexttagger.
thanks for the answer! sure, I understand you can't help to troubleshoot this, and I'm doing the debugging right now. so far what I can find is there are no ngram terms on /terms
, so maybe something weird happens after CGFF. still trying to find out how to debug solr since this is my first time using solr. thanks anyway for the help @dsmiley !
from solrtexttagger.
Related Issues (20)
- SolrTextTagger not working with EmbeddedSolr 6.2.1 HOT 2
- implementing fuzzy matching HOT 2
- Copyrights, Org, etc. HOT 1
- OpenSextant projects add your topics. HOT 4
- SOLR 7 HOT 6
- Retrieve tagged term HOT 2
- synonyms with SolrTextTagger HOT 1
- Running the 100cities.txt example HOT 2
- Error while request tags: TaggerRequestHandler requires text to be POSTed to it HOT 1
- tagging within html attributes HOT 1
- FSTOrdPostingsFormat could enable faster Tagger HOT 1
- Can't create a schema with ConcatenateFilterFactory HOT 2
- Bring the ConcatenateFilter upstream to Lucene HOT 1
- Each tag in the output is an array of names and values instead of an object HOT 2
- htmlOffsetAdjust and matchText enabled gets StringIndexOutOfBoundsException HOT 7
- Small slowdown in tagging performance after moving to the Solr 7.4 built-in tagger handler HOT 1
- how to use the ConcatenateFilterFactory with solr 7.6 HOT 1
- Issue for creating collection in solrcloud. HOT 2
- why do you use FST HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from solrtexttagger.