Comments (7)
Thanks for reporting... I didn't get a chance to look today but will hopefully look closer tomorrow (Friday).
from solrtexttagger.
Thanks, i guess ignoring wrong offset correction might also one solution
if(offsetPair[0]>=0 && offsetPair[1]>=0 ) {
startOffset = offsetPair[0];
endOffset = offsetPair[1];
}
from solrtexttagger.
I'd rather not put band-aids on the consumption end; I'd rather have reasonable offsets at the source end. Thus -- fix the bug. Thanks for sample data... I ran it through a debugger and it appears we need to call startTag.isSyntacticalEmptyElementTag(). It's not completely clear if testing this actually requires more than one tag; if not you can leave assertXmlTag
alone. Can you work on a fix with test?
from solrtexttagger.
Thanks applied fix. change to asertXmlTag is not required, i was trying to add testcase according to exact scenario i had encountered but this fix does not need it.
updated pull request.
from solrtexttagger.
is there any reason, html offset corrector have not been migrated to apache solr? I was planning to move everything solr 7.4 so trying to find if it has any missing feature.
from solrtexttagger.
I wasn't quite comfortable with adding yet another dependency for an optional feature of a fringe feature of everyone using Solr. And I have doubts about the overall approach... it works, isn't bad but it'd be really nice if (somehow) the HtmlStripCharFilterFactory (or derivative) analysis component handled this sort of thing instead of the tagger itself needing to have such a feature, which feels a bit bolted on. I don't have the bandwidth to tackle redoing it though.
#28 (comment)
from solrtexttagger.
BTW a TODO in this project is to extend the hander shipping in Solr 7.4 to provide the htmlOffsetAdjust option. I had this in mind when I ported it to Solr so that this feature could be added separately. Perhaps that would be a separate branch which only provides that feature, tests, that, and does nothing else.
from solrtexttagger.
Related Issues (20)
- SolrTextTagger not working with EmbeddedSolr 6.2.1 HOT 2
- implementing fuzzy matching HOT 2
- Copyrights, Org, etc. HOT 1
- OpenSextant projects add your topics. HOT 4
- SOLR 7 HOT 6
- Retrieve tagged term HOT 2
- synonyms with SolrTextTagger HOT 1
- Running the 100cities.txt example HOT 2
- Error while request tags: TaggerRequestHandler requires text to be POSTed to it HOT 1
- tagging within html attributes HOT 1
- FSTOrdPostingsFormat could enable faster Tagger HOT 1
- Can't create a schema with ConcatenateFilterFactory HOT 2
- Bring the ConcatenateFilter upstream to Lucene HOT 1
- Bring the TaggerRequestHandler to Solr (thus everything?) HOT 4
- Each tag in the output is an array of names and values instead of an object HOT 2
- Small slowdown in tagging performance after moving to the Solr 7.4 built-in tagger handler HOT 1
- how to use the ConcatenateFilterFactory with solr 7.6 HOT 1
- Issue for creating collection in solrcloud. HOT 2
- why do you use FST HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from solrtexttagger.