Git Product home page Git Product logo

Comments (7)

dsmiley avatar dsmiley commented on July 20, 2024

Thanks for reporting... I didn't get a chance to look today but will hopefully look closer tomorrow (Friday).

from solrtexttagger.

jigarparekh80 avatar jigarparekh80 commented on July 20, 2024

Thanks, i guess ignoring wrong offset correction might also one solution

              if(offsetPair[0]>=0 && offsetPair[1]>=0 ) {
            	  startOffset = offsetPair[0];
              	  endOffset = offsetPair[1];
              }

from solrtexttagger.

dsmiley avatar dsmiley commented on July 20, 2024

I'd rather not put band-aids on the consumption end; I'd rather have reasonable offsets at the source end. Thus -- fix the bug. Thanks for sample data... I ran it through a debugger and it appears we need to call startTag.isSyntacticalEmptyElementTag(). It's not completely clear if testing this actually requires more than one tag; if not you can leave assertXmlTag alone. Can you work on a fix with test?

from solrtexttagger.

jigarparekh80 avatar jigarparekh80 commented on July 20, 2024

Thanks applied fix. change to asertXmlTag is not required, i was trying to add testcase according to exact scenario i had encountered but this fix does not need it.
updated pull request.

from solrtexttagger.

jigarparekh80 avatar jigarparekh80 commented on July 20, 2024

is there any reason, html offset corrector have not been migrated to apache solr? I was planning to move everything solr 7.4 so trying to find if it has any missing feature.

from solrtexttagger.

dsmiley avatar dsmiley commented on July 20, 2024

I wasn't quite comfortable with adding yet another dependency for an optional feature of a fringe feature of everyone using Solr. And I have doubts about the overall approach... it works, isn't bad but it'd be really nice if (somehow) the HtmlStripCharFilterFactory (or derivative) analysis component handled this sort of thing instead of the tagger itself needing to have such a feature, which feels a bit bolted on. I don't have the bandwidth to tackle redoing it though.
#28 (comment)

from solrtexttagger.

dsmiley avatar dsmiley commented on July 20, 2024

BTW a TODO in this project is to extend the hander shipping in Solr 7.4 to provide the htmlOffsetAdjust option. I had this in mind when I ported it to Solr so that this feature could be added separately. Perhaps that would be a separate branch which only provides that feature, tests, that, and does nothing else.

from solrtexttagger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.