Git Product home page Git Product logo

Comments (6)

mikegerber avatar mikegerber commented on July 17, 2024 1

@JKamlah I am going to implement it according to the PAGE specs, i.e. "take the TextEquiv with the lowest index (if there are multiple)". This seems to also be correct for your example. (Your code at https://github.com/JKamlah/dinglehopper selects by a user-specified index). Do you see a problem with that I might be missing?

from dinglehopper.

mikegerber avatar mikegerber commented on July 17, 2024

In this example (thanks to @JKamlah!), OCR-D-GT_0008.xml contains corrections in the TextEquivs with the lowest index:
larex-indexed-textequiv-jkamlah.zip

<TextLine id="l2">
<Coords points="301,270 1389,270 1389,306 301,306"/>
<TextEquiv index="0">
<Unicode>
sondere Schrift daraus zu machen. Locke scheint fort-
</Unicode>
</TextEquiv>
<TextEquiv index="1">
<Unicode>
gondere Schrift daraus zu machen. LDocke scheint fortโ€”-
</Unicode>
</TextEquiv>
</TextLine>

from dinglehopper.

JKamlah avatar JKamlah commented on July 17, 2024

Thank you @mikegerber for the quick response.

Do you see a problem with that I might be missing?

No, not at all. It would perfectly fits our needs. The only reason to keep the index selection option is comparing the corrected output with original one? A Use-Case would be, if you use ABBYY for OLR reasons and keep the ocr'd text, you can easily compare it with the new recognized text.

from dinglehopper.

mikegerber avatar mikegerber commented on July 17, 2024

The only reason to keep the index selection option is comparing the corrected output with original one? A Use-Case would be, if you use ABBYY for OLR reasons and keep the ocr'd text, you can easily compare it with the new recognized text.

I'd suggest keeping the ABBYY results and the manually corrected files in separate file groups and compare those, e.g.

ocrd-dinglehopper -I OCR-ABBYY,OCR-ABBYY-CORRECTED -O OCR-ABBYY-CORRECTED-DIFF -P metrics false

This seems to make it a lot more explicit.

from dinglehopper.

JKamlah avatar JKamlah commented on July 17, 2024

You are absolutely right, it is much more explicit. I mean this is more like a fundamental question or? If i have multiple versions (indexes) in my file, i could have the need to compare them or to compare a specific index to another file. But how often will that happen and should dinglehopper offer an option for these few cases?

from dinglehopper.

mikegerber avatar mikegerber commented on July 17, 2024

There is - to my knowledge - nothing in the PAGE specs that says the index is anything more than a preference order, it just happens that LAREX seems to produce files where we could select by index. Another tool might just add indexes where something changed. So I'll recommend copying the files to a named file group.

As for getting the correct TextEquivs, I have fixed this today and will merge!

from dinglehopper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.