Git Product home page Git Product logo

gt-guidelines's People

Contributors

cneud avatar eengl52 avatar kba avatar lena-hinrichsen avatar stweil avatar tboenig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

stweil kba bertsky

gt-guidelines's Issues

Oxygen project files contain duplicate scenarios?

The <ditaScenario> seems to exist twice. Do they differ?

Can you describe how the documentation is built on an abstract level, regardless of IDE? There are quite a few flags set here that I don't understand the meaning of w/o running eclipse with that dita setting.

Guidelines vs Guiedelines

There are two XPR files:

  • OCR-D_GT_Guidelines.xpr
  • OCR-D_GT_Guiedelines.xpr

git history:

* 1e42c0b (5 hours ago) Matthias Boenig Update |
|  OCR-D_GT_Guidelines.xpr  | 392 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
|  OCR-D_GT_Guiedelines.xpr | 101 ++----------------------------------
|  2 files changed, 372 insertions(+), 121 deletions(-)

* 2b43f78 (3 days ago) Matthias Boenig update
   OCR-D_GT_Guidelines.xpr  | 340 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   OCR-D_GT_Guiedelines.xpr | 316 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   2 files changed, 656 insertions(+)

@tboenig Which one is the up-to-date one?

To enhance confusion, the correctly spelt one is referring to the misspelled one:

$ ag guiede
OCR-D_GT_Guidelines.xpr
4:        <filters directoryPatterns="" filePatterns="OCR-D_GT_Guiedelines.xpr" positiveFilePatterns="" showHiddenFiles="false"/>

clarify level 2 rules for Greek ligatures and abbreviation glyphs

The current formulation is quite vague on Greek, but the corresponding precise ruleset for Latin (vocalic yes, consonantal no) suggests that we should indeed try to "regularize" or "canonicalize" the ligatures and abbreviation glyphs used pervasively in hellenistic times (at least those surviving until the 19th century and thus receiving a Unicode codepoint).

In particular,

  • ȣου
  • ϛστ (i.e. not ς!)
  • ϚΣΤ
  • ...
  • ϗκαί
  • ϏΚαί
  • ...

or is this only relevant on level 1?

2 trans pages missing in output

The following two topics are missing in the generated output (both en and de), because they are not mentioned in ocrd_ocrd.ditamap:

  • trans/lyBesonderheiten.dita
  • trans/trApostrophe.dita

img vs images

Is documentation/img for the PAGE docs and documentation/images for the gt guidelines?

clarify quotation rules for level 1

I can only find this short statement about quotation in level 1:

Quotation marks are transferred to today's use and are not differentiated

Considering the many differentations offered by today's use in Unicode, also listed in full by the spec, I wonder what that means.

Today could mean only the the ubiquitous ASCII reduction, i.e. only " and '. That would mean the differentiation is reserved for levels 2 and 3.

But it could also be a subset (e.g. no "low-9" or "high-reversed-9" or no angular quotes).

Could someone please clarify here and update the specs accordingly?

How to encode mathematical fractions?

While Unicode does have codepoints for the most common fractions (¼, ½, ¾ etc). this does not scale because of course not all possible numerator/denominator combinations are available. So it might be best to encode fractions as just "numerator fraction-slash denominator" (with regular numbers or super/subscript numbers?) or even produce LaTeX syntax.

versioning / releases

Idea from @cneud and @kba in call: start semantic versioning here and GH releases.

(Probably even: PDF export for individual releases?)

temp folder

Can we purge the documentation/temp folder from the Git history?

Same for the documentation/out folder.

Aren't those generated?

trans/trFremdsprache is broken in en

It seems the table in trans/trFremdsprache.dita has been crippled during translation from de: only 1 column survived which collapses the 3 GT levels and the comments.

documentation vs page_documentation

There's a lot of redunancy wrt to generated images from XML (how are they created), for example:

097c6501fa303d27bdf2042ddea7a9f5  ./documentation/img/pagecontent_xsd_Complex_Type_pc_RegionRefType.jpeg
097c6501fa303d27bdf2042ddea7a9f5  ./pagexml_dokumentation/img/pagecontent_xsd_Complex_Type_pc_RegionRefType.jpeg
097c6501fa303d27bdf2042ddea7a9f5  ./pagexml_dokumentation/out/webhelp-responsive/img/pagecontent_xsd_Complex_Type_pc_RegionRefType.jpeg

structural concordance: collect more (possible) pairs

In en/trans/structurmets2page.dita, we could add the Page/@type types with their DFG Strukturdatenset counterparts (some of which are already covered in en/trans/structur_gtpageformat.dita, perhaps because they were also in the Zot format already):

  • cover_back: back-cover
  • cover_front: front-cover
  • binding / endsheet / spine / paste_down / colour_checker: empty
  • title_page: title
  • table_of_contents: table-of-contents
  • index: index

Also, why is mets:div/@type=table likened to pc:TextRegion/@type=heading and not @type=caption (or pc:TableRegion directly)? (Same probably goes for mets:div/@type=map vs. caption / pc:MapRegion, as well as mets:div/@type=musical_notation vs. caption or pc:MusicRegion.)

Also, where is illustration?

Next, I would have expected that pc:GraphicRegion/@type gets mapped, too:

  • annotation: handwritten-annotation
  • stamp: stamp
  • printers-mark: decoration
  • ...?

Furthermore, IIUC it seems plausible to also suggest mapping some of the mets:div types to pc:ReadingOrder types:

  • text: paragraph
  • illustration: figure
  • article: article
  • section / chapter / part: div

Generally, it would also help to strictly differentiate between structural types (what ENMAP calls contentUnit) and layout types (what ENMAP calls contentItem).

Lastly, how about also collecting concordance between mets:div types and alto:LayoutTags and alto:StructureTags? I can see many similar entities in the official documentation. Perhaps a full discussion of this would also need to include the various ENMAP profiles...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.