Git Product home page Git Product logo

eutyches's People

Contributors

malamatenia avatar ponteineptique avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

ponteineptique

eutyches's Issues

thoughts on lemma-glosse attribution to words that break in two lines

Since I haven't locally tagged my lemmas, I depend on an external document in order to identify and tag my lemmas (not the best practice probably).

In the case where a word is freestanding, identifying the lemma should not pose any problems with a given "list of lemmas" and a tokenized text.

BUT

what happens with words that break into the following line?
If I join all the lines into a continuous text (by lifting the ¬) and thus reestablish the broken words, I lose the information inherently and purposefully existing in my ALTO files.

I should consider finding a better practice for that lemma-gloss attribution by locally inserting "lemma markers" while transcribing, as suggested by prof. Clérice, and then writing a clean_text script to reestablish the ground truth.

check_Ontology_strings

Following today's attempt to manually check if all of my messy lines are characterized, I need a check_string script to verify that all of my lines are characterized and pop a message of the .XML file name that the omission occurs.

Otherwise, text_extraction depending on SegmOnto doesn't give the whole text(s).

first_commit

Making my first commit with the non-finalized data.
To do (a lot):
-finalize transcription norms (desinences, punctus elevatus, space/dot ambiguity -mayve comma?-)
-complete excel with hand characterization and granularity
-pipeline modeling
-scripts python for ALTO manipulation
-test for automatic glose-lemma attribution
-LaTeX documentation.

difficulties to consider in my MainZone(s)DefaultLine and InterlinearLine extraction

Need to define the best compromise between Line and Zone extraction in order to better annotate my manuscript(s).

3 of my pages pose a significant layout problem where the order of the text is perturbated and blocks of text need to be read vertically instead of horizontally.

Currently, I am assigning a new MainZone#n to bits of texts whose reading order I want to establish/redirect such as columns. It is important to keep this information as it can be useful for the transmission of the text.

Examples:
Leyden,VossLat041,folios 4v and 5r

4r

4v

5r

The problem occurs when I try to extract the main text that lies in the DefaultLines of the multiple ordered MainZones and the glosses that correspond to each line of each zone.
I need the text to appear in that specific order, given that in my fiche de récollement I give for each gloss the exact line (after ordering) of the lemma associated with it.

I need a text_extraction code that extracts the DefaultLines of every MainZone in an ordered fashion (MainZone, MainZone#1-6) and then does the same for the InterlinearLines, in order to (at some point, somehow) tag automatically and associate the lemmas and the glosses following the info in my fiche.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.