Git Product home page Git Product logo

Comments (4)

annakrohn avatar annakrohn commented on August 30, 2024

What I'm going to do is this:

  • favor any languages that appear outside of the host tags
  • if there are two (or more) language tags with the "text" type, split into two (or more) records, removing the language(s) that are not for that record, e.g. the "...grc-1" record has the eng tag removed and vice versa.
  • if there is no language info outside of the host tags, throw an error and the record stays in catalog_pending for review

I am running into a bit of trouble with something that perhaps should be another issue, but I'll bring it up here and split it out if need be.

How do we identify the original language of a work? I used to rely on the id type (tlg, phi, etc.), but now that I am going to expand the ids to anything that takes the form [textgroup].[work], it means that I'm not going to know necessarily what language the ids map to or if they even have a consistent language. If it is a record with an existing cts urn, that is easy, but otherwise I need an indication or way to reason the original language so I can tell what is a translation and what is an edition. Any ideas?

from catalog_data.

AlisonBabeu avatar AlisonBabeu commented on August 30, 2024

In terms of the original language of a work within the new expanded ID system, we may have to manually review them for a time. I don't think we're going to get a flood of works without standard IDs and language encoding information hopefully. I must admit I'm not sure of anything within the records that can be consistently used to infer an original language. Sometimes there are notes fields that will say something like "original in Greek, translation in Latin" but that is unusual.

from catalog_data.

annakrohn avatar annakrohn commented on August 30, 2024

I don't know that we really can manually review them based on how things are currently set up. It think that in the form I might have a place to indicate original language, but that is a little ways away. So I guess for now I'll continue checking against a list that I'll update (maybe the form can update that list too). If it can't find the language it'll throw an error and stay in catalog_pending until we update the list.

from catalog_data.

AlisonBabeu avatar AlisonBabeu commented on August 30, 2024

I'm not sure if this needs a new issue or is related to some of the changes that were made due to the problem I noted here. It seems that in one of the last ingests a number of correct translations were missed entirely by the system. For example, five French translations of the works of Andocides by Georges Dalymeda, tlg0027.tlg001.opp-grc1.mods1.xml, tlg0027.tlg002.opp-grc1.mods1.xml, tlg0027.tlg003.opp-grc1.mods1.xml, tlg0027.tlg004.opp-grc1.mods1.xml, and tlg0027.tlg005.opp-grc1.mods1.xml. This despite the fact that all five of the MODS records in Github include the correct encoding

  <mods:language objectPart="text">
    <mods:languageTerm authority="iso639-2b" type="code">grc</mods:languageTerm>
  </mods:language>
  <mods:language objectPart="text">
    <mods:languageTerm authority="iso639-2b" type="code">fre</mods:languageTerm>
  </mods:language>

I have looked at the records and have no idea what might have caused the system to not create French translations as well as the Greek editions.

from catalog_data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.