Git Product home page Git Product logo

catalog_data's People

Contributors

alisonbabeu avatar annakrohn avatar balmas avatar cwulfman avatar gregorycrane avatar lcerrato avatar perseuscatalog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catalog_data's Issues

new MODS files needed for Aristophanes

In response to request from Greg and Monica Berti I'm putting Epidoc versions of Aristophanes into Perseids and we need new CTS urns for the Epidoc editions and translations.

The epidoc versions will be based upon the following

Birds tlg0019.tlg006.perseus-grc1 & tlg0019.tlg006.perseus-eng1
Clouds tlg0019.tlg003.perseus-grc1 & tlg0019.tlg003.perseus-eng1
Ecclesiazusae tlg0019.tlg010.perseus-grc1 & tlg0019.tlg010.perseus-eng1
Lysistrata tlg0019.tlg007.perseus-grc1 & tlg0019.tlg007.perseus-eng1

"cts-urn" vs "ctsurn"

@balmas and @AlisonBabeu This appears both ways as the type of an identifier. Which is preferable? The new records have "cts-urn" and older have "ctsurn."

When we pick one I can run a method to standardize all the records.

Issue with Naevius, Gnaeus

A search in the catalog for Naevius, Gnaeus reveals two authors http://catalog.perseus.org/?utf8=✓&utf8=✓&search_field=author&q=Naevius. I had thought this was due to the existence of a duplicate authority record (https://github.com/PerseusDL/catalog_data/blob/master/mads/PrimaryAuthors/G/Gnaeus%20Naevius/n85-356413.xml.mads.xml), but it seems something stranger is going on. This author has six individual works attributed to him that for some reason have been split between the two author records in the catalog, this record (http://catalog.perseus.org/catalog/Mlccnn85356413Naevi) lists four works while this record lists two (http://catalog.perseus.org/catalog/Mstoa0206Naevi).

Duplication of the Dialogi of Seneca

There is an issue with the way the individual Dialogi of Seneca are displaying in the catalog. The system has both "correctly" aggregated these works by their individual STOA numbers, for example see the work record for the De Brevitate Vitate (http://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0255.stoa004 data at https://github.com/PerseusDL/catalog_data/tree/master/mods/latinLit/stoa0255/stoa004) but it has also created a top level work record for the Dialogi under phi 1017.12 (catalog view-http://catalog.perseus.org/catalog/urn:cts:latinLit:phi1017.phi012, files at (https://github.com/PerseusDL/catalog_data/tree/master/mods/latinLit/phi1017/phi012). This has led to the creation of duplicate edition entries for the individual Dialogi under the PHI identifier. Additionally this means that the full list of Dialogi do not appear as separate works under the authority record for Seneca http://catalog.perseus.org/catalog/urn:cts:latinLit:phi1017.

CTS URN for De Rebus Bellicis

Title of work: De Rebus Bellicis
Author: Anonymi Auctoris De Rebus Bellicis

Edition: unknown text edition

Electronic resource: University of Oxford Text Archive - http://ota.ahds.ac.uk/desc/0309

The OTA page has a download link to a ZIP file with an TEI header document, but the text body in a plain-text format. It is licensed as Creative Commons Attribution Non-commercial 3.0 and will be the version of the text used for conversion to CTS TEI XML with as much of the OTA header intact as possible. I will attach the OTA header in a separate comment, as well as worldcat entries in another comment on the issue.

new MODS files needed for Thuc translations

@srdee has made (or will make) epidoc versions of the Perseus thuc translations (tlg0003.tlg001.perseus-eng1, tlg0003.tlg001.perseus-eng2 and tlg0003.tlg001.perseus-eng3). We need new MODS files for these versions so they can be assigned new URNs.

None of the editions for an author display in the catalog

For some reason none of the three works for the author Censorinus are displaying in the catalog. If you look at: http://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0084.stoa001, http://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0084.stoa002, or http://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0084.stoa003. All of the metadata seems correct and the information is in the CITE Tables so I'm not sure what is going on. These records also do not contain an empty series element, which I know had caused this display issue in the past.

Justinian duplicate

Can we merge the records and correct the CITE tables for phi2806.phi002 and stoa0168.stoa001b? They are the same work but the records are slightly different.

I've already updated Justinian's author row in the CITE table to include the phi id as an alternate id (didn't have it before).

new MODs file for Plutarch Pericles

I've created an Epidoc version of tlg0007.tlg0012.perseus-grc1, named provisionaly tlg0007.tlg0012.perseus-grc1, for inclusion in Perseids. We need it officially added to the catalog.

Duplication of the Dialogi of Seneca

There is an issue with the way the individual Dialogi of Seneca are displaying in the catalog. The system has both "correctly" aggregated these works by their individual STOA numbers, for example see the work record for the De Brevitate Vitate (http://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0255.stoa004- data at https://github.com/PerseusDL/catalog_data/tree/master/mods/latinLit/stoa0255/stoa004) but it has also created a top level work record for the Dialogi under phi 1017.12 (catalog view-http://catalog.perseus.org/catalog/urn:cts:latinLit:phi1017.phi012, files at (https://github.com/PerseusDL/catalog_data/tree/master/mods/latinLit/phi1017/phi012). This has led to the creation of duplicate edition entries for the individual Dialogi under the PHI identifier. Additionally this means that the full list of Dialogi do not appear as separate works under the authority record for Seneca http://catalog.perseus.org/catalog/urn:cts:latinLit:phi1017.

Number of Aristotle works not displaying or supporting search by Latin translated titles

The list of works under Aristotle (http://catalog.perseus.org/catalog/urn:cite:perseus:author.204) includes many works that for some reason are only displaying the Greek title as well as supporting search by the Greek title. This seems strange because the MODS records have the Latin titles in them and are present in the Atom feeds (http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0086.tlg002).

The CITE Table records for these works do not display the Latin titles.

Need to change pulling language information for host information if possible

I'm wondering about possibly tweaking how the translations are created automatically, largely due to the discovery of a large number of incorrectly created translations.

For example, a recently ingested MODS records for Apuleius (https://github.com/PerseusDL/catalog_pending/blob/e59fdd0200b186c167fc3afd7117ff23f40184a4/mods/Apuleius/apuleius.%28LesBellesLettres-Beaujeu%281973%29.OpusculesPhilosophiques.mods.xml), has both objectPart="text" values of "lat" and "grc" for the host text, since one work in the volume is in Greek, but even though all of the other constituent records for this volume only included a Latin language element, Greek translations were also automatically assigned to these works, I'm presuming due to the inclusion of Greek in the host volume information (http://catalog.perseus.org/?f[exp_language][]=grc&q=%22Beaujeu%2C+Jean%22&search_field=editor&utf8=%E2%9C%93).

I'm wondering if we can have a default where it there isn't a language encoded within a constituent record than the system can flag it as an error so I have to look at a file and hopefully thus eliminate the creation of false translations.

A display issue with abbreviated titles in terms of epigrams

While the abbreviated epigram titles are now showing up for catalog records, I'm wondering if the edition level rather than the work level is the appropriate place for this information to display, because not all of the information is displaying. In the catalog display, the abbreviated epigram titles from just the first
edition record are used as those for the entire work record.

For example, for the epigrams of Simonides: http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0261.tlg003 shows abbreviated titles from first edition record but none of the others. Even when you click on the individual editions (http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0261.tlg003.opp-grc5) and there are a number of abbreviated titles in the ATOM feed, <mods:titleInfo type="abbreviated">mods:titleAG 6.2, AG 6.50, AG 6.52, AG 6.197, AG 6.212-217/mods:title/mods:titleInfo<mods:titleInfo type="abbreviated">mods:titleAP 6.2, AP 6.50, AP 6.52, AP 6.197, AP 6.212-217/mods:title/mods:titleInfo they are not showing up in the edition record.

The authority record for Bacchius and the related work does not show up in the catalog.

https://github.com/PerseusDL/catalog_data/tree/master/mads/PrimaryAuthors/B/Bacchius%20Senex

The authority record listed above is not appearing in the catalog although the author does appear in the GoogleFusion authors table. In addition, for some reason the associated work record https://github.com/PerseusDL/catalog_data/blob/master/mods/greekLit/tlg2136/tlg001/opp-grc1/tlg2136.tlg001.opp-grc1.mods1.xml has been assigned to Simonides. (http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg2136.tlg001)

The authority record for Simonides has some very strange issues going on http://catalog.perseus.org/catalog/Mtlg0261Simon, three incorrect identifiers have been assigned to this record (stoa0027b, tlg2136b, tlg4150) ,although they don't show up in the Github authority record for Simonides (https://github.com/PerseusDL/catalog_data/blob/master/mads/PrimaryAuthors/S/Simonides/n85-298996.xml.mads.xml).

Ctesicles tlg id

The short question for this is: Would it be possible to go ahead and add the tlg id associated with the work attributed to this author as a cts urn in this MADS https://github.com/PerseusDL/catalog_data/blob/master/mads/PrimaryAuthors/C/Ctesicles%20Historicus/viaf34843526.mads.xml?

The longer explanation is: I read the note that there is dispute about there potentially being two authors, or if they are the same person, but as it stands, we already have Ctesicles labeled as the author in the MODS record. This creates a semi-issue where we have two textgroups for Ctesicles, tlg2171 and VIAF34843526 where the VIAF one is tied to a MADS and the tlg is not. That is all perfectly kosher, but the only thing is that the MADS file won't appear in the atom feed for tlg2171 because of the different id so things are slightly less nice and connected. I also have the atom builder look by name to double check these sorts of situations, but that doesn't work here since the authority names are slightly different, "Ctesicles Historicus 3./4. Jh" and "Ctesicles Historicus 3./4. Jh. n. Chr."

Catalog Display Wishlist

Per conversation with Anna, this issue is being created to list several areas where the data in the catalog records (ATOM) is not being displayed or indexed, when such data might be useful.

  1. Add support for the display of pages that were encoded as lists, for example in record
    http://data.perseus.org/catalog/urn:cts:greekLit:tlg0126.tlg001.opp-grc2/atom, there were epigrams on multiple pages, and so was thus encoded as
    <mods:extent unit="pages">
    mods:list156-157, 174-175/mods:list
    /mods:extent

  2. Add the abbreviated titles into the expression level record and possibly make them searchable/indexable, this would be particularly useful in terms of the epigrammatists and fragmentary authors. For example, in the above record, the following information was included

<mods:titleInfo type="abbreviated">
mods:titleAG 5.58, AG 5.59, AG 5.98/mods:title
/mods:titleInfo
<mods:titleInfo type="abbreviated">
mods:titleAP 5.58, AP 5.59, AP 5.98/mods:title
/mods:titleInfo

And for the fragmentary historians, further information was also included in the displayLabel attribute,
for example in record, http://data.perseus.org/catalog/urn:cts:greekLit:tlg0116.tlg002/atom:

<mods:titleInfo type="abbreviated" displayLabel="FHG">
mods:titleFHG 4: 279-284/mods:title
/mods:titleInfo
<mods:titleInfo type="abbreviated" displayLabel="FGrH">
mods:titleFGrH 685/mods:title
/mods:titleInfo
As a side note, some fragmentary historians had these titles included in the work title, it seems this data may have been drawn in from the Authors-Abbreviations-Editions spreadsheet, but is uneven.

  1. Add in the TOC, which often have very granular information, although unfortunately only as textual strings. For example, the record for the Fragmenta of Aeschylous in Nauck's TGF, http://data.perseus.org/catalog/urn:cts:greekLit:tlg0085.tlg011.opp-grc1/atom has very detailed TOC that was not included and might be useful to display
    mods:tableOfContentsAthamas (pg. 1-4) -- Aiguptioi (pg. 4) -- Aitnaiai (pg. 4-6) -- Alkmene (pg. 6)
    -- Amumone (pg. 6-7) -- Apgeioi (pg. 7-8) -- Argo (pg. 8-9) -- Atalanth (pg. 9) -- Bakchai
    (pg. 9) -- Bassarai (pg. 9-10) -- Glaukoi (pg. 10) -- Glaukos Pontios (pg. 11-13) -- Glaukos
    Pontnieus (pg. 13-15) -- Danaides (pg. 15-17) -- Dionusou Trophoi (pg. 17-18) -- Hektoros
    Lutra v. Phruges Eleusinioi (pg. 18-19) -- Europe v. Kares Hedonoi (pg. 19-23) -- Heliades
    (pg. 23-25) -- Herakleidai (pg. 25-26) -- Thalamopoioi (pg. 26) -- Theoroi E Isthmiastai
    (pg. 26-27) -- Thrissai (pg. 27-28) -- Iereiai (pg. 28-29) -- Ixion (pg. 29-30) --
    Isthmiastai v. Theroi Iphigeneia (pg. 31) -- Kaberoi (pg. 31-32) -- Kallisto (pg. 32) --
    Kares E Europe (pg. 32-35) -- Kerkuoh Saturikos (pg. 35-36) -- Kerukes Saturoi (pg. 36-37)
    -- Kirke Saturike (pg. 37-38) -- Kressai (pg. 38-39) -- Leon Saturikos (pg. 40) -- Lemnioi
    (pg. 40) -- Lukourgos Saturikos (pg. 40-41) -- Lutra v Phruges Memnon (pg. 41-42) --
    Murmidones (pg. 42-47) -- Musoi (pg. 47-48) -- Neaniskoi (pg. 48-49) -- Nemea (pg. 49) --
    Nereides (pg. 49-50) -- Niobe (pg. 50-55) -- Xantriai (pg. 55-56) -- Oidipous (pg. 57) --
    Oplon Krisis (pg. 57-58) -- Ostologoi (pg. 58-59) -- Palamedes (pg. 59-60) -- Pentheus (pg.
    60-61) -- Perraibides (pg. 61) -- Penelope (pg. 61-62) -- Poludektes (pg. 62) -- Promethes
    (pg. 62-63) -- Prometheus Luomenos (pg. 63-68) -- Prometheus Purkaeus (pg. 68-69) --
    Prometheus Purphoros (pg. 69-70) -- Proteus Saturikos (pg. 70-72) -- Salaminiai (pg. 72-73)
    -- Semele E Hyrophoroi (pg. 73-74) -- Sisuphos Drapetes et Petrokuliestes (pg. 74-76) --
    Sphigx Saturike (pg. 76) -- Tedephos (pg. 76-77) -- Toxotides (pg. 77-79) -- Trophoi v.
    Dionusos Trophoi Upsipule (pg. 79) -- Philoktetes (pg. 79-83) -- Phorkides (pg. 83) --
    Phruges E Hektoros Lutra (pg. 84-87) -- Phrugioi (pg. 87) -- Psuchagogoi (pg. 87-88) --
    Psuchostasia (pg. 88-89) -- Oreithuia (pg. 89) -- Incertarum Fabularum Fragmenta (pg.
    90-124) -- Fragmenta Dubia et Spuria (pg. 125-128) /mods:tableOfContents

MODS Records that had CTS-URNs in them pre-ingest failing to be displayed in online catalog

There are a number of MODS records that although they successfully made it from catalog_pending into catalog_data are for some reason not being displayed in the online catalog. This seems to apply in particular to Perseus Epidoc editions (see #24)
and editions of the CSEL such as the editions of Tertullian where I had included a CTS-URN in the record in catalog_pending. The new versions nonetheless made it into the CITE Tables.

For example, the MODS records referenced above, found here (https://github.com/PerseusDL/catalog_data/blob/master/mods/latinLit/phi1035/phi001/perseus-lat2/phi1035.phi001.perseus-lat2.mods1.xml) and (https://github.com/PerseusDL/catalog_data/blob/e0dc08189a36e493f23f0c9044c5bea864ec9aad/mods/latinLit/stoa0275/stoa030/opp-lat2/stoa0275.stoa030.opp-lat2.mods1.xml) have two CTS-URNs. In the first example,
<mods:identifier type="ctsurn">urn:cts:latinLit:phi1035.phi001.perseus-lat2/mods:identifier
<mods:identifier type="ctsurn">urn:cts:latinLit:phi1035.phi001.perseus-lat2/mods:identifier and in the second
<mods:identifier type="ctsurn">urn:cts:latinLit:stoa0275.stoa030/mods:identifier
<mods:identifier type="ctsurn">urn:cts:latinLit:stoa0275.stoa030.opp-lat2/mods:identifier.

Libanius

It appears that tlg2200.tlg005 has had all of its constituent records broken out into individual records. Can we get rid of these three files if that is the case as they are redundant?

errors in XML markup returned by GetCapabilities (user report)

(please move if this is not a catalog data issue)

http://catalog.perseus.org/catalog/urn:cts:latinLit:phi1351
http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg7000

-------- Original Message --------
Subject: GetCapabilities XML errors
Resent-From: [email protected]
Date: Sun, 30 Nov 2014 19:46:34 -0500
To: Perseus DL Webmaster [email protected]

I have found the following errors in XML markup returned by GetCapabilities:
Textgroup latinLit:phi1351 groupname is coded as “C. Suetonius Tranquillus” but is actually Tacitus;
Textgroup greekLit:tlg7000 groupname is coded as “greekLit:tlg7000” but is actually Greek Anthology.

System not using dates found in certain elements.

The facet search http://catalog.perseus.org/?f[year_facet][]=0 revealed an interesting issue as to how the system seems to be pulling "Edition or Translation Year Published". 218 records have the year 0. In some cases, such as with the new Arabic records I think this has been caused by the presence of this in the MODS record:
In some cases the Arabic record mods:dateIssued-/mods:dateIssued

Nonetheless, for some classical editions, I am wondering if it is failing to list a date due to different date encoding structures in the XML:

For example, http://data.perseus.org/catalog/urn:cts:latinLit:phi0134.phi004.opp-eng1/atom has the following date encoding:
mods:dateModified1900/mods:dateModified
mods:dateCreated1894/mods:dateCreated

and http://data.perseus.org/catalog/urn:cts:latinLit:phi0914.phi001.opp-eng4/atom has the following date encoding:
mods:copyrightDate1959/mods:copyrightDate
mods:dateModified1967/mods:dateModified

There don't seem to be any issues with records that have the following data encoding: dateIssued
mods:dateIssued1878/mods:dateIssued

catalog new versions of Artistotle Ars Poetica

In PerseusDL/canonical#112, Stella is adding Greek and Arabic versions of Aristotle Ars Poetica to the Perseus canonical repo in order to make them available for editing in Perseids.

In order to facilitate this, she assigned the following version identifiers to them:

urn:cts:tlg0086.tlg034.digicorpus-ara1
urn:cts:tlg0086.tlg034.digicorpus-grc1

Interestingly, these texts come to us by way of us, from the joint Harvard/Tufts GrecoArabic corpus, which is now apparently published at http://digicorpus.net4media-typo3.de/

The source files for these are also here https://bitbucket.org/grecoarabiccorpus/opensourcetexts

I think if we are adding them to Perseids and the Perseus canonical repo, we should also have a catalog entry for them. Not sure our process yet supports predefined version identifiers though.

An authority record and work record need new identifiers

The author Agatharchides(http://catalog.perseus.org/catalog/Atlg0667Agath) had the wrong identifier for two works in the A-A-E spreadsheet (0667.001) instead of (0067.001), and 0667.004 instead of (0067.004). This will need to be fixed and the corresponding URNs redirected.

The author with TLG 0667 (Marcellinus) has no works and thus no authority record in the catalog as yet, but this will still affect one catalog record http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0667.tlg004 (which will need a new CTS-URN of urn:cts:greekLit:tlg0067.tlg004 ).

A TLG identifier was incorrectly assigned and will need to be reassigned

The author Priscian has incorrectly been assigned the TLG textgroup 0592 http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0592. The TLG 0592 is for the author Hermogenes, and the reason it was assigned to Priscian was because one of our works for Priscian is a translation of a work by Hermogenes, and it had the following in the MODS record: 0592.001. This Textgroup will need to be unassigned and the work directed to the correct authority record at http://catalog.perseus.org/catalog/Mstoa0234aPrisc

Issue with Aulus Licinius Archias

There appears to be something odd with Aulus Licinius Archias. He seems to have two MADS records one under PrimaryAuthors/A/Archias/n84-234520.xml.mads.xml and another PrimaryAuthors/A/Aulus Licinius Archias, 5th Cent B.C/AulusLiciniusArchias.mads.xml.

The alt names in the second record (AulusLiciniusArchias.mads.xml) are for Zeuxis, Heracleensis , 5. Jh. v. Chr. who seems to have been a painter?

Is this an odd duplicate case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.