Git Product home page Git Product logo

Comments (7)

wright13 avatar wright13 commented on August 15, 2024

From Kristin:
"In our EML documents, the namespace is declared as “xmlns:eml=https://eml.ecoinformatics.org/eml-2.2.0 “, so that is one place the version of EML should be checked.

The line “xsi:schemaLocation=https://eml.ecoinformatics.org/eml-2.2.0/ https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd” tells the parser that elements within the eml namespace can be found in the eml.xsd file. I believe the latter reference to eml could actually point to any file that contains the schema. It would not necessarily need to include eml-2.2.0 in the path. I found instances of file path name variations when I surveyed some EML documents in the EDI repository.

Based on this understanding, it would seem that checking the first two references to eml-2.2.0 for consistency would be the way to go. What I don’t know is, if the namespace points to eml-2.0.0, could the schema actually point to another version? I suppose if it were hand-coded this might happen by mistake. I doubt the EML would validate in that case, though, so it would likely never reach the DPChecker without being corrected. In that case, you could just check the namespace.

So, basically I am not sure. I tend to go for the overkill on checking things, so I would probably test both. However, since DPChecker will be operating on a valid EML document (I think), checking one is probably sufficient."

from dpchecker.

wright13 avatar wright13 commented on August 15, 2024

The plot thickens! It appears that EML::read_eml always changes the namespace version to 2.2.0 when it reads the metadata into R. Even if I manually change the version in the xml file to less than 2.2.0, it reads in as 2.2.0. Does that seem like odd behavior to either of you? I suppose I can just use something besides read_eml to read the file in for the purpose of checking the version.

from dpchecker.

RobLBaker avatar RobLBaker commented on August 15, 2024

Very odd. I'll look into it. Might be worth raising the issue with the folks developing the EML package. They seem responsive. The problem is EMLeditor doesn't have it's own EML import function but instead also relies on EML::read_eml. So right now all EML we generate will have to get passed through that function and end up being reset to namespace with 2.2.0 regardless of the version it was actually made with.

from dpchecker.

klvanderbilt avatar klvanderbilt commented on August 15, 2024

I wonder if it has something to do with backward compatibility? "EML 2.2 is backward compatible, i.e., EML 2.0 and 2.1 documents could be relabled as EML 2.2 without violating the schema."

from dpchecker.

RobLBaker avatar RobLBaker commented on August 15, 2024

So I think this make sense: EML::write_eml is generating an EML file and so it should be compliant with the most recent schema and indicate that it was generated under this schema. The question is, does EML::write_eml leave a history of all the various versions under which the EML was saved. I think it might. I confess to not totally understanding all of the attribute tags (and they get re-arranged some). I tested this using knb-lter-and.4780 (https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-and.4780.4).

Upon download, the initial eml tag in knb-lter-and.4780.4.xml looks like so:
<eml:eml xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" packageId="knb-lter-and.4780.4" system="https://pasta.edirepository.org" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd">

I importing to R with EML::read_eml and then writing back to xml:

mymeta<-EML::read_eml("knb-lter-and.4780.4.xml", from="xml")
View(mymeta)
EML::write_eml(mymeta, "exportedEML.xml")

And when I open the new "exportedEML.xml" file:
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" packageId="knb-lter-and.4780.4" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd" system="https://pasta.edirepository.org">

It appears that even though the xmlns:eml attribute is now eml-2.2.0, the schema location (xsi:schemaLocation=) and xmlns:ds both still indicate the original EML 2.1.1.

Perhaps it's time to return to the original question, I think the answer is we might expect inconsistencies in the EML version and we need to decide which we want to use for checking data packages. Do we want the original version, or the edited version? In MOST cases I bet these will be the same. It's only when editing old datasets (that already have EML) that they would differ. And, at a bigger picture as long as they all fall within a reasonable range then it passes. If any are greater that 2.x.x that's a hard fail/error (i.e. not possible) and if any are <2.x.x maybe that's a warn (certainly possible but probably indicates something odd is going on).

Thoughts?

from dpchecker.

wright13 avatar wright13 commented on August 15, 2024

I think that makes sense. How does this sound:

  • If namespace version == schema version:
    • If both > 2.2.0, error
    • Else if both < 2.2.0, warning
    • Else success
  • Else If namespace version OR schema version > 2.2.0, throw mismatch error
  • Else if namespace version OR schema version < 2.2.0, throw mismatch warning

from dpchecker.

RobLBaker avatar RobLBaker commented on August 15, 2024

This sounds reasonable.

from dpchecker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.