Comments (7)
From Kristin:
"In our EML documents, the namespace is declared as “xmlns:eml=https://eml.ecoinformatics.org/eml-2.2.0 “, so that is one place the version of EML should be checked.
The line “xsi:schemaLocation=https://eml.ecoinformatics.org/eml-2.2.0/ https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd” tells the parser that elements within the eml namespace can be found in the eml.xsd file. I believe the latter reference to eml could actually point to any file that contains the schema. It would not necessarily need to include eml-2.2.0 in the path. I found instances of file path name variations when I surveyed some EML documents in the EDI repository.
Based on this understanding, it would seem that checking the first two references to eml-2.2.0 for consistency would be the way to go. What I don’t know is, if the namespace points to eml-2.0.0, could the schema actually point to another version? I suppose if it were hand-coded this might happen by mistake. I doubt the EML would validate in that case, though, so it would likely never reach the DPChecker without being corrected. In that case, you could just check the namespace.
So, basically I am not sure. I tend to go for the overkill on checking things, so I would probably test both. However, since DPChecker will be operating on a valid EML document (I think), checking one is probably sufficient."
from dpchecker.
The plot thickens! It appears that EML::read_eml always changes the namespace version to 2.2.0 when it reads the metadata into R. Even if I manually change the version in the xml file to less than 2.2.0, it reads in as 2.2.0. Does that seem like odd behavior to either of you? I suppose I can just use something besides read_eml to read the file in for the purpose of checking the version.
from dpchecker.
Very odd. I'll look into it. Might be worth raising the issue with the folks developing the EML package. They seem responsive. The problem is EMLeditor doesn't have it's own EML import function but instead also relies on EML::read_eml. So right now all EML we generate will have to get passed through that function and end up being reset to namespace with 2.2.0 regardless of the version it was actually made with.
from dpchecker.
I wonder if it has something to do with backward compatibility? "EML 2.2 is backward compatible, i.e., EML 2.0 and 2.1 documents could be relabled as EML 2.2 without violating the schema."
from dpchecker.
So I think this make sense: EML::write_eml is generating an EML file and so it should be compliant with the most recent schema and indicate that it was generated under this schema. The question is, does EML::write_eml leave a history of all the various versions under which the EML was saved. I think it might. I confess to not totally understanding all of the attribute tags (and they get re-arranged some). I tested this using knb-lter-and.4780 (https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-and.4780.4).
Upon download, the initial eml tag in knb-lter-and.4780.4.xml looks like so:
<eml:eml xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" packageId="knb-lter-and.4780.4" system="https://pasta.edirepository.org" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd">
I importing to R with EML::read_eml and then writing back to xml:
mymeta<-EML::read_eml("knb-lter-and.4780.4.xml", from="xml")
View(mymeta)
EML::write_eml(mymeta, "exportedEML.xml")
And when I open the new "exportedEML.xml" file:
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" packageId="knb-lter-and.4780.4" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd" system="https://pasta.edirepository.org">
It appears that even though the xmlns:eml attribute is now eml-2.2.0, the schema location (xsi:schemaLocation=) and xmlns:ds both still indicate the original EML 2.1.1.
Perhaps it's time to return to the original question, I think the answer is we might expect inconsistencies in the EML version and we need to decide which we want to use for checking data packages. Do we want the original version, or the edited version? In MOST cases I bet these will be the same. It's only when editing old datasets (that already have EML) that they would differ. And, at a bigger picture as long as they all fall within a reasonable range then it passes. If any are greater that 2.x.x that's a hard fail/error (i.e. not possible) and if any are <2.x.x maybe that's a warn (certainly possible but probably indicates something odd is going on).
Thoughts?
from dpchecker.
I think that makes sense. How does this sound:
- If namespace version == schema version:
- If both > 2.2.0, error
- Else if both < 2.2.0, warning
- Else success
- Else If namespace version OR schema version > 2.2.0, throw mismatch error
- Else if namespace version OR schema version < 2.2.0, throw mismatch warning
from dpchecker.
This sounds reasonable.
from dpchecker.
Related Issues (20)
- check Methods for odd characters
- test_storage_type not compatible with EZeml
- Detect non-government emails (PII) in metadata or data HOT 1
- test_notes reports error when there is none
- test_date_range chokes if dates include hours, minutes, seconds, et
- add a test for keywords
- Add check for points HOT 1
- test_datatble_urls & test_datatable_urls_doi don't work if url missing
- Functions to support data package review
- add a coordinate out of bounds check
- Check scientificNames are in taxon table
- check for completely empty columns
- Refactor data_table_congruence sapply to vapply
- allow upload_data_package to overwrite files HOT 1
- check for profanity HOT 1
- test_missing_data: consolidate output HOT 2
- test_missing_data should accept "blank" and "empty" as well as "NA"
- missing values are not behaving properly HOT 1
- make test_missing_values more robust
- test_missing_values chokes if there are too many blanks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dpchecker.