Comments (10)
π
Yup, I think it's deliberate that some functions support a simplified subset of the schema, (complex structures can always be built with the list-constructors and should still be covered by the validator). Even with good documentation, too much complexity can be off-putting to new users. Sometimes I think we can nudge people to better options too. For instance, very few programmatic parsers of text-delignated tabular data can handle multiple missing value codes, even though this case does occur in real-world data. It's important the EML can support such cases, but it might be useful to nudge folks away from some options.
For taxonomy, I think it would be useful to nudge people to taxonomic identifiers. As you know, these provide mappings to naming authorities, common names in different languages etc.
from eml.
I agree with that sentiment, @peterdesmet -- while the taxon databases are good for looking up official scientific names, vernacular names are often locally idiosyncratic, and people may want to include them in the metadata to establish that local usage.
from eml.
Good catch, @peterdesmet -- I agree this is a serialization error. The common name should be in the commonName
element, and not as a child node as if it were another rank and value. Here's the XML generated from your example, which should be like the one generated by the IPT that you showed above.
<taxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Class</taxonRankName>
<taxonRankValue>Aves</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Anas platyrhynchos</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Common</taxonRankName>
<taxonRankValue>mallard</taxonRankValue>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicCoverage>
from eml.
I see where the problem lies -- the setCoverage
convenience method is not designed to handle common names. Instead, it interprets each column in the data frame as a descending set or rank names. Here's the note in the package documentation about this:
#' @note If "sci_names" is a data frame, column names of the data frame are rank names.
#' For user-defined "sci_names", users must make sure that the order of rank names
#' they specify is from high to low.
#' Ex. "Kingdom","Phylum","Class","Order","Family","Genus","Species","Common"
#' EML permits any rank names provided they go in descending order.
So the code is behaving as planned, and there is no capability to add a commonName using the convenience method. That said, you could use the convenience method to create the structure, then add the commonName to the list manually like so:
library(EML)
df <- data.frame(Class = "Aves", Species = "Anas platyrhynchos")
coverage <- set_coverage(sci_names = df)
coverage$taxonomicCoverage$taxonomicClassification[[1]]$taxonomicClassification$commonName <- 'Mallard'
str(coverage$taxonomicCoverage)
#> List of 1
#> $ taxonomicClassification:List of 1
#> ..$ :List of 3
#> .. ..$ taxonRankName : chr "Class"
#> .. ..$ taxonRankValue : chr "Aves"
#> .. ..$ taxonomicClassification:List of 3
#> .. .. ..$ taxonRankName : chr "Species"
#> .. .. ..$ taxonRankValue: chr "Anas platyrhynchos"
#> .. .. ..$ commonName : chr "Mallard"
Created on 2022-06-03 by the reprex package (v2.0.1)
This produces the following XML output, which matches the IPT format.
<taxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Class</taxonRankName>
<taxonRankValue>Aves</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Anas platyrhynchos</taxonRankValue>
<commonName>Mallard</commonName>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicCoverage>
from eml.
Thanks for answering! I guess there are two solutions then:
- Donβt list
Common
name in the documentation for the convenience method. - Update the convenience method so that the
Common
column is treated differently? I.e. turn this issue in a feature request?
from eml.
Yeah, I was contemplating both of those as well. (1) seems like a good idea, (2) seems like a bit of a hack, especially as commonName
is a repeatable element in EML. So I was thinking a new feature to add commonNames would be good, but hadn't settled on how to do it. Maybe columns in the data frame with names Common_1
to Common_n
? In other places in the package with 1:n cardinality, we use another table, but its a lot of overhead for this. Open to suggestions. @jeanetteclark any thoughts on this one from your perspective?
from eml.
there are a handful of functions in the EML
package that only partially support the schema. For example, for a very long time set_attributes
only supported having one missing value code. If it seems like a common enough use case, then we could support repeatable common names as you suggest @mbjones though I agree it seems a little like a hack. I think in order of complexity (time permitting) we could plan to:
- remove common name from documentation
- add support for a single common name (and re-document as needed)
- add support for multiple common names
from eml.
For what it's worth, the dataframe I wanted to provide was one like:
taxonID | rank | scientificName | common.en | common.nl |
---|---|---|---|---|
5BSG3 | species | Vulpes vulpes | fox | vos |
Which I reduced to the convenience format:
Species | Common |
---|---|
Vulpes vulpes | fox |
But as originally described, the common name ended up as a child.
I would be happy with a convenience method that supported 1 common name (and 1 taxonID) for the lowest rank in the row. In fact - since I don't use the hierarchy because the IPT does not support it - I would be fine with ID, scientificName, rank, vernacularName.
from eml.
@peterdesmet Thanks! Nice example.
I think it would be better though for the user to specify only the taxonID, from which we derive the rest. Note that I think we would need to use recognized prefixes on the taxonID or some other indication of which authority we're referring too, e.g. that looks like a a stable (i.e. post 2019) Catalogue of Life ID. Note that according to COL, the English common name appears to be "red fox". When the user gives both a taxon ID and other data such as a common name(s) that may not match that associated with the taxon ID, it's not obvious which one the R package should add to the metadata.
e.g.:
taxalight::tl("COL:5BSG3", "col")
from eml.
I think it would be better though for the user to specify only the taxonID, from which we derive the rest.
It would be nice to provide that as an option, but I think users should still have the option to provide a list of taxa and be done with it.
from eml.
Related Issues (20)
- set_attributes forces all numeric fields to have storageType = "float" HOT 7
- Taxonomic Coverage and bibtex HOT 1
- Species name epithet is not handled the way specified in the EML schema HOT 2
- Error with molePerKilogram in unit list returned by get_unitList() HOT 3
- dataset and datatable entries from README example fail HOT 2
- `shiny_attributes` performance improvments HOT 8
- Revisit how users can find a learn to use the `eml$*` constructors HOT 2
- Add a minimum version requirement on taxadb and wait to release the next version of this package HOT 1
- Web scraping | sapply function | Error in readBin(5L, "raw", 65536L) : Failure when receiving data from the peer HOT 1
- Creating EML elements with XML attributes HOT 2
- Duplicate person when using `write_eml()` HOT 2
- Set attributes for properties, e.g. `<title xml:lang="eng">` HOT 3
- Function to convert DataCite metadata to EML: good fit for this package? HOT 7
- `<![CDATA[` not always recognized HOT 1
- [Units] Discussion about current unit list HOT 5
- `set_responsibleParty()`: allow to create organization parties HOT 1
- namespace conflict introduced when importing/exporting EML generated under older schema
- EML::eml_validate conflicts with knb.ecoinformatics.org parser & appears to introduce invalid xml into valid files HOT 1
- EML seems to have trouble with foreign key constraints HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eml.