This is the holy grail of metadata infrastructure and ostensibly the

See comments on this in issue <a class="issue-link js-issue-link" data-error-text="Fai

The approaches to semantics you outline in <a class="issue-link js-issue-link" data-er

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Integrate two datatables based on EML spec and ontology about eml HOT 7 OPEN

ropensci commented on July 18, 2024

Integrate two datatables based on EML spec and ontology

from eml.

Comments (7)

mbjones commented on July 18, 2024

See comments on this in issue #5 .

from eml.

cboettig commented on July 18, 2024

The approaches to semantics you outline in #5 sound promising: I see how annotations (e.g. this one) map onto the eml (e.g. this one), though I don't spot the corresponding lines in the EML that map to the annotation?

More generally I am curious about how the user would specify semantics; it seems we might always have a variety of ways to actually implement them in. My understanding of semantics is pretty limited, but I imagine the general idea would be to provide URIs in place of definitions. In an ideal linked data world, it wouldn't matter if we all used the same URIs for species (e.g. ITIS TaxanomicTypeName) since that could be resolved... If the user could manage to specify the URIs, we can then figure out behind the scenes whether we write that to a definition node directly or do something more intelligent like the examples you point to.

Of course I image the first problem is both having URIs for terms users want and helping them discover them.

I suppose for the moment this might be out of the scope of reml...

from eml.

cboettig commented on July 18, 2024

I keep thinking about this integration question and trying to wrap my head around the different kinds of semantics involved here. Let's see if I got this right:

We have a vocabulary defined by the EML Schema, which can give semantic meaning to things (e.g. we have a precise notion of the term "genus", the units "gram" and "kilogram", etc), but it is not an ontology (e.g. OWL), so we don't have access to the richer reasoning tools and infrastructure thereof.

We can use the schema definitions (e.g. 'coverage' nodes) to annotate attributes using id and <reference> as described in #9, but this is not commonly done. This would also be weaker than providing ontological definitions of terms, ultimately needed to do the synthesis described at the top of this issue.

So instead, we can annotate EML with the approach @mbjones describes in #5, in which an external XML file provides ontological descriptions of the nodes, as illustrated in some examples. It seems like this is the way forward, given the current EML schema.

@mbjones Are you familiar with the NeXML standard, e.g. as described by Vos et al. 2012

All elements in a NeXML document—branches and nodes in trees, cells in a matrix, OTUs, and so on—can be identified and given annotations using a generalized system that allows for simple values as well as complex, structured information such as geo-references, taxon concepts, or character-state descriptions. Moreover, data elements can be declared as instances of a class defined in an ontology, making the semantics of the data themselves computable.

@mbjones It seems like they have a more direct way of accomplishing this goal; e.g. a tighter correspondence between Schema's vocabulary and available ontologies? Is this at all instructive for us? There are a lot of shared objectives here -- e.g .attaching geo-references and taxon concepts to nodes -- it seems like a common approach here would be good. Perhaps your working groups are already talking to each other?

from eml.

mbjones commented on July 18, 2024

I have a passing familiarity with NexML, having had to deal with it in Kepler, but what you describe sounds useful. The key in all of these is to have a solid, well-defined identifier for anything that you want to apply an annotation to. The EML id attribute is one of these, and is how we implemented the semtools annotations that I described in #5. Although I said that people don't often apply geospatial and taxon constraints to particular attributes, that is how we intended for the system to work. The additionalMetadata <describes> element provides a general purpose way of annotating any subtree in an EML document. I think this is parallel to what NexML provides though their <Annotated> complex type, with its <about> attribute pointing at a URI. So, the reason for us to do annotations separately in Semtools is exactly this -- there are many metadata standards, and each has its own way describing entities and attributes. The external annotation schema that I cited in #5 provides a mechanism to link annotations to ontologies that is flexible enough to apply to multiple different metadata schemas. It could be inlined inside of an EML additionalMetadata element for ease of use, or it could stand alone as its own independent document. My impression is that NexML assumes these will always be external, as the <Annotated> element uses the about URI attribute for the pointer.

We have not talked with the NexML folks, but sounds like we should. Do you know that there will be a concentrated emphasis on this via a biodiversity semantics workshop at TDWG this year that Mark Schildhauer is organizing? We also have some work on this coming up via our Semtools project, so hopefully we can all come to an acceptable shared approach.

from eml.

cboettig commented on July 18, 2024

@mbjones From @rvosa I understand that NeXML provides this kind of annotation in <meta> nodes as child nodes to the attributes, e.g. here's an excerpt from a list of <node> elements where one has such an annotationL

                        <node id="tree2n2" label="n2" otu="t1"/>
            <node id="tree2n3" label="n3"/>
            <node id="tree2n4" about="#tree2n4" label="n4">
                <meta 
                    id="tree2dict1" 
                    property="cdao:has_tag" 
                    content="true" 
                    xsi:type="nex:LiteralMeta"
                    datatype="xsd:boolean"/>
            </node>
            <node id="tree2n5" label="n5" otu="t3"/>
            <node id="tree2n6" label="n6" otu="t2"/>

(Or see richer examples here)

One clever thing about this is that the <meta> nodes use RDFa syntax, so that the data can be extracted by any generic RDFa tool, and can leverage ontologies directly. The external Semtools annotation examples you linked (like this one) look like powerful way to go about this. Curious if they could exploit the same RDFa trick?

I see there's already a beta schema for the Semtools annotation (sms-semannot.xsd); perhaps you could point me to the documentation for this? I guess we can already generate the annotations for some attributes programmatically, e.g. the standardUnits).

from eml.

rvosa commented on July 18, 2024

Hi Matt, Carl,

nice to be in touch about this, and interesting to see how you guys are dealing with the same challenges. To give a more complete, applied example of the RDFa annotations, have a look at this TreeBASE study dump: https://github.com/rvosa/supertreebase/blob/master/data/treebase/S100.xml

(Carl, this directory holds all of TreeBASE, as you asked.)

What we're trying to do is embed the metadata about the study (publication data, GUIDs for taxa) inside the data file so that it can be extracted with generic tools. To wit, here are the triples that are thus generated:

http://www.w3.org/2012/pyRdfa/extract?uri=https%3A%2F%2Fraw.github.com%2Frvosa%2Fsupertreebase%2Fmaster%2Fdata%2Ftreebase%2FS100.xml&format=turtle&rdfagraph=output&vocab_expansion=false&rdfa_lite=false&embedded_rdf=true&space_preserve=true&vocab_cache=true&vocab_cache_report=false&vocab_cache_refresh=false

Cheers,

Rutger

from eml.

cboettig commented on July 18, 2024

@mbjones Guess I should learn to use a computer. I see there's already a lot of information about the semtools approach here: https://code.ecoinformatics.org/code/semtools/trunk/dev/sms/README.txt

I suppose we can follow a similar approach to morpho of providing a semtools R package that could be used as a 'plugin' with reml.

from eml.

Integrate two datatables based on EML spec and ontology about eml HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent