Git Product home page Git Product logo

Comments (7)

mbjones avatar mbjones commented on July 18, 2024

See comments on this in issue #5 .

from eml.

cboettig avatar cboettig commented on July 18, 2024

The approaches to semantics you outline in #5 sound promising: I see how annotations (e.g. this one) map onto the eml (e.g. this one), though I don't spot the corresponding lines in the EML that map to the annotation?

More generally I am curious about how the user would specify semantics; it seems we might always have a variety of ways to actually implement them in. My understanding of semantics is pretty limited, but I imagine the general idea would be to provide URIs in place of definitions. In an ideal linked data world, it wouldn't matter if we all used the same URIs for species (e.g. ITIS TaxanomicTypeName) since that could be resolved... If the user could manage to specify the URIs, we can then figure out behind the scenes whether we write that to a definition node directly or do something more intelligent like the examples you point to.

Of course I image the first problem is both having URIs for terms users want and helping them discover them.

I suppose for the moment this might be out of the scope of reml...

from eml.

cboettig avatar cboettig commented on July 18, 2024

I keep thinking about this integration question and trying to wrap my head around the different kinds of semantics involved here. Let's see if I got this right:

We have a vocabulary defined by the EML Schema, which can give semantic meaning to things (e.g. we have a precise notion of the term "genus", the units "gram" and "kilogram", etc), but it is not an ontology (e.g. OWL), so we don't have access to the richer reasoning tools and infrastructure thereof.

We can use the schema definitions (e.g. 'coverage' nodes) to annotate attributes using id and <reference> as described in #9, but this is not commonly done. This would also be weaker than providing ontological definitions of terms, ultimately needed to do the synthesis described at the top of this issue.

So instead, we can annotate EML with the approach @mbjones describes in #5, in which an external XML file provides ontological descriptions of the nodes, as illustrated in some examples. It seems like this is the way forward, given the current EML schema.

@mbjones Are you familiar with the NeXML standard, e.g. as described by Vos et al. 2012

All elements in a NeXML document—branches and nodes in trees, cells in a matrix, OTUs, and so on—can be identified and given annotations using a generalized system that allows for simple values as well as complex, structured information such as geo-references, taxon concepts, or character-state descriptions. Moreover, data elements can be declared as instances of a class defined in an ontology, making the semantics of the data themselves computable.

@mbjones It seems like they have a more direct way of accomplishing this goal; e.g. a tighter correspondence between Schema's vocabulary and available ontologies? Is this at all instructive for us? There are a lot of shared objectives here -- e.g .attaching geo-references and taxon concepts to nodes -- it seems like a common approach here would be good. Perhaps your working groups are already talking to each other?

from eml.

mbjones avatar mbjones commented on July 18, 2024

I have a passing familiarity with NexML, having had to deal with it in Kepler, but what you describe sounds useful. The key in all of these is to have a solid, well-defined identifier for anything that you want to apply an annotation to. The EML id attribute is one of these, and is how we implemented the semtools annotations that I described in #5. Although I said that people don't often apply geospatial and taxon constraints to particular attributes, that is how we intended for the system to work. The additionalMetadata <describes> element provides a general purpose way of annotating any subtree in an EML document. I think this is parallel to what NexML provides though their <Annotated> complex type, with its <about> attribute pointing at a URI. So, the reason for us to do annotations separately in Semtools is exactly this -- there are many metadata standards, and each has its own way describing entities and attributes. The external annotation schema that I cited in #5 provides a mechanism to link annotations to ontologies that is flexible enough to apply to multiple different metadata schemas. It could be inlined inside of an EML additionalMetadata element for ease of use, or it could stand alone as its own independent document. My impression is that NexML assumes these will always be external, as the <Annotated> element uses the about URI attribute for the pointer.

We have not talked with the NexML folks, but sounds like we should. Do you know that there will be a concentrated emphasis on this via a biodiversity semantics workshop at TDWG this year that Mark Schildhauer is organizing? We also have some work on this coming up via our Semtools project, so hopefully we can all come to an acceptable shared approach.

from eml.

cboettig avatar cboettig commented on July 18, 2024

@mbjones From @rvosa I understand that NeXML provides this kind of annotation in <meta> nodes as child nodes to the attributes, e.g. here's an excerpt from a list of <node> elements where one has such an annotationL

                        <node id="tree2n2" label="n2" otu="t1"/>
            <node id="tree2n3" label="n3"/>
            <node id="tree2n4" about="#tree2n4" label="n4">
                <meta 
                    id="tree2dict1" 
                    property="cdao:has_tag" 
                    content="true" 
                    xsi:type="nex:LiteralMeta"
                    datatype="xsd:boolean"/>
            </node>
            <node id="tree2n5" label="n5" otu="t3"/>
            <node id="tree2n6" label="n6" otu="t2"/>

(Or see richer examples here)

One clever thing about this is that the <meta> nodes use RDFa syntax, so that the data can be extracted by any generic RDFa tool, and can leverage ontologies directly. The external Semtools annotation examples you linked (like this one) look like powerful way to go about this. Curious if they could exploit the same RDFa trick?

I see there's already a beta schema for the Semtools annotation (sms-semannot.xsd); perhaps you could point me to the documentation for this? I guess we can already generate the annotations for some attributes programmatically, e.g. the standardUnits).

from eml.

rvosa avatar rvosa commented on July 18, 2024

Hi Matt, Carl,

nice to be in touch about this, and interesting to see how you guys are dealing with the same challenges. To give a more complete, applied example of the RDFa annotations, have a look at this TreeBASE study dump: https://github.com/rvosa/supertreebase/blob/master/data/treebase/S100.xml

(Carl, this directory holds all of TreeBASE, as you asked.)

What we're trying to do is embed the metadata about the study (publication data, GUIDs for taxa) inside the data file so that it can be extracted with generic tools. To wit, here are the triples that are thus generated:

http://www.w3.org/2012/pyRdfa/extract?uri=https%3A%2F%2Fraw.github.com%2Frvosa%2Fsupertreebase%2Fmaster%2Fdata%2Ftreebase%2FS100.xml&format=turtle&rdfagraph=output&vocab_expansion=false&rdfa_lite=false&embedded_rdf=true&space_preserve=true&vocab_cache=true&vocab_cache_report=false&vocab_cache_refresh=false

Cheers,

Rutger

from eml.

cboettig avatar cboettig commented on July 18, 2024

@mbjones Guess I should learn to use a computer. I see there's already a lot of information about the semtools approach here: https://code.ecoinformatics.org/code/semtools/trunk/dev/sms/README.txt

I suppose we can follow a similar approach to morpho of providing a semtools R package that could be used as a 'plugin' with reml.

from eml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.