Git Product home page Git Product logo

robot's Introduction

ROBOT is an OBO Tool

Java CI Maven Central Javadocs

ROBOT is a command-line tool and library for automating ontology development tasks, with a focus on Open Biological and Biomedical Ontologies (OBO).

Cite ROBOT

R.C. Jackson, J.P. Balhoff, E. Douglass, N.L. Harris, C.J. Mungall, and J.A. Overton. ROBOT: A tool for automating ontology workflows. BMC Bioinformatics, vol. 20, July 2019.

Installation and Usage

Please see http://robot.obolibrary.org.

Build

We use Maven as our build tool. Make sure it's installed, then run:

mvn clean package

This will create a self-contained Jar file in bin/robot.jar.

Other build options:

  • mvn clean test runs JUnit tests with reports in [module]/target/surefire-reports
  • mvn clean verify rebuilds the package and runs integration tests against it, with reports in [module]/target/failsafe-reports
  • mvn site generates Javadoc in target/site and [module]/target/site

Alternatively, you can use Docker with the provided Dockerfile to build and run ROBOT from within a container. First build an image with docker build --tag robot . then run ROBOT from the container with the usual command-line arguments: docker run --rm robot --help.

Code Style

We use Google Java Style, automatically enforced with google-java-format and fmt-maven-plugin. You may want to use the styleguide configuration file for Eclipse or IntelliJ.

Design

The library provides a set of Operations and a set of Commands. Commands handle the command-line interface and IO tasks, while Operations focus on manipulating ontologies. Sometimes you will have the pair of an Operation and a Command, but there's no necessity for a one-to-one correspondence between them.

Commands implement the Command interface, which requires a main(String[] args) method. Each command can be called via main, but the CommandLineInterface class provides a single entry point for selecting between all the available commands. While each Command can run independently, there are shared conventions for command-line options such as --input, --prefix, --output, etc. These shared conventions are implemented in the CommandLineHelper utility class. There is also an IOHelper class providing convenient methods for loading and saving ontologies and lists of terms. A simple Command will consist of a few CommandLineHelper calls to determine arguments, a few IOHelper calls to load or save files, and one call to the appropriate Operation.

Operations are currently implemented with static methods and no shared interface. They should not contain IO or CLI code.

The current implementation is modular but not pluggable. In particular, the CommandLineInterface class depends on a hard-coded list of Commands.

Term Lists

Many Operations require lists of terms. The IOHelper class defines methods for collecting lists of terms from strings and files, and returning a Set<IRI>. Our convention is that a term list is a space-separated list of IRIs or CURIEs with optional comments. The "#" character and everything to the end of the line is ignored. Note that a "#" must start the line or be preceded by whitespace -- a "#" inside an IRI does not start a comment.

Acknowledgments

The initial version of ROBOT was developed by James A. Overton, based on requirements and designs given by Chris Mungall, Heiko Dietze and David Osumi-Sutherland. This initial version was funded by P41 grant 5P41HG002273-09 to the Gene Ontology Consortium. Current support is from NIH grant 1 R24 HG010032-01, “Services to support the OBO foundry standards” to C. Mungall and B. Peters.

Copyright

The copyright for ROBOT code and documentation belongs to the respective authors. ROBOT code is distributed under a BSD3 license. Our pom.xml files list a number of software dependencies, each with its own license.

robot's People

Contributors

agbeltran avatar allenbaron avatar balhoff avatar beckyjackson avatar bjonnh avatar cmungall avatar dependabot[bot] avatar dlutz2 avatar dougli1sqrd avatar dustine32 avatar gouttegd avatar grosscol avatar hdietze avatar hkir-dev avatar ignazio1977 avatar jamesaoverton avatar jclerman avatar jmkeil avatar joshmoore avatar jseager7 avatar matentzn avatar mikel-egana-aranguren avatar pk-mitre avatar psiotwo avatar reality avatar shalsh23 avatar simonjupp avatar yy20716 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

robot's Issues

Implement SPARQL server

This builds on issue #24. Once we have an ontology loaded into an RDF graph, we could activate the Jena Fuseki web interface to allow ad hoc queries and updates. When the user stops Fuseki with Ctrl-C (or whatever) the ROBOT command should pick up from there, accept any modifications, and finish processing the chain.

I think this would be a useful feature. The only tricky part I foresee will be starting and stopping Fuseki without stopping ROBOT. But I'll label this "enhancement" and leave it for later.

Support OWLNamedIndividuals in ROBOT templates

This feature was requested by Sridevi. I have some working code, but I'd like to discuss it here first before committing.

I propose to add a new TYPE column for ROBOT templates. This column will be optional, defaulting to owl:Class, but its value can be any CURIE or IRI that you want to be assigned to the rdf:type predicate. These types will be special cases, handled using OWLAPI:

  • owl:Class: dataFactory.getOWLClass(iri)
  • owl:AnnotationProperty: dataFactory.getOWLAnnotationProperty(iri)
  • owl:ObjectProperty: dataFactory.getOWLObjectProperty(iri)
  • owl:DatatypeProperty: dataFactory.getOWLDataProperty(iri)
  • owl:Datatype: dataFactory.getOWLDatatype(iri)

(Let me know if I'm missing any...)

If the TYPE is not one of these special values, then we will use dataFactory.getOWLNamedIndividual(iri) and assert the rdf:type like any other annotation with a URI value.

This should allow for template tables that create a bunch of individuals, using ID, TYPE, and various annotation columns. It should also work for other OWL entities, but without special logical axioms such as domain and range (to be added later).

In my current code, if you use C and CI templates for a row, you will end up with the same IRI used for both an OWLNamedIndividual and an OWLClass. I'm not sure how to handle this interaction.

Decide how chaining and the input option should interact

Every command accepts an input ontology (or null) and returns an output ontology (or null). This is the simplest way I could think of to transfer state through a chain of commands. All commands also accept the --input FILE option. But when both an input from a previous command and local --input option are present, which should win?

In the current implementation, the input from the previous command in the chain wins. Now I think that the principle of least surprise says that an explicit user-entered --input option should win over the implicit chaining behaviour. But I'm not sure.

Prefix Management

The IOHelper class should create a PrefixManager to help parse and render ontologies. It should include a good default set of prefixes so that further configuration is rarely required. We'll need at least these:

  • rdf
  • rdfs
  • xsd
  • owl
  • obo

RDFa and JSON-LD define a larger set of default prefixes, and we could follow their lead (plus obo): http://www.w3.org/2011/rdfa-context/rdfa-1.1

I tend to use this style in my work: obo:OBI_0000070. So including the obo prefix covers all OBO ontologies.

In the Gene Ontology world there seems to be wide use of project-specific prefixes: GO:0008150, UBERON:0001062. Do we want to support this style? If so, how should we build the list of prefixes?

Implement convert operation

OBO format is the most important target format, but we should also support the other OWL flavours implemented by OWLAPI OWLOntologyFormat. We can have a --format FORMAT option, and if it is not specified we can try to use the file extension of the output file. These are the formats I think we should support, with their file extensions:

  • OBO .obo
  • RDFXML .owl
  • Turtle .ttl
  • OWLXML .owx
  • Manchester .omn
  • OWL Functional .ofn

Command-line examples:

robot convert --input release.owl --format OBO --output release.obo

robot convert --input release.owl --output release.obo

Add CONTRIBUTING.md to repo

Thinking ahead to more widespread adoption and multiple stakeholders, what do we think of a contributors guide.

Once we're past the rapid development phase we should encourage something like: all new operations must be done on a feature branch, perhaps the same name as the command/operation, e.g. convert-to-negative-normal-form-and-wibble-all-existentials. Ideally each would have a pull request.

This is a bigger project the protocol may be OTT for us at least at first but can still crib from their docs:
https://github.com/ga4gh/schemas/blob/master/CONTRIBUTING.md

Add op: remove-redundant-axioms

For discussion, cc @dosumis

Motivation

For various reasons, the release version of an ontology can end up with redundant axioms; e.g.

  1. A SubClassOf B
  2. B SubClassOf C
  3. A SubClassOf C

Here 3 is redundant. Generally this should be pruned from the public release. There may be situations where this is not desired: for example, 3 may be adorned with a useful axiom annotation

How it is used

As part of a release process, after reasoning and after #7

Note on GO-centric corner case: should never be applied to editors ontology as it may cause "ontology autophagy" when used as part of a cached link approach.

Extensions

The notion of redundancy can be extended to the existential graph too, but this is more subtle, and my be best considered out of scope for this op, and a new op developed. The name perform-transitive-reduction or similar should be reserved for this task and @fbastian's code adopted.

Spec

For simplicity we will only consider the graph formed by SubClassOf axioms between named classes. Equivalence axioms are ignored for the purposes of determining redundancy. Thus in the following 1 will be retained

1. A SubClassOf B
2. A EquivalentTo B and R some Z

The user should perform #7 relax step first to ensure a complete SubClassOf axiom graph is present. Motivation: if we use reasoning to determine redundancy it complicates structural and logical requirements (TBD).

If the ontology contains an axiom A SubClassOf Z, and there exist a chain of two or more axioms A SubClassOf B, B SubClassOf ... ... ... SubClassOf Z, then A SubClassOf Z is considered redundant.

Alternate reasoner based way of finding redundancy that does not require pre-reasoning: If the ontology contains an asserted axiom A SubClassOf Z and two inferences, A SubClassOf B, 'B SubClassOf Z` (as determined by the owlapi reasoner api, i.e. proper SubClassOf), then A-Z is redundant.

All redundant axioms are found, and then removed

TODO: spec equivalence between named classes corner case

Implement state object for commands

In the current design, state is passed between chained commands as an OWLOntology. This is the simplest thing that could possibly work, and it has been working so far. But we foresee the need to pass more state between commands as ROBOT is extended to handle more tasks. So we will:

  • create a CommandState class with just two methods for now:
    • getOntology(OWLOntology)
    • setOntology(OWLOntology)
  • update the signature of execute to public CommandState Command.execute(CommandState, String) throws Exception
  • update all commands for the new signature

In the future we can extend CommandState as needed without changing this signature again.

Create PROV metadata graph describing operation flow for robot commands

Example of properties to go in header
https://docs.google.com/spreadsheets/d/1pTdRsCM9terVYS7biAw5mvdNwKBKONa4dro2ry_wCqI/edit#gid=0

It's hard for some ontologies to provide this in the header (e.g. the source is .obo). It would be useful to have an easy feature to bring this in from another file. This may be as trivial as a simple merge operation.

The release process could also provide friendly warnings if some fields (e.g. 'tracker') and not filled in the release version.

Cross-platform build scripts

ROBOT supports command chaining. When combined with Make, this allows us to script complex workflows. We have also discussed making a ROBOT plugin for Gradle. But...

  • Make is not easy to run on Windows
  • we haven't made any progress on the Gradle plugin
  • a Gradle file would not look like a series of ROBOT command chains; users would have to learn Gradle and our plugin

In short, we don't have a good solution for cross-platform ROBOT build scripting.

Here's an idea. I can't decide whether it's a good idea, or stupid, or dangerous, but I'm going to propose it anyway.

Define a robotfile like this:

release.owl:
    merge --input edit.owl
    reason --remove-redundant-subclasses
    annotate -A annotations.ttl -o release.owl
    convert -o release.obo

all: [release.owl]
default: [all]

Run named tasks like this:

robot run release.owl task2 task3

Run the default tasks like this:

robot

The robotfile would be a YAML map from task name to either a command string or a list of task names. You could easily migrate from the command-line to a robotfile. Our integration tests can already run a command chain from a string and we already include YAML, so this wouldn't require much work.

The robotfile would NOT do the following

  • NO dependency specification or resolution, NO timestamp/hash checking, NONE of the other stuff that Make and Gradle are good at
  • NO shell commands
  • NO fancier YAML structures

My main concern is that feature creep would turn this into a re-implementation of Make, which would be a bad thing. The next obvious feature is some sort of templating, e.g. the current date as a variable ... and so the creep begins.

Add operation to warn of dangling logical axioms or references to owl:deprecated classes

When imports are used it's common for the upstream to change. This may result in referenced classes being deprecated/obsoleted, or left dangling (i.e pointing to a class that has no axioms about it). Note that while classes should never disappear from the source ontology, they may potentially disappear from the import module depending on how that module was generated.

Some discussion here:

http://wiki.geneontology.org/index.php/Ontology_meeting_2015-02-05#Obsoleted.2Fmerged_terms_in_external_ontologies_we_cross-ref_to

Currently in GO we do this via SPARQL checks: geneontology/go-ontology#14605

but this is a bit awkward due to TBox axioms as triples. Also harder to check if something is dangling.

It may be better to have a utility in robot for detecting reference to dangling and/or deprecated classes. That utility could be used in various places

  • in extract, the it's generally not intended for one of the seed classes to be deprecated. This should produce a warning and optionally failure if the client asks for this
  • optionally when operations such as reason are performed, give a warning that results may be incomplete. Optionally fail here.

New Operation: Make a property subset

Motivation

Users commonly want to work with a graph involving a minimal set of core relations. Sometimes this involves simply removing an OP from the ontology, but this may weaken the graph too much.

For example, many users of GO do not want to see lumen_part_of in the OBO-basic graph. We would instead rewrite this as a part_of relation. Note it's only valid to rewrite an OP to its superProperty if the axiom is SubClassOf, not Equivalence. Hence this operation is related to #7. We would typically run the property slimming after equivalence weakening

In addition to removing all OPs from the ontology, we'd want to remove any axioms that use these OPs

Basic Operation

owltools1 code:

https://github.com/owlcollab/owltools/blob/master/OWLTools-Core/src/main/java/owltools/mooncat/Mooncat.java#L1144

Given a whitelist of properties P, create a blacklist NP that is all OPs in signature not in P

For each axiom A, get OPs in signature of A. If this intersects with NP, then remove axiom

Before removal, check if possible to rewrite to a weaker axiom. The general case is awkward to specify, but the main case we care about is C SubClassOf R some D axioms. In this case, replace R with the most specific property in the whitelist that is a superproperty (inferred) of R. If there are multiple MSPs (can happen if the RBox is a DAG) then write multiple SubClassOf axioms

Extensions

It may be simpler to break this into two operations. (1) A SubClassOf weakening or rewriting step, that maps up to whitelist properties as specified above (2) A simple property filter that removes all occurrences of the blacklist properties.

The user could choose to execute (2) without (1), but they would do this knowing that the graph may be significantly weakened.

Extensions

Allow use of .yaml for templates

Nice to have, but obviously easy to have an out-of-band yaml->tsv converter.

The yaml could trivially follow the tsv format: e.g. just an array of tag:value pairs. Or it could be more nested, with a header block and a separate body block.

And although it sounds hokey, the ability to embed the yaml inside a md file that is self-documenting like the obo metadata files would be nice.

cc @dosumis

Implement reasoner options

The command architecture we're using is based on the assumption that commands are (mostly) independent, so we might want to run them in any order. For instance, we might want to merge some ontologies and then reason, or reason and then merge in some other ontologies.

I think that issues #7, #8, #16 break this assumption: if we're going to run some combination of these operations, then we always want to run them in a previously specified order.

In this latter case, I think it makes more sense to use option flags on the command, instead of separate commands. For instance #16 could be triggered with this command-line option:

robot reason --input edit.owl --remove-redundant-subclass-axioms

We can have short-forms for these options, good default options, and provide convenient options to set groups of common flags.

I'll try implementing #16 using this technique.

Proposed strategy for autophagic ontology dependencies

TBD:

cc @hdietze @dosumis

Cyclic Dependencies

Within the OBO Library, the logical dependency graph may not always be
acyclic. For example, GO makes use of CL to define "neuron
development". Conversely, CL will use GO to define "GABAergic neuron"
(in turn GO will define GABAergic secretion using CHEBI).

This is illustrated here:

cyclic

The arrows depict a logical dependency. These are typically implemented using owl:imports.

Import Modules

One common practice is to make sub-modules using the OWLAPI Syntactic
Locality Module Extractor (SLME) or OntoFox. For example, CL imports
cl/imports/uberon_import.owl (note we assume the standard OBO
library URL prefix here). This is generated from Uberon. When
generating modules the following tensions are balanced:

  • The module should be in RDF/XML to maximize reuse in different tool chains
  • The serialized form should be compact, and ideally not generate spurious diffs. Typically axiom annotations are removed.
  • The module should be logically complete as reasoning use cases for the upstream dictates
  • The module should be annotation-assertion complete as curator requirements for source dictates
  • The module should be complete enough as not to confuse general users of the source ontology

These tensions are handled in different ways by different
ontologies. Often the modules are very minimal: labels and logical
axioms; disjointness and 'infectious' axioms removed in advance.

Note that in this case there are no cycles in the import chain (all
imports live within the purl space of the source). How ever, the
combined derivation plus import graph can still have cycles:

cyclic

This leads to Autophagy. If uberon/imports/cl_import is derived
from cl.owl, and cl.owl imports cl/imports/uberon_import.owl,
which is derived from uberon, then we have uberon eating a part of
itself. Furthermore, it is eating a stale copy of part of
itself. This can cause horrendous problems, especially where we have
rigorous constraints. For example, Uberon may have a disjoitness axiom
that gets violated by an earlier less perfect version of itself.

Solution: Redirect to Null

The solution here is to redirect the import modules from external ontologies to a null ontology.

This is illustrated in the following scenario, where we have a
"mega-ontology" that wants to bring in various other ontologies, and
does not care to bring in the import modules, since these may be
autophagic dupes of the complete ontologies that are brought in:

cyclic

The cost here is maintenance on the part of the mega-ontology developers. For every import module in the chain, they need:

  • one line in a catalog-xml
  • one null ontology to redirect to (note separate nulls are required as each ontology must be named according to the ontology being faked)

Issues

This is dependent on having a catalog, which typically depends on
having an ontology 'checked out' of a repo; or on some kind of distro
being made; e.g. owl/zip.

If the catalog is not present then the behavior defaults to autophagy

An alternate solution would be to copy-and-rewire. Documentation TBD.

Operation to auto-add curation status

At NCEAS meeting. Seems would be very useful to have better population of IAO curation status field. This could be partly automated (e.g. no def means not approved). If the TBN tool could generate these (by a configuration that is configurable on a per-ontology basis) that would be useful. May be as simple as a SPARQL UPDATE

Too many prefixes in converted output

Fixing #26 raised some new problems. We set a lot of prefixes by default in obo_context.jsonld. With commit 493a8a1 we use the current prefixes when converting to OWL formats that support prefixes. But the resulting files include all the prefixes, even the ones that aren't used. While it doesn't break anything, it looks bad.

Possibilities:

  1. Specify fewer prefixes, at least for output
  2. Make OWLAPI serialization smarter

Convert tables to OWL

This is a feature I want for OBI, and that we already use at IEDB. We start with a spreadsheet where the first row has headers and the second row has templates, such as "'has specified input' some X". Then for each row we generate a class by substituting the text of the cell value as "X" and then parsing the result to an OWL class expression that we use in a subclass axiom (or something). Sometimes the value for "X" is just the rdfs:label of a named class, and sometimes it's a more complex class expression. Things can get more complicated from there.

The specific use case is that OBI assays have a lot of common structure that's getting lost when people define them one-at-a-time in Protege. We've done the work to standardize them in a few spreadsheets with required and optional columns, and now we want to have a standard tool for converting the spreadsheets to OWL.

This feature will require the ManchesterOWLSyntaxClassExpressionParser and the OWLEntityChecker. I have an implementation in Clojure that I plan to adapt.

Signaling of fatal/blocking errors

While experimenting with the reasoner command/operation. I noticed that the method just silently returns (and maybe event writes the result to a file) in case of an inconsistent ontology. There is no change in exit code (command-line level) or Java return value to indicate that there was an fatal problem.

In general we need an agreed up on mechanism to signal an error for a command (should we really use exceptions for this?). This should also result in a non-normal exit code for the command-line.

Rename this repo (and java package space)

List your suggestions here.

I'll veto owltools2.

  • ootk - Ontology Operations Tool Kit (pronounced ooh-tik)
  • goof - General Ontology Operations Framework
  • gorf - General Ontology Release Framework (also a classic video game)
  • spork - Sensible Processing of Ontologies and Reporting Kit (or sprok or sprokit)

RobotMaker: an idea for providing a standard setup for running robot in travis and other CIs

Does the following sound like a good idea? This would be a file that would be checked into some standard repository, possibly the robot one itself.

##   This is a generic Makefile that can be incorporated into any OBO project repository and included in
##   a project-specific Makefile by adding the following to the top of the makefile
##
## include Makefile-OBO
##
##   Note that Make does not support includes over URLs, so it is necessary to first copy this into your project directory.
##   (and then `git add` etc). After doing this it can be synchronized like so:
##
## make regenerate
##
##   Currently the main purpose is to define some standard build targets, variables, and to provide a standard way
##   of obtaining the robot executable, e.g. for running in travis
##
## For more details see: https://github.com/ontodev/robot

## OBO Library prefix
OBO=http://purl.obolibrary.org/obo

## This file
THIS=Makefile-OBO

## Robot executable
ROBOT=./robot

## Current location where executable and jars can be found
ROBOJENKINS=http://build.berkeleybop.org/job/robot/lastSuccessfulBuild/artifact/bin

## 'install' robot
robot.jar:
    wget $(ROBOJENKINS)/$@ -O $@
robot: robot.jar
    wget $(ROBOJENKINS)/$@ -O $@ && chmod +x $@

## RobotMakers must sometimes remake themselves
regenerate:
    wget https://raw.githubusercontent.com/ontodev/robot/master/examples/$(THIS) -O $(THIS)

Upgrade to OWLAPI 4

Better to do this sooner than later. Anyone have any experience with updating from 3 to 4?

We need to make sure the ELK and HermiT are happy with the OWLAPI 4.

Packaging ROBOT as a CLI utility

How should we package ROBOT as a CLI utility?

There's a spectrum of comfort with the command line. Some people won't touch anything without a GUI, while others live on the command line, but there's a wide range in between that's often ignored. I want to make ROBOT so easy to install that it covers most of that middle ground.

Since we're using Java, we can package all the dependencies into an uberjar with a Main class. That's what I've been doing for development.

The tricky part that varies across platforms is executing the jar. I think that the usual, minimal technique is to provide three files for download: the jar, a Unix shell script, and a Windows batch script. But there are friendlier ways to deliver these to users:

  1. Mac OS X: I use Homebrew a lot. It should be easy to make a Homebrew repository, then run:

    brew tap ontodev
    brew install robot
    
  2. Linux: We could package a DEB and RPM, maybe with FPM.

  3. Windows: I haven't used Windows in a decade. Can someone else speak to this?

Another idea: The Clojure's build tool Leiningen is installed as a single shell or batch script. Just put it on your PATH and it will download/update/run the lein.jar file. This is slick, but the cross-platform shell script is scary.

I'd also like to use GitHub's release system, if possible.

Save ontologies to the format specified by their extension

Most commands support an --output option. When the output is an ontology, it should be saved in the OWL format specified by the file extension.

The exception is the convert command, which is used to override this behaviour. When the --format option is used, the ontology will be saved in the specified format no matter what the --output file extension is.

Only one --output is allowed for the convert command, but most other commands should allow multiple --outputs.

The supported formats and file extensions are:

  • RDFXML .owl
  • OBO .obo
  • Turtle .ttl
  • OWLXML .owx
  • Manchester .omn
  • OWL Functional .ofn

Operations and commands should mutate input ontologies unless that doesn't make sense

Almost all of the commands and operations we're defining take an input ontology and change it in some way. I've been resisting this at the implementation level because I prefer immutable data, but immutability goes against the grain of Java and OWLAPI. I'm going to stop resisting and adopt these conventions:

  • mutate input ontologies unless it doesn't make sense for that operation
  • when the input to a method is mutated, the return type must be void
  • Javadocs must be clear about whether mutation is happening or not

These existing operations should mutate the input ontologies:

  • merge: start with the first input ontology and add axioms to it
  • filter: remove axioms from the input ontology
  • reason: add inferred axioms to the input ontology

These existing operations do not mutate the input ontologies:

  • diff
  • extract

Better to make these changes now, while we don't have many operations.

Support for catalog xml

Does robot support import redirects via Protege-style catalog xml

e.g.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <uri id="User Entered Import Resolution" name="http://purl.obolibrary.org/obo/ro/core.owl" uri="imports/ro_import.owl"/>
...

If not, can this support be added? Many use cases for OBOish OWL ontologies, including GO, require this.

Many thanks,
David

New operation: relax axioms / weaken-equivalence

Not sure what the formal name for this is, @dosumis calls this axiom relaxation

Motivation

It is frequently convenient to view an ontology without equivalence axioms. This is often for structural reasons. Certain editions of ontologies may come with a guarantee that the existential graph formed by all SubClassOf axioms (between named classes and existential axioms) is both complete (w.r.t graph operations) and non-redundant. Including EquivalentClasses axioms can introduce redundancy at the graph view level. For example, the genus is frequently more general than the inferred superclasses.

To ensure that the existential graph is graph-complete it's necessary to write new SubClassOf axioms that are entailed by (but weaker than) Equivalence axioms

Basic Operation

For any equivalence axiom between a name class C and either a single existential X_1 or the class expression IntersectionOf(X_1 ... X_n), generate axioms

  • C SubClassOf X_1
  • ...
  • C SubClassOf X_n

This could possibly be conceived of as the chaining of two operations: (1) weakening of an equivalence to a SubClassOf (2) rewriting a C SubClassOf IntersectionOf(...) axiom to multiple subclass axioms. However, there should be a way to present this to the user as a single operation.

Extensions

The basic operation could be extended to axioms involving arbitrary numbers of named classes and at least one existential. It would also be valid to rewrite equivalents between two named classes as reciprocal subClassOf axioms, but there is no requirement to do so at present so I recommend sticking with the basic.

Generate and validate a basic version of an ontology

Many communities find it useful to have a "basic" version of an ontology. For many bioinformatics users, the following characteristics are often assumed (sometimes for good reason, sometimes historic arbitrariness):

http://oboformat.googlecode.com/svn/trunk/doc/obo-syntax.html#6.1

Some like 6.1.11 is too obo specific. But a modified version that is "no axiom annotations" might be more generally useful

Some of these rules can only be used to validate rather than generate. For example, for the existential graph to be a DAG, the ontology release manager should make a choice as to the strategy in which this is achieved (in GO, it is by restricting to a set of OPs that are irreflexive or otherwise guantaeed not to cycle at the class EG level).

Of course other communities may have different notions of basic. For example, I have seen cases of all existential axioms being removed. But this would render many bioontologies unfit for their primary purpose.

TBD: should robot-core have a single "generate-basic" command that hardwires a certain communities's assumptions, or should it rather facilitate obo-basic by means of a combination of atomic simple commands and/or the ability to plug in SPARQL queries or similar?

cc @dosumis @hdietze

Add a dump-terms command

Primary use case: replace the SLME method in owltools, which currently takes a seed ontology and external ontologies as inputs.

Better to split this into two steps:

  • robot dump -i seed.owl > terms.txt
  • robot extract -i ext1.owl --term-file terms.txt -o imports/import_ext1.owl
  • ...

Here I think 'terms' should be treated liberally as any named OWL object (OP, C, I)

Fixing import module woes

Note: this ticket is not yet well-specified, intended to serve as an area to get the ball rolling and collect requirements. Some familiarity with existing makefile-based owltools pipelines would help.

Minimal annotations

When creating import modules, I generally make the annotations minimal. Labels (plus the necessary logical axioms for reasoning of course). Sometimes this is not ideal; editors of the main ontology like to be able to search by syns, see definitions at least. E.g. https://code.google.com/p/envo/issues/detail?id=131

Currently the strategy is to tweak the makefile according to requests from the main ontology editors. E.g. sometimes we change the command to include syns/defs. The main reason not to include these is simply to avoid VCS churn (e.g. for GO the import modules are regenerated daily, people frequently need the set of external terms referenced handy)

Extending the import module

If editors need an external term not currently in an import module, we often put them through a hellish procedure. Some README-editors files have a baroque procedure involving switching to a URI view, pasting in the desire URI, then regenerating the imports. With GO editors add to an imports-requests obo file and Jenkins will include the term in the import module when they wake up next morning.

Short of having this fully integrated into Protege it would be nice if there was an easy to run command line way of extending the import module.

Another thing that would be useful would be a way of seamlessly swapping out the import module for the full ontology (with some catalog trickery plus handling of syncing a local copy of the full ontology). This would allow the editors unhindered search of the full external ontology. This can sometimes overwhelm Protege (e.g. CHEBI or Uberon as externals) so an intermediate strategy may be useful (e.g. a module that includes all classes and logical axioms, but excludes axiom annotations, for example)

New option proposal: split

I would like to have an operation for auto-modularizing an ontology. Note this is distinct from module extraction, but not unrelated. The goal is to split a monolithic ontology into two or more modules based on some kind of organization criteria.

One of the drivers for this is slightly artificial: file size limits on github for large OWL releases.

The input would be an Ontology. This may be have an import chain or may be standalone. The output would be a set of Ontologies o1...on. These would all form a connected import graph with a single root. Ideally this would be or be easily turned into a OWL/ZIP file, see owlcs/owlapi#375

We can imagine a few strategies. The two obvious ones to me are

  1. axiom-type based
  2. content-based

For axiom type-based we could simply have a file for every axiom type (e.g. SubClassOf). Some could be further partitioned: e.g. axiom types such as AnnotationAssertion could be further split based on the AnnotationProperty.

Content type would ideally magically make biologically meaningful modules. E.g. an anatomy ontology would split by organ system. However, it's far from being this simple, many partitions overlap, and there may be axiom cross-talk between partitions, defeating the point of modularization. It seems this is best done manually or at least semi-automatically, and is a good long term goal for ontologies but not doable in this ticket. @sesuncedu has previously alluded to some kind of automatic technique for splitting an ontology.

I assume there has been a lot of theoretical work on this, but the theoretical work has not made it's way back into robust tooling.

We can make a distinction here between source and release files. There may be different use cases depending for source and release modules, and the modules used by source developers may be different from the modules required by release consumers.

For the immediate use case I have in mind, the source of the ontology is .obo (compact, monolithic, easily diffed) and the derived .owl is much larger hence the need to split somehow. I acknowledge this applies to a decreasing number of projects. Longer term, for ontologies maintained as .owl, we will need to switch to modularized source, but one difficulty here is ensuring axioms stay in the right module (it's not quite as easy as editing in Eclipse). Strategies involve modifying Protege, post-save hooks to reorganize axioms into modules or simply discipline. This is out of scope for this ticket but worth mentioning as context. Particularly, option (2) makes more sense than (1) for source modules. However, there may be projects where (1) makes sense for source. E.g. non-logic oriented editors edit textual descriptive properties in the AnnotationAssertion module, core developers edit SubClassOf EquivalentClasses module, advanced developers edit GCI, propertyChain, etc.

Currently this ticket is more for discussion than implementation.

cc @dosumis @mcourtot

Implement annotate operation

We need an operation to change ontology annotations. We've talked about getting annotations from a YAML file ... I'd like to see an example. Here are some ideas for the command-line interface:

  • --ontology-iri IRI (-O) set the ontology IRI
  • --version-iri IRI (-V) set the ontology version IRI
  • --remove-all-annotations remove all annotations on this ontology
  • --annotation IRI VALUE (-a) add an annotation for property with value (and maybe a type)
  • --annotations-file FILE (-A) add annotations from a file

Example:

OBO="http://purl.obolibrary.org/obo"
robot annotate --input reasoned.owl \
  --ontology-iri "$OBO/obi.owl" \
  --version-iri "$OBO/obi/2015-05-04/obi.owl" \
  --remove-all-annotations \
  --annotation rdfs:label "Ontology for Biomedical Investigations" \
  --annotation rdfs:comment "123"^^xsd:integer \
  --output obi.owl

Create an example repository that demonstrates robot

The repo would have

  • a test ontology with some imports, in src/ontology
  • a Makefile or similar
  • a "make test" target or similar, that triggers robot
  • a travis.yml file, for robotravis
  • a "make release" target or similar, uses robot

Could be adapted into OBO tutorial - e.g. people could fork repo for testing

Perhaps >1 repo; e.g. one repo could be for the basic kind of setup above, another could be for the application-ontology-builder style

See also: http://douroucouli.wordpress.com/2014/01/08/creating-an-ontology-project/

Packaging ROBOT as a library

How should we package ROBOT as a library?

The most convenient option for users would be to publish on Maven Central. That might be a bit of a hassle to set up, I've never done it before.

The other obvious choice is to publish to the BBOP repository. This requires users to configure both the repository and the package.

We'll also provide versioned jars for download.

Implement MIREOT

(Issue #3 is related.)

We have a basic extract command using the OWLAPI SyntacticLocalityModuleExtractor. We also need to support MIREOT. The Courtot et al 2011 paper was about single classes, but people have extended the idea in various ways.

  • OntoFox allows you to specify "low level" and "top level" terms, then several options:
    • intermediates
      • includeNoIntermediates: just the requested terms
      • includeAllIntermediates: requested low level terms and all their ancestors up to the top level terms
      • includeComputedIntermediates: requested low level and top level terms, plus some common ancestor terms in between
    • other settings
      • includeAllChildren: adds the children of a low level term
      • subClassOf: overrides parent
  • James Malone and Simon Jupp have another set of uses:
    • Mireot Basic
    • Mireot Full
    • Partial Closure
    • Full Closure

Some of these are not very well specified, but they cover a range of use cases in the wild.

annotation axiom filtering

the current filter command is named generically but it's fairly specific (filters axioms that reference OPs not in whitelist)

Should we have a new command for APs? Combine?

Also, another useful owltools option was to remove axiom annotations. Add as sub-option?

Implement query operation

The query operation should allow arbitrary SPARQL select and update queries to be run against the ontology, saving the results to files. I'm most familiar with Apache Jena, so I plan to use that. Things will be simpler if we load the ontology into the default RDF graph, and don't use named graphs.

We may want to run multiple queries, but we only want to load the ontology into an RDF graph once. The current chaining implementation passes state as an OWLOntology, and I'd like to stick with that simple solution as long as possible. So I propose this command-line interface:

  • --select INPUT OUTPUT (-s) take an input SPARQL file, run the select query, and save to a file; the output format will be determined by the file extension
  • --update INPUT (-u) take an input SPARQL file, run the update query

You can specify these options multiple times. Apache CLI should keep them in the right order. When all queries have been run, we'll load the default RDF graph into an ontology for further processing. Suggestions for the best way to do this are appreciated!

Jena supports these output file formats, and we'll use these file extensions:

  • text .txt
  • CSV .csv
  • TSV .tsv
  • XML .xml
  • JSON .js or .json

Note that the text format is close to some of the table formats accepted by various Markdown parsers: http://pandoc.org/README.html#tables

Example:

robot query --input example.owl \
  --select query1.rq result1.csv \
  --select query2.rq result2.csv \
  --update update1.rq \
  --update update2.rq \
  --output updated.owl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.