Git Product home page Git Product logo

edamxpathvalidator's Introduction

DOI representing all stable versions, resolving to the latest: 10.5281/zenodo.822690

DOI of the latest stable EDAM version 1.25: 10.5281/zenodo.3899895

Current status of the 'main' development file: Build status

Latest documentation: Documentation Status

Twitter: @edamontology (follow).

What is EDAM?

EDAM is a comprehensive ontology of well-established, familiar concepts that are prevalent within computational biology, bioinformatics, and bioimage informatics. EDAM includes types of data and data identifiers, data formats, operations, and topics related to data analysis in life sciences. EDAM provides a set of concepts with preferred terms and synonyms, related terms, definitions, and other information - organised into a simple and intuitive hierarchy for convenient use (see figure).

EDAM is particularly suitable for semantic annotations and categorisation of diverse resources related to bioscientific data analysis: e.g. tools, workflows, or training materials. EDAM is also useful in data management, for recording provenance metadata of processed bioscientific data.

Viewing and download

EDAM can be browsed online at the NCBO BioPortal, at OLS, and in the EDAM Browser.

The all-newest unstable version can be browsed and commented at the NCBO BioPortal and WebProtégé (free registration required).

The latest stable version is always downloable from http://edamontology.org/EDAM.owl | tsv | csv. For older versions, see http://edamontology.org/page#Download or /releases.

EDAM relations figure FOSSA Status

Documentation

Comprehensive documentation and guidelines are available via readthedocs (maintained here).

A quick overview is at the http://edamontology.org home page.

Citing EDAM

If you refer to EDAM or its part in a scholarly publication, please cite:

Melissa Black, Lucie Lamothe, Hager Eldakroury, Mads Kierkegaard, Ankita Priya, Anne Machinda, Uttam Singh Khanduja, Drashti Patoliya, Rashika Rathi, Tawah Peggy Che Nico, Gloria Umutesi, Claudia Blankenburg, Anita Op, Precious Chieke, Omodolapo Babatunde, Steve Laurie, Steffen Neumann, Veit Schwämmle, Ivan Kuzmin, Chris Hunter, Jonathan Karr, Jon Ison, Alban Gaignard, Bryan Brancotte, Hervé Ménager, Matúš Kalaš (2022). EDAM: the bioscientific data analysis ontology (update 2021) [version 1; not peer reviewed]. F1000Research, 11(ISCB Comm J): 1. Poster. 10.7490/f1000research.1118900.1 Open access

EDAM releases are citable with DOIs too, for cases when that is needed. 10.5281/zenodo.822690 represents all releases and resolves to the DOI of the last stable release.

Research notice

Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from June 2021.

Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.

For more information, please visit our informational page or download our participant information sheet.

License

FOSSA Status

edamxpathvalidator's People

Contributors

hmenager avatar matuskalas avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

edamxpathvalidator's Issues

Don't understand this error message

multiple concepts with the same namespace  have the same synonym 'Small molecule sketch' - 'Chemical structure image' (http://edamontology.org/data_1712) -> 'Chemical structure image' (http://edamontology.org/data_1712)

There's only concept listed so this doesn't compute :).

"Small molecular sketch" was a synonym in the (now-deprecated) Chemical structure sketch - which is replacdd by Chemical structure image, if that helps .

I could be missing something ...

Check for maximum depth

In the soon-to-be-released Editors Guide:

Each subontology must not descend beyond a certain depth (see below).  Specifically, this means that each concept **MUST** have at least one path to root (*i.e.* `Topic <>`_, `Operation <>`_, `Data <>`_ or `Format <>`_ no deeper than indicated.   It's OK for a concept to have other paths to root that are deeper than this.
   2.1 **Topics** 3 levels deep max. *i.e.* *Topic* (root) -> Topic -> Subtopic -> Subsubtopic (leaves)
   2.2 **Operations** 6 levels deep max. 
   2.3 **Data** 4 levels deep max. 
   2.4 **Format** - 3 levels deep max. 

Not sure (I doubt) xpath can do this, but we need a check (raising an ERROR) in such cases.

EDAM validation using ontodev

From talking to Simon Jupp in Austria last week, we should explore https://github.com/ontodev/robot for EDAM housekeeping including validation, diff, slim generation etc.

Validation is provided by the report function - we'd need to provide a list of SPARQL queries for the specific checks we'd want (@simonjupp - could you pls. send a link to some sample queries - which you showed me last week)

There's also a file structure-independent ontology diff and lots of other stuff

It could augment or perhaps eventually replace edamxpathvalidator

cc @hmenager

Check for singletons

i.e. concepts with a single kid only, suggest to raise WARN

And for chains of singletons, suggest to raise ERROR - we don't want these and should fix, prob. by adding appropriate sibling concepts.

Check for multiple parents

Raise WARN but we need to define exceptions according to the patters in EDAM where we know this is OK, e.g.

  • Formats link to both Format (by type of data)(http://edamontology.org/format_2350) (or kids) and one of XML, Textual format etc.
  • Operations can link to a top-level Operation in addition to others
  • etc. (we need to enumerate exact rules)

Maybe raise ERROR when something has more than two parents - this points to a conceptual mess that needs cleaning up.

I'll firm this up once the Editors Guide (http://edamontologydocs.readthedocs.io/en/latest/editors_guide.html) is done.

Add checks for subsets

This has to be first cleaned-up in EDAM itself, see edamontology/edamontology#418

  • Every data_|format_|operation_|topic_ concept has data|format|operation|topic subset
  • Everything in the mainline EDAM needs subset bio
  • Subsets are encoded in a desired way, ideally (Semantic) Web-friendly. Needs some more experiments first...

Weird fail of "two labels" assertion when a concept is its own super-concept 👽

(editted, sorry I haven't seen properly at first)

@hmenager
There was a correct ...is superclass of itself error message and the correct the class has two labels, which is forbidden in EDAM message, but:

Not urgent, but would be nice to fix for perfection.

Update term uniqueness checker to check universal uniqueness

Atm, the check is missing uniqueness including both label and a synonym. See bug edamontology/edamontology#741.

And there is no check of uniqueness between "sub-ontologies". Let's add a warning for now, to see what the situation is.

Plus please add related_term into the checks, just like any other synonym. (An even better solution would be inferring that it is a special case of a synonym, but I assume that might be possible only with an OWL/RDF reasoner(?))

Related note, not necessarily a part of this issue: I'm considering making file extensions searchable in BioPortal etc., by also defining them as a special case of a synonym. Not strictly related to that, but it'd be great for us to see whether the file extensions are unique or overlap somewhere (possibly also with a main label of a different format). But of course unlike other terms, these can overlap with the main label of the given format (e.g. label PDF and fileext pdf).

Another note: I assume the check if case-insensitive, i.e. the same term with just different capitalisation will fail the uniqueness check, right?

  • Fail if the same term is used as the main label and as a synonym anywhere within the same "sub-ontology".
  • Output a warning if the same term is used in multiple sub-ontologies, irrespective of whether it is the main label or a synonym.
  • Include related_term into the check, fail if not unique.

Spell-checking definitions and comments

Not a high priority, but very nice to convey professionalism, and easy via Unix ?

We'd need exceptions (to handle informatics jargon) including all known terms and synonyms already defined in EDAM.

Should raise WARN with action to update the exceptions dictionary

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.