fairplus / data-maturity Goto Github PK

View Code? Open in Web Editor NEW

8.0 6.0 10.0 23.3 MB

FAIR Dataset Maturity model

Home Page: https://fairplus.github.io/Data-Maturity/

License: MIT License

HTML 20.66% Dockerfile 0.22% Ruby 6.38% Liquid 0.26% SCSS 54.33% JavaScript 17.07% Shell 0.17% Python 0.91%

biomedical-data-science lifescience fair fair-data maturity-model

data-maturity's Introduction

FAIRplus Dataset Maturity (DSM) Model

The FAIRplus-DSM model is intended as a comprehensive reference model for state-of-FAIRness improvement in research datasets. Based on the FAIR guiding principles, the DSM model defines and classifies requirements that constitute an incremental path towards improving FAIRness level for a given research dataset.

Contributing

You are welcome to contribute to the content. The material is developed in markdown and a jekyll template (Just the docs) is used to format the markdown pages and generate the website (https://fairplus.github.io/Data-Maturity/).

If you want to add content please create a new branch from this one. When you are ready to merge your changes open a pull request against this branch.
The content of the website is in markdown files in the /docs directory whereas the images included in the markdown files are in assets/images.
Refer to the Just the Docs documentation for usage and customisation information.

License

The FAIRplus DSM Model content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
The jekyll theme is available as open source under the terms of the MIT License.

data-maturity's People

Contributors

Stargazers

Watchers

Forkers

iemam tabbassidaloii lauportell mkonopkoelixir denoa1 rpatil524 haipinglu yojanagadiya fabiolib millybilli

data-maturity's Issues

Comments on Usage Areas and Indicators

Hi,
After going through Data Usage Areas and FAIR indicators I have some doubts and comments.

Data Usage Areas:

How do this relate to maturity levels? Will these evolve to levels or completely different things?
I think the area "Reproducibility" should have more associated indicators, like F+S04 and F+S06. I see repurposing and reproducibility as two ways of reusing data, I would expect them to share most of their requisites.
Regarding the future data usage areas, I see data analytics as a type of repurposing, too specific to be an area on its own.

FAIR+ indicators:

If a research project follows the study-essay model in one way or another it would not be possible to apply this framework to assess its maturity. If following the study-essay model is a prerequisite, I think it should be stated and the model explained a bit. It would be especially useful for those still planning their data management.
From the indicators defined at the study and essay levels, it is assumed essays do not have metadata, only studies. I see this metadata also describes the essays, but I would still expect each experiment to have its associated metadata.

TASK: Add and update glossary

Issues should resolve the following:

Completion of the definition of terms within the glossary
Updation of terms with respect to DSM v3.0
Deployment of the glossary on the GitHub pages
Linking of the DSM tool with the glossary terms

Test Issue

Lorem Ipsum

F+A06: Needs rephrasing

The description for this indicator seems to be entirely based on a certain organisational view of data that is quite obscure and restrictive. It would be helpful if this could be reworded into a more generic form that is broadly applicable to all research data. The indicator talks about various aspects like data models, which are covered elsewhere, as well as unique identifiers, which are not. This makes it seem like the remit for this indicator is too broad and it should be more focussed

F+S08 not mapped to data usage areas

Hi, in v0.1 Only F+S08a,b,c were mapped to data usage areas. F+S08 is not present in the indicator-DU mapping. Could you please have a look? Thanks!

F+S01: Title in summary table does not match title in the list

This may just be a copy/paste error as F+S01 and F+S02 seem to have the same title. If it isn't an error, consider renaming one of the two as this is otherwise confusing.

F+S06: Consider rewording indicator from an instruction to a neutral statement

All other indicators are phrased as a neutral statement while this indicator is worded as an instruction. Reword for consistency.

F+S03: Title doesn't make any sense + definition vague

It's possible that the title is just a typo but it doesn't make any sense in its current format. The description is quite vague and the examples focus only on sequencing and variation data, specifically on file formats. This is a common bias in biomedical data standards.

This entire indicator needs to be revised and widened. It mentions biological data types in half a sentence but doesn't define what it means by this and only refers to file types afterwards.

F+S08(a-d): need clearer definitions

The description of the overall indicator (F+S08) seems to make a distinction between structured metadata and machine-readable metadata without clarifying what is meant by this (this whole point of structuring metadata is to make it machine readable). It also talks about common vocabularies. Neither concept (structured metadata or vocabularies) are revisited in the sub-indicators. This should be clarified.

There is a lot of overlap between indicator a, b and c. In particular, data processing and analyses methods are generally considered to be experimental protocols and should therefore be covered under indicator a. If indicator a is intended to only focus on sample collection and sample processing (in vivo/vitro) steps of a study while c focuses on the in silico aspects, this needs to be clarified

F+A05

This indicator suddenly mentions competency questions, which haven't come up before. This is confusing.

Also, as much as I love the OBO foundry, they are not the be-all and end-all of biomedical ontology standards and there are some excellent ontologies and vocabularies that are not on their list. Please represent them as a good example rather than the definitive list of available options.

Missing indicator: unique identifiers

Most areas of FAIR, including data models, versioning, licensing and vocabularies are covered in one or more of these indicators. However no indicators talks about (meta)data having unique and persistent identifiers. There should be an indicator for this, probably under F+A.

Task: Linking FairPlus Indicators with Cookbook Recipes

F+S04: Indicator is too example/implementation focussed

There is very little generic description of what this indicator is supposed to capture. The description immediately focussed on specific examples, which only cover a very narrow range of biological experiments, namely classic sample/tissue collection experiments.

F. F+MM-2.3H name

The definition given here is not consistent with what is provided in the DMM excel sheet (FAIRplus Dataset Maturity Model_v0.2).
Based on the sheet, the definition for F. F+MM-2.3H is Hosting environment offers the capability to browse related Datasets.

TASK: Curato.owl alignment

Checking the alignment of CMM processes with Curato.owl curation ontology

Refine one example indicator for assessment result

Task: partial compliance is very generic. We need to work on refining it
show the progress
Sample protocols

Oya will share the first try with Nick

Get a stable link for FAIR assessment doc v0.1

Hi, @oyadenizbeyan @madhavij

I am working on providing links to the FAIRplus indicators v0.1. I want to give people links to the FAIR indicator v0.1 and FAIR assessment documents.

However, currently, even if there is a formal release, I don't know where to find the version 0.1 specific links.

Also, the current spreadsheet is provided in xlsx format, which makes it difficult for sharing with external reviewers. Something that can be directly displayed without downloading would be better?

Could you please help me out?

thanks a lot!

Broken link in page on evaluation method page

On the "Evaluation method" page the link pointing to the detailed descriptions of the FAIRPlus indicators (bullet point 2) is broken

TASK: Data Model

define a machine actionable data model to express CMM assessment.
Action: organize a hand on a meeting to discuss this model: 2 hours hachothan to work on.
Consider the relation with curation ontology

F+A03/04: strong overlap

It is unclear to me what the difference between F+A03 and F+A04 is. If an appropriate community standard is used, this should already define data structure incl types, constraints etc. In light of this, F+A04 seems to be superfluous, as long as F+A03 is updated to clarify that the use of an accepted community standard, if available, is encouraged over the definition of a new in-house standard.