Git Product home page Git Product logo

data-maturity's Introduction

RDMkit logo

FAIRplus Dataset Maturity (DSM) Model

The FAIRplus-DSM model is intended as a comprehensive reference model for state-of-FAIRness improvement in research datasets. Based on the FAIR guiding principles, the DSM model defines and classifies requirements that constitute an incremental path towards improving FAIRness level for a given research dataset.

Contributing

You are welcome to contribute to the content. The material is developed in markdown and a jekyll template (Just the docs) is used to format the markdown pages and generate the website (https://fairplus.github.io/Data-Maturity/).

  • If you want to add content please create a new branch from this one. When you are ready to merge your changes open a pull request against this branch.
  • The content of the website is in markdown files in the /docs directory whereas the images included in the markdown files are in assets/images.
  • Refer to the Just the Docs documentation for usage and customisation information.

License

data-maturity's People

Contributors

actions-user avatar daniwelter avatar iemam avatar lauportell avatar lrodrin avatar yojanagadiya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

data-maturity's Issues

Comments on Usage Areas and Indicators

Hi,
After going through Data Usage Areas and FAIR indicators I have some doubts and comments.

Data Usage Areas:

  • How do this relate to maturity levels? Will these evolve to levels or completely different things?
  • I think the area "Reproducibility" should have more associated indicators, like F+S04 and F+S06. I see repurposing and reproducibility as two ways of reusing data, I would expect them to share most of their requisites.
  • Regarding the future data usage areas, I see data analytics as a type of repurposing, too specific to be an area on its own.

FAIR+ indicators:

  • If a research project follows the study-essay model in one way or another it would not be possible to apply this framework to assess its maturity. If following the study-essay model is a prerequisite, I think it should be stated and the model explained a bit. It would be especially useful for those still planning their data management.
  • From the indicators defined at the study and essay levels, it is assumed essays do not have metadata, only studies. I see this metadata also describes the essays, but I would still expect each experiment to have its associated metadata.

TASK: Add and update glossary

Issues should resolve the following:

  • Completion of the definition of terms within the glossary
  • Updation of terms with respect to DSM v3.0
  • Deployment of the glossary on the GitHub pages
  • Linking of the DSM tool with the glossary terms

F+A06: Needs rephrasing

The description for this indicator seems to be entirely based on a certain organisational view of data that is quite obscure and restrictive. It would be helpful if this could be reworded into a more generic form that is broadly applicable to all research data. The indicator talks about various aspects like data models, which are covered elsewhere, as well as unique identifiers, which are not. This makes it seem like the remit for this indicator is too broad and it should be more focussed

F+S08 not mapped to data usage areas

Hi, in v0.1 Only F+S08a,b,c were mapped to data usage areas. F+S08 is not present in the indicator-DU mapping. Could you please have a look? Thanks!

F+S03: Title doesn't make any sense + definition vague

It's possible that the title is just a typo but it doesn't make any sense in its current format. The description is quite vague and the examples focus only on sequencing and variation data, specifically on file formats. This is a common bias in biomedical data standards.

This entire indicator needs to be revised and widened. It mentions biological data types in half a sentence but doesn't define what it means by this and only refers to file types afterwards.

F+S08(a-d): need clearer definitions

The description of the overall indicator (F+S08) seems to make a distinction between structured metadata and machine-readable metadata without clarifying what is meant by this (this whole point of structuring metadata is to make it machine readable). It also talks about common vocabularies. Neither concept (structured metadata or vocabularies) are revisited in the sub-indicators. This should be clarified.

There is a lot of overlap between indicator a, b and c. In particular, data processing and analyses methods are generally considered to be experimental protocols and should therefore be covered under indicator a. If indicator a is intended to only focus on sample collection and sample processing (in vivo/vitro) steps of a study while c focuses on the in silico aspects, this needs to be clarified

F+A05

This indicator suddenly mentions competency questions, which haven't come up before. This is confusing.

Also, as much as I love the OBO foundry, they are not the be-all and end-all of biomedical ontology standards and there are some excellent ontologies and vocabularies that are not on their list. Please represent them as a good example rather than the definitive list of available options.

Missing indicator: unique identifiers

Most areas of FAIR, including data models, versioning, licensing and vocabularies are covered in one or more of these indicators. However no indicators talks about (meta)data having unique and persistent identifiers. There should be an indicator for this, probably under F+A.

F+S04: Indicator is too example/implementation focussed

There is very little generic description of what this indicator is supposed to capture. The description immediately focussed on specific examples, which only cover a very narrow range of biological experiments, namely classic sample/tissue collection experiments.

F. F+MM-2.3H name

The definition given here is not consistent with what is provided in the DMM excel sheet (FAIRplus Dataset Maturity Model_v0.2).
Based on the sheet, the definition for F. F+MM-2.3H is Hosting environment offers the capability to browse related Datasets.

Get a stable link for FAIR assessment doc v0.1

Hi, @oyadenizbeyan @madhavij

I am working on providing links to the FAIRplus indicators v0.1. I want to give people links to the FAIR indicator v0.1 and FAIR assessment documents.

However, currently, even if there is a formal release, I don't know where to find the version 0.1 specific links.

Also, the current spreadsheet is provided in xlsx format, which makes it difficult for sharing with external reviewers. Something that can be directly displayed without downloading would be better?

Could you please help me out?

thanks a lot!

TASK: Data Model

define a machine actionable data model to express CMM assessment.
Action: organize a hand on a meeting to discuss this model: 2 hours hachothan to work on.
Consider the relation with curation ontology

F+A03/04: strong overlap

It is unclear to me what the difference between F+A03 and F+A04 is. If an appropriate community standard is used, this should already define data structure incl types, constraints etc. In light of this, F+A04 seems to be superfluous, as long as F+A03 is updated to clarify that the use of an accepted community standard, if available, is encouraged over the definition of a new in-house standard.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.