rda-dmp-common / hackathon-2020 Goto Github PK
View Code? Open in Web Editor NEWRDA hackathon on maDMPs
License: The Unlicense
RDA hackathon on maDMPs
License: The Unlicense
It would be great if we can write a publication together to present our results for sharing and referencing...
Goal: Be able to import maDMPs to Data Stewardship Wizard as questionnaires
We will work on this, when ready and happy with #7. This task requires more planning and probably may result in some PoC rather then in-DSW implementation.
maDMPs can be used to exchange information between two systems, but they can also be published.
We need a repository for maDMPs that replaces existing PDF-based repositories.
The repository should allow to restrict visibility of specific parts of the maDMP. For example, information on Costs are not publically available – nobody should be allowed to see them.
Users should be able to search for relevant maDMPs and the system should display a list of relevant maDMPs, e.g.
EasyDMP is based around a question/answer format, and each question has a type, for instance single-choice, multiple-choice, yes/no, date range, url, or typed value fetched from external API via the EEStore. A set of questions are collected into a "section", which is collected into a "template". The answers per plan are stored as JSON and is perfectly machine readable but since there is no ontology or vocabulary it is not machine actionable. In addition a free text version is generated for use as an attachment to funding applications.
It'd be handy to have types for: quota (number plus type, say), email address, cost (currency, value), keywords, language, controlled vocabulary locally stored etc., many possiblities.
This should be relatively easy for a newcomer to EasyDMP.
Since the main funding agencies now require Data Management Plans (DMP) during grant applications, the Institute for Systems and Computer Engineering, Technology and Science (INESC TEC)* started to help researchers with DMP creation. The plans are created using collaborative method between data steward and researchers that includes several kinds of activity, such as interviews, analysis of the publications, data related to the project and DMP examples existing in their scientific domain. After the preliminary work by the data steward, the first version of the plan is created and presented to researchers for refinement, corrections, and ultimately completion of a final version. This method simplifies DMP creation, can be applied in different domains, creates DMPs with more details, but requires improvements.
Thus, during the Hackathon, the work of our group will be focused on the analysis of the existed DMPs created at INESC TEC according to maDMP scheme. We will juxtapose of our method and maDMP concept to identify what we need to change (add, delete, edit) in our DMPs to make them machine-actionable. In other words, we will identify requirements to improve INESC TEC RDM Workflow and make our DMP method conforming to maDMP concept.
RDA DMP Common Standard has a notion of datasets, one of which is obligatory, but multiple are supported. EasyDMP is oriented around a "plan" made from answers to typed questions, which is converted to a free text form, with no notion of datasets.
This is a complex redesign/refactor job, which I will use as a background/fallback task, and I don't expect any help :)
Many projects also have data management workflows that are either defined in some 'standard' (like Common Workflow Language) or in less standard, but perhaps community-wide manner. What would be good is to understand how we can incorporate this element of data management into the plan.
Example of data management workflow could be: read in data file, transform data objects into physical quantities, perform some analysis of those quantities and output the quantities to a new file.
I think one thing we could have I guess is if the workflow has a DOI then it can be referenced in the plan, but it's not clear to me where and maybe also how. The workflow itself may take some time to complete.
Here is a collection of DMPs and maDMPs:
https://zenodo.org/communities/tuw-dmps-ds-2020/
Let's review them and create some examples that all of us can use for testing our implementation of the standard.
It would be great to investigate exchange of information between the OpenAIRE Research Graph and maDMPs.
For example, the research graph has information on projects and data produced in the project. We could use this information to generate an maDMP that is later submitted to funder. This can also work in the opposite direction: maDMP created using a dmp tool contains already information on data that is reused for a research project and data that was generated. This can be upload to the knowledge graph.
Example of a result from the KG: http://api.openaire.eu/search/datasets?projectID=777541
KG: https://www.openaire.eu/blogs/the-openaire-research-graph
https://zenodo.org/communities/openaire-research-graph?page=1&size=20
information to be followed...
The current specification of the standard has many fields defined as Strings, because there was no standard vocabulary to be used by the whole community. For example, Dataset\type can be set to any String value. This is because there are different vocabularies with the RDM community. For example, DataCite and COAR define vocabularies of types of datasets.
There is a need for a group that would analyse maDMP specification in view of fields for which establishing a common vocabulary would be needed and would be possible. This can result in developments after the hackathon - maybe an RDA WG should be established for that purpose? Dataset\type is just an example of alignment needed.
Integrating our CRIS/RIMS (Converis) with our institutional DMP platform, to make things more streamlined for researchers and admins alike.
Implementation of publishing of (selected) UCT DMPs through DataCite, who already mint the UCT-branded dois for the scholarly outputs published on UCTs data repository ZivaHub, running on Figshare for Institutions.
Goal: Be able to export maDMPs in RDF from Data Stewardship Wizard
Work on extending and refining the baseline mechanism of OpenDMP software for importing and exporting maDMPs.
Validate the alignment with the models of other tools as those are represented in the current maDMP specicfication.
I think in the domain ABC we need to include more information on Security and Privacy. I have a collection of DMPs and would like to define an extension to the standard by defining additional fields to reflect the needs of domain ABC.
[this is an example]
As part of the ongoing effort to have different formats to represent the DMP Common Standard, I'm looking for help in creating a new version of the DCSO.
There are four main points of action:
1. Use some means (SHACL, ShEx or some other option) to represent the constraints in the DMP Common Standard.
2. Integrate the DCAT and DublinCore ontologies into the existing DCSO, thus reusing classes (and properties) as opposed to the current practice of redefining classes.
3. How to represent the custom controlled vocabularies required for some of the existing fields, in a way that they allow for validation (i.e., the usage of iso-3166-1-alpha2 in the geo_location property in the Host class).
4. Provide the DCSO with a purl, thus solving the current namespace issue. Which is unsuitable for long term preservation and reuse.
Edit:
I'm currently solving issue 4. Following the advice of robertgiessmann. Thanks!
Edit2:
Issue 4 solved. https://w3id.org/dcso Thanks.
First step: extract maDMP from figshare repository using the figshare API https://docs.figshare.com/
Second step: test importing of an maDMP into figshare
We have already some examples [1] [2] [3] on how maDMP can be mapped to Science Europe or Horizon 2020 DMP templates. There is a need for a group that would review existing mappings and would come up with a single mapping. Small extensions to the maDMP standard may also be needed. A perfect outcome for the hackathon would be a common mapping for one of the popular research funders. New mappings, e.g. to NSF, are also welcomed!
[1] https://doi.org/10.5281/zenodo.3727720
[2] https://doi.org/10.5281/zenodo.3727714
[3] https://doi.org/10.5281/zenodo.3727724
A collection of DMPs and maDMPs can also be found here:
https://zenodo.org/communities/tuw-dmps-ds-2020/
I would like to export information from tool X, that I maintain, into the tool Y. I think that this could help us in ABC. Let's see how much information we can exchange.
(This is just an example)
The FIP (FAIR Implementation Profile, which is the output of a filled in FAIR Matrix questionnaire, found at https://fair-matrix.ds-wizard.org) can be seen as the DNA of a DMP. I would like to investigate how to logically and technically link the two.
The Research Data Management Organiser (RDMO) is tool to organize information about data management and create DMPs. Our GitHub orga is https://github.com/rdmorganizer.
Internally, RDMO already uses a vocabulary to abstract our questionnaire(s) from the user input. At the hackday, we want to map this internal vocabulary (we call it domain
) to maDMP and create an export functionality. The core team of RDMO (@jochenklar, @triole, @leucoryx) will participate and we would be happy if other people join us.
An import of maDMP data into RDMO will be a follow-up project.
From @iliremavriqi @mb-wali @freelion93
I would like to resolve all the issues that exist in the DMP Common Standard repository :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.