Git Product home page Git Product logo

materials-data-science-and-informatics / mdmc-nep-top-level-ontology Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 5.0 668 KB

This repository collects the ongoing work towards the development of the ontology on common terms defined for the MDMC Joint Lab and NEP.

License: MIT License

Python 0.14% HTML 9.32% CSS 3.78% JavaScript 86.76%
materials-informatics materials-science ontology provenance-tracking

mdmc-nep-top-level-ontology's People

Contributors

az-ihsan avatar eosmenaj avatar mehrdadjalali-kit avatar rossellaaversa avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

mdmc-nep-top-level-ontology's Issues

Requirement Analysis: Sample

Competency Questions on Sample

From #19

  1. Which Project has the Sample been attributed to?
  2. In which Studies has the Sample been prepared?
  3. In which Studies has the Sample been used?
  4. In which Experiment has the Sample been prepared?
  5. In which Experiments has the Sample been used?
  6. Which Measurements has the Sample been prepared for?
  7. In which Measurements has the Sample been used?
  8. How has the Sample been prepared?
  9. Has the Sample been prepared by Research Users?
  10. Which Research Users has prepared the Sample?
  11. Which Sample Components is the Sample made of?
  12. Which Sample Components have been used in the Sample Preparation?
  13. Which Sample Components have been used in the Sample Component Synthesis?
  14. Which Equipment has been used to prepare the Sample?
  15. Where has the Sample been prepared?
  16. When has the Sample been prepared?
  17. Which Research Data have been related to the Sample?
  18. Which Publication Data have been related to the Sample?

Requirement Analysis: Data

Competency Questions on Data

From #19

  1. Which Research Data have been used in Data Processing?
  2. Which Research Data have been produced in Data Processing?
  3. Which Software has been used in Data Processing?
  4. Which Research Users have performed the Data Processing?
  5. Has Data Analysis been member of the Data Analysis Lifecycle?
  6. Which Project have the Research Data been attributed to?
  7. Which Research Users have produced the Research Data?
  8. At which Institutions have the Raw Data been produced?
  9. In which Study have the Research Data been produced?
  10. In which Experiments have the Raw Data been produced?
  11. In which Measurements have the Raw Data been produced?
  12. Which Instruments have been used to produce the Raw Data?
  13. Which Samples have been used in the Measurement to produce the Raw Data?
  14. Which Data Analysis have been performed to produce the Analysed Data?
  15. Which Data Analysis software has been used to produce the Analysed Data?
  16. Which Research Data does the Dataset collects?
  17. In which Data Collaboration Platform is Research Data stored?
  18. In which Data Repository is Research Data stored?
  19. Which Dataset has been Publication Data derived from?
  20. Which Metadata has described Research Data?

Data analysis lifecycle role

We need to define the role used in Data Analysis Lifecycle module. E.g., Who is perfoming data analysis, data processing, and data interpretation.

Experiment role

We need to define a role defining a research user performing an experiment, measurement, sample preparation, etc

Reference Data

Suggested definition: Reference Data which is not produced during the current Study, used as reference to compare and/or to validate the output of the Study, typically during the Data Analysis Lifecycle.

  • is subclass of: Research Data
  • is referenced by: Research Data, Conclusions

Data Analysis

According to the glossary, Data Analysis includes Data Processing, Correlative Characterization, and Data Interpretation.
In the STM use case, "Image Selection and Retrievement", "Image labelling process" and "Metadata Selection" are included in Data Processing.

Data Analysis can be superclass of Data Processing and Correlative Characterization?
BUT Data Interpretation must be outside the superclass, as it does NOT generate analysed data (only intepretation)

To check: is Correlative Characterization using ONLY raw data or it can happen that is uses also analysed data?
If it uses also analysed data, it cannot be in the superclass of Data Analysis.

Requirement Analysis: Research User

Competency Questions of Research User

From #19

  1. Which Projects is the Research User member of?
  2. Which Studies has the Research User performed?
  3. Which Experiments has the Research User performed?
  4. Which Measurements has the Research User performed?
  5. Which Instruments has the Research User used?
  6. Which Samples has the Research User used?
  7. Which Data Analysis Lifecycles has the Research User performed?
  8. Which Data has the Research User produced?
  9. Which Data has the Research User published?

Dataset

Dataset (from glossary): collection of scientifically related Research Data which can be Raw Data , Analysed Data, or other Datasets, each described by their related Metadata. The components of a Dataset remain individually identifiable within the Dataset.

Instrument and Equipment

According to the current definition, Instrument is part of Equipment.
But Instrument is a "special" part of Equipment, which allows to perform the proper Measurement.
What is missing is a formal distinction of what is Instrument and what is all the rest of the (mounted) Equipment.
Suggestion: we can use and discuss the Ontology of Mehrdad.

Modularize the ontology

At the moment the ontology is big chunk of terms that connects each other. That would be nice if we modularize the big ontology into some modules. I would propose to modularize the complete ontology into:

  1. Core
  2. Dataset
  3. Data Analysis Lifecycle
  4. Experiment

And later on, the complete ontology is the one importing all 4 modules.

The idea behind the modularization is to have list of modules that can be a stand alone ontology, e.g., The experiment ontology is specialised to describe/annotate the provenance information of an experiment.

Another Use-case example

Researcher Z records dataset A using sample B at research facility C / instrument D under set of method/conditions/parameters E (sometimes gathered in a paper lab book). Researcher Z copies the data to his/her external hard-drive and takes it back to his/her home institute F. Facility C will take care of the long term storage of the raw dataset, archived in location Y. Researcher Z explores the data in a phase of exploratory analyses, writing a set of macros/scripts G, using software H in different versions (exploratory!) with data outputs J (in different versions, accordingly), generating a set of figures K (in different versions, accordingly). Subset L(K) (often also later versions of it) will be copied into a manuscript document that is produced in various versions with input from authors. A late version of the manuscript is then published as paper M.

PS: In some cases (e.g. due to a doubt in the workflow or a question from a colleague) researcher Z or (rarely, e.g., in case of scientific misconduct is suspected) another person needs to trace back the whole provenance chain from a plot shown in figure L of M, which can be extremely time-consuming.

Use case STM workflow

Hi,

i have several questions regarding the STM workflows that i'm currently mapping to MDMC-NEP ontology.

  1. Just to make sure Image Selection & Retrievement and Image Labelling Process are the way you get the data ready to be analyzed right which is here Structured& FAIR dataset?
  2. About the Filtered Image, do you do some image filtering, machine learning, or deep learning to result in this image?
  3. I'm still not quite understand what do you mean by Metadata Selection activity and why they used the Structure & FAIR Dataset and generated Filter Image. Could you explain it again to me maybe according to the workflow in the paper?

Hope @mpanighel and @EOsmenaj can answer this..

Introduction texts

It would be great if someone can first write the introduction texts for the front page of repository as of now it looks like empty.
Or else, we can put in our meeting agendan to generate some README.md

[Use-case] STM typical Experiment example

Instrument Scientist mounts the Sample Component Ni substrate on a Sample Holder.

Sample Holder is inserted it into the preparation chamber to perform the Sample Preparation in different steps:

  • Sample Component Ni substrate undergoes sputtering step with the Equipment Sputter gun
  • Sample Component Ni substrate undergoes annealing step with the Equipment Heating stage
  • In the dosing step, Sample Component ethylene is dosed on the Ni substrate with the Equipment Gas line to produce graphene.

The Instrument LEED is used to perform a Measurement to check the Sample quality.
The obtained Sample Gr_Ni is placed in the Instrument VT-STM to perform the Measurement.

Study and Project

I would suggest not to take the Study term as a subclass of the Project because:

  1. Based on the definition Project is planned to perform Study. In the active sentence, Study performs Project
  2. One instance of Project would have e.g. 10 members of Research User, if the Study as subclass of Project, then all those 10 members also performed an instance of Study. This will violate a Project that consists of at least one or more Study with the division of Research User take part in a study, e.g. 5 Persons in study_1 and another 5 persons in study_2

Requirement Analysis: Project

Comptency Questions of Project

From #19

  1. Which Research Users are member of the Project?
  2. Which Studies have been attributed to the Project?
  3. Which Experiments have been attributed to the Project?
  4. Which Measurements have been attributed to the Project?
  5. Which Instruments have been attributed to the Project?
  6. Which Samples have been attributed to the Project?
  7. Which Data Analysis Lifecycles have been attributed to the Project?
  8. Which Research Data have been attributed to the Project?
  9. Which Publication Data have been attributed to the Project?

Sub-activity of DataAnalysisLifecycle

On 13.09.2022, we had a discussion on whether we change the property related to DataAnalysisLifeCycle and its sub-activity DataProcessing, DataAnalysis, DataInterpretataion. At the moment the usage of isMemberOf is ambiguous with the 'prov:hadMember' that relates the entity:Collection to its entity member.

After reading the PROV documents again, one FAQ suggests that we can use dcterms:hasPart to relates the activity e.g., DataAnalysisLifeCycle with its sub-activities.

Thus,

`DataAnalysisLifeCycle` dcterms:hasPart some` `DataProcessing`
`DataAnalysisLifeCycle` dcterms:hasPart some` `DataAnalysis`
`DataAnalysisLifeCycle` dcterms:hasPart some` `DataInterpretataion`

[Term Discussion] Publication Data

According to the definition of Publication Data

Publication Data is a Research Data intended to appear in a scientific publication.

It is true that Publication data is a subclass of Research Data.

But, the comment of Publication data is part of Research Data and Analysed Data is somewhat confusing because if suppose Publication Data is a subclass of Analyzed data then it inherently also a research data.

I would suggest that there is a connection between Publication Data and Dataset (disjoint union of Analysed Data, Metadata, and Raw Data) connection would be:
PublicationData hasDataset Dataset

Requirements Analysis

This is issue will be reserved for a collection of Competence Questions (CQs) as part of requirements analysis process.

Competence Questions

  1. Which experiment that took place at a Lab_1 Laboratory?
  2. Who generated the raw data?
  3. What was the sample component synthesis used during the sample preparation?
  4. During which experiment was the measured sample prepared?
  5. Which equipment was used for sample preparation?
  6. Who generated the metadata?

Requirement Analysis: Data Analysis Lifecycle

Competence Questions on DAL

From #19

  1. Which Project has the Data Analysis Lifecycle been attributed to?
  2. Which Study is the Data Analysis Lifecycle part of?
  3. Which Research Users have performed the Data Analysis Lifecycle?
  4. Which Research Data have been used for the Data Analysis Lifecycle?
  5. Which Results have been obtained from the Data Analysis Lifecycle?
  6. Which processes have been members of the Data Analysis Lifecycle?
  7. Has Data Processing been member of the Data Analysis Lifecycle?

2022 TODOs

Todo list for 2022

  1. Survey of available metadata schema, glossary and ontology
  2. We will try to align with MSLE Ontology, this will fix/expand the Instrument/Equipment part.
  3. We will create Sample Preparation
  4. We will create Measurement
  5. We will create Data Analysis
  6. Check what is available for Sample Description
  7. Check Materials Description -> Crystal Structure into atomic Level (Crystal Structure Ontology)
  8. We will decribe the real Data

TODOs for June 2022.

Requirement Analysis: Institution

Competency Questions

From #19

  1. Which Laboratories are hosted by the Institution?
  2. Which Equipment is available at the Institution?
  3. Which Instruments are available at the Institution?
  4. Which Measurement Techniques are available at the Institution?
  5. Which Experiments have been performed at the Institution?
  6. Which Measurements have been performed at the Institution?
  7. Which Raw Data have been produced at the Institution?

Requirement Analysis: Experiment

Competency Questions of Experiment

From #19

  1. Which Project has the Experiment been attributed to?
  2. Which Study is the Experiment part of?
  3. Which Measurements have been performed in the Experiment?
  4. Which Equipment has been used in the Experiment?
  5. Which Instruments have been used in the Experiment?
  6. Which Measurement Techniques have been used in the Experiment?
  7. Has the Experiment included any Sample Preparations?
  8. Which Samples have been used in the Experiment?
  9. Which Samples have been prepared in the Experiment?
  10. Where has the Experiment been performed?
  11. When has the Experiment been performed?
  12. Which Research Users have performed the Experiment?
  13. Which Raw Data have been produced in the Experiment?

Requirement Analysis: Study

Competence Questions

From #19

  1. Which Project has the Study been attributed to?
  2. Which Research Users have performed the Study?
  3. Which Experiments have been performed in the Study?
  4. Which Measurements have been performed in the Study?
  5. Which Instruments have been used in the Study?
  6. Which Samples have been used in the Study?
  7. Which Data Analysis Lifecycles have been performed in the Study?
  8. Which Research Data have been produced in the Study?
  9. Which Publication Data have been produced in the Study?
  10. At which Institutions has the Study been conducted?

Distribute CQs over PRIMA modules

To distribute CQs over PRIMA modules in order to get the overview of a module itself. By doing so, we can show to the user the ability of each module to answer specific requirements.

Semantic Alignment with PMDco

The effort of doing the sematic alignment between PRIMA and PMDco is worthwhile taking.

The idea would be to subsume several classes in PMDco for PRIMA classes. Also, the reused of PMDco relationship in PRIMA would also benefial to define the semantically sound process.

Requirement Analysis: Data Analysis

Competency Questions on Data Analysis

From #19

  1. Which Research Data have been used in Data Analysis?
  2. Which Research Data have been produced in Data Analysis?
  3. Which Data Analysis Software has been used in Data Analysis?
  4. Which Research Users have performed the Data Analysis?

[Term Discussion] Study and Experiment

As the definition Experiment and Study, respectively:

Experiment is identifiable and reproducible activity with a clear start time and clear finish time conducted by Research User who uses one or more Instruments to investigate or produce Sample and collects Raw Data about it. Experiment consists of (or includes โ€“ in case of Sample Preparation) one or a series of Measurements. Experiment can be a computer simulation (computational experiment), or a combination of it with physical Measurements.

Study is a set of one or more experiments performed by one or more Research Users in one or more Laboratories using one or more Instruments for taking one or more Measurements of one or more Samples and corresponding Data analyses, which are part of the same Project.

According to the discussion, Experiment is a subclass of the Study, but if we read to the definition above the Study is a subclass of Experiment?

If we change the Study to the subclass of Experiment? How can we see also that Sample Preparation term is a subclass of Experiment?

New terms and definition

  • Fabrication (previously SampleComponentFabrication):
    The production of a Precursor in controlled conditions performed by a commercial enterprise, one or more Research Users or a third party. Fabrication may require the use of Equipment, Consumable(s) and Instrument(s). A Measurement may also be performed during the Fabrication, e.g., to characterize the intermediate and/or final resulting Precursor(s).

  • Precursor (previously SampleComponent):
    Identifiable entity (typically a piece of material) with distinctive properties (structural, chemical, dimensional, functional and others), which is fabricated during the Fabrication and is used during the Sample Preparation to produce a Sample. It may include one or more substrates, layers, masks, evaporation materials, coatings, and molecules. A single Precursor might itself become the only SampleComponent of a Sample in case it undergoes Measurement. E.g. Precursor1 is ethylene and Precursor2 is a Ni substrate. Ethylene reacts with Ni substrate converting into graphene (SampleComponent1). The result of Sample Preparation is a Sample composed of a graphene layer (SampleComponent1) and the same Ni substrate (SampleComponent2).

  • SampleComponent (redefinition):
    Identifiable entity (typically a piece of material) which constitutes a part of a Sample, usually with distinctive properties (structural, chemical, dimensional, functional and others).

  • Sample (redefinition):
    Identifiable entity (typically a piece of material) with distinctive properties (structural, chemical, dimensional, functional and others), composed by one or more Sample Components, exposed to the Instrument during a Measurement, typically after a Sample Preparation. Sample may be held by a Sample Holder and/or carried by a Sample Carrier during the Measurement. Sample may also stand for a model, configuration or input (or any combination of them) of a Computation.

  • Sample Preparation (redefinition):
    Identifiable and reproducible set of actions (physical changes or chemical reactions) typically carried out by one or more Research Users to produce one or more Samples and/or to make the Sample(s) fit to perform a Measurement. The actions may be performed on (or between) one or more Precursors or Sample(s). Sample Preparation may require the use of Equipment, Consumable(s) and Instrument(s). A Measurement may also be performed during the Sample Preparation, e.g., to characterize the intermediate stages and/or the final resulting Sample(s).

  • Consumables (new term):
    Complementary entity used in Fabrication or Sample Preparation or Measurement which has a limited time capacity or is limited in its number of uses before it is disposed of, necessary to the process itself and normally bought from third party manufacturers. Eg:- gloves, syringes, wipes, etching solutions, glass slides, spatulas, weighing paper, two-sided tape etc.

  • Equipment (redefinition):
    Any kind of item, device, machine or other tool (also virtual) used by one or more Research Users to perform one or more Fabrication, Sample Preparations and/or Measurements. Usually, the Equipment is located in a Laboratory hosted by an Institution and is usually an investment. According to this definition, an Instrument is a particular type of Equipment.

Requirement Analysis: Laboratory

Competency Questions of Laboratory

From #19

  1. By which Institution is the Laboratory hosted?
  2. Which Equipment is available in the Laboratory?
  3. Which Instruments are available in the Laboratory?
  4. Which Measurement Techniques are available in the Laboratory?
  5. Which Experiments have been performed in the Laboratory?
  6. Which Measurements have been performed in the Laboratory?
  7. Which Raw Data have been produced in the Laboratory?

Research User

Research User should have a role, e.g., ResearchUser hasRole DataScientist

Please take a look to Datacite Role list.

[Term Discussion] Measurement and Experiment

I would suggest that Measurement is not a subclass of Experiment because

Experiment is identifiable and reproducible activity with a clear start time and clear finish time conducted by Research User who uses one or more Instruments to investigate or produce Sample and collects Raw Data about it. Experiment consists of (or includes โ€“ in case of Sample Preparation) one or a series of Measurements. Experiment can be a computer simulation (computational experiment), or a combination of it with physical Measurements.

  1. Based on the definition Experiment consist of one or a series of Measurements and consists of is not a subclass relation. in triple Experiment hasMeasurement Measurement

[Discussion] Ontology name

Please think about the fancy name of ontology
with these points in mind

  • We would like to use this as a core ontology metadata that could be extended to some particular experiment
  • It consists of organization, experiment activity, data
  • We implemented the provenance data according to ProvO

Requirement Analysis: Data Interpretation

Competency Questions on Data Interpretation

From #19

  1. Has Data Interpretation been member of the Data Analysis Lifecycle?
  2. Which Research Data have been used in Data Interpretation?
  3. Which Reference Data have been used in Data Interpretation?
  4. Which Software have been used in Data Interpretation?
  5. Which Research Users have performed the Data Interpretation?
  6. Which Result have been obtained from the Data Interpretation?

Sample Carrier

Hi, i came across this while editing the ontology.

Is the sample carrier equipment? can we say that sample carrier is a subclass of equipment rather than entity?

MatVoc

MatVoc core ontology (part of the STREAM project) has some terms which can be somehow mapped to our ontology.
@MehrdadJalali-KIT has the intention to align with the MatVoc in the MSLE ontology.

However, MatVoc is not following the Prov, as we did.
So we are aware of MatVoc but we don't directly align with it because we use Prov.

As suggested by @az-ihsan, we will decide how to argument the alignment depending on when the publications are ready.

Requirement Analysis: Measurement

Competency Questions of Measurement

From #19

  1. Which Project has the Measurement been attributed to?
  2. Which Study is the Measurement part of?
  3. Which Experiment is the Measurement part of?
  4. Which Equipment has been used in the Measurement?
  5. Which Instrument has been used in the Measurement?
  6. Which Measurement Technique has been used in the Measurement?
  7. Which Samples have been used in the Measurement?
  8. Which Samples have been prepared for the Measurement?
  9. Which Sample Carrier has been used in the Measurement?
  10. Which Raw Data have been produced in the Measurement?

[Use case] ELN Herbie collaboration

We will map the data comes from ELN Herbie.
The data consists of material casting and extrusion.

PRIMA will be used to annotate and harmonize ontologies developed in ELN Herbie.

Data Analysis Lifecycle

In the general sense, the steps of the Data Analysis Lifecycle can be combined in chains in different order, really dependent on the use case.
It's difficult for the ontology to keep track of all the possible cases.
Either we try to keep it more general (having "Data Analysis Lifecycle steps" and "intermediate results") or we try to cover the most common cases, just to ensure that something is not wrong.
For instance:

data processing:

  • input: raw data or processed data or analysed data
  • output: processed data
  • using: data processing software

data analysis:

  • input: raw data or processed data or analysed data
  • output: analysed data
  • using: data analysis software

data interpretation:

  • input: raw data or processed data or analysed data
  • output: results, conclusions, inference, ... and possibly analysed data

This can be simplified (checking the appropriate definitions):

data processing:

  • input: research data
  • output: research data
  • using: data analysis software

data analysis:

  • input: research data
  • output: research data
  • using: data analysis software

data interpretation:

  • input: research data
  • output: results, conclusions, inference, ... and possibly analysed data

With this second case, what we have it that the Data Analysis Lifecycle has:

  • input: research data
  • output: results, conclusions, inference, ... and possibly analysed data
  • using: data analysis software

independently of which steps and on which order.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.