Suggested definition: Reference Data which is not produced during the current Study, used as reference to compare and/or to validate the output of the Study, typically during the Data Analysis Lifecycle.
According to the glossary, Data Analysis includes Data Processing, Correlative Characterization, and Data Interpretation.
In the STM use case, "Image Selection and Retrievement", "Image labelling process" and "Metadata Selection" are included in Data Processing.
Data Analysis can be superclass of Data Processing and Correlative Characterization?
BUT Data Interpretation must be outside the superclass, as it does NOT generate analysed data (only intepretation)
To check: is Correlative Characterization using ONLY raw data or it can happen that is uses also analysed data?
If it uses also analysed data, it cannot be in the superclass of Data Analysis.
Dataset (from glossary): collection of scientifically related Research Data which can be Raw Data , Analysed Data, or other Datasets, each described by their related Metadata. The components of a Dataset remain individually identifiable within the Dataset.
According to the current definition, Instrument is part of Equipment.
But Instrument is a "special" part of Equipment, which allows to perform the proper Measurement.
What is missing is a formal distinction of what is Instrument and what is all the rest of the (mounted) Equipment.
Suggestion: we can use and discuss the Ontology of Mehrdad.
At the moment the ontology is big chunk of terms that connects each other. That would be nice if we modularize the big ontology into some modules. I would propose to modularize the complete ontology into:
Core
Dataset
Data Analysis Lifecycle
Experiment
And later on, the complete ontology is the one importing all 4 modules.
The idea behind the modularization is to have list of modules that can be a stand alone ontology, e.g., The experiment ontology is specialised to describe/annotate the provenance information of an experiment.
Researcher Z records dataset A using sample B at research facility C / instrument D under set of method/conditions/parameters E (sometimes gathered in a paper lab book). Researcher Z copies the data to his/her external hard-drive and takes it back to his/her home institute F. Facility C will take care of the long term storage of the raw dataset, archived in location Y. Researcher Z explores the data in a phase of exploratory analyses, writing a set of macros/scripts G, using software H in different versions (exploratory!) with data outputs J (in different versions, accordingly), generating a set of figures K (in different versions, accordingly). Subset L(K) (often also later versions of it) will be copied into a manuscript document that is produced in various versions with input from authors. A late version of the manuscript is then published as paper M.
PS: In some cases (e.g. due to a doubt in the workflow or a question from a colleague) researcher Z or (rarely, e.g., in case of scientific misconduct is suspected) another person needs to trace back the whole provenance chain from a plot shown in figure L of M, which can be extremely time-consuming.
i have several questions regarding the STM workflows that i'm currently mapping to MDMC-NEP ontology.
Just to make sure Image Selection & Retrievement and Image Labelling Process are the way you get the data ready to be analyzed right which is here Structured& FAIR dataset?
About the Filtered Image, do you do some image filtering, machine learning, or deep learning to result in this image?
I'm still not quite understand what do you mean by Metadata Selection activity and why they used the Structure & FAIR Dataset and generated Filter Image. Could you explain it again to me maybe according to the workflow in the paper?
It would be great if someone can first write the introduction texts for the front page of repository as of now it looks like empty.
Or else, we can put in our meeting agendan to generate some README.md
Instrument Scientist mounts the Sample Component Ni substrate on a Sample Holder.
Sample Holder is inserted it into the preparation chamber to perform the Sample Preparation in different steps:
Sample Component Ni substrate undergoes sputtering step with the Equipment Sputter gun
Sample Component Ni substrate undergoes annealing step with the Equipment Heating stage
In the dosing step, Sample Component ethylene is dosed on the Ni substrate with the Equipment Gas line to produce graphene.
The Instrument LEED is used to perform a Measurement to check the Sample quality.
The obtained Sample Gr_Ni is placed in the Instrument VT-STM to perform the Measurement.
I would suggest not to take the Study term as a subclass of the Project because:
Based on the definition Project is planned to perform Study. In the active sentence, Study performs Project
One instance of Project would have e.g. 10 members of Research User, if the Study as subclass of Project, then all those 10 members also performed an instance of Study. This will violate a Project that consists of at least one or more Study with the division of Research User take part in a study, e.g. 5 Persons in study_1 and another 5 persons in study_2
On 13.09.2022, we had a discussion on whether we change the property related to DataAnalysisLifeCycle and its sub-activity DataProcessing, DataAnalysis, DataInterpretataion. At the moment the usage of isMemberOf is ambiguous with the 'prov:hadMember' that relates the entity:Collection to its entity member.
After reading the PROV documents again, one FAQ suggests that we can use dcterms:hasPart to relates the activity e.g., DataAnalysisLifeCycle with its sub-activities.
Publication Data is a Research Data intended to appear in a scientific publication.
It is true that Publication data is a subclass of Research Data.
But, the comment of Publication data is part of Research Data and Analysed Data is somewhat confusing because if suppose Publication Data is a subclass of Analyzed data then it inherently also a research data.
I would suggest that there is a connection between Publication Data and Dataset (disjoint union of Analysed Data, Metadata, and Raw Data) connection would be: PublicationData hasDataset Dataset
To distribute CQs over PRIMA modules in order to get the overview of a module itself. By doing so, we can show to the user the ability of each module to answer specific requirements.
The effort of doing the sematic alignment between PRIMA and PMDco is worthwhile taking.
The idea would be to subsume several classes in PMDco for PRIMA classes. Also, the reused of PMDco relationship in PRIMA would also benefial to define the semantically sound process.
As the definition Experiment and Study, respectively:
Experiment is identifiable and reproducible activity with a clear start time and clear finish time conducted by Research User who uses one or more Instruments to investigate or produce Sample and collects Raw Data about it. Experiment consists of (or includes โ in case of Sample Preparation) one or a series of Measurements. Experiment can be a computer simulation (computational experiment), or a combination of it with physical Measurements.
Study is a set of one or more experiments performed by one or more Research Users in one or more Laboratories using one or more Instruments for taking one or more Measurements of one or more Samples and corresponding Data analyses, which are part of the same Project.
According to the discussion, Experiment is a subclass of the Study, but if we read to the definition above the Study is a subclass of Experiment?
If we change the Study to the subclass of Experiment? How can we see also that Sample Preparation term is a subclass of Experiment?
Fabrication (previously SampleComponentFabrication):
The production of a Precursor in controlled conditions performed by a commercial enterprise, one or more Research Users or a third party. Fabrication may require the use of Equipment, Consumable(s) and Instrument(s). A Measurement may also be performed during the Fabrication, e.g., to characterize the intermediate and/or final resulting Precursor(s).
Precursor (previously SampleComponent):
Identifiable entity (typically a piece of material) with distinctive properties (structural, chemical, dimensional, functional and others), which is fabricated during the Fabrication and is used during the Sample Preparation to produce a Sample. It may include one or more substrates, layers, masks, evaporation materials, coatings, and molecules. A single Precursor might itself become the only SampleComponent of a Sample in case it undergoes Measurement. E.g. Precursor1 is ethylene and Precursor2 is a Ni substrate. Ethylene reacts with Ni substrate converting into graphene (SampleComponent1). The result of Sample Preparation is a Sample composed of a graphene layer (SampleComponent1) and the same Ni substrate (SampleComponent2).
SampleComponent (redefinition):
Identifiable entity (typically a piece of material) which constitutes a part of a Sample, usually with distinctive properties (structural, chemical, dimensional, functional and others).
Sample (redefinition):
Identifiable entity (typically a piece of material) with distinctive properties (structural, chemical, dimensional, functional and others), composed by one or more Sample Components, exposed to the Instrument during a Measurement, typically after a Sample Preparation. Sample may be held by a Sample Holder and/or carried by a Sample Carrier during the Measurement. Sample may also stand for a model, configuration or input (or any combination of them) of a Computation.
Sample Preparation (redefinition):
Identifiable and reproducible set of actions (physical changes or chemical reactions) typically carried out by one or more Research Users to produce one or more Samples and/or to make the Sample(s) fit to perform a Measurement. The actions may be performed on (or between) one or more Precursors or Sample(s). Sample Preparation may require the use of Equipment, Consumable(s) and Instrument(s). A Measurement may also be performed during the Sample Preparation, e.g., to characterize the intermediate stages and/or the final resulting Sample(s).
Consumables (new term):
Complementary entity used in Fabrication or Sample Preparation or Measurement which has a limited time capacity or is limited in its number of uses before it is disposed of, necessary to the process itself and normally bought from third party manufacturers. Eg:- gloves, syringes, wipes, etching solutions, glass slides, spatulas, weighing paper, two-sided tape etc.
Equipment (redefinition):
Any kind of item, device, machine or other tool (also virtual) used by one or more Research Users to perform one or more Fabrication, Sample Preparations and/or Measurements. Usually, the Equipment is located in a Laboratory hosted by an Institution and is usually an investment. According to this definition, an Instrument is a particular type of Equipment.
I would suggest that Measurement is not a subclass of Experiment because
Experiment is identifiable and reproducible activity with a clear start time and clear finish time conducted by Research User who uses one or more Instruments to investigate or produce Sample and collects Raw Data about it. Experiment consists of (or includes โ in case of Sample Preparation) one or a series of Measurements. Experiment can be a computer simulation (computational experiment), or a combination of it with physical Measurements.
Based on the definition Experiment consist of one or a series of Measurements and consists of is not a subclass relation. in triple Experiment hasMeasurement Measurement
MatVoc core ontology (part of the STREAM project) has some terms which can be somehow mapped to our ontology. @MehrdadJalali-KIT has the intention to align with the MatVoc in the MSLE ontology.
However, MatVoc is not following the Prov, as we did.
So we are aware of MatVoc but we don't directly align with it because we use Prov.
As suggested by @az-ihsan, we will decide how to argument the alignment depending on when the publications are ready.
In the general sense, the steps of the Data Analysis Lifecycle can be combined in chains in different order, really dependent on the use case.
It's difficult for the ontology to keep track of all the possible cases.
Either we try to keep it more general (having "Data Analysis Lifecycle steps" and "intermediate results") or we try to cover the most common cases, just to ensure that something is not wrong.
For instance:
data processing:
input: raw data or processed data or analysed data
output: processed data
using: data processing software
data analysis:
input: raw data or processed data or analysed data
output: analysed data
using: data analysis software
data interpretation:
input: raw data or processed data or analysed data
output: results, conclusions, inference, ... and possibly analysed data
This can be simplified (checking the appropriate definitions):
data processing:
input: research data
output: research data
using: data analysis software
data analysis:
input: research data
output: research data
using: data analysis software
data interpretation:
input: research data
output: results, conclusions, inference, ... and possibly analysed data
With this second case, what we have it that the Data Analysis Lifecycle has:
input: research data
output: results, conclusions, inference, ... and possibly analysed data