Dear @open-sci/playarists,
Please find here attached my comments on all your material. You have to address all of them and, once finalised, close this issue with a comment containing your reply to each of the points I have highlighted. There is no specific deadline to complete this task; thus, please take your time.
Please, be aware that some modifications in some documents may also affect modifications in other documents. As a final note, please remember to keep your notebooks up-to-date.
After closing this issue, please remember to update your material.md file by specifying the references to the new version of all your documents.
As usual, for further doubts, do not hesitate to contact me in the Slack channel or just comment on this issue here.
General comments from the presentation
The following comments and questions should be addressed and may also result in the modification of some of the material prepared for the project:
-
Did you have the chance to compare the US vs Europe, also in terms of the kinds of articles?
-
You said that, considering the countries of journals, the UK wins. How mega-journals (i.e. journals that handle a lot of different disciplines) may affect this result? Is that the case in your data?
-
It would be interesting to compare at least with Colavizza et al. (2022) to look at the coverage of digital humanities in OpenCitations Meta, mixing the data from that article with those you obtained.
-
How would you solve the lacking of information in ERIH-PLUS? Can you suggest a possible strategy to address it?
DMP
The document (PDF) of the DMP says "Version 0" but in the metadata, it is "Version 3". Please correct.
Why did you use the Horizon 2020 template instead of the Horizon Europe one?
Title: SSH_OA_Publications_in_OCMeta
-
1.1.3 What are the formats of the described generated/collected data? Did you use any plain text (TXT), HTML, XML, PDF/A in your data? Did you also use JSON?
-
2.1.1 Are you re-using the described data and how? I did not understand what you said here. Can you please rephrase?
-
3.1.1.3 Will your metadata use standardised vocabularies? You answered yes, but before you replied, you did not use metadata. What is the true thing?
-
3.1.1.5 Will you make the metadata available free-of-charge? Where? I did not see any RDF serialisation format anywhere, honestly.
-
3.1.1.9 Will you provide clear version numbers for your data? You said so, but then you are not doing it (e.g. you used Version 1 etc. instead of Version 1.0.0).
-
3.1.1.15 Will you use standardised formats for the described data? Did you also use JSON?
-
3.1.2.3 How will the data be made available? What is the project website? Please specify it.
-
3.1.4.4 What internationally recognised licence(s) will you use for the
described data? But it seems that the data have been published with CC-BY...
-
3.1.4.7 Describe the data quality assurance processes. Which formats? Which data models and standards?
Title: Data management software
Some of the points arisen below apply also here. In addition:
1.1.3 What are the formats of the described generated/collected data? "Software" is not a format.
3.1.1.22 Will you provide metadata describing the quality of the data? It is not clear how you have used it. Do you have a report about it somewhere? The link to the report should be added to the DMP.
3.1.2.3 How will the data be made available? What is "Repository of Archive"?
3.1.4.7 Describe the data quality assurance processes. You should be a bit more precise here. It is not clear how you will do it.
3.1.4.8 Will you provide any support for data reuse? Where is the link to the notebook?
# Protocol
There are a lot of references to classes and methods in Python that, while fine to use, do not help the understanding of the protocol. While excerpts of code can be used in the definition of the protocol, it is important to explain them and, in particular, all the passages. Please, update the protocol so that a reader reading it can understand how to run the experiment without looking necessarily to external material. Explain the procedures to follow, and limit the links to the code where necessary.
Software
The section "Naming convention of Datasets" in the README of the software seems lacking of some information. Can you please update it?
Data
The description of the data on Zenodo should also detail the format used for data (column names, their meaning, etc.).
Article
There are no authors specified! Please add all of them, with appropriate affiliations.
Abstract: please use the same structured abstract already developed in the paper. The Abstract section is usually not numbered.
Introduction: it should include at least the research questions, and it should also contain, at the very end, how the rest of the paper is actually structured (e.g. "The rest of the paper is structured as follows. In Section 2, ...").
Materials and Methods: Remove my name as an author of your software - you did it, not me. Cite properly the data you have reused (Meta, COCI, etc.) by creating appropriate bibliographic references for them (pointing to the right version, e.g., on Figshare). Saying "we downloaded Meta" and then adding the link to opencitations.net/meta is not correct.
Results: The figures should have a proper caption describing the graph. This should be true for all figures in the whole text.
Discussion: Please identify a subsection here where to list all the limitations properly.
Conclusions: they are missing and should also contain some sketches of future works.
References: if you use the references (I agree), please cite them properly in the text without using footnotes, but using APA style.