Git Product home page Git Product logo

gmp-featurizer's People

Contributors

bdice avatar josephmontoya-tri avatar raylei-tri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

bdice

gmp-featurizer's Issues

[Paper] Review comments

Hi @RayLei-TRI! I am reviewing the JOSS paper submitted here: openjournals/joss-reviews#5476

I am opening this issue to track my comments on the paper, as I complete the reviewer checklist. I am writing this as a checklist if you'd like to check things off as they are addressed.

License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

  • Your repository is Apache 2.0 licensed, but it says it is based on code from https://github.com/ulissigroup/amptorch which is GPL-3.0 licensed. Have you checked whether your code's licensing is correct? I am not a lawyer (nor am I able to audit your software) but in my understanding, GPL software cannot be relicensed under a "permissive" license like Apache 2.0 without the permission of all contributors.

Contribution and authorship: Has the submitting author (@RayLei-TRI) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

  • I don't see contributions from Joseph Montoya listed in the software or cited theory papers. Has Joseph agreed to be a coauthor? Maybe a small acknowledgement of the coauthors' contributions could be helpful. Sometimes folks use "CRediT" for a short summary of contributions. See also: https://joss.readthedocs.io/en/latest/submitting.html#authorship I don't have strong feelings, as long as this paper complies with JOSS expectations.

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

  • I see a sentence in the Overview section: "Please refer to section ?? for more details." The reference seems to be broken here.
  • "Although this package does not explicitly depend on ASE [...]". I would try to write this more like "Two of the most widely used Python libraries in chemistry and materials science are ASE (cite) and pymatgen (cite). The gmp-featurizer library provides APIs that can read atomic systems as input from these libraries, though they are not required as dependencies."
    • Bonus: You might also mention what that means for users who need to read files with their system geometry. I have used a similar design in another analysis package, where file I/O was not explicitly part of the package but accepting inputs from other libraries (like ASE or pymatgen) allowed the package to leverage other libraries' I/O functionality to get data into a format accepted by the package we wrote. It's a nice benefit of the interoperability!
  • Capitalize the language name "Python" unless referring to the executable command python.

References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

  • Please check the citations are rendering properly. There are at least two I saw, marked "Ray?" and "collins_accelerated_2017", that appear to be incorrect.

Installation: Does installation proceed as outlined in the documentation?

  • The command python install -e . in the README is incorrect. I think it's meant to say pip install -e ..

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

  • I think the README and/or Jupyter Notebooks need to go farther to be useful. I was able to run the notebooks but didn't know what to do with the results. Where do the inputs come from? Is the pseudopotential file a standardized input format? Can you give more information about how that can be generated? Once features are computed, can a simple machine learning model be trained, or can the features be visualized? How are users supposed to make use of the HDF5 files that are produced in the "cache" directory? Can examples of these use cases be presented?

Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

  • Deferring to issue #1 for this, since it has been requested already.

[Paper] State of the field

Quoting the summary section of the paper: " Starting from the GMP feature computation module from AmpTorch [@amptorch], the capability of GMP-Featurizer has since been greatly improved, including its accuracy and efficiency, as well as the ability to parallelize on different cores, even machines. Moreover, this python package only has very few dependencies that are all standard python libraries, plus cffi for C++ code interfacing and Ray [@ray] for parallelization, making it lightweight and robust. "

Would you mind explaining a bit more regarding the accuracy and efficiency of GMP-featurizer compared to the GMP feature computation module from AmpTorch? For example, computing speedup by *.*x times.

This is to meet the following review criteria:

  • State of the field: Do the authors describe how this software compares to other commonly-used packages?

Documentation for .gpsp files

I would add some documentation for creating new .gpsp files. It seems to be sensitive to the format in a non obvious way.

[REVIEW comments] for the paper and code submitted to JOSS

Hello, @RayLei-TRI and @JosephMontoya-TRI sorry for my late review report of the manuscript submitted to Journal of Open Source Software openjournals/joss-reviews#5476

Here are my minor comments which you may consider during revision

  • I personally think the requirements.txt is not complete. To test the code, I built a new virtual environment and installed the code following the instructions. When testing the examples, the code reminded me that I needed to install ase, numpy, and so on. I realized some of these libraries were included in the requirements.txt of earlier versions, could authors explain why did they remove it? To make the code ready completely, I would suggest including all the necessary and major libraries in the requirements.txt

  • In the manuscript, when referring to the "cffi" library, it may be more appropriate to use "Cffi" or "CFFI" to distinguish it as a specific terminology. Upon reading the summary at the beginning, I mistakenly assumed it was a typographical error.

  • The reference of ray in the manuscript was not formatted properly (e.g. "Ray?" appears in the current summary).

  • I noticed that CIF files taken from ICSD couldn't been read by ase_read. However, if I convert the CIF from ICSD to a CIF from VESTA software. It works. I know this issue doesn't from GMP-featurizer itself, but you may consider raising a warning if read of CIF fails. Note that I only test one CIF from ICSD, and you may cross-test it for confirmation.

  • In the section Overview of manuscript, "Please refer to section ??" should be fixed.

  • Authors are invited to create a webpage to document the tutorials and manuals.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.