ulissigroup / amptorch Goto Github PK

View Code? Open in Web Editor NEW

59.0 59.0 35.0 19.38 MB

AMPtorch: Atomistic Machine Learning Package (AMP) - PyTorch

License: GNU General Public License v3.0

Python 11.55% C++ 83.53% C 3.87% G-code 0.60% TeX 0.42% GAP 0.02%

amptorch's People

Contributors

Stargazers

Watchers

amptorch's Issues

Force Training

I'm trying to train an alloy system model. I'm more interested in getting the forces accurate. But while training using amptorch, I am facing the following issue - As energy MAE decreases, Force MAE starts increasing, even though the total loss is decreasing. I'm attaching the output convergence file.

out.log

I tried to increase the force_coefficient, change the learning rate, also increases the number of BP descriptors - none of the solutions are helping. Any advise would be great.

Thanks

fingerprint parallelization

Current FP parallelization works properly in any given script. However if running multiple scripts in parallel that require the construction of FPs - an error arises as, currently, simple_nn stores pickles in generic naming ('data1', etc). Need to modify simple_nn code in order to accommodate for this.

[JOSS] Issue with examples - inhomogeneous shape

I tried to set up and run the examples as part of my JOSS review. I get the same error with most of them, almost identical.

I'm running on cpu with the conda env provided on Ubuntu 22.04.
Python 3.9 + skorch 0.10.0

1_GMP

1 (inhomogeneous shape error)
2 (inhomogeneous shape error)
3 (inhomogeneous shape error)

2_SF

1 (inhomogeneous shape error)
2 (inhomogeneous shape error)

3_lmdb

1 🆗
2 (inhomogeneous shape error)
3 (inhomogeneous shape error)
4 (inhomogeneous shape error)

4_misc

custom_descriptor_example (inhomogeneous shape error)
get_fp_example 🆗

[JOSS] Community Guidelines

Feedback from Joss Review

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

The community guidelines on this repository are minimal and could be improved.

I recommend you create a CONTRIBUTING file (Github Help Page). There are many template CONTRIBUTING files out there that you could lean upon, and it's built-in to GitHub so may get picked up by 3rd party tools, potential contributors may expect to see it there also.

Contribution to the software
Currently the README.md suggests only to use the technical method of "fork and pull"ing. There is no direction on what sort of contributions are welcome, how PR descriptions should be written, if tests/examples/documentation should be included with their contributions, or any other steps you would expect. These things may seem obvious, but if unstated, may not be observed.

Report issues or problems with the software / Seek support
Currently the README.md suggests only to file issues in the Issues tab of the repository.
There is no information on how a bug should be reported, if feature requests are accepted, etc.
You may find setting up some issue templates useful to ensure that some basic information is provided (e.g. you probably want users to supply a crash log if they're filing a bug)

Passing arguments as config data class

I think it would be very helpful if the config passed to AtomsTrainer used in many of the examples could be wrapped into a class (perhaps a dataclass). The structure of the class is basically already provided in https://amptorch.readthedocs.io/en/latest/usage.html but it is currently very easy to make a mistake or get lost amongst default values. This should also aid development going forwards.

Torch version issues

This was a known issue last year with PyTorch version issues, but it was never written down permanently (only in Slack messages that have since been automatically deleted)

`(amptorch) [jparas7@login-ice-2 amptorch]$ python -m amptorch.tests.training_test
Traceback (most recent call last):
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/hice1/jparas7/amptorch/amptorch/__init__.py", line 2, in <module>
    from .trainer import AtomsTrainer
  File "/home/hice1/jparas7/amptorch/amptorch/trainer.py", line 17, in <module>
    from amptorch.dataset import AtomsDataset, DataCollater, construct_descriptor
  File "/home/hice1/jparas7/amptorch/amptorch/dataset.py", line 2, in <module>
    from torch_geometric.data import Batch
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/__init__.py", line 1, in <module>
    import torch_geometric.utils
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/utils/__init__.py", line 3, in <module>
    from .scatter import scatter
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/utils/scatter.py", line 7, in <module>
    import torch_geometric.typing
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/typing.py", line 37, in <module>
    import torch_sparse  # noqa
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/__init__.py", line 40, in <module>
    from .tensor import SparseTensor  # noqa
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/tensor.py", line 13, in <module>
    class SparseTensor(object):
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch/jit/_script.py", line 1294, in script
    _compile_and_register_class(obj, _rcb, qualified_name)
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch/jit/_recursive.py", line 44, in _compile_and_register_class
    script_class = torch._C._jit_script_class_compile(qualified_name, ast, defaults, rcb)
RuntimeError:
object has no attribute sparse_csc_tensor:
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/tensor.py", line 585
            value = torch.ones(self.nnz(), dtype=dtype, device=self.device())

        return torch.sparse_csc_tensor(colptr, row, value, self.sizes())
               ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE`

Sparse tensor dtype

Seems the dtype of the sparse tensors for fp_primes is always float32
it doesn't change by setting cmd:dtype to other types
there doesn't seem to be a very quick fix, so I want to create an issue here as a reference

[JOSS review] Code feedback

Hi @ajmedford and other authors! Apologies for the slightly slow uptake on my review (openjournals/joss-reviews#5035) -- I have now been able to install the package and try the examples and have enough to give a first round of feedback.

I think it would be very helpful if the config passed to AtomsTrainer used in many of the examples could be wrapped into a class (perhaps a dataclass). The structure of the class is basically already provided in https://amptorch.readthedocs.io/en/latest/usage.html but it is currently very easy to make a mistake or get lost amongst default values. This should also aid development going forwards.
The usage documentation should explain the relevant classes involved and requirements for e2e training and evaluating a potential. Currently much of this data is stuffed into comments in the large config code snippet.
Overall the examples could be better motivated (why you might want to do such a thing, step by step), both in the online documentation and in the code itself. Currently they showcase the novel features of Amptorch (GMP, lmdb etc.) but I felt a basic tutorial for just training the "default" network was missing. This is kinda provided on the "usage" page but it takes some work to piece it together in its current state. It would be nice if the examples generated some figures to actually see what is happening, executing the code snippets which provide little feedback.
If this package was used in the several linked preprints about GMP, singleNN etc. then perhaps code examples to generate some of the (simpler?) results from these papers would be appropriate, if at all possible.
The code itself has no docstrings describing the functionality of parameters of any of the classes or functions (that I could find). API documentation is somewhat of a hard requirement for JOSS
The GMP example should be made simpler by specifying the path to the already-provided pseudodensity files:

amptorch/examples/GMP/GMP_example.py

Lines 29 to 31 in 78cb2f1

path_to_psp = "<path>/pseudodensity_psp/"

# path to the GMP pseudopotential (.g)files

# please copy the "pseudodensity_psp" folder to somehere and edit the path to it here

as e.g., Path(__file__).parent / "pseudodensity_psp".
With a default installation I was not able to get all of the tests to pass (consistency_test.py fails, as well as several in test_script.py, in both CPU and GPU mode). Could you perhaps provide some documentation for executing the tests, in case it is my set up that is wrong? (test failures hidden below)
The last code release was in July 2021 with the name "initial release". The code should be released in its complete state (which it seems to be!) before publication in JOSS. Any of the suggestions above could then be implemented in a future released version.

Test failures

 =============================================================== test session starts ================================================================platform linux -- Python 3.9.16, pytest-7.2.1, pluggy-1.0.0 -- /home/mevans/.local/conda/envs/amptorch/bin/python                                   cachedir: .pytest_cache
rootdir: /home/mevans/src/amptorch                                                                                                                  collected 23 items

amptorch/tests/consistency_test.py::test_energy_force_consistency FAILED                                                                     [  4%] amptorch/tests/cp_uncertainty_calibration_test.py::test_cp_uncertainty_calibration PASSED                                                    [  8%] amptorch/tests/cutoff_funcs_test.py::test_cutoff_funcs PASSED                                                                                [ 13%]
amptorch/tests/gaussian_descriptor_set_test.py::test_gaussian_descriptor_set PASSED                                                          [ 17%]
amptorch/tests/pretrained_test.py::test_pretrained PASSED                                                                                    [ 21%] amptorch/tests/pretrained_test.py::test_pretrained_no_config PASSED                                                                          [ 26%] amptorch/tests/test_script.py::test_cutoff_funcs PASSED                                                                                      [ 30%]
amptorch/tests/test_script.py::test_gaussian_descriptor_set PASSED                                                                           [ 34%] amptorch/tests/test_script.py::test_pretrained PASSED                                                                                        [ 39%]
amptorch/tests/test_script.py::test_pretrained_no_config PASSED                                                                              [ 43%]
amptorch/tests/test_script.py::test_lmdb_pretrained PASSED                                                                                   [ 47%]
amptorch/tests/test_script.py::test_lmdb_pretrained_no_config PASSED                                                                         [ 52%]
amptorch/tests/test_script.py::test_training PASSED                                                                                          [ 56%]
amptorch/tests/test_script.py::test_training_gmp PASSED                                                                                      [ 60%]
amptorch/tests/test_script.py::test_cp_uncertainty_calibration FAILED                                                                        [ 65%]
amptorch/tests/test_script.py::TestMethods::test_cosine_and_polynomial_cutoff_funcs PASSED                                                   [ 69%] amptorch/tests/test_script.py::TestMethods::test_gds PASSED                                                                                  [ 73%] amptorch/tests/test_script.py::TestMethods::test_load_retrain PASSED                                                                         [ 78%] amptorch/tests/test_script.py::TestMethods::test_load_retrain_lmdb PASSED                                                                    [ 82%] amptorch/tests/test_script.py::TestMethods::test_training_scenarios PASSED                                                                   [ 86%] amptorch/tests/test_script.py::TestMethods::test_training_scenarios_gmp PASSED                                                               [ 91%] amptorch/tests/test_script.py::TestMethods::test_uncertainty_cp FAILED                                                                       [ 95%] amptorch/tests/training_test.py::test_training PASSED                                                                                        [100%]

===================================================================== FAILURES =====================================================================

Error in prediction when setting debug to true

Dear Developers and Users,

When I set the debug to true in the config, the training worked well but an error occurred during the prediction.

    100              0.0202              0.0297        0.0007            0.0119            0.0148        0.0001     +  0.0163
Training completed in 1.969996690750122s
Traceback (most recent call last):
  File "./train_example.py", line 87, in <module>
    predictions = trainer.predict(images)
  File "/users/amptorch/trainer.py", line 245, in predict
    self.descriptor = construct_descriptor(self.config["dataset"]["descriptor"])
KeyError: 'descriptor'

After checking the source code, I found the following lines gave rise to this issue.

amptorch/amptorch/trainer.py

Line 128 in bd8af57

self.config["dataset"]["descriptor"] = descriptor_setup

I don't understand why self.config["dataset"]["descriptor"] is only set when if not self.debug? Should this be set no matter whether debug is on?

Many thanks,
Jiayan

Parallelization in fingerprint generation does not seam to work

When running the calculation as in examples, fingerprints are generated sequentially on a single processing core. Attempts to preprocess with AtomsData(..., cores=n) do not change the situation. The only option found until now is to use multiprocessing.Pool with structures submitted to AtomsData one by one (lists of length 1). This seems to generate the FP files but after that, errors out with

MaybeEncodingError: Error sending result: '[<amptorch.dataset.AtomsDataset object at 0x7f63509476a0>]'. Reason: 'TypeError("can't pickle _cffi_backend.__CDataOwn objects",)'

I am sorry for the unorganized report, but this is my first time. I will be glad to submit additional information as needed.

Questions about the NNL calculation

I'm confused by the code achieving thhe NNL method, The NNL method assumes that the errors follow a Gaussian distribution and the variance is linearly correlated with d". However, in the code, the NNL is achieved by "nll = -np.sum(stats.norm.logpdf(calib_y, loc=0, scale=s1 + calib_dist * s2))", with the scale being s1 + calib_dist * s2.

If I understand it correctly, this code meas that the the variance is linearly correlated with s1 + calib_dist * s2, since the scale corresponds to the standard deviation, which is the squared root of the variance. Does this conflict with the assumption that the variance is linearly correlated with d?

Thank you and look forward to your answer!

[JOSS REVIEW] Support for Python>3.6

Hi there, I'll be reviewing this repo for JOSS over at openjournals/joss-reviews#5035.

I'll raise a few issues in this repo as I go -- I'll try to keep issues in this repo focused on actionable code changes, and will instead provide paper feedback etc. in the main review thread, so please keep an eye on both.

First up, I am trying to install the package with conda (22.11) following the instructions in the README. For the env_cpu.yml, conda hangs deep in the dependency solver. Switching to mamba seems to work fine (it uses a different dependency solver), and the following environment is created:

env_cpu_mamba.yml:

name: amptorch_mamba
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - alsa-lib=1.2.7.2=h166bdaf_0
  - appdirs=1.4.4=pyh9f0ad1d_0
  - ase=3.21.1=pyhd8ed1ab_0
  - blas=1.0=mkl
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.12.7=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2021.5.30=py36h5fab9bb_0
  - cfgv=3.3.1=pyhd8ed1ab_0
  - click=8.0.1=py36h5fab9bb_0
  - cpuonly=2.0=0
  - cudatoolkit=11.1.1=ha002fc5_11
  - cycler=0.11.0=pyhd8ed1ab_0
  - dataclasses=0.8=pyh787bdff_2
  - dbus=1.13.6=h5008d03_3
  - distlib=0.3.6=pyhd8ed1ab_0
  - editdistance-s=1.0.0=py36h605e78d_1
  - expat=2.5.0=h27087fc_0
  - filelock=3.4.1=pyhd8ed1ab_0
  - flask=2.0.3=pyhd8ed1ab_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.14.1=hc2a2eb6_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.12.1=hca18f0e_1
  - gettext=0.21.1=h27087fc_0
  - glib=2.74.1=h6239696_1
  - glib-tools=2.74.1=h6239696_1
  - gst-plugins-base=1.20.3=h57caac4_2
  - gstreamer=1.20.3=hd4edc92_2
  - h5py=3.1.0=nompi_py36hc1bc4f5_100
  - hdf5=1.10.6=nompi_h6a2412b_1114
  - icu=69.1=h9c3ff4c_0
  - identify=2.3.7=pyhd8ed1ab_0
  - importlib-metadata=4.8.1=py36h5fab9bb_0
  - importlib_metadata=4.8.1=hd8ed1ab_1
  - importlib_resources=5.4.0=pyhd8ed1ab_0
  - intel-openmp=2022.1.0=h9e868ea_3769
  - itsdangerous=2.0.1=pyhd8ed1ab_0
  - jinja2=3.0.3=pyhd8ed1ab_0
  - jpeg=9e=h166bdaf_2
  - keyutils=1.6.1=h166bdaf_0
  - kiwisolver=1.3.1=py36h605e78d_1
  - krb5=1.20.1=hf9c8cef_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.39=hcc3a1bd_1
  - lerc=3.0=h9c3ff4c_0
  - libblas=3.9.0=16_linux64_mkl
  - libcblas=3.9.0=16_linux64_mkl
  - libclang=13.0.1=default_hc23dcda_0
  - libcurl=7.87.0=h6312ad2_0
  - libdeflate=1.10=h7f98852_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.10=h9b69904_4
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=12.2.0=h65d4601_19
  - libgfortran-ng=12.2.0=h69a702a_19
  - libgfortran5=12.2.0=h337968e_19
  - libglib=2.74.1=h606061b_1
  - libgomp=12.2.0=h65d4601_19
  - libiconv=1.17=h166bdaf_0
  - liblapack=3.9.0=16_linux64_mkl
  - libllvm13=13.0.1=hf817b99_2
  - libnghttp2=1.47.0=hdcd2b5c_1
  - libnsl=2.0.0=h7f98852_0
  - libogg=1.3.4=h7f98852_1
  - libopus=1.3.1=h7f98852_1
  - libpng=1.6.39=h753d276_0
  - libpq=14.5=h2baec63_4
  - libsqlite=3.40.0=h753d276_0
  - libssh2=1.10.0=haa6b8db_3
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libtiff=4.3.0=h0fcbabc_4
  - libuuid=2.32.1=h7f98852_1000
  - libuv=1.44.2=h166bdaf_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp-base=1.2.4=h166bdaf_0
  - libxcb=1.13=h7f98852_1004
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h885dcf4_1
  - libzlib=1.2.13=h166bdaf_4
  - markupsafe=2.0.1=py36h8f6f2f9_0
  - matplotlib=3.3.4=py36h5fab9bb_0
  - matplotlib-base=3.3.4=py36hd391965_0
  - mkl=2022.1.0=hc2b9512_224
  - mysql-common=8.0.31=haf5c9bc_0
  - mysql-libs=8.0.31=h28c427c_0
  - ncurses=6.3=h27087fc_1
  - ninja=1.11.0=h924138e_0
  - nodeenv=1.6.0=pyhd8ed1ab_0
  - nspr=4.35=h27087fc_0
  - nss=3.82=he02c5a1_0
  - numpy=1.19.5=py36hfc0c790_2
  - olefile=0.46=pyh9f0ad1d_1
  - openjpeg=2.5.0=h7d73246_0
  - openssl=1.1.1s=h0b41bf4_1
  - pcre2=10.40=hc3806b6_0
  - pillow=8.3.2=py36h676a545_0
  - pip=21.3.1=pyhd8ed1ab_0
  - pre-commit=2.2.0=py36h9f0ad1d_1
  - pthread-stubs=0.4=h36c2ea0_1001
  - pycparser=2.21=pyhd8ed1ab_0
  - pykdtree=1.3.4=py36ha112f06_0
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - pyqt=5.12.3=py36h5fab9bb_7
  - pyqt-impl=5.12.3=py36h7ec31b9_7
  - pyqt5-sip=4.19.18=py36hc4f0c31_7
  - pyqtchart=5.12=py36h7ec31b9_7
  - pyqtwebengine=5.12.1=py36h7ec31b9_7
  - python=3.6.15=hb7a2778_0_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.6=2_cp36m
  - pytorch=1.9.0=py3.6_cuda11.1_cudnn8.0.5_0
  - pytorch-mutex=1.0=cpu
  - pyyaml=5.4.1=py36h8f6f2f9_1
  - qt=5.12.9=h1304e3e_6
  - readline=8.1.2=h0f457ee_0
  - scipy=1.5.3=py36h81d768a_1
  - setuptools=58.0.4=py36h5fab9bb_2
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.40.0=h4ff8645_0
  - tk=8.6.12=h27826a3_0
  - toml=0.10.2=pyhd8ed1ab_0
  - tornado=6.1=py36h8f6f2f9_1
  - tqdm=4.45.0=pyh9f0ad1d_1
  - typing_extensions=4.1.1=pyha770c72_0
  - virtualenv=20.4.7=py36h5fab9bb_0
  - werkzeug=2.0.2=pyhd8ed1ab_0
  - wheel=0.37.1=pyhd8ed1ab_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - zipp=3.6.0=pyhd8ed1ab_0
  - zlib=1.2.13=h166bdaf_4
  - zstd=1.5.2=h6239696_4
  - pip:
      - cffi==1.15.1
      - charset-normalizer==2.0.12
      - decorator==4.4.2
      - docker-pycreds==0.4.0
      - gitdb==4.0.9
      - gitpython==3.1.18
      - googledrivedownloader==0.4
      - idna==3.4
      - isodate==0.6.1
      - joblib==1.1.1
      - lmdb==1.0.0
      - networkx==2.5.1
      - pandas==1.1.5
      - pathtools==0.1.2
      - promise==2.3
      - protobuf==3.19.6
      - psutil==5.9.4
      - pytz==2022.7
      - rdflib==5.0.0
      - requests==2.27.1
      - scikit-learn==0.24.2
      - sentry-sdk==1.12.1
      - setproctitle==1.2.3
      - shortuuid==1.0.11
      - skorch==0.10.0
      - smmap==5.0.0
      - tabulate==0.8.10
      - threadpoolctl==3.1.0
      - torch-cluster==1.5.9
      - torch-geometric==2.0.3
      - torch-scatter==2.0.9
      - torch-sparse==0.6.12
      - torch-spline-conv==1.2.1
      - urllib3==1.26.13
      - wandb==0.13.7
      - yacs==0.1.8

I wonder if perhaps more tightly-pinned environment files could be provided (or mamba suggested) so that this problem can be circumvented?

Hashing of variable symmetry functions

Original AMP hashes fingerprints of structures with no information pertaining to # of symmetry functions - such that changing the number of symmetry functions will not recalculate fingerprints but utilize the saved fingerprint corresponding to a different symmetry function.

Deleting the previous fingerprint/prime solves this but is an unrealistic solution. Need to incorporate symmetry function information in hash as well.

[JOSS] Confirmation of authorship

Hi, I'm starting to go through the JOSS review checklist (openjournals/joss-reviews#5035).

First thing I notice is that I don't see the submitter's username (@ajmedford) in any commits to this repository.
There are 8 contributors, but 12 authors credited on the paper.

Also if you could confirm for me that @mshuaibii and @nicoleyghu are intended to be equal contributors (@mshuaibii appears to have an order of magnitude more contributions than @nicoleyghu - 480 commits vs 16, as well as those commits being a bit more substantial too.

non-code contributions are fine as per the JOSS authorship guidelines:

Purely financial (such as being named on an award) and organizational (such as general supervision of a research group) contributions are not considered sufficient for co-authorship of JOSS submissions, but active project direction and other forms of non-code contributions are. The authors themselves assume responsibility for deciding who should be credited with co-authorship, and co-authors must always agree to be listed. In addition, co-authors agree to be accountable for all aspects of the work, and to notify JOSS if any retraction or correction of mistakes are needed after publication.

but, I feel it would be remiss of me to not confirm that it is as intended.

Duplicate

Sorry, not sure what happened here, GitHub duplicated my other issue when I edited the title...

ASE 3.20

The ASE 3.19 requirement is now out of date and will become a limitation with calculators that change frequently (eg some of the tightbinding codes require 3.20).

Can we move to 3.20 or will that break something?

Memory issues while loading large dataset

I am trying to load a large dataset for training and running out of memory while converting ASE atoms collection to Data objects. Is there a way to to generate the descriptors, create Data objects and Scaling Feature separately and then load them for training?

Mutable default arguments and potential bugs in the future

There are instances where mutable default arguments (empty lists/dicts) are used in function/method definitions, such as in amptorch.trainer.Atoms.init, which has a config default value of an empty dict. You should never use mutable default arguments in function definitions, since every call to that function that uses the default argument will share that same list/dict, which can cause some weird bugs.

The safe alternative is to set the default value to None, check if the argument's value is None, and then assign it accordingly.

For the specific init example, I think a better alternative would just be to get rid of the default value entirely in order to enforce the config as an argument. Not sure where else this issue pops up, but it should be easy to find all instances of '=[]' and '={}' using grep. This is potentially an issue in other medford-group repos as well.

[JOSS] Paper feedback

Feedback from JOSS review

Overall, the paper is well written, I appreciate being able to understand it without too much prior knowledge. I have a few comments though to be addressed:

line 27: "~10⁶+" - The claim is unclear. This should be either "~10⁶" OR "10⁶+". The training routine can scale to approximately that many points, or it can scale to more than that amount?
line 50: You should probably cite lmdb
line 54: the acronym UQ is not expanded.
You make claims that AMPTorch can support higher amounts of datapoints than the base AMP and other existing codes (lines 37-38). It would be helpful to name some other examples than AMP and state their limits for comparison, to highlight the impact of this work versus the SOTA.

examples do not work

I installed the latest commit of amptorch with all the dependencies in env_gpu.yml. I can import amptorch but I cannot run the example scripts. I wonder if the updated dependency versions are causing these examples to break.

custom_descriptor_example.py:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-3ad6331c597d> in <module>()
     38     descriptor_setup=("gaussian", Gs, elements),
     39     forcetraining=False,
---> 40     save_fps=True,
     41 )
     42 

/amptorch/lib/python3.6/site-packages/amptorch/dataset.py in __init__(self, images, descriptor_setup, forcetraining, save_fps, scaling, cores, process)
     26         self.forcetraining = forcetraining
     27         self.scaling = scaling
---> 28         self.descriptor = construct_descriptor(descriptor_setup)
     29 
     30         self.a2d = AtomsToData(

/amptorch/lib/python3.6/site-packages/amptorch/dataset.py in construct_descriptor(descriptor_setup)
     91 
     92 def construct_descriptor(descriptor_setup):
---> 93     fp_scheme, fp_params, cutoff_params, elements = descriptor_setup
     94     if fp_scheme == "gaussian":
     95         descriptor = Gaussian(Gs=fp_params, elements=elements, **cutoff_params)

ValueError: not enough values to unpack (expected 4, got 3)

train_example.py:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-7ab00cf4fa51> in <module>()
     73 torch.set_num_threads(1)
     74 trainer = AtomsTrainer(config)
---> 75 trainer.train()
     76 
     77 predictions = trainer.predict(images)

/amptorch/lib/python3.6/site-packages/amptorch/trainer.py in train(self, raw_data)
    233 
    234         stime = time.time()
--> 235         self.net.fit(self.train_dataset, None)
    236         elapsed_time = time.time() - stime
    237         print(f"Training completed in {elapsed_time}s")

/amptorch/lib/python3.6/site-packages/skorch/regressor.py in fit(self, X, y, **fit_params)
     89         # this is actually a pylint bug:
     90         # https://github.com/PyCQA/pylint/issues/1085
---> 91         return super(NeuralNetRegressor, self).fit(X, y, **fit_params)

/amptorch/lib/python3.6/site-packages/skorch/net.py in fit(self, X, y, **fit_params)
    901             self.initialize()
    902 
--> 903         self.partial_fit(X, y, **fit_params)
    904         return self
    905 

/amptorch/lib/python3.6/site-packages/skorch/net.py in partial_fit(self, X, y, classes, **fit_params)
    860         self.notify('on_train_begin', X=X, y=y)
    861         try:
--> 862             self.fit_loop(X, y, **fit_params)
    863         except KeyboardInterrupt:
    864             pass

/amptorch/lib/python3.6/site-packages/skorch/net.py in fit_loop(self, X, y, epochs, **fit_params)
    774 
    775             self.run_single_epoch(dataset_train, training=True, prefix="train",
--> 776                                   step_fn=self.train_step, **fit_params)
    777 
    778             if dataset_valid is not None:

/amptorch/lib/python3.6/site-packages/skorch/net.py in run_single_epoch(self, dataset, training, prefix, step_fn, **fit_params)
    806 
    807         batch_count = 0
--> 808         for data in self.get_iterator(dataset, training=training):
    809             Xi, yi = unpack_data(data)
    810             yi_res = yi if not is_placeholder_y else None

/amptorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

/amptorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    403         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    404         if self._pin_memory:
--> 405             data = _utils.pin_memory.pin_memory(data)
    406         return data
    407 

/amptorch/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py in pin_memory(data)
     53         return type(data)(*(pin_memory(sample) for sample in data))
     54     elif isinstance(data, container_abcs.Sequence):
---> 55         return [pin_memory(sample) for sample in data]
     56     elif hasattr(data, "pin_memory"):
     57         return data.pin_memory()

/amptorch/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py in <listcomp>(.0)
     53         return type(data)(*(pin_memory(sample) for sample in data))
     54     elif isinstance(data, container_abcs.Sequence):
---> 55         return [pin_memory(sample) for sample in data]
     56     elif hasattr(data, "pin_memory"):
     57         return data.pin_memory()

/amptorch/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py in pin_memory(data)
     53         return type(data)(*(pin_memory(sample) for sample in data))
     54     elif isinstance(data, container_abcs.Sequence):
---> 55         return [pin_memory(sample) for sample in data]
     56     elif hasattr(data, "pin_memory"):
     57         return data.pin_memory()

/amptorch/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py in <listcomp>(.0)
     53         return type(data)(*(pin_memory(sample) for sample in data))
     54     elif isinstance(data, container_abcs.Sequence):
---> 55         return [pin_memory(sample) for sample in data]
     56     elif hasattr(data, "pin_memory"):
     57         return data.pin_memory()

/amptorch/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py in pin_memory(data)
     55         return [pin_memory(sample) for sample in data]
     56     elif hasattr(data, "pin_memory"):
---> 57         return data.pin_memory()
     58     else:
     59         return data

/amptorch/lib/python3.6/site-packages/torch_geometric/data/data.py in pin_memory(self, *keys)
    363         If :obj:`*keys` is not given, the conversion is applied to all present
    364         attributes."""
--> 365         return self.apply(lambda x: x.pin_memory(), *keys)
    366 
    367     def debug(self):

/amptorch/lib/python3.6/site-packages/torch_geometric/data/data.py in apply(self, func, *keys)
    324         """
    325         for key, item in self(*keys):
--> 326             self[key] = self.__apply__(item, func)
    327         return self
    328 

/amptorch/lib/python3.6/site-packages/torch_geometric/data/data.py in __apply__(self, item, func)
    303     def __apply__(self, item, func):
    304         if torch.is_tensor(item):
--> 305             return func(item)
    306         elif isinstance(item, SparseTensor):
    307             # Not all apply methods are supported for `SparseTensor`, e.g.,

/amptorch/lib/python3.6/site-packages/torch_geometric/data/data.py in <lambda>(x)
    363         If :obj:`*keys` is not given, the conversion is applied to all present
    364         attributes."""
--> 365         return self.apply(lambda x: x.pin_memory(), *keys)
    366 
    367     def debug(self):

RuntimeError: cannot pin 'torch.sparse.FloatTensor' only dense CPU tensors can be pinned

The torch-related versions are listed below:

amptorch                      0.1
torch                         1.6.0
torch-cluster                 1.5.9
torch-geometric               1.7.0
torch-scatter                 2.0.6
torch-sparse                  0.6.9
torch-spline-conv             1.2.1

How do you correctly restart training?

I am trying to restart training using pre-trained weights from the checkpoint. I load then using the load_pretrained function, but the training seems to start from the beginning. What am I doing wrong ?

Please create an official Release of the package.

An actual release is important for installation at most HPC sites.

torch.cat() bottleneck

Alternative solution to torch.cat call in factorize_data is needed. Performance suffering for large systems as a result.

Scaling - turn off scaling for target values

Is there a way to train without scaling the target values ? (Does it require source change?)

Fingerprinting

The current fingerprinting scheme assumes pbc = True, there are crashes when using amptorch with cases where the periodic boundary conditions is not necessarily true.

torch.cat() bottleneck

Alternative solution to torch.cat call in factorize_data is needed. Performance suffering for large systems as a result.

Tasks

Energy/Force Training
AMP-like Logging
Parallel customization
Data preprocessing optimization (PyTorch doesn't support sparse data loading)
GPU optimization
Speed up reorganization

Force prediction

Force training MAE is 0.0016 from the neural net output. But the force error of prediction for a trained image is far away from that. Please see the figure attached. The first one is true force and the second one is predicted force. I also copied part of my code here.
trainer = AtomsTrainer()
trainer.load_pretrained(checkpoint_path='./checkpoints/2021-01-28-16-50-03-test')
image_test = read(image_train[5])
predictions = trainer.predict([image_test])
pred_force = np.array(predictions["forces"])
true_force = image_test.get_forces()
print('true forces: ', true_force)
print('pred forces: ', pred_force)

Update Required for PyTorch and CUDA Versions to Support NVIDIA H100 GPUs and Resolve Dependency Issues

Are there plans to update the PyTorch and CUDA versions for this installation to the latest releases, such as PyTorch 2.3.1 and CUDA 11.8? My research would greatly benefit from using the NVIDIA H100 GPUs provided by my institution, which are not supported by the current repository versions.

Additionally, compiling dependencies like torch-scatter on Windows requires outdated compilers, such as VS 2015 SDK, leading to issues on newer systems. Using the latest versions of these dependencies resolves compatibility and compilation issues but causes errors in amptorch due to moved/renamed functions and incorrect data types.

I have attempted to fix some of these errors, but doing so is challenging and requires deep insight into the amptorch code.

Any feedback would be much appreciated!

[Edit]: Since the recent release of Skorch version 1.0.0, I highly recommend upgrading to this stable version instead of using the beta 0.10.0

Incompatible with the old version of simple-nn

Thank you for your great code. However, do you still maintain it?

I am trying to install Simple-NN, however

INFO: pip is looking at multiple versions of simple-nn to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement tensorflow<2.0,>=1.6 (from simple-nn) (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.11.1, 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.14.0rc0, 2.14.0rc1)
ERROR: No matching distribution found for tensorflow<2.0,>=1.6

I guess the version of tensorflow is no longer support tensorflow<2.0,>=1.6.

Could you please take a look at it?

Thank you

Unable to use training dataset with images of varying numbers of atoms

I am attempting to train a model using images collected from an MD of a bare slab, and an MD of a supported cluster on the same slab. When the fingerprints are scaled by the FeatureScaler, a RuntimeError is thrown, saying that the size of the fingerprint tensors must be the same, but it indicates that the number of fingerprints for atoms in the bare-slab images are fewer than those for the supported cluster images (see error message below). ACSF descriptors for elements not present in the image are not calculated, leading to this mismatch in fingerprint sizes for images.

My current workaround for this is adding swapping the elements of some of the slab atoms in the fixed bottom layer with those that would be in the supported cluster. This is a messy fix, but it allows training a model with images of varying compositions. I think adding padding to the fingerprints to fill in the uncalculated descriptors would be the best solution.

fingerprint_mismatch_error.log
fingerprint_mismatch_script.log

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Update dependency circleci/python to v3.10

Detected dependencies

circleci

.circleci/config.yml

circleci/python 3.7

pip_setup

setup.py

cffi >=1.0.0

Check this box to trigger a request for Renovate to run again on this repository

torch.cat() bottleneck

Alternative solution to torch.cat call in factorize_data is needed. Performance suffering for large systems as a result.

	path_to_psp = "<path>/pseudodensity_psp/"
	# path to the GMP pseudopotential (.g)files
	# please copy the "pseudodensity_psp" folder to somehere and edit the path to it here

ulissigroup / amptorch Goto Github PK

amptorch's People

Contributors

Stargazers

Watchers

Forkers

amptorch's Issues

Ignored or Blocked

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org