deepmodeling / dpgen2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dptech-corp/dpgen2

29.0 29.0 22.0 1.44 MB

2nd generation of the Deep Potential GENerator

Home Page: https://docs.deepmodeling.com/projects/dpgen2/

License: GNU Lesser General Public License v3.0

Python 99.99% Dockerfile 0.01%

concurrent-learning python

dpgen2's People

Contributors

Stargazers

Watchers

Forkers

wanghan-iapcm angusezhang yanze039 njzjz huangjiameng zhanpengou zhangzh-pku angel-jia chengqian-zhang zjgemi iprozd shdchen wangzyphysics zezhong-zhang cloudac7 yangtengleo

dpgen2's Issues

Dump LAMMPS HDF5 trajectory

As discussed in the WeChat, we may consider dump LAMMPS trajectory to HDF5 format.

However, I notice that there are still some restrictions.

Reference: https://docs.lammps.org/dump_h5md.html

Inpurt script switches to json5 format

The json5 format is a extension of the json format, and is more human readable and easier to write. The main advantages:

support comments
trailing commas allowed.
no quotation needed for keys.

More details are found on the official web site of json5

The json5 is parsed by python package pyjson5

Implement Gaussian

This issue asks for the support of first-principles (FP) software Gaussian.
The implementation is supposed to take the advantage of the new interface for adding new FP methods by PR #98

dpgen2 showkey seems not work at debug mode

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='127.0.0.1', port=2746): Max retries exceeded with url: /api/v1/workflows/argo/dpgen-f05hl/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b372af03cd0>: Failed to establish a new connection: [Errno 111] Connection refused'))

automatic docs

including:

automatic Python API docs
automatic dargs docs

Bug: exceeded workflow templates limited

The scheduler is a class with large amount of data. Passing scheduler as dflow.Parameter causes an error like

workflow templates are limited to 128KB, this workflow is 172157 bytes

Fast fixing: encode the class in a file and pass the scheduler as a dflow.Artifact.

User interface for monitoring the dpgen2 workflow

Via the interface the user should be able to check

the stage of the exploration
the iteration of the exploration
the accurate, candidate, failed ratios of the exploration
the number of data generated in each iteration, and the accumulated amount of data.

System.from does not suppot the mixed_type data

The model of lmp does not support "fmt" : "deepmd/npy/mixed",

"convergence": {
         "type":                    "adaptive-lower",
        "conv_tolerance":            0.00,
        "numb_candi_f":             10,
        "_rate_candi_f":              0,
        "level_f_hi":                0.5,
        "n_checked_steps":           8,
        "_command":      "all"
     },
     "max_numb_iter" :       1,
     "fatal_at_max" :        false,
     "output_nopbc":         true,
     "configuration_prefix": null,
     "configurations":       [
         {
             "type" : "file",
             "files" : ["md.data/35"],
             "fmt" : "deepmd/npy/mixed",
             "remove_pbc" : true
         }

User interface for resubmitting workflow

The interface allows users to submit workflow that continues from existing workflows.

Implement the op RunVasp

Implement the op RunVasp. This OP run a VASP DFT task prepared by PrepVasp, and outputs the labeled configuration (coordinate, simulation cell, dft energy, force and virial) in the deepmd/npy data format provided by dpdata.

We have to implement the execute method of the class. What the op does is explained in the docstr of the class and the interface of the execute method is provided in the docstr of the method.

One can also take the implementation of RunDPTrain as an example for the RunVasp

Implement adaptive trust level stage scheduler

Implement the adaptive trust level scheduler. In each iteration this scheduler:

sort the model deviation of all explored configurations.
select a certain number of configurations of the highest model deviations as candidates.
set the new trust level as the highest model deviation of the rest configurations
the exploration stage converges when the trust level does not decrease.

Support different backends for DeePMD-kit

xref: deepmodeling/dpgen#1462

skip the training in the first iteration

the training in the very first iteration may be skipped if the init-models are trained from the initial dataset.

User interface for downloading resultant files.

With the interface, a user should be able to download the resultant files of the dpgen iterations.

the result files of training, including the models, the learning curves, the logs.
the result files of lmp exploration. including the input and output files of lmp exploration tasks.
the result files of labeling. including the input and output files of the vasp tasks.

how to set group_size in run_train_config?

As described in the doc, the step_configs/run_train_config/template_slice_config can be set, but it occur like that:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/dp/bin/dpgen2", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/main.py", line 329, in main
    submit_concurrent_learning(
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 621, in submit_concurrent_learning
    dpgen_step, finetune_step = workflow_concurrent_learning(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 399, in workflow_concurrent_learning
    concurrent_learning_op = make_concurrent_learning_op(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 142, in make_concurrent_learning_op
    prep_run_train_op = PrepRunDPTrain(
                        ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/superop/prep_run_dp_train.py", line 187, in __init__
    self = _prep_run_dp_train(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/superop/prep_run_dp_train.py", line 240, in _prep_run_dp_train
    prep_train = Step(
                 ^^^^^
TypeError: Step.__init__() got an unexpected keyword argument 'template_slice_config'

implement more OPs and workflows

DPGEN2 refactor DP-GEN based on dflow. The technical details can be found here. Currently, we have only implemented DP+LAMMPS+VASP workflow.

The following features, which are part of DP-GEN, have not been implemented into DPGEN2 yet:

DPGEN2 also accepts features that were not implemented in DP-GEN.

Notes:

Unit tests and examples should be added.
PEP-8 should be obeyed.
DP-GEN codes can be reused under the LGPL-3.0 license.
DPGEN2 offers some useful utils, including dpgen2.utils.chdir.set_directory and dpgen2.utils.run_command.run_command.

dpgen2 can not support CP2K

I attempted to utilize the CP2K package for implementing DPGEN2, but encountered the following error:

raise ArgumentValueError(
dargs.dargs.ArgumentValueError: [at location `fp`] get invalid choice `cp2k` for flag key `type`.

It would be greatly beneficial if the developer could provide support for CP2K in DPGEN2, as CP2K is a valuable tool.

Does dpgen2 support use in HPC?

As the title.

The CICD of the project

setup the continuous integration and continuous deployment for dpgen2.

unit test on pull request
distribution via pip and/or conda
building docker images

Use dargs to parse the input files.

The user input files should be parsed by dargs, so that the keys and the types of values provided in the input files are correct.

Implement the op PrepVasp

This op prepare the VASP DFT calculation tasks. This op takes the vasp input template and a list of configurations, outputs a list of path that contains all the necessary files to run a VASP DFT task.

We have to implement the execute method of the class. What the op does is explained in the docstr of the class and the interface of the execute method is provided in the docstr of the method.

One can also take the implementation of PrepDPTrain as an example for the PrepVasp

CH4 example needs update

Some of the keys in the input file example/ch4/param_CH4_deepmd-kit-2.0.1.json are outdated. For example run_fp_image is not supported anymore. We need to update the input file

Implement the op CollectData

This op

collect the labeled data stored in each fp task directories,
store the data in one directory
add the directory to the training data.

The source code of the op is here. One may consult the docstr for the interface of the op.

Implement the op RunLmp

Implement the op RunLmp. This OP run a LAMMPS MD task prepared by PrepLmp, and outputs a MD trajectory and the model deviation of each configuration on the trajectory.

We have to implement the execute method of the class. What the op does is explained in the docstr of the class and the interface of the execute method is provided in the docstr of the method.

One can also take the implementation of RunDPTrain as an example for the RunLmp

RFC: Refactor DPGEN2 with a new design

Hi community,

This RFC is about a proposal to refactor DPGEN workflow with a new design based on DFlow

A typical DPGEN2 configuration is like the below:
https://github.com/deepmodeling/dpgen2/blob/master/examples/chno/input.json

IMHO there are some issues in the configuration:

The context (executor, container, etc) configuration is mix with the configuration of algorithm
It is hard to validate such configuration with tool like pydantic, which would be error prone
Data files are not allowed to carry their own configuration, which makes it hard to training different systems at the same time.

A suggested pseudo configuration design is like the below, which borrow some ideas from ai2-kit project.
This configuration is supposed to be more formal and clean to maintain.

# executor configuration
executor:
  bohrium: ...

# dflow configuration for each software
dflow:
  python:
    container: ai2-kit/0.12.10
    python_cmd: python3
  deepmd:
    container: deepmd/2.7.1
    dp_cmd: dp
  lammps:
    container: deepmd/2.7.1
    lammps_cmd: lmp
  cp2k:
    container: cp2k/2023.1
    cp2k_cmd: mpirun cp2k.psmp

# declare file resources as datasets before use them
# so that we can assign extra attributes to them
datasets:
  dpdata-Ni13Pd12:
    url: /path/to/data
    format:  deepmd/npy

  sys-Ni13Pd12:
    url: /path/to/data
    includes: POSCAR*
    format: vasp
    attrs:
    # allow user to defined system-wise configuration
    # so that we can explore multiple types of systems in an iteration
      lammps:
        plumed_config: !load_text plumed.inp # use custom yaml tags to embed data from other file
      cp2k:
        input_template: !load_text cp2k.inp

workflow:
  general:
    type_map: [C, O, H]
    mass_map: [12, 16, 1]
    max_iters: 5

  train:
    deepmd:
      init_dataset: [dpdata-Ni13Pd12]
      input_template: !load_yaml deepmd.json  # use custom yaml tags to embed data from other file

  explore:
    # instead of using `type: lammps` to specific different software
    # specific a dedicated entry for different softwares of the same stage
    # so that we can use pydantic to validate the configuration item
    # and lead to a better code structure:
    # https://github.com/chenggroup/ai2-kit/blob/main/ai2_kit/workflow/cll_mlp.py#L163-L293
    lammps:
      nsteps: 10
      systems: [ sys-Ni13Pd12 ]  # reference dataset via key
      # support different way of variable combination strategies to avoid combination explosion
      # vars defined in `explore_vars` will combines with system_files with Cartesian product
      # vars defined in `broadcast_vars` will just broadcast to system_files
      # this design is useful if there are a lot of file
      explore_vars:
        TEMP: [330, 430, 530]
      broadcast_vars:
        LAMBDA_f: [0.0, 0.25, 0.5. 0.75. 1.0]
      template_vars:
        POST_INIT:  |
          neighbor bin 2.0
      plumed_config: !load_text plumed.inp

   # isolated select stage from explore so that we can implement more complex structure selection algorithm
  select:
    model_devi:
      decent_f: [0.12, 0.18]
    limit: 50

  label:
    cp2k:
      input_template: !load_text cp2k.inp

next:
  # specify configuration for next iteration
  # it will merge with the current configuration as a new configuration file for next round
  config: !load_yml iter-001.yml

The above configuration is easy to validate with pydantic, for example:
https://github.com/chenggroup/ai2-kit/blob/main/ai2_kit/workflow/cll_mlp.py#L32-L111

I believe a better design of configuration will lead to a better software design.
I post my thoughts for the community to review, and it would be appreciated to get some feedbacks.

read the doc for dpgen2

The dpgen2 project needs a readthedoc page for documentation