ulissigroup / finetuna Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 11.0 8.22 MB

Active Learning for Machine Learning Potentials

License: MIT License

Python 100.00%

finetuna's People

Contributors

Stargazers

Watchers

Forkers

mattaadams ahtloor shriya-sharma cheng7258002 12chao icamps flash-jaehyun tsuyama1990 wankiwi jiaozihao18

finetuna's Issues

_message.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN6google8protobuf2io17SafeDoubleToFloatEd

When I was trying to finish the installation of Finetuna, I run the test from [N2H_Ag111_dissociation]. This was the error showing on my terminal.
/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
Traceback (most recent call last):
File "/hpcfs/users/a1732812/Test/11/N2H_Ag111.py", line 2, in
from finetuna.online_learner.online_learner import OnlineLearner
File "/hpcfs/users/a1732812/xxx/finetuna/finetuna/online_learner/online_learner.py", line 5, in
from finetuna.logger import Logger
File "/hpcfs/users/a1732812/xxx/finetuna/finetuna/logger.py", line 10, in
import wandb
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/init.py", line 26, in
from wandb import sdk as wandb_sdk
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/init.py", line 5, in
from .wandb_artifacts import Artifact # noqa: F401
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/wandb_artifacts.py", line 33, in
from wandb.apis import InternalApi, PublicApi
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/apis/init.py", line 42, in
from .internal import Api as InternalApi # noqa
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/apis/internal.py", line 3, in
from wandb.sdk.internal.internal_api import Api as InternalApi
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 45, in
from ..lib import retry
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/lib/retry.py", line 17, in
from .mailbox import ContextCancelledError
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 10, in
from wandb.proto import wandb_internal_pb2 as pb
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/proto/wandb_internal_pb2.py", line 8, in
from wandb.proto.v4.wandb_internal_pb2 import *
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/proto/v4/wandb_internal_pb2.py", line 5, in
from google.protobuf.internal import builder as _builder
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/internal/builder.py", line 42, in
from google.protobuf import reflection as _reflection
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/reflection.py", line 51, in
from google.protobuf import message_factory
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/message_factory.py", line 45, in
from google.protobuf import descriptor_pool
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/descriptor_pool.py", line 63, in
from google.protobuf import descriptor
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 51, in
from google.protobuf.pyext import _message
ImportError: /hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/pyext/_message.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN6google8protobuf2io17SafeDoubleToFloatEd

So, could anyone tell me how can I be able to resolve the problem? Thank you so much!

Need to merge optimal vaspinteractive settings to main

Mark as TODO for branch tt-vaspinteractive-test

Recalculating training data with precalculated singlepoint values

The current learner will recalculate the forces and energies of all training data using the parent calculator.
Need to change the 'DeltaCalc' to use precalculated values instead.

Need tests for offline active learning and NEBs

There is now a test case for a Cu13 nanocluster relaxation with OAL.

We also need test cases for offline active learning and NEBs. The example scripts could be adapted pretty easily. Try to keep it as clean / simple / fast as possible so these tests can be run/fixed frequently.

Ensemble training in parallel needs copying of trainer

An initialized trainer should be copied for each ensemble.

Question VASP_interactive example

Dear Researchers,

I tried to follow the example of VASP_interactive on online learning. However I struggled early, as the line:
from al_mlp.atomistic_methods import replay_trajectory

leads to
ImportError: cannot import name 'replay_trajectory' from 'al_mlp.atomistic_methods' (.../.local/lib/python3.8/site-packages/al_mlp-0.1-py3.8.egg/al_mlp/atomistic_methods.py)

I had tried to install al_mlp using the setup.py routine. However I didn't see where the replay_trajectory function would be added, even in Github.

Any help would be highly appreciated!

Alexander

There is a training process and GPU memory usage, but the GPU is not working.

Hello, here is my code:

ml_potential = FinetunerCalc(
    checkpoint_path="gemnet_t_direct_h512_all.pt",
    mlp_params={
        "tuner": {
            "unfreeze_blocks": [
                "out_blocks.3.seq_forces",
                "out_blocks.3.scale_rbf_F",
                "out_blocks.3.dense_rbf_F",
                "out_blocks.3.out_forces",
                "out_blocks.2.seq_forces",
                "out_blocks.2.scale_rbf_F",
                "out_blocks.2.dense_rbf_F",
                "out_blocks.2.out_forces",
                "out_blocks.1.seq_forces",
                "out_blocks.1.scale_rbf_F",
                "out_blocks.1.dense_rbf_F",
                "out_blocks.1.out_forces",
            ],
            "num_threads": 32
        },
        "optim": {
            "batch_size": 1,
            "num_workers": 0,
            "max_epochs": 400,
            "lr_initial": 0.0003,
            "factor": 0.9,
            "eval_every": 1,
            "patience": 3,
            "checkpoint_every": 100000,
            "scheduler_loss": "train",
            "weight_decay": 0,
            "eps": 1e-8,
            "optimizer_params": {
                "weight_decay": 0,
                "eps": 1e-8,
            },
        },
        "task": {
            "primary_metric": "loss",
        },
        "local_rank": 0
    }, 
)
ml_potential.train(parent_dataset=train_dataset[:2])

my cuda version is 11.3, nvidia-smi can see the training process and GPU memory usage, but the volatile gpu-util is 0, and the power consumption has not increased. Is there a problem with my parameter settings?

vaspinterative not found mpi process

The finetuna job is running the cray cluster . the model I use is " module swap PrgEnv-cray PrgEnv-intel".
The default mpi is cray mpi. the Warning is :
" UserWarning: Cannot find the mpi process or you're using different ompi wrapper. Will not send stop signal to mpi."
Does the warning impact the calculation speed?

Best,
Li Yuke

energy calculated by mlp vs quantum espresso has a huge difference

Hello,

I run the quantum espresso example from 'examples/quantum_espresso/qe_gpu_online_al_example.py' (I have not changed anything from the example except the path to QE and psedopotential). After the calculation was complete when i plotted the energies from the predicted trajectory ('online_learner_trajectory.traj') i see that energies predicted by mlp is around -1 eV while energies predicted by QE is around -140000 eV. Is there something i am missing?. How do i get the correct energy values?

i have attached the predicted energy plot.

VaspInteractive vs Vasp calculator

Hello!

based on some of the previous issues, I think a problem I'm running into is related to the VaspInteractive calculator not being compatible with my custom build of vasp 5.4.4. So, I simply replaced VaspInteractive' with Vasp' from ase (~line 63 of the example wrapper script), and I get the following traceback:

Traceback (most recent call last):
File ".../finetuna_wrapper.py", line 87, in
with vasp_calc as parent_calc:
AttributeError: enter

Is there something else I need to do that's being missed here?

Finetuna crashes after the first DFT calculation

Issue

I tried to run FINETUNA with VASP 6.3.0 to relax H*CO on Pt(111) using the provided ASE example template (no 1). 10 steps are performed with the MLP and then a DFT calculation is triggered. However, after the DFT calculation converges, the software crashes and reports the following error message:

Trying to close the VASP stream but encountered error:
process PID not found (pid=181196)
Will now force closing the VASP process. The OUTCAR and vasprun.xml outputs may be incomplete
Force below threshold: check with parent
OnlineLearner: Parent calculation required
Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 442, in wrapper
ret = self._cache[fun]
AttributeError: _cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1642, in wrapper
return fun(self, *args, **kwargs)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 445, in wrapper
return fun(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1684, in _parse_stat_file
data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 775, in bcat
return cat(fname, fallback=fallback, _open=open_binary)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 763, in cat
with _open(fname) as f:
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 727, in open_binary
return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/181196/stat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 361, in _init
self.create_time()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 714, in create_time
self._create_time = self._proc.create_time()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1642, in wrapper
return fun(self, *args, **kwargs)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1852, in create_time
ctime = float(self._parse_stat_file()['create_time'])
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1649, in wrapper
raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: process no longer exists (pid=181196)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/gpfs/data/cfgoldsm/bkreitz1/VASP/methane-oxidation/neb/h--co-diss/IS/finetuna/example.py", line 106, in
relaxer.run(
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/atomistic_methods.py", line 198, in run
dyn.run(fmax=self.fmax, steps=self.steps)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 294, in run
return Dynamics.run(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 181, in run
for converged in Dynamics.irun(self):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 168, in irun
self.log()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 308, in log
forces = self.atoms.get_forces()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/atoms.py", line 790, in get_forces
forces = self._calc.get_forces(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces
return self.get_property('forces', atoms)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/calculators/calculator.py", line 736, in get_property
self.calculate(atoms, [name], system_changes)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 189, in calculate
energy, forces, fmax = self.get_energy_and_forces(atoms)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 259, in get_energy_and_forces
energy, forces, constrained_forces = self.add_data_and_retrain(
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 491, in add_data_and_retrain
self.parent_calc._pause_calc()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/vasp_interactive/vasp_interactive.py", line 471, in _pause_calc
mpi_process = _find_mpi_process(pid)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/vasp_interactive/vasp_interactive.py", line 65, in _find_mpi_process
process_list = [psutil.Process(pid)]
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 332, in init
self._init(pid)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 373, in _init
raise NoSuchProcess(pid, msg='process PID not found')
psutil.NoSuchProcess: process PID not found (pid=181196)
Trying to close the VASP stream but encountered error:
'psutil'

Software

OpenMPI 4.0.5
Intel 2020.2
Python 3.9.0

Executed on 2 nodes, with 2 tasks per node and 8 cpus per task (not sure if that's relevant)

I adjusted the vasp calculator as follows:

vasp_calc = VaspInteractive(
    ibrion=-1,
    nsw=0,
ispin=1,
    ediff=1e-6,
    ediffg=-0.03,
    encut=450.0,
    laechg=False,
    lcharg=False,
    lwave=False,
    #ncore=4,
    xc="beef-vdw",
    kpts=(3,3,1),
)

A case where finetuna fails to reach a reasonable minimum

Hello again! I've found a couple of cases where finetuna fails to reach a reasonable minimum. From my very brief use of it, the common theme seems to be weak binding metals with adsorbates that don't quite bind. It seems like finetuna can fail to localize a minimum with the molecule just above the surface and instead the molecule can float up through the periodic image leading to surfaces potentially blowing up.

I don't know how useful this is, but I've also included a zip file with one such case, *N2H on Ag (111). I'm running a regular VASP optimization to see how the cg optimizer handles this and will update when it's finished.
to_zulissi.zip

QE colab calculator does not work as intended

Traceback for error as follows:

/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in learn(self)
     88 
     89         while not self.terminate:
---> 90             self.do_before_train()
     91             self.do_train()
     92             self.do_after_train()
/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in do_before_train(self)
     98         """
     99         if self.iterations > 0:
--> 100             self.query_data()
    101         self.fn_label = f"{self.file_dir}{self.filename}_iter_{self.iterations}"
    102 
/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in query_data(self)
    140         """
    141         queried_images = self.query_func()
--> 142         self.training_data += compute_with_calc(queried_images, self.delta_sub_calc)
    143 
    144     def check_terminate(self):
/usr/local/lib/python3.6/dist-packages/al_mlp/utils.py in compute_with_calc(images, calculator)
     53     for image in images:
     54         image.set_calculator(copy.deepcopy(calculator))
---> 55     return convert_to_singlepoint(images)
     56 
     57 
/usr/local/lib/python3.6/dist-packages/al_mlp/utils.py in convert_to_singlepoint(images)
     24         os.makedirs("./temp", exist_ok=True)
     25         os.chdir("./temp")
---> 26         sample_energy = image.get_potential_energy(apply_constraint=False)
     27         sample_forces = image.get_forces(apply_constraint=False)
     28         image.set_calculator(
/usr/local/lib/python3.6/dist-packages/ase/atoms.py in get_potential_energy(self, force_consistent, apply_constraint)
    731                 self, force_consistent=force_consistent)
    732         else:
--> 733             energy = self._calc.get_potential_energy(self)
    734         if apply_constraint:
    735             for constraint in self.constraints:
/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py in get_potential_energy(self, atoms, force_consistent)
    706 
    707     def get_potential_energy(self, atoms=None, force_consistent=False):
--> 708         energy = self.get_property('energy', atoms)
    709         if force_consistent:
    710             if 'free_energy' not in self.results:
/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py in get_property(self, name, atoms, allow_calculation)
    734             if not allow_calculation:
    735                 return None
--> 736             self.calculate(atoms, [name], system_changes)
    737 
    738         if name not in self.results:
/usr/local/lib/python3.6/dist-packages/al_mlp/calcs.py in calculate(self, atoms, properties, system_changes)
     52         self.calcs[0].results = self.parent_results
     53         self.calcs[1].results = self.base_results
---> 54         super().calculate(atoms, properties, system_changes)
     55 
     56         if "energy" in self.results:
/usr/local/lib/python3.6/dist-packages/ase/calculators/mixing.py in calculate(self, atoms, properties, system_changes)
     50 
     51         for w, calc in zip(self.weights, self.calcs):
---> 52             calc.calculate(atoms, properties, system_changes)
     53 
     54             for k in properties:
TypeError: calculate() takes from 2 to 3 positional arguments but 4 were given

Dask ensemble doesn't work with localcluster

Changing the oal_CuNP test to use dask causes the tests for fail hard. I think it's a problem with how local files are getting stored/used for checkpoints

Importing FinetunerCalc leads to Segmentation Fault

I tried testing the QE example code (qe_gpu_online_al_example.py). However, Python quickly exits with only the message "Segmentation Fault".

When I try to run the code line by line in the interactive Python interpreter, I found that the following line causes the segmentation fault:
from finetuna.ml_potentials.finetuner_calc import FinetunerCalc

What could be causing this?

Some additional details:

I am on an AWS EC2 instance with GPU
I created my Python environment by calling conda env create -f env.gpu.yml. However, the code failed to run, and I found out that calling import torch; torch.cuda.is_available() returned False. So I then called pip3 install torch torchvision torchaudio to get a GPU-enabled Pytorch installed in my environment.

Ensemble doesn't need to be made in OAL on instantiation

We don't need to actually make an ensemble in the online active learning until we're certain we're going to need. For example, if the first point is the same as the training data, we should just use the data and make/update the ensemble as points are added.

https://github.com/ulissigroup/al_mlp/blob/buildfix/al_mlp/online_learner.py#L79

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

circleci

.circleci/config.yml

circleci/python 3.7

github-actions

.github/workflows/black.yml

actions/checkout v2

.github/workflows/unittests.yml

actions/checkout v2

Check this box to trigger a request for Renovate to run again on this repository

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: The renovate configuration file contains some invalid settings
Message: Invalid configuration option: pipfile

Multiple problems when running CPU version

Hello,

I recently installed the CPU version of Finetuna according to the README on our local servers and ran into a series of issues.

First error was in pymatgen. The error said that yaml.safe_load() had been removed. I edited the local_env.py of pymatgen to fit the new format.

Next I got the OSError: libc10_cuda.so, no such file or directory.
Apparently, this is a .so from PyTorch which hadn't been installed. I edited the _ops.py in the ctypes library to manually load these .so files which I got from the GPU version of the Finetuna. I wondered if these .so files are even necessary for the CPU version. Since I am not running a CUDA device these files should be redundant, right?

After that a python script in the ocpmodels directory was missing. I took a look at the OCP20 repository and they merged to trainer files into one which led to the error. I edited the imports to the merged file.

Now I am stuck at an error which I have no idea on how to solve.

File "/.../mambaforge_install/envs/finetuna/lib/python3.9/site-packages/llvmlite/binding/targets.py", line 201, in from_triple
raise RuntimeError(str(outerr))
RuntimeError: No available targets are compatible with triple "x86_64-unknown-linux-gnu"

My research on the error message was not too helpful so far. I hope that maybe another user has run into this problem or someone with more experience with llvmlite can help me to understand the problem.

I also made an environment for the GPU version and except for the YAML issue it started without issues (and the crashed as expected since I don't have a CUDA device to run the program on).

Is there maybe a more up to date version of the CPU environment? Especially the YAML and OCP20 issue seem to come from changes in other libraries that negatively affect Finetunas functionality.

Thank you in advance!

online_learner.py needs a cleanup

Logic is hard to follow, lots of extraneous code.

ulissigroup / finetuna Goto Github PK

finetuna's People

Contributors

Stargazers

Watchers

Forkers

finetuna's Issues

Open

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org