ulissigroup / finetuna Goto Github PK
View Code? Open in Web Editor NEWActive Learning for Machine Learning Potentials
License: MIT License
Active Learning for Machine Learning Potentials
License: MIT License
When I was trying to finish the installation of Finetuna, I run the test from [N2H_Ag111_dissociation]. This was the error showing on my terminal.
/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
Traceback (most recent call last):
File "/hpcfs/users/a1732812/Test/11/N2H_Ag111.py", line 2, in
from finetuna.online_learner.online_learner import OnlineLearner
File "/hpcfs/users/a1732812/xxx/finetuna/finetuna/online_learner/online_learner.py", line 5, in
from finetuna.logger import Logger
File "/hpcfs/users/a1732812/xxx/finetuna/finetuna/logger.py", line 10, in
import wandb
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/init.py", line 26, in
from wandb import sdk as wandb_sdk
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/init.py", line 5, in
from .wandb_artifacts import Artifact # noqa: F401
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/wandb_artifacts.py", line 33, in
from wandb.apis import InternalApi, PublicApi
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/apis/init.py", line 42, in
from .internal import Api as InternalApi # noqa
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/apis/internal.py", line 3, in
from wandb.sdk.internal.internal_api import Api as InternalApi
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 45, in
from ..lib import retry
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/lib/retry.py", line 17, in
from .mailbox import ContextCancelledError
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 10, in
from wandb.proto import wandb_internal_pb2 as pb
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/proto/wandb_internal_pb2.py", line 8, in
from wandb.proto.v4.wandb_internal_pb2 import *
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/wandb/proto/v4/wandb_internal_pb2.py", line 5, in
from google.protobuf.internal import builder as _builder
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/internal/builder.py", line 42, in
from google.protobuf import reflection as _reflection
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/reflection.py", line 51, in
from google.protobuf import message_factory
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/message_factory.py", line 45, in
from google.protobuf import descriptor_pool
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/descriptor_pool.py", line 63, in
from google.protobuf import descriptor
File "/hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 51, in
from google.protobuf.pyext import _message
ImportError: /hpcfs/users/a1732812/miniconda3/envs/finetuna/lib/python3.9/site-packages/google/protobuf/pyext/_message.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN6google8protobuf2io17SafeDoubleToFloatEd
So, could anyone tell me how can I be able to resolve the problem? Thank you so much!
Mark as TODO for branch tt-vaspinteractive-test
The current learner will recalculate the forces and energies of all training data using the parent calculator.
Need to change the 'DeltaCalc' to use precalculated values instead.
There is now a test case for a Cu13 nanocluster relaxation with OAL.
We also need test cases for offline active learning and NEBs. The example scripts could be adapted pretty easily. Try to keep it as clean / simple / fast as possible so these tests can be run/fixed frequently.
An initialized trainer should be copied for each ensemble.
Dear Researchers,
I tried to follow the example of VASP_interactive on online learning. However I struggled early, as the line:
from al_mlp.atomistic_methods import replay_trajectory
leads to
ImportError: cannot import name 'replay_trajectory' from 'al_mlp.atomistic_methods' (.../.local/lib/python3.8/site-packages/al_mlp-0.1-py3.8.egg/al_mlp/atomistic_methods.py)
I had tried to install al_mlp using the setup.py routine. However I didn't see where the replay_trajectory function would be added, even in Github.
Any help would be highly appreciated!
Alexander
Hello, here is my code:
ml_potential = FinetunerCalc(
checkpoint_path="gemnet_t_direct_h512_all.pt",
mlp_params={
"tuner": {
"unfreeze_blocks": [
"out_blocks.3.seq_forces",
"out_blocks.3.scale_rbf_F",
"out_blocks.3.dense_rbf_F",
"out_blocks.3.out_forces",
"out_blocks.2.seq_forces",
"out_blocks.2.scale_rbf_F",
"out_blocks.2.dense_rbf_F",
"out_blocks.2.out_forces",
"out_blocks.1.seq_forces",
"out_blocks.1.scale_rbf_F",
"out_blocks.1.dense_rbf_F",
"out_blocks.1.out_forces",
],
"num_threads": 32
},
"optim": {
"batch_size": 1,
"num_workers": 0,
"max_epochs": 400,
"lr_initial": 0.0003,
"factor": 0.9,
"eval_every": 1,
"patience": 3,
"checkpoint_every": 100000,
"scheduler_loss": "train",
"weight_decay": 0,
"eps": 1e-8,
"optimizer_params": {
"weight_decay": 0,
"eps": 1e-8,
},
},
"task": {
"primary_metric": "loss",
},
"local_rank": 0
},
)
ml_potential.train(parent_dataset=train_dataset[:2])
my cuda version is 11.3, nvidia-smi can see the training process and GPU memory usage, but the volatile gpu-util is 0, and the power consumption has not increased. Is there a problem with my parameter settings?
The finetuna job is running the cray cluster . the model I use is " module swap PrgEnv-cray PrgEnv-intel".
The default mpi is cray mpi. the Warning is :
" UserWarning: Cannot find the mpi process or you're using different ompi wrapper. Will not send stop signal to mpi."
Does the warning impact the calculation speed?
Best,
Li Yuke
Hello,
I run the quantum espresso example from 'examples/quantum_espresso/qe_gpu_online_al_example.py' (I have not changed anything from the example except the path to QE and psedopotential). After the calculation was complete when i plotted the energies from the predicted trajectory ('online_learner_trajectory.traj') i see that energies predicted by mlp is around -1 eV while energies predicted by QE is around -140000 eV. Is there something i am missing?. How do i get the correct energy values?
i have attached the predicted energy plot.
Hello!
based on some of the previous issues, I think a problem I'm running into is related to the VaspInteractive calculator not being compatible with my custom build of vasp 5.4.4. So, I simply replaced VaspInteractive' with
Vasp' from ase (~line 63 of the example wrapper script), and I get the following traceback:
Traceback (most recent call last):
File ".../finetuna_wrapper.py", line 87, in
with vasp_calc as parent_calc:
AttributeError: enter
Is there something else I need to do that's being missed here?
Issue
I tried to run FINETUNA with VASP 6.3.0 to relax H*CO on Pt(111) using the provided ASE example template (no 1). 10 steps are performed with the MLP and then a DFT calculation is triggered. However, after the DFT calculation converges, the software crashes and reports the following error message:
Trying to close the VASP stream but encountered error:
process PID not found (pid=181196)
Will now force closing the VASP process. The OUTCAR and vasprun.xml outputs may be incomplete
Force below threshold: check with parent
OnlineLearner: Parent calculation required
Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 442, in wrapper
ret = self._cache[fun]
AttributeError: _cache
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1642, in wrapper
return fun(self, *args, **kwargs)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 445, in wrapper
return fun(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1684, in _parse_stat_file
data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 775, in bcat
return cat(fname, fallback=fallback, _open=open_binary)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 763, in cat
with _open(fname) as f:
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_common.py", line 727, in open_binary
return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/181196/stat'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 361, in _init
self.create_time()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 714, in create_time
self._create_time = self._proc.create_time()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1642, in wrapper
return fun(self, *args, **kwargs)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1852, in create_time
ctime = float(self._parse_stat_file()['create_time'])
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/_pslinux.py", line 1649, in wrapper
raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: process no longer exists (pid=181196)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/gpfs/data/cfgoldsm/bkreitz1/VASP/methane-oxidation/neb/h--co-diss/IS/finetuna/example.py", line 106, in
relaxer.run(
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/atomistic_methods.py", line 198, in run
dyn.run(fmax=self.fmax, steps=self.steps)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 294, in run
return Dynamics.run(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 181, in run
for converged in Dynamics.irun(self):
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 168, in irun
self.log()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/optimize/optimize.py", line 308, in log
forces = self.atoms.get_forces()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/atoms.py", line 790, in get_forces
forces = self._calc.get_forces(self)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces
return self.get_property('forces', atoms)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/ase/calculators/calculator.py", line 736, in get_property
self.calculate(atoms, [name], system_changes)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 189, in calculate
energy, forces, fmax = self.get_energy_and_forces(atoms)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 259, in get_energy_and_forces
energy, forces, constrained_forces = self.add_data_and_retrain(
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/finetuna/online_learner/online_learner.py", line 491, in add_data_and_retrain
self.parent_calc._pause_calc()
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/vasp_interactive/vasp_interactive.py", line 471, in _pause_calc
mpi_process = _find_mpi_process(pid)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/vasp_interactive/vasp_interactive.py", line 65, in _find_mpi_process
process_list = [psutil.Process(pid)]
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 332, in init
self._init(pid)
File "/users/bkreitz1/anaconda/finetuna/lib/python3.9/site-packages/psutil/init.py", line 373, in _init
raise NoSuchProcess(pid, msg='process PID not found')
psutil.NoSuchProcess: process PID not found (pid=181196)
Trying to close the VASP stream but encountered error:
'psutil'
Software
OpenMPI 4.0.5
Intel 2020.2
Python 3.9.0
Executed on 2 nodes, with 2 tasks per node and 8 cpus per task (not sure if that's relevant)
I adjusted the vasp calculator as follows:
vasp_calc = VaspInteractive(
ibrion=-1,
nsw=0,
ispin=1,
ediff=1e-6,
ediffg=-0.03,
encut=450.0,
laechg=False,
lcharg=False,
lwave=False,
#ncore=4,
xc="beef-vdw",
kpts=(3,3,1),
)
Hello again! I've found a couple of cases where finetuna fails to reach a reasonable minimum. From my very brief use of it, the common theme seems to be weak binding metals with adsorbates that don't quite bind. It seems like finetuna can fail to localize a minimum with the molecule just above the surface and instead the molecule can float up through the periodic image leading to surfaces potentially blowing up.
I don't know how useful this is, but I've also included a zip file with one such case, *N2H on Ag (111). I'm running a regular VASP optimization to see how the cg optimizer handles this and will update when it's finished.
to_zulissi.zip
Traceback for error as follows:
/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in learn(self)
88
89 while not self.terminate:
---> 90 self.do_before_train()
91 self.do_train()
92 self.do_after_train()
/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in do_before_train(self)
98 """
99 if self.iterations > 0:
--> 100 self.query_data()
101 self.fn_label = f"{self.file_dir}{self.filename}_iter_{self.iterations}"
102
/usr/local/lib/python3.6/dist-packages/al_mlp/offline_learner.py in query_data(self)
140 """
141 queried_images = self.query_func()
--> 142 self.training_data += compute_with_calc(queried_images, self.delta_sub_calc)
143
144 def check_terminate(self):
/usr/local/lib/python3.6/dist-packages/al_mlp/utils.py in compute_with_calc(images, calculator)
53 for image in images:
54 image.set_calculator(copy.deepcopy(calculator))
---> 55 return convert_to_singlepoint(images)
56
57
/usr/local/lib/python3.6/dist-packages/al_mlp/utils.py in convert_to_singlepoint(images)
24 os.makedirs("./temp", exist_ok=True)
25 os.chdir("./temp")
---> 26 sample_energy = image.get_potential_energy(apply_constraint=False)
27 sample_forces = image.get_forces(apply_constraint=False)
28 image.set_calculator(
/usr/local/lib/python3.6/dist-packages/ase/atoms.py in get_potential_energy(self, force_consistent, apply_constraint)
731 self, force_consistent=force_consistent)
732 else:
--> 733 energy = self._calc.get_potential_energy(self)
734 if apply_constraint:
735 for constraint in self.constraints:
/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py in get_potential_energy(self, atoms, force_consistent)
706
707 def get_potential_energy(self, atoms=None, force_consistent=False):
--> 708 energy = self.get_property('energy', atoms)
709 if force_consistent:
710 if 'free_energy' not in self.results:
/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py in get_property(self, name, atoms, allow_calculation)
734 if not allow_calculation:
735 return None
--> 736 self.calculate(atoms, [name], system_changes)
737
738 if name not in self.results:
/usr/local/lib/python3.6/dist-packages/al_mlp/calcs.py in calculate(self, atoms, properties, system_changes)
52 self.calcs[0].results = self.parent_results
53 self.calcs[1].results = self.base_results
---> 54 super().calculate(atoms, properties, system_changes)
55
56 if "energy" in self.results:
/usr/local/lib/python3.6/dist-packages/ase/calculators/mixing.py in calculate(self, atoms, properties, system_changes)
50
51 for w, calc in zip(self.weights, self.calcs):
---> 52 calc.calculate(atoms, properties, system_changes)
53
54 for k in properties:
TypeError: calculate() takes from 2 to 3 positional arguments but 4 were given
Changing the oal_CuNP test to use dask causes the tests for fail hard. I think it's a problem with how local files are getting stored/used for checkpoints
I tried testing the QE example code (qe_gpu_online_al_example.py). However, Python quickly exits with only the message "Segmentation Fault".
When I try to run the code line by line in the interactive Python interpreter, I found that the following line causes the segmentation fault:
from finetuna.ml_potentials.finetuner_calc import FinetunerCalc
What could be causing this?
Some additional details:
conda env create -f env.gpu.yml
. However, the code failed to run, and I found out that calling import torch; torch.cuda.is_available()
returned False. So I then called pip3 install torch torchvision torchaudio
to get a GPU-enabled Pytorch installed in my environment.We don't need to actually make an ensemble in the online active learning until we're certain we're going to need. For example, if the first point is the same as the training data, we should just use the data and make/update the ensemble as points are added.
https://github.com/ulissigroup/al_mlp/blob/buildfix/al_mlp/online_learner.py#L79
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
.circleci/config.yml
circleci/python 3.7
.github/workflows/black.yml
actions/checkout v2
.github/workflows/unittests.yml
actions/checkout v2
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Location: renovate.json
Error type: The renovate configuration file contains some invalid settings
Message: Invalid configuration option: pipfile
Hello,
I recently installed the CPU version of Finetuna according to the README on our local servers and ran into a series of issues.
First error was in pymatgen. The error said that yaml.safe_load() had been removed. I edited the local_env.py of pymatgen to fit the new format.
Next I got the OSError: libc10_cuda.so, no such file or directory.
Apparently, this is a .so from PyTorch which hadn't been installed. I edited the _ops.py in the ctypes library to manually load these .so files which I got from the GPU version of the Finetuna. I wondered if these .so files are even necessary for the CPU version. Since I am not running a CUDA device these files should be redundant, right?
After that a python script in the ocpmodels directory was missing. I took a look at the OCP20 repository and they merged to trainer files into one which led to the error. I edited the imports to the merged file.
Now I am stuck at an error which I have no idea on how to solve.
File "/.../mambaforge_install/envs/finetuna/lib/python3.9/site-packages/llvmlite/binding/targets.py", line 201, in from_triple
raise RuntimeError(str(outerr))
RuntimeError: No available targets are compatible with triple "x86_64-unknown-linux-gnu"
My research on the error message was not too helpful so far. I hope that maybe another user has run into this problem or someone with more experience with llvmlite can help me to understand the problem.
I also made an environment for the GPU version and except for the YAML issue it started without issues (and the crashed as expected since I don't have a CUDA device to run the program on).
Is there maybe a more up to date version of the CPU environment? Especially the YAML and OCP20 issue seem to come from changes in other libraries that negatively affect Finetunas functionality.
Thank you in advance!
Logic is hard to follow, lots of extraneous code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.