Comments (8)
Hi @kkly1995 ,
This error usually comes up when a key that you try to compute statistics over β usually the energy or force when computing normalization constants β isn't in your dataset.
Can you run python -m pdb nequip/nequip/scripts/train.py path/to/minimal.yaml
and run p field
when it catches on this error?
Thanks.
from nequip.
Thank you for explaining the error, it looks like it was the .xyz provided by MD17 was not exactly in the format that could be fully parsed by ASE, i.e. it did not read the energies and forces. After fixing the format and verifying that ASE could correctly read the energies and forces, nequip successfully ran and in fact produced an identical result to that in configs/minimal.yaml
(so indeed the datasets are actually the same).
I actually have another ASE related issue, if that's alright. Once again I use configs/minimal.yaml
but change the data to my own (attached below) as well as the numbers n_train
and n_val
. My data contains 500 structures:
>>> from ase.io import read
>>> samples = read('subset.xyz', format='extxyz', index=':')
>>> len(samples)
500
and I can verify that ASE can parse the energies and forces of every structure here. However, with nequip I get the following error:
Successfully loaded the data set of type ASEDataset(100)...
Traceback (most recent call last):
File "/home/kkly2/anaconda3/envs/nequip/bin/nequip-train", line 8, in <module>
sys.exit(main())
File "/home/kkly2/anaconda3/envs/nequip/lib/python3.8/site-packages/nequip/scripts/train.py", line 40, in main
fresh_start(parse_command_line(args))
File "/home/kkly2/anaconda3/envs/nequip/lib/python3.8/site-packages/nequip/scripts/train.py", line 125, in fresh_start
trainer.set_dataset(dataset)
File "/home/kkly2/anaconda3/envs/nequip/lib/python3.8/site-packages/nequip/train/trainer.py", line 1046, in set_dataset
raise ValueError(
ValueError: too little data for training and validation. please reduce n_train and n_val
Am I correct in thinking that it is only reading 100 structures?
from nequip.
Glad that helped!
TODO: to self, at this to FAQ
What did you set n_train
and n_val
to?
from nequip.
Here are the numbers compared to the original:
$ diff minimal.yaml ~/nequip/configs/minimal.yaml
2c2
< root: results/LaH
---
> root: results/aspirin
15,16c15,16
< dataset: ase
< dataset_file_name: subset.xyz
---
> dataset: aspirin
> dataset_file_name: benchmark_data/aspirin_ccsd-train.npz
23,25c23,25
< n_train: 400
< n_val: 100
< batch_size: 5
---
> n_train: 5
> n_val: 5
> batch_size: 1
from nequip.
did you accidentally set include_frames
or something? this is strange since yes, ASEDataset(100)
indicates that it loaded only 100 frames
or maybe you are loading a different subset.xyz
than you think you were?
I'm not entirely sure what else this could be...
from nequip.
It is true that I previously had a smaller dataset of the same name subset.xyz
which had only 100 frames, which I since removed from this directory. The error reported above is after I replaced it with the larger dataset. I deleted results/LaH/minimal
before rerunning but I also noticed there is results/LaH/processed
. Just now I removed all these directories, it was able to successfully read all 500 frames. The error must be some artifact left over from when I ran a similar input, using the same directory but different subset.xyz
.
from nequip.
Ah, that explains it. If you keep the run_name
and the cutoff radius the same between two runs, then the code will read in the previously processed data set from file (in your case the one with 100 frames), instead of recomputing it. So it read the one with 100 frames instead of recomputing the one with 500 because the name and cutoff radius were the same.
We will make that more clear in the docs. Thanks for the notice.
Closing this.
from nequip.
Worth noting that this issue of cached processed versions of datasets getting out of sync with your dataset settings is resolved in the current beta version. However, that does not include if you change the actual data file that is read fromβ we don't want to waste time reading it just to check if something has changed, so you are responsible still for making sure that you reprocess if you change the contents of a datafile.
from nequip.
Related Issues (20)
- The use of nequip command HOT 1
- π [BUG] Cannot run nequip-train with provided example HOT 4
- π [FEATURE]How to Train and Validate on Separate Datasets HOT 2
- How to do custom EarlyStopping?β [QUESTION] HOT 4
- β [QUESTION]
- β [QUESTION] About the data class AtomicData HOT 3
- bugs with "initialize_from_state"π [BUG] HOT 5
- how to choose nosehoover the value of nvt_qβ [QUESTION]
- β [QUESTION]Finetuning Validation Error Higher than Pre-training Error in Nequip HOT 1
- Confusion about `num_frames` attribute in `HDF5Dataset` HOT 1
- β [QUESTION]how can I use the ase calculator for testing ?
- What is the unit of virialsβ [QUESTION] HOT 3
- MLFF for Silicon HOT 1
- minimal extxyz HOT 2
- β [QUESTION] cannot repeat the RMSE of water dataset
- π [FEATURE] Custom properties HOT 1
- π [BUG] Using initialize_from_state HOT 3
- Default Units in the outputs of training HOT 1
- β [QUESTION] How to use hdf5 dataset in config files?
- π [BUG] `InvalidVersion` in v0.6.0 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nequip.