Git Product home page Git Product logo

alignn's Introduction

alt text codecov PyPI version GitHub tag (latest by date) GitHub code size in bytes GitHub commit activity Downloads

Table of Contents

ALIGNN & ALIGNN-FF (Introduction)

The Atomistic Line Graph Neural Network (https://www.nature.com/articles/s41524-021-00650-1) introduces a new graph convolution layer that explicitly models both two and three body interactions in atomistic systems. This is achieved by composing two edge-gated graph convolution layers, the first applied to the atomistic line graph L(g) (representing triplet interactions) and the second applied to the atomistic bond graph g (representing pair interactions).

A unified force-field model, ALIGNN-FF (https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00096b ) was developed that can model both structurally and chemically diverse solids with any combination of 89 elements from the periodic table.

ALIGNN layer schematic

Installation

First create a conda environment: Install miniconda environment from https://conda.io/miniconda.html Based on your system requirements, you'll get a file something like 'Miniconda3-latest-XYZ'.

Now,

bash Miniconda3-latest-Linux-x86_64.sh (for linux)
bash Miniconda3-latest-MacOSX-x86_64.sh (for Mac)

Download 32/64 bit python 3.10 miniconda exe and install (for windows)

Method 1 (conda based installation)

Now, let's make a conda environment, say "my_alignn", choose other name as you like::

conda create --name my_alignn python=3.10
conda activate my_alignn
conda install alignn -y

optional GPU dependencies notes

If you need CUDA support, it's best to install PyTorch and DGL before installing alignn to ensure that you get a CUDA-enabled version of DGL.

conda install dgl=2.1.0 pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia

Method 2 (edit/debug in-place install)

You can laso install a development version of alignn by cloning the repository and installing in place with pip:

git clone https://github.com/usnistgov/alignn
cd alignn
python -m pip install -e .

Method 3 (using pypi):

As an alternate method, ALIGNN can also be installed using pip command as follows:

pip install alignn
pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html

Examples

Notebooks Google Colab Descriptions
Regression model Open in Google Colab Examples for developing single output regression model for exfoliation energies of 2D materials.
MLFF Open in Google Colab Examples of training a machine learning force field for Silicon.
ALIGNN-FF Relaxer+EV_curve+Phonons+Interface gamma_surface+Interface separation Open in Google Colab Examples of using pre-trained ALIGNN-FF force-field model.
Miscellaneous tasks Open in Google Colab Examples for developing single output (such as formation energy, bandgaps) or multi-output (such as phonon DOS, electron DOS) Regression or Classification (such as metal vs non-metal), Using several pretrained models.

Here, we provide examples for property prediction tasks, development of machine-learning force-fields (MLFF), usage of pre-trained property predictor, MLFFs, webapps etc.

Dataset preparation for property prediction tasks

The main script to train model is train_alignn.py. A user needs at least the following info to train a model: 1) id_prop.csv with name of the file and corresponding value, 2) config_example.json a config file with training and hyperparameters.

Users can keep their structure files in POSCAR, .cif, .xyz or .pdb files in a directory. In the examples below we will use POSCAR format files. In the same directory, there should be an id_prop.csv file.

In this directory, id_prop.csv, the filenames, and correponding target values are kept in comma separated values (csv) format.

Here is an example of training OptB88vdw bandgaps of 50 materials from JARVIS-DFT database. The example is created using the generate_sample_data_reg.py script. Users can modify the script for more than 50 data, or make their own dataset in this format. For list of available datasets see Databases.

The dataset in split in 80:10:10 as training-validation-test set (controlled by train_ratio, val_ratio, test_ratio) . To change the split proportion and other parameters, change the config_example.json file. If, users want to train on certain sets and val/test on another dataset, set n_train, n_val, n_test manually in the config_example.json and also set keep_data_order as True there so that random shuffle is disabled.

A brief help guide (-h) can be obtained as follows.

train_alignn.py -h

Regression example

Now, the model is trained as follows. Please increase the batch_size parameter to something like 32 or 64 in config_example.json for general trainings.

train_alignn.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp

Classification example

While the above example is for regression, the follwoing example shows a classification task for metal/non-metal based on the above bandgap values. We transform the dataset into 1 or 0 based on a threshold of 0.01 eV (controlled by the parameter, classification_threshold) and train a similar classification model. Currently, the script allows binary classification tasks only.

train_alignn.py --root_dir "alignn/examples/sample_data" --classification_threshold 0.01 --config "alignn/examples/sample_data/config_example.json" --output_dir=temp

Multi-output model example

While the above example regression was for single-output values, we can train multi-output regression models as well. An example is given below for training formation energy per atom, bandgap and total energy per atom simulataneously. The script to generate the example data is provided in the script folder of the sample_data_multi_prop. Another example of training electron and phonon density of states is provided also.

train_alignn.py --root_dir "alignn/examples/sample_data_multi_prop" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp

Automated model training

Users can try training using multiple example scripts to run multiple dataset (such as JARVIS-DFT, Materials project, QM9_JCTC etc.). Look into the alignn/scripts/train_*.py folder. This is done primarily to make the trainings more automated rather than making folder/ csv files etc. These scripts automatically download datasets from Databases in jarvis-tools and train several models. Make sure you specify your specific queuing system details in the scripts.

other examples

Additional example trainings for 2D-exfoliation energy, superconductor transition temperature.

Using pre-trained models

All the trained models are distributed on [Figshare](https://figshare.com/projects/ALIGNN_models/126478.

The pretrained.py script can be applied to use them. These models can be used to directly make predictions.

A brief help section (-h) is shown using:

pretrained.py -h

An example of prediction formation energy per atom using JARVIS-DFT dataset trained model is shown below:

pretrained.py --model_name jv_formation_energy_peratom_alignn --file_format poscar --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp

Web-app

A basic web-app is for direct-prediction available at JARVIS-ALIGNN app. Given atomistic structure in POSCAR format it predict formation energy, total energy per atom and bandgap using data trained on JARVIS-DFT dataset.

JARVIS-ALIGNN

ALIGNN-FF

Atomisitic line graph neural network-based FF (ALIGNN-FF) can be used to model both structurally and chemically diverse systems with any combination of 89 elements from the periodic table. To train the ALIGNN-FF model, we have used the JARVIS-DFT dataset which contains around 75000 materials and 4 million energy-force entries, out of which 307113 are used in the training. These models can be further finetuned, or new models can be developed from scratch on a new dataset.

ASE calculator provides interface to various codes. An example for ALIGNN-FF is give below. Note that there are multiple pretrained ALIGNN-FF models available, here we use the deafult_path model. As more accurate models are developed, they will be made available as well:

from alignn.ff.ff import (
    AlignnAtomwiseCalculator,
    default_path,
    mptraj_path,
    wt01_path,
)
import matplotlib.pyplot as plt
from ase import Atom, Atoms
import time
from ase.build import bulk
import numpy as np
import matplotlib.pyplot as plt
from ase.build import make_supercell
%matplotlib inline

model_path = default_path()
calc = AlignnAtomwiseCalculator(path=model_path)

t1 = time.time()
# a = 5.43
lattice_params = np.linspace(5.2, 5.6)
fcc_energies = []
ready = True
for a in lattice_params:
    atoms = bulk("Si", "diamond", a=a)
    atoms.set_tags(np.ones(len(atoms)))
    atoms.calc = calc
    e = atoms.get_potential_energy()
    fcc_energies.append(e)
t2 = time.time()
print("Time", t2 - t1)
plt.plot(lattice_params, fcc_energies, "-o")
plt.title("Si")
plt.xlabel("Lattice constant ($\AA$)")
plt.ylabel("Total energy (eV)")
plt.show()

To train ALIGNN-FF use train_alignn.py script which uses atomwise_alignn model:

AtomWise prediction example which looks for similar setup as before but unstead of id_prop.csv, it requires id_prop.json file (see example in the sample_data_ff directory). An example to compile vasprun.xml files into a id_prop.json is kept here. Note ALIGNN-FF requires energy stored as energy per atom:

train_alignn.py --root_dir "alignn/examples/sample_data_ff" --config "alignn/examples/sample_data_ff/config_example_atomwise.json" --output_dir=temp

To finetune model, use --restart_model_path tag as well in the above with the path of a pretrained ALIGNN-FF model with same model confurations.

An example for training MLFF for silicon is provided here. It is highly recommeded to get familiar with this example before developing a new model. Note: new model configs such as lg_on_fly and add_reverse_forces should be defaulted to True for newer versions. For MD runs, use_cutoff_function is recommended.

A pretrained ALIGNN-FF (under active development right now) can be used for predicting several properties, such as:

run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="unrelaxed_energy"
run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="optimize"
run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="ev_curve"

To know about other tasks, type.

run_alignn_ff.py -h

Several supporting scripts for stucture optimization, equation of states, phonon and related calculations are provided in the repo as well. If you need further assistance for a particular task, feel free to raise an GitHus issue.

Performances

Please refer to JARVIS-Leaderboard to check the performance of ALIGNN models on several databases.

1) On JARVIS-DFT 2021 dataset (classification)

Model Threshold ALIGNN
Metal/non-metal classifier (OPT) 0.01 eV 0.92
Metal/non-metal classifier (MBJ) 0.01 eV 0.92
Magnetic/non-Magnetic classifier 0.05 µB 0.91
High/low SLME 10 % 0.83
High/low spillage 0.1 0.80
Stable/unstable (ehull) 0.1 eV 0.94
High/low-n-Seebeck -100 µVK-1 0.88
High/low-p-Seebeck 100 µVK-1 0.92
High/low-n-powerfactor 1000 µW(mK2)-1 0.74
High/low-p-powerfactor 1000µW(mK2)-1 0.74

2) On JARVIS-DFT 2021 dataset (regression)

Property Units MAD CFID CGCNN ALIGNN MAD: MAE
Formation energy eV(atom)-1 0.86 0.14 0.063 0.033 26.06
Bandgap (OPT) eV 0.99 0.30 0.20 0.14 7.07
Total energy eV(atom)-1 1.78 0.24 0.078 0.037 48.11
Ehull eV 1.14 0.22 0.17 0.076 15.00
Bandgap (MBJ) eV 1.79 0.53 0.41 0.31 5.77
Kv GPa 52.80 14.12 14.47 10.40 5.08
Gv GPa 27.16 11.98 11.75 9.48 2.86
Mag. mom µB 1.27 0.45 0.37 0.26 4.88
SLME (%) No unit 10.93 6.22 5.66 4.52 2.42
Spillage No unit 0.52 0.39 0.40 0.35 1.49
Kpoint-length Å 17.88 9.68 10.60 9.51 1.88
Plane-wave cutoff eV 260.4 139.4 151.0 133.8 1.95
єx (OPT) No unit 57.40 24.83 27.17 20.40 2.81
єy (OPT) No unit 57.54 25.03 26.62 19.99 2.88
єz (OPT) No unit 56.03 24.77 25.69 19.57 2.86
єx (MBJ) No unit 64.43 30.96 29.82 24.05 2.68
єy (MBJ) No unit 64.55 29.89 30.11 23.65 2.73
єz (MBJ) No unit 60.88 29.18 30.53 23.73 2.57
є (DFPT:elec+ionic) No unit 45.81 43.71 38.78 28.15 1.63
Max. piezoelectric strain coeff (dij) CN-1 24.57 36.41 34.71 20.57 1.19
Max. piezo. stress coeff (eij) Cm-2 0.26 0.23 0.19 0.147 1.77
Exfoliation energy meV(atom)-1 62.63 63.31 50.0 51.42 1.22
Max. EFG 1021 Vm-2 43.90 24.54 24.7 19.12 2.30
avg. me electron mass unit 0.22 0.14 0.12 0.085 2.59
avg. mh electron mass unit 0.41 0.20 0.17 0.124 3.31
n-Seebeck µVK-1 113.0 56.38 49.32 40.92 2.76
n-PF µW(mK2)-1 697.80 521.54 552.6 442.30 1.58
p-Seebeck µVK-1 166.33 62.74 52.68 42.42 3.92
p-PF µW(mK2)-1 691.67 505.45 560.8 440.26 1.57

3) On Materials project 2018 dataset

The results from models other than ALIGNN are reported as given in corresponding papers, not necessarily reproduced by us.

Prop Unit MAD CFID CGCNN MEGNet SchNet ALIGNN MAD:MAE
Ef eV(atom)-1 0.93 0.104 0.039 0.028 0.035 0.022 42.27
Eg eV 1.35 0.434 0.388 0.33 - 0.218 6.19

4) On QM9 dataset

Note the issue related to QM9 dataset. The results from models other than ALIGNN are reported as given in corresponding papers, not necessarily reproduced by us. These models were trained with same parameters as solid-state databases but for 1000 epochs.

Target Units SchNet MEGNet DimeNet++ ALIGNN
HOMO eV 0.041 0.043 0.0246 0.0214
LUMO eV 0.034 0.044 0.0195 0.0195
Gap eV 0.063 0.066 0.0326 0.0381
ZPVE eV 0.0017 0.00143 0.00121 0.0031
µ Debye 0.033 0.05 0.0297 0.0146
α Bohr3 0.235 0.081 0.0435 0.0561
R2 Bohr2 0.073 0.302 0.331 0.5432
U0 eV 0.014 0.012 0.00632 0.0153
U eV 0.019 0.013 0.00628 0.0144
H eV 0.014 0.012 0.00653 0.0147
G eV 0.014 0.012 0.00756 0.0144

5) On hMOF dataset

Property Unit MAD MAE MAD:MAE R2 RMSE
Grav. surface area m2 g-1 1430.82 91.15 15.70 0.99 180.89
Vol. surface area m2 cm-3 561.44 107.81 5.21 0.91 229.24
Void fraction No unit 0.16 0.017 9.41 0.98 0.03
LCD Å 3.44 0.75 4.56 0.83 1.83
PLD Å 3.55 0.92 3.86 0.78 2.12
All adsp mol kg-1 1.70 0.18 9.44 0.95 0.49
Adsp at 0.01bar mol kg-1 0.12 0.04 3.00 0.77 0.11
Adsp at 2.5bar mol kg-1 2.16 0.48 4.50 0.90 0.97

6) On qMOF dataset

MAE on electronic bandgap 0.20 eV

7) On OMDB dataset

coming soon!

8) On HOPV dataset

coming soon!

9) On QETB dataset

coming soon!

10) On OpenCatalyst dataset

On 10k dataset:

DataSplit CGCNN DimeNet SchNet DimeNet++ ALIGNN MAD: MAE
10k 0.988 1.0117 1.059 0.8837 0.61 -

Useful notes (based on some of the queries we received)

  1. If you are using GPUs, make sure you have a compatible dgl-cuda version installed, for example: dgl-cu101 or dgl-cu111, so e.g. pip install dgl-cu111 .
  2. While comnventional '.cif' and '.pdb' files can be read using jarvis-tools, for complex files you might have to install cif2cell and pytraj respectively i.e.pip install cif2cell==2.0.0a3 and conda install -c ambermd pytraj.
  3. Make sure you use batch_size as 32 or 64 for large datasets, and not 2 as given in the example config file, else it will take much longer to train, and performnce might drop a lot.
  4. Note that train_alignn.py and pretrained.py in alignn folder are actually python executable scripts. So, even if you don't provide absolute path of these scripts, they should work.
  5. Learn about the issue with QM9 results here: #54
  6. Make sure you have pandas version as >1.2.3.
  7. Starting March 2024, pytroch-ignite dependency will be removed to enable conda-forge build.

References

  1. Atomistic Line Graph Neural Network for improved materials property predictions
  2. Prediction of the Electron Density of States for Crystalline Compounds with Atomistic Line Graph Neural Networks (ALIGNN)
  3. Recent advances and applications of deep learning methods in materials science
  4. Designing High-Tc Superconductors with BCS-inspired Screening, Density Functional Theory and Deep-learning
  5. A Deep-learning Model for Fast Prediction of Vacancy Formation in Diverse Materials
  6. Graph neural network predictions of metal organic framework CO2 adsorption properties
  7. Rapid Prediction of Phonon Structure and Properties using an Atomistic Line Graph Neural Network (ALIGNN)
  8. Unified graph neural network force-field for the periodic table
  9. Large Scale Benchmark of Materials Design Methods

Please see detailed publications list here.

How to contribute

For detailed instructions, please see Contribution instructions

Correspondence

Please report bugs as Github issues (https://github.com/usnistgov/alignn/issues) or email to [email protected].

Funding support

NIST-MGI (https://www.nist.gov/mgi).

Code of conduct

Please see Code of conduct

alignn's People

Contributors

bdecost avatar janosh avatar knc6 avatar msehabibur avatar pbenner avatar ramyaguru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alignn's Issues

Cannot reproduce results for the Magnetic/non-Magnetic classifier on JARVIS-3D dataset

I am trying to train the ALIGNN model from scratch using JARVIS-3D dataset to reproduce the results of the magnetic/non-magnetic classification task. n I used the same model parameters in the paper and have attached the parameters I used for the config file below.

The model trains but seems to overfit the training data as shown by the results on the validation dataset. Here are the results from the last few epochs I get after ~30 epochs:

Train ROC AUC: 0.9761
Val ROC AUC: 0.4948
Train ROC AUC: 0.9816
Val ROC AUC: 0.4894
Train ROC AUC: 0.9860
Val ROC AUC: 0.4954

How to Reproduce this issue:

  1. Create a Python script called pretraining.py in the current working directory and dump this code into it:
from jarvis.db.figshare import data
from jarvis.core.atoms import Atoms
import json
import os
import csv

# Define data sources and target features
d = data('dft_3d')
target_feature_name = 'magmom_outcar'

# Create a list of target feature values, removing any 'na' entries
target = [material[target_feature_name] for material in d if material[target_feature_name] != 'na']

# Get the id and structure of each material
material_id = [material['jid'] for material in d]
material_structure = [Atoms.from_dict(material['atoms']) for material in d]

# Name of the folder to store the training data
train_folder_name = 'train-folder'

# Create the training folder if it doesn't exist
if not os.path.exists(train_folder_name):
    os.makedirs(train_folder_name)
    print(f"Folder '{train_folder_name}' created successfully.")
else:
    print(f"Folder '{train_folder_name}' already exists.")

# Write each structure to a .vasp file in the training folder
for struct, id in zip(material_structure,material_id):
    struct.write_poscar(os.path.join(train_folder_name, f"{id}.vasp"))

# Get a list of all the .vasp files in the training folder
all_files = os.listdir(train_folder_name)
file_names = [x for x in all_files if x.endswith('vasp')]

# Write the filenames and target values to a .csv file in the training folder
with open(os.path.join(train_folder_name, 'id_prop.csv'), 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    for value1, value2 in zip(file_names, target):
        writer.writerow([value1, value2])

# Define a configuration dictionary for the ALIGNN model
data = {
    "version": "112bbedebdaecf59fb18e11c929080fb2f358246",
    "dataset": "user_data",
    "target": "target",
    "atom_features": "cgcnn",
    "neighbor_strategy": "k-nearest",
    "id_tag": "jid",
    "random_seed": 123,
    "classification_threshold": 0.05,
    "n_val": None,
    "n_test": None,
    "n_train": None,
    "train_ratio": 0.8,
    "val_ratio": 0.1,
    "test_ratio": 0.1,
    "target_multiplication_factor": None,
    "epochs": 40,
    "batch_size": 64,
    "weight_decay": 1e-05,
    "learning_rate": 0.001,
    "filename": "sample",
    "warmup_steps": 2000,
    "criterion": "mse",
    "optimizer": "adamw",
    "scheduler": "onecycle",
    "pin_memory": False,
    "save_dataloader": False,
    "write_checkpoint": True,
    "write_predictions": True,
    "store_outputs": True,
    "progress": True,
    "log_tensorboard": False,
    "standard_scalar_and_pca": False,
    "use_canonize": True,
    "num_workers": 0,
    "cutoff": 8.0,
    "max_neighbors": 12,
    "keep_data_order": False,
    "model": {
        "name": "alignn",
        "alignn_layers": 4,
        "gcn_layers": 4,
        "atom_input_features": 92,
        "edge_input_features": 80,
        "triplet_input_features": 40,
        "embedding_features": 64,
        "hidden_features": 256,
        "output_features": 1,
        "link": "identity",
        "zero_inflated": False,
        "classification": True
    }
}

with open('{}/config.json'.format(train_folder_name), 'w') as f:
    json.dump(data, f)

Run the script using python pretraining.py. This should create a training folder and all the necessary files for training.

  1. Run the training command in the same directory:
    train_folder.py --root_dir "train-folder" --config "train-folder/config.json" --output_dir='magnetic-class-output' > train.log

The model will train and log training results similar to what I have shown above.

QM9 test results

Dear maintainers,

I just stumbled across your paper and found it very interesting.

I was wondering whether test results on the QM9 database were compared against the other methodologies presented in Table 5, and if these could be shared. I think these could be useful for the community.

Many thanks!

Out of Memory Bug

I trained the model with 28000 cif data, but every time I ran it, I got an error:
"slurmstepd: error: Detected 1 oom-kill event(s) in StepId=59934605.batch. Some of your processes may have been killed by the cgroup out-of-memory handler. "
I already used 500GB for CPU, why is it still out of memory?
image

Estimate DimeNet++ workload on MP dataset

In addition to timing/performance comparison on QM9:

  • time a few iterations or 1 epoch to estimate workload of full training
  • if feasible, train on JV (or a subset?) to facilitate comparison
  • also if feasible, train on MP

load graph data directly from disk to reduce memory requirements for large datasets

e.g. the MEGNet dataset is rather large; so are OQMD and AFLOW

proposal: store graph data in hdf5 using dataset index as main dataset keys
Alternate key: use the structure identifier, e.g. "JVASP-1234" or "MP-5678"

import h5py
import pandas as pd

identifier = "jid"
df = pd.DataFrame(jdata("dft_3d"))

with h5py.File("dft_3d.hdf5", "w") as f:
        for idx, row in df.iterrows():
            # store graph data in HDF5 group keyed with structure id
            # e.g. "JVASP-1234"
            identifier = row[identifier]
            group = f.create_group(identifier)
            ndata = group.create_group("ndata")
            edata = group.create_group("edata")
        
            graph = build_dgl_graph(row["structure"])
            
            # store edge list representation
            u, v = graph.edges()
            group["u"] = u
            group["v"] = v
        
            # store node data in a supgroup "ndata"
            # e.g. f["JVASP-1234/ndata/atomic_number"]
            # ndata["atomic_number"] = graph.ndata["atomic_number"]
            for key, node_feature in graph.ndata.items():
                ndata[key] = node_feature
        
            # store node data in a supgroup "ndata"
            # e.g. f["JVASP-1234/edata/r"]
            # edata["r"] = graph.edata["r"]
            for key, node_feature in graph.edata.items():
                edata[key] = node_feature

Then dataloading can look like

class StructureDataset():
    def __getitem__(self, idx):

        # https://discuss.pytorch.org/t/dataloader-when-num-worker-0-there-is-bug/25643/16
        if self.dataset is None:
            self.dataset = h5py.File("dft_3d.hdf5", "r")

        # look up structure id, e.g.  "JVASP-1234"
        key = self.identifiers[idx]
        group = self.dataset[key]
        ndata = group["ndata"]
        edata = group["edata"]

        # load graph from edge list
        g = dgl.graph(group["u"], group["v"])

        for key in ndata:
            g.ndata[key] = ndata[key]

        for key in edata:
            g.edata[key] = edata[key]

Running ALIGNN on Multi-GPUs

Dear All,

I would like to run ALIGNN on multi GPUs. When I checked the code I could not find any option.

Is there any method to run ALIGNN on multi GPUs such as using PyTorch Lightning or DDP function from PyTorch (Distributed Data Parallel)?

Best regards,
Mirac

Are the forces from alignn conservative?

if self.config.include_pos_deriv:
# Not tested yet
g.ndata["coords"].requires_grad_(True)
dx = [g.ndata["coords"], r]
else:
dx = r
if self.config.energy_mult_natoms:
en_out = out * g.num_nodes()
else:
en_out = out
# force calculation based on bond displacement vectors
# autograd gives dE / d{r_{i->j}}
pair_forces = (
self.config.grad_multiplier
* grad(
en_out,
dx,
grad_outputs=torch.ones_like(en_out),
create_graph=True,
retain_graph=True,
)[0]
)
if self.config.force_mult_natoms:
pair_forces *= g.num_nodes()

This snippet of code shows that the automatic differentiation is done with respect to the pairwise distances if using defaults. This means that the forces can only capture 2-body effects but the architecture models 3-body terms.

Using a Trained Model

Dear All,

I successfully trained a model with ALIGNN. There is "pretrained.py" script that import some models according to datasets.

My question is how I can use my trained model to predict a structure? Do I have to modify the "pretrained.py" script to import my model or can I do something like this:

python alignn/pretrained.py --model_name <my_output_pt_file> --file_format poscar --file_path /path/to/sample_file

Thanks for the help.

CIF File non-iterable NoneType object Error

Dear All,

I train ALIGNN with cif files. To improve the performance, I tried to augment my cif files with AugLiChem library. Here is the snippet from the original file and the augmented cif file:

Original file:

data_image0
_chemical_formula_structural       H10C14S2N2O2
_chemical_formula_sum              "H10 C14 S2 N2 O2"
_cell_length_a       4.3258
_cell_length_b       8.982
_cell_length_c       8.4721
_cell_angle_alpha    90
_cell_angle_beta     90.594
_cell_angle_gamma    90

_space_group_name_H-M_alt    "P 1"
_space_group_IT_number       1

loop_
  _space_group_symop_operation_xyz
  'x, y, z'

loop_
  _atom_site_type_symbol
  _atom_site_label
  _atom_site_symmetry_multiplicity
  _atom_site_fract_x
  _atom_site_fract_y
  _atom_site_fract_z
  _atom_site_occupancy
  H   H1        1.0  0.83700  0.83300  0.55800  1.0000
  H   H2        1.0  0.16300  0.33300  0.44200  1.0000
.
.
.
(continues)

Augmented file:

# generated using pymatgen
data_H5C7SNO
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   4.32580000
_cell_length_b   8.98200000
_cell_length_c   8.47210000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.59400000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   1
_chemical_formula_structural   H5C7SNO
_chemical_formula_sum   'H10 C14 S2 N2 O2'
_cell_volume   329.16012678
_cell_formula_units_Z   2
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  H  H0  1  0.83420779  0.82961480  0.54722879  1.0
  H  H1  1  0.15729856  0.32521300  0.43626670  1.0
.
.
.
(continues)

ALIGNN works fine with original cif files but whenever I try to train it with augmented file, I encounter the following error:

Using backend: pytorch
Traceback (most recent call last):
  File "/raid/apps/alignn/2021/bin/train_folder.py", line 195, in <module>
    train_for_folder(
  File "/raid/apps/alignn/2021/bin/train_folder.py", line 103, in train_for_folder
    atoms = Atoms.from_cif(file_path)
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/core/atoms.py", line 537, in from_cif
    cif_atoms = cif_atoms.get_primitive_atoms
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/core/atoms.py", line 710, in get_primitive_atoms
    return Spacegroup3D(self).primitive_atoms
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/analysis/structure/spacegroup.py", line 240, in primitive_atoms
    lattice, scaled_positions, numbers = spglib.find_primitive(
TypeError: cannot unpack non-iterable NoneType object

I can not see a problem in augmented files. Do you have any suggestions?

Best regards,

compute training size learning curves

let's do this in cross validation to address the stability question

strategy: 5x shuffle-split validation scheme to keep things simple. schedule runs using ray tune grid_search

report results for jarvis-55k formation energy and band gap targets.

  • jarvis-55k formation energy
  • jarvis-55k band gap
  • publication-quality plots
  • integrate into manuscript

Python API for web form?

Is the JARVIS-ALIGNN web interface accessible via Python API? I'd like to get predictions for a few dozen POSCAR files without pasting them all in manually.

prediction_results_train_set.csv serialization bug

for single-output regression tasks, the predictions serialized to the datafile prediction_results_train_set.csv in this block of code are not as expected

nominally this code should read predicted and target values from the EpochOutputStore and just write them to css

  • by default this seems to serialize the validation set predictions, not training set predictions as implied by the file name
  • there are two f.write statements, and the target and prediction values are swapped...

phonon spectra from ALIGNN force fields

Hello!
It is probably not a bug report but a question regarding using ALIGNN force fields for calculating phonon spectra of solids. In the entropy_from_FF jupyter notebook there is example showing nice phonon bandstructure calculated based on ALIGNN force fields for Cu, but when I try to use it for other systems e.g. "feg" from the same notebook or some compounds known to be stable, I obtain bandstructures with many nagative branches. This is independent on the supercell size and geometry optimization. So my question - is it inevitably or can I somehow get rid of these negative branches?

Best regards,
Anton.

OMDB Dataset Import Error

Dear All,

First of all, thanks for creating ALIGNN tool.

I am trying to train a model with OMDB dataset to obtain a bandgap prediction. The dataset containes xyz files of molecules and bandgap values of them. It is also included in JARVIS documentation:
https://jarvis-tools.readthedocs.io/en/master/databases.html

I am following the README file on ALIGNN page. I generated my xyz samples from the dataset as it follows without any problem:

`
from jarvis.db.figshare import data as jdata
from jarvis.core.atoms import Atoms

omdbset = jdata("omdb")
prop = "bandgap"

max_samples = 12500
f = open("id_prop.csv", "w")
count = 0
for i in omdbset:
atoms = Atoms.from_dict(i["atoms"])
cod_id = i["cod_id"]
xyz_name = "OMDB-" + cod_id + ".xyz"
target = i[prop]
if target != "na":
atoms.write_xyz(xyz_name)
f.write("%s,%6f\n" % (xyz_name, target))
count += 1
if count == max_samples:
break
f.close()
`

I used the config.json file like you did it in QM9 training. Just the following 2 lines are different:

"dataset": "omdb", "target": "bandgap",

When i tried to run the code like this, it gave the following errors:

python /home/fsysadmin/alignn/alignn/train_folder.py --root_dir "/home/fsysadmin/alignn/omdb_tests/prep_data" --config "/home/fsysadmin/alignn/omdb_tests/prep_data/config.json" --file_format xyz --output_dir=/home/fsysadmin/alignn/omdb_tests/results

Using backend: pytorch Check 1 validation error for TrainingConfig dataset unexpected value; permitted: 'dft_3d', 'jdft_3d-8-18-2021', 'dft_2d', 'megnet', 'megnet2', 'mp_3d_2020', 'qm9', 'qm9_dgl', 'qm9_std_jctc', 'user_data', 'oqmd_3d_no_cfid', 'edos_up', 'edos_pdos', 'qmof', 'hmof', 'hpov', 'pdbbind', 'pdbbind_core' (type=value_error.const; given=omdb; permitted=('dft_3d', 'jdft_3d-8-18-2021', 'dft_2d', 'megnet', 'megnet2', 'mp_3d_2020', 'qm9', 'qm9_dgl', 'qm9_std_jctc', 'user_data', 'oqmd_3d_no_cfid', 'edos_up', 'edos_pdos', 'qmof', 'hmof', 'hpov', 'pdbbind', 'pdbbind_core')) Traceback (most recent call last): File "/home/fsysadmin/alignn/alignn/train_folder.py", line 194, in <module> train_for_folder( File "/home/fsysadmin/alignn/alignn/train_folder.py", line 80, in train_for_folder config.keep_data_order = keep_data_order AttributeError: 'dict' object has no attribute 'keep_data_order'

As i understand from the error output, OMDB dataset is not included into ALIGNN package. Giving the full path of OMDB tar file did not also solve the problem.

How can i include OMDB dataset? There are some scripts that import dataset in alignn repo such as train_all_qm9_jctc.py. Maybe this scripts can be modified to include OMDB.

I appreciate your help.

Best regards,

cross-validation comparison with ReLU and Swish

Benchmark ReLU vs Swish performance and workload -- I ran some informal comparisons between ReLU networks and swish networks this summer, but didn't record computational workload or run the tests in cross-validation. In the one-off comparison study I had done, ReLU and Swish networks gave roughly equivalent performance on JARVIS e_form and e_gap tasks, and we just switched to Swish networks from that point on

Allowing training without test set

The following patch allows training without having a test set, i.e. for cases where the test set is seperate:

diff --git a/alignn/data.py b/alignn/data.py
index 175b915..e70bebc 100644
--- a/alignn/data.py
+++ b/alignn/data.py
@@ -171,8 +171,9 @@ def get_id_train_val_test(
     # full train/val test split
     # ids = ids[::-1]
     id_train = ids[:n_train]
-    id_val = ids[-(n_val + n_test) : -n_test]  # noqa:E203
-    id_test = ids[-n_test:]
+    id_val = ids[-(n_val + n_test) : -n_test]  if n_test > 0 else ids[-(n_val + n_test) :] # noqa:E203
+    id_test = ids[n_test:] if n_test > 0 else []
+
     return id_train, id_val, id_test
 
 
@@ -508,7 +509,7 @@ def get_train_val_loaders(
             classification=classification_threshold is not None,
             output_dir=output_dir,
             tmp_name="test_data",
-        )
+        ) if len(dataset_test) > 0 else None
 
         collate_fn = train_data.collate
         # print("line_graph,line_dih_graph", line_graph, line_dih_graph)
@@ -528,7 +529,7 @@ def get_train_val_loaders(
 
         val_loader = DataLoader(
             val_data,
-            batch_size=batch_size,
+            batch_size=1,
             shuffle=False,
             collate_fn=collate_fn,
             drop_last=True,

AttributeError: module 'dgl' has no attribute 'DGLGraph'

Describe the bug
AttributeError: module 'dgl' has no attribute 'DGLGraph'when input train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp (windows system)
my env
torch 2.0.0
torchaudio 2.1.1+cu118
torchvision 0.16.1+cu118
dgl 1.0.2+cu118
python 3.10.13
image

Order of edge-gated graph convolutions?

Hi,

Thanks for this great library.

I think I noticed a slight inconsistency in the code from the paper. The paper states that the ALIGNN layer first performs the edge-gated graph convolution on the line graph to update the pair and triplet features, and then the pair features are passed as edges to the atomistic/direct graph.

However, when I look at alignn.models.alignn.ALIGNNConv.forward, I see that the edge-gated graph convolution is actually applied on the atomistic/direct graph first, and then the updated pair features are passed as nodes to the line graph. Am I understanding this correctly?

Small mistake in train script

Hello everyone :)

Thanks for great tool! I have found a small error in the "alignn/train.py" script while I was trying to use it for regression.

At line 866 we have:
`

if config.n_early_stopping is not None:
    if classification:
        my_metrics = "accuracy"
    else:
        my_metrics = "mae"

    def default_score_fn(engine):
        score = engine.state.metrics[my_metrics]
        return score

    es_handler = EarlyStopping(
        patience=config.n_early_stopping,
        score_function=default_score_fn,
        trainer=trainer,
    )

`
The problem is that, as stated in the documentation of ignite, "An improvement is considered if the score is higher." In other words, EarlyStopping only checks for increases of a benefit/reward function to be maximized, like "accuracy". A decreasing cost function would trigger earlyStopping after "patience" number of epochs, even if the cost is still decreasing.

To monitor the decrease of a cost function, e.g. "mae", the default_score_fn() could return the negative value of the cost metric, as shown in the ignite documentation example here:
https://pytorch.org/ignite/generated/ignite.handlers.early_stopping.EarlyStopping.html

All the best

FF training does not work with best_model.pt

I am trying to retrain the force fields model on a dataset a created. I have successfully created an "id_prop.json" file and a config.json file as instructed in the README file.

Training from scratch seems to be fine. The following command works as expected and the model starts training:

train_folder_ff.py --root_dir "aliggn-ff-train-data" --config "aliggn-ff-train-data/config.json" --output_dir=temp

However, if I try to use the best_model.pt to restart training using the restart_model_path argument, the model does not train.

Here is an example of how to reproduce the issue:

!train_folder_ff.py --root_dir "aliggn-ff-train-data" --config "aliggn-ff-train-data/config.json" --restart_model_path "/usr/local/lib/python3.10/dist-packages/alignn/ff/best_model.pt" --output_dir=temp

Which gives the following error:
`

fatal: not a git repository (or any of the parent directories): .git
len dataset 12254
Restarting the model training: /usr/local/lib/python3.10/dist-packages/alignn/ff/best_model.pt
Rest config name='alignn_atomwise' alignn_layers=4 gcn_layers=4 atom_input_features=92 edge_input_features=80 triplet_input_features=40 embedding_features=64 hidden_features=256 output_features=1 grad_multiplier=-1 calculate_gradient=True atomwise_output_features=3 graphwise_weight=0.85 gradwise_weight=0.05 stresswise_weight=0.05 atomwise_weight=0.05 link='identity' zero_inflated=False classification=False force_mult_natoms=False include_pos_deriv=False
model ALIGNNAtomWise(
(atom_embedding): MLPLayer(
(layer): Sequential(
(0): Linear(in_features=92, out_features=256, bias=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): SiLU()
)
)
(edge_embedding): Sequential(
(0): RBFExpansion()
(1): MLPLayer(
(layer): Sequential(
(0): Linear(in_features=80, out_features=64, bias=True)
(1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(2): SiLU()
)
)
(2): MLPLayer(
(layer): Sequential(
(0): Linear(in_features=64, out_features=256, bias=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): SiLU()
)
)
)
(angle_embedding): Sequential(
(0): RBFExpansion()
(1): MLPLayer(
(layer): Sequential(
(0): Linear(in_features=40, out_features=64, bias=True)
(1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(2): SiLU()
)
)
(2): MLPLayer(
(layer): Sequential(
(0): Linear(in_features=64, out_features=256, bias=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): SiLU()
)
)
)
(alignn_layers): ModuleList(
(0-3): 4 x ALIGNNConv(
(node_update): EdgeGatedGraphConv(
(src_gate): Linear(in_features=256, out_features=256, bias=True)
(dst_gate): Linear(in_features=256, out_features=256, bias=True)
(edge_gate): Linear(in_features=256, out_features=256, bias=True)
(bn_edges): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(src_update): Linear(in_features=256, out_features=256, bias=True)
(dst_update): Linear(in_features=256, out_features=256, bias=True)
(bn_nodes): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(edge_update): EdgeGatedGraphConv(
(src_gate): Linear(in_features=256, out_features=256, bias=True)
(dst_gate): Linear(in_features=256, out_features=256, bias=True)
(edge_gate): Linear(in_features=256, out_features=256, bias=True)
(bn_edges): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(src_update): Linear(in_features=256, out_features=256, bias=True)
(dst_update): Linear(in_features=256, out_features=256, bias=True)
(bn_nodes): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(gcn_layers): ModuleList(
(0-3): 4 x EdgeGatedGraphConv(
(src_gate): Linear(in_features=256, out_features=256, bias=True)
(dst_gate): Linear(in_features=256, out_features=256, bias=True)
(edge_gate): Linear(in_features=256, out_features=256, bias=True)
(bn_edges): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(src_update): Linear(in_features=256, out_features=256, bias=True)
(dst_update): Linear(in_features=256, out_features=256, bias=True)
(bn_nodes): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(readout): AvgPooling()
(fc_atomwise): Linear(in_features=256, out_features=3, bias=True)
(fc): Linear(in_features=256, out_features=1, bias=True)
)
Traceback (most recent call last):
File "/usr/local/bin/train_folder_ff.py", line 309, in
train_for_folder(
File "/usr/local/bin/train_folder_ff.py", line 239, in train_for_folder
model.load_state_dict(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ALIGNNAtomWise:
Missing key(s) in state_dict: "fc_atomwise.weight", "fc_atomwise.bias".
`
It seems that the saved model does not include weights or biases for fc_atomwise.

A quick dirty fix to the issue is to set these weights and biases to zero and have the model learn these parameters from the dataset instead of training all parameters from scratch.

Here is a simple Python code to establish that:

import torch
from alignn.ff.ff import AlignnAtomwiseCalculator, default_path
from ase.io import read, write

model_path = default_path()
model_checkpoint = torch.load(model_path+'/best_model.pt', map_location=torch.device('cpu'))
duplicated_weight = model_checkpoint['fc.weight'].repeat(3, 1)*0
duplicated_bias = model_checkpoint['fc.bias'].repeat(3)*0
model_checkpoint['fc_atomwise.weight'] = duplicated_weight
model_checkpoint['fc_atomwise.bias'] = duplicated_bias
torch.save(model_checkpoint, 'updated_best_model.pt')
model_checkpoint['fc_atomwise.weight'] = model_checkpoint['fc.weight']
model_checkpoint['fc_atomwise.bias'] = model_checkpoint['fc.weight']

After running the code above. The model can be trained by using the following command:
!train_folder_ff.py --root_dir "aliggn-ff-train-data" --config "aliggn-ff-train-data/config.json" --restart_model_path "updated_best_model.pt" --output_dir=temp

I may be missing something here as well since I am not very well-versed in using pytorch. Please let me know.

Angle Information

Hi @bdecost,

When I wrote the following code based on your code to print the line graph, an incomprehensible part appeared.

In the code below, among (node_i, node_j) based on ij_pair and (nodej_, nodek_) based on jk_pair, I think that (node_j) and (nodej_) should have the same node number, but there are cases where they do not match as a result of executing the code below.

When I referred to your paper, I understood that the angle of the line graph is made based on node_i, node_j, and node_k. Why is the shared node_j not the same?

############################################ CIF file (mp-2500.cif)
'''generated using pymatgen'''
data_AlCu
_symmetry_space_group_name_H-M 'P 1'
_cell_length_a 6.37716407
_cell_length_b 6.37716407
_cell_length_c 6.92031335
_cell_angle_alpha 57.14549155
_cell_angle_beta 57.14549155
_cell_angle_gamma 37.46229262
_symmetry_Int_Tables_number 1
_chemical_formula_structural AlCu
_chemical_formula_sum 'Al5 Cu5'
_cell_volume 140.31041575
cell_formula_units_Z 5
loop

_symmetry_equiv_pos_site_id
symmetry_equiv_pos_as_xyz
1 'x, y, z'
loop

_atom_site_type_symbol
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
Al Al0 1 0.50000000 0.50000000 0.50000000 1
Al Al1 1 0.15622000 0.15622000 0.53856900 1
Al Al2 1 0.84378000 0.84378000 0.46143100 1
Al Al3 1 0.37823100 0.37823100 0.00427500 1
Al Al4 1 0.62176900 0.62176900 0.99572500 1
Cu Cu5 1 0.00000000 0.00000000 0.00000000 1
Cu Cu6 1 0.25794700 0.25794700 0.75941600 1
Cu Cu7 1 0.74205300 0.74205300 0.24058400 1
Cu Cu8 1 0.10895200 0.10895200 0.22813800 1
Cu Cu9 1 0.89104800 0.89104800 0.77186200 1

############################################# Code
import os
import dgl
import numpy as np

import torch

from jarvis.core.atoms import Atoms
from jarvis.core.graphs import Graph

from torch_geometric.data import InMemoryDataset, Data, Batch
from torch_geometric.utils.convert import from_networkx

raw_path = './mp-2500.cif'
crystal = Atoms.from_cif(raw_path, use_cif2cell=False)
coords = crystal.cart_coords
graph = Graph.atom_dgl_multigraph(crystal, cutoff=8.0, atom_features='cgcnn',max_neighbors=12, compute_line_graph=True, use_canonize=False)
for i in [0,1]:
'''Atom-Bond Graph'''
if i==0:
g = from_networkx(dgl.DGLGraph.to_networkx(graph[i], node_attrs=['atom_features'], edge_attrs=['r']))
x = torch.tensor([x.detach().numpy() for x in g.atom_features])
z = torch.tensor(crystal.atomic_numbers)
pos = torch.tensor(coords, dtype=torch.float)
edge_id = g.id
edge_pos = torch.tensor([x.detach().numpy() for x in g.r])
edge_index = g.edge_index
edge_distance = torch.tensor(np.linalg.norm(graph[i].edata['r'], axis=1))
ab_g = Data(x=x, z=z, pos=pos, edge_id=edge_id, edge_index=edge_index, edge_distance=edge_distance, edge_pos=edge_pos, idx=n)
'''Line Graph'''
if i==1:
g = from_networkx(dgl.DGLGraph.to_networkx(graph[i], node_attrs=['r'], edge_attrs=['h']))
x = torch.tensor(np.linalg.norm(graph[i].ndata['r'], axis=1))
pos = torch.tensor([x.detach().numpy() for x in g.r])
edge_id = g.id
edge_index = g.edge_index
edge_angle = g.h
ba_g = Data(x=x, pos=pos, edge_id=edge_id, edge_index=edge_index, edge_angle=edge_angle, idx=n)
dataset = [ab_g, ba_g]

'''dataset[1] = Line Graph'''
'''dataset[0] = Atom-Bond Graph'''
ij_pair = dataset[1].edge_index[0]
jk_pair = dataset[1].edge_index[1]
node_i = dataset[0].edge_index[0][ij_pair]
node_j = dataset[0].edge_index[1][ij_pair]
nodej_ = dataset[0].edge_index[0][jk_pair]
nodek_ = dataset[0].edge_index[1][jk_pair]

################################################################# Result
node_i[0:10], node_j[0:10], nodej_[0:10], nodek_[0:10]

Python version>=3.9?

when I ran the code --python setup.py develop
It turned out that
Processing scipy-1.11.0rc2.tar.gz
Writing /tmp/easy_install-7tv5fusw/scipy-1.11.0rc2/setup.cfg
Running scipy-1.11.0rc2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7tv5fusw/scipy-1.11.0rc2/egg-dist-tmp-8nqf7m51
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/setuptools/sandbox.py", line 152, in save_modules
yield saved
File "/root/miniconda3/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
yield
File "/root/miniconda3/lib/python3.8/site-packages/setuptools/sandbox.py", line 254, in run_setup
_execfile(setup_script, ns)
File "/root/miniconda3/lib/python3.8/site-packages/setuptools/sandbox.py", line 43, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-7tv5fusw/scipy-1.11.0rc2/setup.py", line 33, in
"ase",
RuntimeError: Python version >= 3.9 required.

Running the regression example

  1. There is no train_folder.py inside alignn/scripts folder
  2. Even if you copy it there, the problem still resides with "import alignn.data" because it is not inside that folder
  3. It works if you copy train_folder.py in the root directory, but still has more errors/issues.

User config selection should have priority

Hello,
There are currently two ways to set certain configuration variables such as keep_data_order: 1) as a command-line argument, or 2) in the config.json file. However, the way that the train_folder script is set up, changing these variables in the config.json file has no effect, because they will be overridden by the default value of the command-line argument.

When that happens, the program actually over-writes the original config.json to make it look like the variable was never changed there in the first place. Forgetful users (me) will think that was an oversight on their part, edit the config.json file again as desired, and re-run with the same result several times before figuring out what's going on.

There are a couple of ways to avoid this problem. I assume the simplest would be to remove these variables from config.json, leaving the command-line argument as the only way to change them. Or if it's desirable to have these variables be changed from either direction, I think the config.json should have priority over the default value for the command line argument, and if the user explicitly sets a command line argument which conflicts with config.json they should get an error instead of a silent over-write.

Classification model prediction

Add an example to make predictions with trained model. Something like the following:

from alignn.models.alignn import ALIGNN, ALIGNNConfig
import torch
import pprint
from alignn.config import TrainingConfig
from jarvis.core.atoms import Atoms
from jarvis.core.graphs import Graph
from jarvis.db.jsonutils import dumpjson, loadjson

device = "cpu"
if torch.cuda.is_available():
    device = torch.device("cuda")

filename = "checkpoint_100.pt"
cutoff = 8
max_neighbors = 12
config = loadjson("config.json")
print(pprint.pprint(config))
config = TrainingConfig(**config)
model = ALIGNN(config.model)
model.load_state_dict(torch.load(filename, map_location=device)["model"])
model.to(device)
model.eval()

atoms = Atoms.from_poscar("POSCAR")
g, lg = Graph.atom_dgl_multigraph(
    atoms,
    cutoff=float(cutoff),
    max_neighbors=max_neighbors,
)
out_data = (
    torch.argmax(model([g.to(device), lg.to(device)]))
    .detach()
    .cpu()
    .numpy()
    .flatten()
    .tolist()
)[0]
print("out_data class ", out_data)

Jarvis data

Thank you for your work on the efficient way to predict the ML method for the molecular system.

But, I couldn't reproduce the paper.

I found that Jarvis summarizes QM9 datasets with normalization, and I issued in jarvis.

I tested ALIGNN, but cannot reproduce it for the unnormalized QM9 datasets.
Only the normalized QM9 dataset provided by Jarvis works to reproduce prediction values in paper.

clean up train.py for clarity and flexibility

we've ended up with a lot of complex logic in the main training script -- I think we should consider moving ignite event handler definitions out of the main train_dgl body and register handlers with add_event_handler instead of the @on decorator
trainer.add_event_handler(Events.STARTED, lambda _: print("Start training"))

Pytorch version of the code

Is it possible to have the torch version of the code in place of torch-ignite? I want to go deep dive into the actual code and want to make some changes in loss function. I have tried to replace the trainer.run(train_loader, max_epochs=config.epochs) with simple iterative torch version of it as follows :

for epoch in range(config.epochs):
for batch in train_loader:
net = net.train()
optimizer.zero_grad()
graph, line_graph, target = batch
if torch.cuda.is_available():
graph = graph.to(device, non_blocking=True)
line_graph = line_graph.to(device, non_blocking=True)
target = target.to(device, non_blocking=True)
g=(graph,line_graph)
output = net(g)
loss = criterion(output, target)
mae_error = mae(output.data.cpu(), target.cpu())
loss.backward()
optimizer.step()

But I am not able to reproduce the result. Could you please help me to resolve the issue?

use_canonize for inference through pretrained.py

pretrained.py calls following :
g, lg = Graph.atom_dgl_multigraph(
atoms,
cutoff=float(cutoff),
max_neighbors=max_neighbors,
)

This uses the default value of use_canonize = False, which is not necessarly the value which was used for training. From my testing, changing this value for inference greatly influences results (tested on jv_formation_energy_peratom_alignn with POSCAR files from the sample_data folder).

Typically, running the following : python pretrained.py --model_name jv_formation_energy_peratom_alignn --file_format poscar --file_path .\examples\sample_data\POSCAR-JVASP-107772.vasp
gives 0.003147430717945099 if use_canonize = False (default behaviour).
gives -5.2578747272491455e-05 if use_canonize = True

This difference is not huge but seems to be greatly increased when batching mutliple files for inference (my own implementation).

From these results, my guess is that one would want this variable to be stored in the model parameters, rather than called upon training/inference.

Are these normal results ? What is the physical/mathematical meaning of this variable ?

OMDB Dataset Performance

Dear All,

I am using ALIGNN model to train OMDB dataset and trying to improve results by adjusting hyperparameters. But I have not achieved good results yet (one of the reasons is that my test molecules are a little bigger than the dataset)

In README.md file, OMDB results say "coming soon". Do you have any train results for this? I would like to compare your result and test the trained model on my own molecules.

Best regards,

train_folder_ff does not utilize GPU

I am trying to train a force fields model by using a variation of the following command that is mentioned in the readme to match my directories:

train_folder_ff.py --root_dir "alignn/examples/sample_data_ff" --config "alignn/examples/sample_data_ff/config_example_atomwise.json" --output_dir=temp
However, training is super slow and does not seem to utilize the GPU at all. This can be further confirmed by running nvidia-smi and viewing the output during training:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.43.02 Driver Version: 535.43.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:07:00.0 Off | N/A |
| 0% 42C P8 13W / 170W | 71MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1405 G /usr/lib/xorg/Xorg 56MiB |
| 0 N/A N/A 1571 G /usr/bin/gnome-shell 5MiB |
+---------------------------------------------------------------------------------------+





If I am training a model that does not utilize force fields, the GPU is used.
For example, running train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp and simultanously running nvidia-smi gives the following output:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.43.02 Driver Version: 535.43.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:07:00.0 Off | N/A |
| 0% 46C P2 62W / 170W | 921MiB / 12288MiB | 39% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1405 G /usr/lib/xorg/Xorg 56MiB |
| 0 N/A N/A 1571 G /usr/bin/gnome-shell 5MiB |
| 0 N/A N/A 29095 C .../miniconda3/envs/version/bin/python 848MiB |
+---------------------------------------------------------------------------------------+

I have done my best to check that all the dependencies are compatible and I can confirm that the device is switched to cuda in the train_folder_ff.py script.

ValueError: Input data has zero size. Please provide non-empty data

Engine run is terminating due to exception: Input data has zero size. Please provide non-empty data
Traceback (most recent call last):
File "/work/somee/MD_HIT_structure/alignn/alignn/train_folder.py", line 214, in
train_for_folder(
File "/work/somee/MD_HIT_structure/alignn/alignn/train_folder.py", line 197, in train_for_folder
train_dgl(
File "/work/somee/alignn/alignn/train.py", line 961, in train_dgl
trainer.run(train_loader, max_epochs=config.epochs)
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 900, in run
return self._internal_run()
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 943, in _internal_run
return next(self._internal_run_generator)
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 1001, in _internal_run_as_gen
self._handle_exception(e)
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 639, in _handle_exception
raise e
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 973, in _internal_run_as_gen
self._fire_event(Events.EPOCH_COMPLETED)
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 426, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "/work/somee/alignn/alignn/train.py", line 881, in log_results
evaluator.run(val_loader)
File "/work/somee/alignn_env/lib/python3.10/site-packages/ignite/engine/engine.py", line 861, in run
raise ValueError("Input data has zero size. Please provide non-empty data")
ValueError: Input data has zero size. Please provide non-empty data

Fix version logging for non-local installations

current version logging assumes that code is running from a working directory under the alignn repository, and logs the current commit hash to the full configuration datafile.

to support use of ALIGNN in separate repositories, or installation from pypi, we should check for alignn.version and maybe log that instead. Is there a clean way to determine whether code is running from a local clone of the alignn repo vs some other project?

Graph construction for large-unit cell crystals (e.g., MOFs)

Hello,

I'm currently utilizing ALIGNN for property prediction of MOFs, but I'm encountering significant delays in the process due to the time-consuming steps of reading CIF files, processing them, and constructing graphs. Is there any approach to expedite this procedure? I'm aware that ALIGNN has been trained with approximately 137,000 MOFs for CO2 capture performance. I'm curious to know how you managed such a substantial quantity of MOFs. Additionally, is it possible to save the constructed graphs for future use, avoiding the need to recreate them during each training session?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.