Git Product home page Git Product logo

cgcnn's Introduction

Crystal Graph Convolutional Neural Networks

This software package implements the Crystal Graph Convolutional Neural Networks (CGCNN) that takes an arbitary crystal structure to predict material properties.

The package provides two major functions:

  • Train a CGCNN model with a customized dataset.
  • Predict material properties of new crystals with a pre-trained CGCNN model.

The following paper describes the details of the CGCNN framework:

Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties

Table of Contents

How to cite

Please cite the following work if you want to use CGCNN.

@article{PhysRevLett.120.145301,
  title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
  author = {Xie, Tian and Grossman, Jeffrey C.},
  journal = {Phys. Rev. Lett.},
  volume = {120},
  issue = {14},
  pages = {145301},
  numpages = {6},
  year = {2018},
  month = {Apr},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevLett.120.145301},
  url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}

Prerequisites

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

conda upgrade conda
conda create -n cgcnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge

*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.

This creates a conda environment for running CGCNN. Before using CGCNN, activate the environment by:

source activate cgcnn

Then, in directory cgcnn, you can test if all the prerequisites are installed properly by running:

python main.py -h
python predict.py -h

This should display the help messages for main.py and predict.py. If you find no error messages, it means that the prerequisites are installed properly.

After you finished using CGCNN, exit the environment by:

source deactivate

Usage

Define a customized dataset

To input crystal structures to CGCNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target properties for each crystal (not needed for predicting, but you need to put some random numbers in id_prop.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. id_prop.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the value of target property. If you want to predict material properties with predict.py, you can put any number in the second column. (The second column is still needed.)

  2. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  3. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.

The structure of the root_dir should be:

root_dir
├── id_prop.csv
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There are two examples of customized datasets in the repository: data/sample-regression for regression and data/sample-classification for classification.

For advanced PyTorch users

The above method of creating a customized dataset uses the CIFData class in cgcnn.data. If you want a more flexible way to input crystal structures, PyTorch has a great Tutorial for writing your own dataset class.

Train a CGCNN model

Before training a new CGCNN model, you will need to:

Then, in directory cgcnn, you can train a CGCNN model for your customized dataset by:

python main.py root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/sample-regression has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/sample-regression

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/sample-regression

You can also train a classification model with label --task classification. For instance, you can use data/sample-classification by:

python main.py --task classification --train-size 5 --val-size 2 --test-size 3 data/sample-classification

After training, you will get three files in cgcnn directory.

  • model_best.pth.tar: stores the CGCNN model with the best validation accuracy.
  • checkpoint.pth.tar: stores the CGCNN model at the last epoch.
  • test_results.csv: stores the ID, target value, and predicted value for each crystal in test set.

Predict material properties with a pre-trained CGCNN model

Before predicting the material properties, you will need to:

Then, in directory cgcnn, you can predict the properties of the crystals in root_dir:

python predict.py pre-trained.pth.tar root_dir

For instace, you can predict the formation energies of the crystals in data/sample-regression:

python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression

And you can also predict if the crystals in data/sample-classification are metal (1) or semiconductors (0):

python predict.py pre-trained/semi-metal-classification.pth.tar data/sample-classification

Note that for classification, the predicted values in test_results.csv is a probability between 0 and 1 that the crystal can be classified as 1 (metal in the above example).

After predicting, you will get one file in cgcnn directory:

  • test_results.csv: stores the ID, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset in id_prop.csv, which is not important.

Data

To reproduce our paper, you can download the corresponding datasets following the instruction.

Authors

This software was primarily written by Tian Xie who was advised by Prof. Jeffrey Grossman.

License

CGCNN is released under the MIT License.

cgcnn's People

Contributors

tanyaadams1 avatar txie-93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cgcnn's Issues

What does 'target' means in id_prop.csv??

Hi. Im currently researching about your cgcnn paper and implementation of it. I have reviewed your entire cgcnn code and now im trying to run my own custom code with custom dataset.
But I don't understand meaning of 'target' in data/sample-regression/id_prop.csv.
Ten target values(1.0, 2.0, ... 10.0) are attached to each material-id(unique crystal number). what is it exactly?
I hope your explanation.

Thank you!

Atom features specifics

Your work is great! I try to reproduce your work. And I want to know what the nine attributes used in atom.init are.
Thank you!

tensor size error

python main.py --train-size 30000 --val-size 10000 --test-size 10000 my_test/mp_test/
Epoch: [0][0/118]	Time 22.182 (22.182)	Data 10.518 (10.518)	Loss 1.0705 (1.0705)	MAE 1.340 (1.340)
Epoch: [0][10/118]	Time 17.675 (19.742)	Data 10.263 (11.007)	Loss 1.0962 (0.9969)	MAE 1.322 (1.234)
Epoch: [0][20/118]	Time 17.326 (19.752)	Data 9.870 (11.245)	Loss 0.7512 (0.9568)	MAE 1.034 (1.198)
Traceback (most recent call last):
  File "main.py", line 488, in <module>
    main()
  File "main.py", line 157, in main
    train(train_loader, model, criterion, optimizer, epoch, normalizer)
  File "main.py", line 208, in train
    for i, (input, target, _) in enumerate(train_loader):
  File "/home/jjs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/jjs/gitdir/cgcnn/cgcnn/data.py", line 130, in collate_pool
    torch.cat(batch_nbr_fea, dim=0),
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 12 and 7 in dimension 1 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

I get mp structure by something like this:

from pymatgen.ext.matproj import MPRester
with MPRester(api_key) as m:
    d = m.get_data(mp_id)
    with open(cif_file,'w') as f:
        f.write(d[0]['cif'])

How to understand output to generate crystal graphs without training model

Good morning!

I was wondering if there is an easy way to generate the crystal graphs for visualization with this code. While I would like to train a model later, I currently just want to visualize my dataset, and I'm finding it particularly hard with this code.

Thank you, and wonderful project! Everything runs smoothly with essentially no error.

Vanessa

About checkpoint.pth.tar

Could I delete checkpoint.pth.tar when pre-trained is completed?
And, could I rollback my pre-trained model by specifying the checkpoint.pth.tar?

atom_init.json

I was inspecting atom_init.json file and I found something strange in comparison to the documentation of atom features in the table S2 (Supplemental Material).

  • Since every feature is one hot encoded, length of the vector should be 93 according to the table, but it is actually 92.

  • Also Group number feature has 18 categories in the table, but in the file it has 19. Actinide and lanthanide elements are presented by having 1 at element with index 0, except Lutetium which is labeled as group number 3. Other elements have an element equal to 1 on the same index as their group number, e.g. for hydrogen, element at index number 1 is 1.

  • Regarding the Period number feature it says it has 9 categories, but in the file it has 7 since lathanide and actinide elements are described with period 6 and 7 respectively and not 8 and 9 as stated in the table S2.

  • Could you also explain why hydrogen's electronegativity is in the bin [1.9 - 2.25) instead of [2.25-2.6) according to Sanderson electronegativity, i.e. 2.59.

CGCNN can't run on certain CIF files

Hello,

Great work with the project. I have been using CGCNN for perovskites, with the CIF files taken from ICSD database. However, I found out that it only works with certain CIF files, but not the others. Does it only take a certain "type" of CIF files? If so, how do I change the CIF files which don't work to the format that can be inputted into CGCNN?

As a reference, here are some of the ICSD collection codes:

  • Working: 110630, 243861, 252316, 255885, 255886, 432089
  • Not working: 252317, 252318, 252319, 252320, 254340, 254341, 434118, 432090, 254288

The error that comes out:

Traceback (most recent call last):
  File "main.py", line 513, in <module>
    main()
  File "main.py", line 119, in main
    sample_data_list = [dataset[i] for i in range(len(dataset))]
  File "main.py", line 119, in <listcomp>
    sample_data_list = [dataset[i] for i in range(len(dataset))]
  File "/run/user/1000/gvfs/user/cgcnn-ubuntu/cgcnn-master/cgcnn/data.py", line 326, in __getitem__
    print("Specie number: ",crystal[0].specie.number)
  File "/home/user/miniconda3/envs/cgcnn/lib/python3.7/site-packages/pymatgen/core/sites.py", line 79, in __getattr__
    raise AttributeError(a)
AttributeError: specie

Thank you!

Size mismatch when loading pre-trained models

Hi,

When I try to load pre-trained models to test predict.py, I was noticed as follows:

python predict.py pre-trained/final-energy-per-atom.pth.tar mp/
=> loading model params 'pre-trained/final-energy-per-atom.pth.tar'
=> loaded model params 'pre-trained/final-energy-per-atom.pth.tar'
=> loading model 'pre-trained/final-energy-per-atom.pth.tar'
Traceback (most recent call last):
File "E:\cgcnn-master\predict.py", line 298, in
main()
File "E:\cgcnn-master\predict.py", line 94, in main
model.load_state_dict(checkpoint['state_dict'])
File "C:\ProgramData\Anaconda3\envs\cgcnn1\lib\site-packages\torch\nn\modules\module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CrystalGraphConvNet:
size mismatch for convs.0.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.1.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.2.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.3.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).

btw, then I tried to train my own model and use it to predict. The errors above didn't show up, but I got a TOO large MAE.

(cgcnn) E:\cgcnn-master>python predict.py E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar mp/
=> loading model params 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loaded model params 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loading model 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loaded model 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar' (epoch 484, validation 0.05862389877438545)
C:\ProgramData\Anaconda3\envs\cgcnn\lib\site-packages\pymatgen\io\cif.py:1155: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Test: [0/74] Time 26.633 (26.633) Loss inf (inf) MAE 5.977 (5.977)
Test: [10/74] Time 24.787 (27.052) Loss inf (inf) MAE 6.005 (6.013)
Test: [20/74] Time 28.383 (28.096) Loss inf (inf) MAE 5.941 (6.010)
Test: [30/74] Time 31.305 (28.518) Loss inf (inf) MAE 6.081 (6.008)
Test: [40/74] Time 30.491 (29.037) Loss inf (inf) MAE 5.860 (6.010)
Test: [50/74] Time 35.822 (29.651) Loss inf (inf) MAE 6.035 (6.008)
Test: [60/74] Time 33.488 (30.191) Loss inf (inf) MAE 6.033 (6.012)
Test: [70/74] Time 34.823 (30.565) Loss inf (inf) MAE 5.955 (6.008)
** MAE 6.009

Thanks for your attention!

Issues with installation

Hey I am having issues installing cgcnn on my terminal. Following the steps as mentioned in the README I am unable to run the python main.py files.

One problem is regarding pytorch:
File "main.py", line 213
input_var = (Variable(input[0].cuda(async=True)),
^
SyntaxError: invalid syntax

While redoing this in a different python env, I cannot install cgcnn module from anywhere. The error I get
ModuleNotFoundError: No module named 'cgcnn'

How exactly should I go about this? Thanks!

MAE for predicting and training shear modulus and bulk modulus is very large

I found that the MAE I obtained when training the model with these two attributes actually reached more than 100! I don’t know if it’s a problem with the material I picked. I used the k_vrh and g_vrh of the material in the material project as the bulk modulus and shear modulus. I think there is no problem with my data set. Here is my training situation:

2IEV8IU_LBDZ0$}JKH 5

why the cgcnn model cannot predict graphene and diamond?

I have trained cgcnn model by using material-project ids in "cgcnn/data/material-data/mp-ids-46744.csv" and I have tested the model to predict "energy_per_atom"(target value of materials in material-project) of the "graphene" and "diamond".

Both materials are made of one element "C(carbon)" but they have different structures. so the model must be able to distinguish the difference between them. but the result is not good. The model fails to predict similar value of target(energy_per_atom).

the following is that capture

capture

In the above image, the target values of graphene and diamond are -9.0904, -9.2203 respectively. but the prediction values are -1.7248, -1.7681 respectively. (normalized target values are both 0.7071)

so, I want to know the reason why the trained cgcnn model cannot predict the target value of graphene and diamond?

I expect your kind explanation. Thank you.

Question about fetching Materials Project Data

Hi,

Thank you for your great work! I am trying to fetch the data from the Materials Project database based on the mp-ids that you provided. I am wondering if the mp-ids that you provided are materials id or task id?

I tried

MPRester().get_structure_by_material_id(id)

but a lot of the ids in your csv files return void responses. Then I tried

mid = get_materials_id_from_task_id(id)
structure = get_structure_by_material_id(mid)

and it worked. I want to ask if this is the correct way of fetching the dataset.

Thank you

Question: How to make dataset?

Hi

I have questions about datasets.

Materials Project database and API had changed a lot from when this repository started, and I can’t get the whole dataset written in the “cgcnn/data/material/mp-ids-○○.csv” file.

In CGCNN paper(https://arxiv.org/abs/1710.10324 ), it is written that

After removing ill-converged crystals, the full database has 46744 materials covering 87 elements, ...

My questions are as follows.

  • How to collect “mp-ids-27430", “mp-ids-46744”, “mp-ids-3402" like dataset?
  • What is the mean of ill-converged crystal ?

Thanks,

Description of crys_fea in model.py

Hi,

What is crys_fea? I can't seem to find a description of it in the comments of model.py. Based on your suggestion in #21, it seems like this is the intermediate layer we're interested in. I assume it's the crystal feature vector. If that's the case, any help on what that actually means? Is there a place in the CGCNN paper that I should refer to?

Thank you,

Sterling

@mliu7051

What to do with negative values of Bulk Modulus and Shear Modulus?

Hi Tian
When we set up the model for properties like bulk modulus and shear modulus, we take into account the log10(GPa) values. For some crystals, the values of bulk and shear moduli come out to be negative. The question is how do we take the logarithm? Do we take the absolute values before calculating the logarithm? Or do we take only those crystals which have positive values for these properties?
I will be grateful to you if you can answer.
Thanks

Bulid the element vector

image
Sorry to disturb you, I wonder that when i build the file atom_init.json, i notice that every element key-value pair have so much value.
If i need to fill it all? Or i only need to change The first few values?

Issue with path assertion for root_dir

Hi!
I've been trying to get this running but it does not seem to be able to find my root_dir. i've just been trying to have it run a regression and using this line exactly:

(cgcnn) blake@blake-desktop:~/cgcnn$ python main.py --disable-cuda --train-size 250 -- val-size 25 --test-size 20 root_dir
Traceback (most recent call last):
File "main.py", line 488, in
main()
File "main.py", line 85, in main
dataset = CIFData(*args.data_options)
File "/home/blake/cgcnn/cgcnn/data.py", line 288, in init
assert os.path.exists(root_dir), 'root_dir does not exist!'
AssertionError: root_dir does not exist!
I've tried having the the root_dir directory in the cgcnn folder and outside of it in my home directory but both have resulted in the same error. If it matters or helps, I'm running Ubuntu 18.10 cosmic cuttlefish.

Problem with importing MPDataRetrieval

ImportError Traceback (most recent call last)

in ()
1 get_ipython().system('sudo pip3 install scipy')
----> 2 from matminer.data_retrieval.retrieve_MP import MPDataRetrieval
3
4 def data_query(mp_api_key, max_elms=3, min_elms=3, max_sites=20, include_te=False):
5 """

4 frames

/usr/local/lib/python3.7/dist-packages/scipy/cluster/vq.py in ()
68 import numpy as np
69 from collections import deque
---> 70 from scipy._lib._util import _asarray_validated, check_random_state,
71 rng_integers
72 from scipy.spatial.distance import cdist

ImportError: cannot import name 'rng_integers' from 'scipy._lib._util' (/usr/local/lib/python3.7/dist-packages/scipy/_lib/_util.py)

The setting of hyperparameters to get the best performance

Hi Tian:
I have tried to reproduce your result like "MAE_model = 0.039" using your dataset.

The best MAE’s we achieved with Eq. 4 and Eq. 5 are 0.136 eV/atom and 0.039 eV/atom

But it is hard to set the hyperparameters correctly. Could you please share your settings of hypermeters?

Thank you and looking forward to your reply.

Pytorch version

Are there any plans to make CGCNN compatible with Pytorch 0.4+ (or even v1.0+)?

I ask because it seems like on some linux distributions (e.g. computing clusters) the required version is not available:

ardunn@n0001:~$ pip install torch==0.3.1
Collecting torch==0.3.1
  Could not find a version that satisfies the requirement torch== (from versions: 0.1.2, 0.1.2.post1, 0.4.1, 0.4.1.post2, 1.0.0)
No matching distribution found for torch==0.3.1

Whereas on other versions (e.g., Mac), the version is available.

installation problem

Hi,
I was trying to create the env as per your README:
conda create -n cgcnn python=3.7 scikit-learn pytorch=1.0.0 torchvision pymatgen -c pytorch -c matsci

I got this following error, I am using Fedora30 and Anaconda is updated to the newest version.

LinkError: pre-link script failed for package matsci::pydispatcher-2.0.5-py_0
location of failed script: /home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin/.pydispatcher-pre-link.sh
==> script messages <==

==> script output <==
stdout:
stderr: + unset _mlshdbg

  • '[' 0 = 1 ']'
  • export MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH
  • MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH
  • unset _mlre _mlIFS
  • '[' -n x ']'
  • _mlIFS='
    '
  • IFS=' '
  • for _mlv in ${MODULES_RUN_QUARANTINE:-}
  • '[' LD_LIBRARY_PATH = LD_LIBRARY_PATH -a LD_LIBRARY_PATH = LD_LIBRARY_PATH ']'
    ++ eval 'echo ${LD_LIBRARY_PATH+x}'
    +++ echo x
  • '[' -n x ']'
    ++ eval 'echo ${LD_LIBRARY_PATH}'
    +++ echo /usr/lib64/mpich/lib
  • _mlre='LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' '
  • _mlrv=MODULES_RUNENV_LD_LIBRARY_PATH
    ++ eval 'echo ${MODULES_RUNENV_LD_LIBRARY_PATH:-}'
    +++ echo
  • _mlre='LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' '
  • '[' -n 'LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' ' ']'
  • _mlre='eval LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' '
    ++ eval 'LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib'''' 'LD_LIBRARY_PATH=''''''' /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash autoinit
    +++ LD_LIBRARY_PATH_modquar=/usr/lib64/mpich/lib
    +++ LD_LIBRARY_PATH=
    +++ /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash autoinit
  • _mlcode='module() {
    unset _mlshdbg;
    if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '''1''' ]; then
    case "$-" in
    vx*) set +vx; _mlshdbg='''vx''' ;;
    v) set +v; _mlshdbg='''v''' ;;
    x) set +x; mlshdbg='''x''' ;;
    ) _mlshdbg='''''' ;;
    esac;
    fi;
    unset _mlre _mlIFS;
    if [ -n "${IFS+x}" ]; then
    _mlIFS=$IFS;
    fi;
    IFS=''' ''';
    for _mlv in ${MODULES_RUN_QUARANTINE:-}; do
    if [ "${_mlv}" = "${_mlv##
    [!A-Za-z0-9
    ]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
    if [ -n "eval '\''echo ${'\''$_mlv'\''+x}'\''" ]; then
    _mlre="${_mlre:-}${_mlv}_modquar='''eval '\''echo ${'\''$_mlv'\''}'\''''' ";
    fi;
    mlrv="MODULES_RUNENV${_mlv}";
    _mlre="${_mlre:-}${_mlv}='''eval '\''echo ${'\''$_mlrv'\'':-}'\''''' ";
    fi;
    done;
    if [ -n "${_mlre:-}" ]; then
    eval eval ${_mlre}/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '\''"$@"'\'';
    else
    eval /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@";
    fi;
    _mlstatus=$?;
    if [ -n "${_mlIFS+x}" ]; then
    IFS=$_mlIFS;
    else
    unset IFS;
    fi;
    unset _mlre _mlv _mlrv _mlIFS;
    if [ -n "${_mlshdbg:-}" ]; then
    set -$_mlshdbg;
    fi;
    unset _mlshdbg;
    return $_mlstatus;
    };
    MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl; export MODULES_CMD;
    MODULESHOME=/usr/share/Modules; export MODULESHOME;
    test 0;'
  • _mlret=0
  • '[' -n x ']'
  • IFS='
    '
  • unset _mlIFS
  • unset _mlre _mlv _mlrv
  • '[' 0 -eq 0 ']'
  • eval 'module() {
    unset _mlshdbg;
    if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '''1''' ]; then
    case "$-" in
    vx*) set +vx; _mlshdbg='''vx''' ;;
    v) set +v; _mlshdbg='''v''' ;;
    x) set +x; mlshdbg='''x''' ;;
    ) _mlshdbg='''''' ;;
    esac;
    fi;
    unset _mlre _mlIFS;
    if [ -n "${IFS+x}" ]; then
    _mlIFS=$IFS;
    fi;
    IFS=''' ''';
    for _mlv in ${MODULES_RUN_QUARANTINE:-}; do
    if [ "${_mlv}" = "${_mlv##
    [!A-Za-z0-9
    ]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
    if [ -n "eval '\''echo ${'\''$_mlv'\''+x}'\''" ]; then
    _mlre="${_mlre:-}${_mlv}_modquar='''eval '\''echo ${'\''$_mlv'\''}'\''''' ";
    fi;
    mlrv="MODULES_RUNENV${_mlv}";
    _mlre="${_mlre:-}${_mlv}='''eval '\''echo ${'\''$_mlrv'\'':-}'\''''' ";
    fi;
    done;
    if [ -n "${_mlre:-}" ]; then
    eval eval ${_mlre}/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '\''"$@"'\'';
    else
    eval /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@";
    fi;
    _mlstatus=$?;
    if [ -n "${_mlIFS+x}" ]; then
    IFS=$_mlIFS;
    else
    unset IFS;
    fi;
    unset _mlre _mlv _mlrv _mlIFS;
    if [ -n "${_mlshdbg:-}" ]; then
    set -$_mlshdbg;
    fi;
    unset _mlshdbg;
    return $_mlstatus;
    };
    MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl; export MODULES_CMD;
    MODULESHOME=/usr/share/Modules; export MODULESHOME;
    test 0;'
    ++ MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl
    ++ export MODULES_CMD
    ++ MODULESHOME=/usr/share/Modules
    ++ export MODULESHOME
    ++ test 0
  • '[' 0 = 1 ']'
  • '[' -t 2 ']'
  • export -f module
  • export -f switchml
  • ENV=/usr/share/Modules/init/profile.sh
  • export ENV
  • BASH_ENV=/usr/share/Modules/init/bash
  • export BASH_ENV
  • '[' 5 -ge 3 ']'
  • [[ hxB =~ i ]]
  • [[ ! :/home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin:/home/shazia/anaconda3/bin:/home/shazia/anaconda3/bin:/home/shazia/anaconda3/condabin:/home/shazia/.local/bin:/home/shazia/bin:/usr/lib64/mpich/bin:/usr/share/Modules/bin:/usr/lib64ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin: =~ :/usr/share/Modules/bin: ]]
  • '[' '!' -n x ']'
    ++ manpath
  • [[ ! :/usr/share/man/mpich-x86_64::/home/shazia/anaconda3/man:/usr/local/share/man:/usr/share/man: =~ :/usr/share/man: ]]
  • unset _mlcode _mlret
  • '[' -n '' ']'
  • /home/shazia/anaconda3/envs/cgcnn/bin/python /home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/link.py
    /home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin/.pydispatcher-pre-link.sh: line 2: /home/shazia/anaconda3/envs/cgcnn/bin/python: No such file or directory

return code: 127

Predict at atom level

Hello,

Thank you so much for this project, very interesting!

I am trying to use this package to do a regression prediction to the atom level instead of predicting a property for the whole crystal. I am trying to input the atom labels but struggling to change the data.py script accordingly... what would be the easiest way to input the atom labels?

And I guess I need to output the atom prediction before the pooling? Does this correspond to return out of def forward(): in model.py?

Thanks!
Marta

Issues about predication for example

Hey,i have installed cgcnn and run predict.py successfully.
As followed by readme, i tried "python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression". However, the MAE predication was so huge, which indicates bad accuracy of prediction.
In my understanding,i can simply use pre-trained model to predict examples in /data to test cgcnn, but the results confused me. So am I doing something wrong at some step?

Loading data issue with certain CIF files

Describe the issue

CGCNN cannot accept as input CIFs that PeriodicSite consists of multiple elements.

To Reproduce

from pymatgen.core.structure import Structure

crystal = Structure.from_file("1005004.cif")
crystal.replace_species({"Fe2+": {"Fe2+": 0.5, "Mg2+": 0.5}})
crystal.to(fmt='cif', filename="./data/sample-classification/1005004_multiple.cif")

"1005004.cif" can be downloaded at http://www.crystallography.net/cod/result.php?journal=Inorganic%20Chemistry

Expected behavior
CIFs that PeriodicSite consists of multiple elements should also be accepted as input.
(A number of such CIFs are included in the ICSD database)

Error message

$ python predict.py pre-trained/semi-metal-classification.pth.tar data/sample-classification

=> loading model params 'pre-trained/semi-metal-classification.pth.tar'
=> loaded model params 'pre-trained/semi-metal-classification.pth.tar'
=> loading model 'pre-trained/semi-metal-classification.pth.tar'
=> loaded model 'pre-trained/semi-metal-classification.pth.tar' (epoch 92, validation 0.958599858597622)
/python3.8/site-packages/pymatgen/io/cif.py:1155: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Traceback (most recent call last):
  File "predict.py", line 299, in <module>
    main()
  File "predict.py", line 102, in main
    validate(test_loader, model, criterion, normalizer, test=True)
  File "predict.py", line 125, in validate
    for i, (input, target, batch_cif_ids) in enumerate(val_loader):
  File "/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/python3.8/site-packages/torch/utils/data/dataloader.py", line 692, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/cgcnn/cgcnn/data.py", line 325, in __getitem__
    atom_fea = np.vstack([self.ari.get_atom_fea(crystal[i].specie.number)
  File "/cgcnn/cgcnn/data.py", line 325, in <listcomp>
    atom_fea = np.vstack([self.ari.get_atom_fea(crystal[i].specie.number)
  File "python3.8/site-packages/pymatgen/core/sites.py", line 79, in __getattr__
    raise AttributeError(a)
AttributeError: specie

Questions about recreating a data set

hi,天哥
I have the following questions:
(1)Why are the data sets used to predict band gaps and formation energies different and quite different?
(2)If I want to make a new data set, what criteria should I use to obtain some materials from MP?
(3)How to remove the ill-converged crystals mentioned in your article, or by what standards?
I hope da lao can help me answer it, thank you!

Error raised while testing the code in README

I configured the code in both docker and real environment, and there was not any error duIring excuting the training part. But for the predicting part, it raised the following error:

root@ef1b695a4d41:/workspace/cgcnn# python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression
=> loading model params 'pre-trained/formation-energy-per-atom.pth.tar'
=> loaded model params 'pre-trained/formation-energy-per-atom.pth.tar' 
=> loading model 'pre-trained/formation-energy-per-atom.pth.tar'
=> loaded model 'pre-trained/formation-energy-per-atom.pth.tar' (epoch 968, validation 0.03972001800748568)
/opt/conda/lib/python3.6/site-packages/pymatgen/io/cif.py:1107: UserWarning: Issues encountered while parsing CIF:
  warnings.warn("Issues encountered while parsing CIF:")
/opt/conda/lib/python3.6/site-packages/pymatgen/io/cif.py:1109: UserWarning: Some fractional co-ordinates rounded to ideal values to avoid finite precision errors.
  warnings.warn(error)
predict.py:131: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_var = (Variable(input[0].cuda(non_blocking=True), volatile=True),
predict.py:132: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  Variable(input[1].cuda(non_blocking=True), volatile=True),
predict.py:146: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  volatile=True)
Traceback (most recent call last):
  File "predict.py", line 302, in <module>
    main()
  File "predict.py", line 106, in main
    validate(test_loader, model, criterion, normalizer, test=True)
  File "predict.py", line 157, in validate
    losses.update(loss.data.cpu()[0], target.size(0))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I used PyTorch 1.1 with CUDA10 in docker, while PyTorch 1.0 with CUDA 9.0 in real environment. The error was raised in both environment. So how could I deal with it?

Thanks!

Description for Materials Project .csv files

Hi Tian,

Trying to figure out what each of the mp-ids-*.csv files correspond to.

After removing ill-converged crystals, the full database has 46744 materials covering 87 elements, 7 lattice systems, and 216 space groups.
The database [34] we use includes the energy above hull of 18928 perovskite crystals
Figures 2(b) and 2(c) show the performance of the two models on 9350 test crystals

Table 1 has # train data with values of 28046, 16458, and 2041.

So "46744" seems pretty straightforward mp-ids-46744.csv

But what do mp-ids-3402.csv and mp-ids-27430.csv correspond to in the paper?

Thanks!

Sterling

On understanding the AtomInitializer class

The documentation says that the AtomInitializer class is the "Base class for intializing the vector representation for atoms." I'm trying to understand the reasoning behind the word "initializing", as this representation doesn't appear to be changing during the training (starting from an initial state). So, is the representation really a static one?

How to bind cgcnn for matminer?

Hello

I'm looking at the matminer documentation which asserts that it's cgcnn featurizer requires cgcnn with python bindings. How would I install cgcnn as a package rather than launch it from main.py?

Multiclass classification

Thank you for making this tool! I am running into an issue when I run python main.py --task classification:

/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/pymatgen/io/cif.py:1134: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Traceback (most recent call last):
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 513, in <module>
    main()
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 175, in main
    train(train_loader, model, criterion, optimizer, epoch, normalizer)
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 252, in train
    loss = criterion(output, target_var)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/modules/loss.py", line 216, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/functional.py", line 2704, in nll_loss
    return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Target 2 is out of bounds.

Have I made an error in setting up the customized dataset? Or does the code not currently support multiclass classification? The entries of my id_prop.csv look like this:

ACOFUU 1
ACOGAB 1
ACOGEF 1
ADABAK 2
AFEJUQ 1
AGUBUA 1
AKOXIJ 1
ALAMUW 0

Data construction specifics

Hello,
Thanks for your awesome work! I was wondering if you could provide more details regarding the data construction with the materials project API. Specifically,

  1. Which keys were queried to generate the features (Appendix Table 2)?
    a. Are group & period number included in the API or did you use other methods?
    b. Which keys in materialsproject held electronegativity, valence electrons (nsites?), etc.?

  2. What IDs are saved in atom_init? Do these IDs have to match with information in the CIF files?

Cheers!

The problem of using the Materials Project database and the Perovskite database

Hello~ Thanks for your great work!
I am a computer science student, so I am not familiar with the use of material databases.
After I read your data.py code, I find that you read cif file and get anything you need.
But as a computer science student, I can't understand your method of getting information from cif file. I would be very grateful if you could give me a little explanation.
Besides, your cif files are from COD database, how could I get cif files from the Materials Project database and the Perovskite database in the same format as COD database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.