txie-93 / cgcnn Goto Github PK

View Code? Open in Web Editor NEW

624.0 624.0 303.0 7.22 MB

Crystal graph convolutional neural networks for predicting material properties.

License: MIT License

Python 100.00%

cgcnn's People

Contributors

Stargazers

Watchers

Forkers

buptxz w6ye duxiu727 usccolumbia yangxi1209 dingzw007 mkhorton lizhenzhupearl davy9501 jielanlee cheng7258002 vedic-partap qzhu2017 sidgoyal78 ulissigroup liuruiphy hunbeombak akifqc ardunn gpilania gloriacapano lawlietsoul ahzeeshan tanyaadams1 2013311026 holdymoldy chqlee ycyoon1994 resnant blancavazquez pgg1610 amoliu nissy-dev wanghaiyangmit kris4hu yingyingeryu ruoitrau86 motoshitakao aixuan1995py hipeter ljzhou86 lclindu alphatestk roysh liubo8203yy wilsonnater kimyujin96 nikhil3456 f14r3 yuewang-austin gtechzilla yitengzhou noncomplete nitin0301 changchunhe mwitman1 kasunkg niuxf16 catenate15 chengweilin114 pincher-chen shiqiaol fuzhuliu yfyh2013 zhenming-xu dillonyost whumaple aleatorm lingdu-zero shyshy903 wrao96 daisukeyamauchi liangzai951 shivampotdar andrew-s-rosen rishikeshmagar arpanisi shuangte nyummvclab lingjing324 lwk205 j35tor 3ddylan ws517037749 shreeja7 saimani5 zhang-kaifeng lcdx sgbaird aspirincode grahamrobertsw suthzx takoyaki1116 xchen147 snailwhb bundaberg-joey ruihwang danielpert ezpzbz amoslives

cgcnn's Issues

Description of crys_fea in model.py

Hi,

What is crys_fea? I can't seem to find a description of it in the comments of model.py. Based on your suggestion in #21, it seems like this is the intermediate layer we're interested in. I assume it's the crystal feature vector. If that's the case, any help on what that actually means? Is there a place in the CGCNN paper that I should refer to?

Thank you,

Sterling

@mliu7051

What does 'target' means in id_prop.csv??

Hi. Im currently researching about your cgcnn paper and implementation of it. I have reviewed your entire cgcnn code and now im trying to run my own custom code with custom dataset.
But I don't understand meaning of 'target' in data/sample-regression/id_prop.csv.
Ten target values(1.0, 2.0, ... 10.0) are attached to each material-id(unique crystal number). what is it exactly?
I hope your explanation.

Thank you!

How to understand output to generate crystal graphs without training model

Good morning!

I was wondering if there is an easy way to generate the crystal graphs for visualization with this code. While I would like to train a model later, I currently just want to visualize my dataset, and I'm finding it particularly hard with this code.

Thank you, and wonderful project! Everything runs smoothly with essentially no error.

Vanessa

What to do with negative values of Bulk Modulus and Shear Modulus?

Hi Tian
When we set up the model for properties like bulk modulus and shear modulus, we take into account the log10(GPa) values. For some crystals, the values of bulk and shear moduli come out to be negative. The question is how do we take the logarithm? Do we take the absolute values before calculating the logarithm? Or do we take only those crystals which have positive values for these properties?
I will be grateful to you if you can answer.
Thanks

struct = Structure.from_dict(entry["structure"]) # Pymatgen structure KeyError: 'structure'

when i want to run make_cifs.py it will have the errors. how to solve it
import os
import json
from pymatgen.core import Structure

------Settings------#

struct_json_path = "qmof.json" # path to structure json
cif_folder_path = "qmof_cif" # path to folder where CIFs will be stored
write_site_props = True # if site properties should be written to CIF
only_ddec_charge = False # set to True if you only want _atom_site_charge flags

------Settings------#

Make new folder to store CIFs

if not os.path.exists(cif_folder_path):
os.mkdir(cif_folder_path)

Read in structure data

with open("E:\GANN-main\qmof_database\qmof.json") as f:
qmof_struct_data = json.load(f)

Loop over structures and write each one out to a CIF

qmof_structs = {}
for entry in qmof_struct_data:

qmof_id = entry["qmof_id"]  # name for CIF
print(f"Writing {qmof_id}")

struct = Structure.from_dict(entry["structure"])  # Pymatgen structure
cif_path = os.path.join(cif_folder_path, f"{qmof_id}.cif")  # path to write CIF
struct.to(filename=cif_path)  # write CIF
properties = dict(sorted(struct.site_properties.items()))  # fetch site properties

tensor size error

python main.py --train-size 30000 --val-size 10000 --test-size 10000 my_test/mp_test/
Epoch: [0][0/118]	Time 22.182 (22.182)	Data 10.518 (10.518)	Loss 1.0705 (1.0705)	MAE 1.340 (1.340)
Epoch: [0][10/118]	Time 17.675 (19.742)	Data 10.263 (11.007)	Loss 1.0962 (0.9969)	MAE 1.322 (1.234)
Epoch: [0][20/118]	Time 17.326 (19.752)	Data 9.870 (11.245)	Loss 0.7512 (0.9568)	MAE 1.034 (1.198)
Traceback (most recent call last):
  File "main.py", line 488, in <module>
    main()
  File "main.py", line 157, in main
    train(train_loader, model, criterion, optimizer, epoch, normalizer)
  File "main.py", line 208, in train
    for i, (input, target, _) in enumerate(train_loader):
  File "/home/jjs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/jjs/gitdir/cgcnn/cgcnn/data.py", line 130, in collate_pool
    torch.cat(batch_nbr_fea, dim=0),
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 12 and 7 in dimension 1 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

I get mp structure by something like this:

from pymatgen.ext.matproj import MPRester
with MPRester(api_key) as m:
    d = m.get_data(mp_id)
    with open(cif_file,'w') as f:
        f.write(d[0]['cif'])

Issues about predication for example

Hey，i have installed cgcnn and run predict.py successfully.
As followed by readme, i tried "python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression". However, the MAE predication was so huge, which indicates bad accuracy of prediction.
In my understanding，i can simply use pre-trained model to predict examples in /data to test cgcnn, but the results confused me. So am I doing something wrong at some step?

installation problem

Hi,
I was trying to create the env as per your README:
conda create -n cgcnn python=3.7 scikit-learn pytorch=1.0.0 torchvision pymatgen -c pytorch -c matsci

I got this following error, I am using Fedora30 and Anaconda is updated to the newest version.

LinkError: pre-link script failed for package matsci::pydispatcher-2.0.5-py_0
location of failed script: /home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin/.pydispatcher-pre-link.sh
==> script messages <==

==> script output <==
stdout:
stderr: + unset _mlshdbg

'[' 0 = 1 ']'
export MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH
MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH
unset _mlre _mlIFS
'[' -n x ']'
_mlIFS='
'
IFS=' '
for _mlv in ${MODULES_RUN_QUARANTINE:-}
'[' LD_LIBRARY_PATH = LD_LIBRARY_PATH -a LD_LIBRARY_PATH = LD_LIBRARY_PATH ']'
++ eval 'echo ${LD_LIBRARY_PATH+x}'
+++ echo x
'[' -n x ']'
++ eval 'echo ${LD_LIBRARY_PATH}'
+++ echo /usr/lib64/mpich/lib
_mlre='LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' '
_mlrv=MODULES_RUNENV_LD_LIBRARY_PATH
++ eval 'echo ${MODULES_RUNENV_LD_LIBRARY_PATH:-}'
+++ echo
_mlre='LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' '
'[' -n 'LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' ' ']'
_mlre='eval LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib''' LD_LIBRARY_PATH='''''' '
++ eval 'LD_LIBRARY_PATH_modquar='''/usr/lib64/mpich/lib'''' 'LD_LIBRARY_PATH=''''''' /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash autoinit
+++ LD_LIBRARY_PATH_modquar=/usr/lib64/mpich/lib
+++ LD_LIBRARY_PATH=
+++ /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash autoinit
_mlcode='module() {
unset _mlshdbg;
if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '''1''' ]; then
case "$-" in
vx*) set +vx; _mlshdbg='''vx''' ;;
v) set +v; _mlshdbg='''v''' ;;
x) set +x; mlshdbg='''x''' ;;
) _mlshdbg='''''' ;;
esac;
fi;
unset _mlre _mlIFS;
if [ -n "${IFS+x}" ]; then
_mlIFS=$IFS;
fi;
IFS=''' ''';
for _mlv in ${MODULES_RUN_QUARANTINE:-}; do
if [ "${_mlv}" = "${_mlv##[!A-Za-z0-9]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
if [ -n "eval '\''echo ${'\''$_mlv'\''+x}'\''" ]; then
_mlre="${_mlre:-}${_mlv}_modquar='''eval '\''echo ${'\''$_mlv'\''}'\''''' ";
fi;
mlrv="MODULES_RUNENV${_mlv}";
_mlre="${_mlre:-}${_mlv}='''eval '\''echo ${'\''$_mlrv'\'':-}'\''''' ";
fi;
done;
if [ -n "${_mlre:-}" ]; then
eval eval ${_mlre}/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '\''"$@"'\'';
else
eval /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@";
fi;
_mlstatus=$?;
if [ -n "${_mlIFS+x}" ]; then
IFS=$_mlIFS;
else
unset IFS;
fi;
unset _mlre _mlv _mlrv _mlIFS;
if [ -n "${_mlshdbg:-}" ]; then
set -$_mlshdbg;
fi;
unset _mlshdbg;
return $_mlstatus;
};
MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl; export MODULES_CMD;
MODULESHOME=/usr/share/Modules; export MODULESHOME;
test 0;'
_mlret=0
'[' -n x ']'
IFS='
'
unset _mlIFS
unset _mlre _mlv _mlrv
'[' 0 -eq 0 ']'
eval 'module() {
unset _mlshdbg;
if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '''1''' ]; then
case "$-" in
vx*) set +vx; _mlshdbg='''vx''' ;;
v) set +v; _mlshdbg='''v''' ;;
x) set +x; mlshdbg='''x''' ;;
) _mlshdbg='''''' ;;
esac;
fi;
unset _mlre _mlIFS;
if [ -n "${IFS+x}" ]; then
_mlIFS=$IFS;
fi;
IFS=''' ''';
for _mlv in ${MODULES_RUN_QUARANTINE:-}; do
if [ "${_mlv}" = "${_mlv##[!A-Za-z0-9]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
if [ -n "eval '\''echo ${'\''$_mlv'\''+x}'\''" ]; then
_mlre="${_mlre:-}${_mlv}_modquar='''eval '\''echo ${'\''$_mlv'\''}'\''''' ";
fi;
mlrv="MODULES_RUNENV${_mlv}";
_mlre="${_mlre:-}${_mlv}='''eval '\''echo ${'\''$_mlrv'\'':-}'\''''' ";
fi;
done;
if [ -n "${_mlre:-}" ]; then
eval eval ${_mlre}/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '\''"$@"'\'';
else
eval /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@";
fi;
_mlstatus=$?;
if [ -n "${_mlIFS+x}" ]; then
IFS=$_mlIFS;
else
unset IFS;
fi;
unset _mlre _mlv _mlrv _mlIFS;
if [ -n "${_mlshdbg:-}" ]; then
set -$_mlshdbg;
fi;
unset _mlshdbg;
return $_mlstatus;
};
MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl; export MODULES_CMD;
MODULESHOME=/usr/share/Modules; export MODULESHOME;
test 0;'
++ MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl
++ export MODULES_CMD
++ MODULESHOME=/usr/share/Modules
++ export MODULESHOME
++ test 0
'[' 0 = 1 ']'
'[' -t 2 ']'
export -f module
export -f switchml
ENV=/usr/share/Modules/init/profile.sh
export ENV
BASH_ENV=/usr/share/Modules/init/bash
export BASH_ENV
'[' 5 -ge 3 ']'
[[ hxB =~ i ]]
[[ ! :/home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin:/home/shazia/anaconda3/bin:/home/shazia/anaconda3/bin:/home/shazia/anaconda3/condabin:/home/shazia/.local/bin:/home/shazia/bin:/usr/lib64/mpich/bin:/usr/share/Modules/bin:/usr/lib64ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin: =~ :/usr/share/Modules/bin: ]]
'[' '!' -n x ']'
++ manpath
[[ ! :/usr/share/man/mpich-x86_64::/home/shazia/anaconda3/man:/usr/local/share/man:/usr/share/man: =~ :/usr/share/man: ]]
unset _mlcode _mlret
'[' -n '' ']'
/home/shazia/anaconda3/envs/cgcnn/bin/python /home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/link.py
/home/shazia/anaconda3/pkgs/pydispatcher-2.0.5-py_0/bin/.pydispatcher-pre-link.sh: line 2: /home/shazia/anaconda3/envs/cgcnn/bin/python: No such file or directory

return code: 127

The problem of using the Materials Project database and the Perovskite database

Hello~ Thanks for your great work!
I am a computer science student, so I am not familiar with the use of material databases.
After I read your data.py code, I find that you read cif file and get anything you need.
But as a computer science student, I can't understand your method of getting information from cif file. I would be very grateful if you could give me a little explanation.
Besides, your cif files are from COD database, how could I get cif files from the Materials Project database and the Perovskite database in the same format as COD database.

Issue with path assertion for root_dir

Hi!
I've been trying to get this running but it does not seem to be able to find my root_dir. i've just been trying to have it run a regression and using this line exactly:

(cgcnn) blake@blake-desktop:~/cgcnn$ python main.py --disable-cuda --train-size 250 -- val-size 25 --test-size 20 root_dir
Traceback (most recent call last):
File "main.py", line 488, in
main()
File "main.py", line 85, in main
dataset = CIFData(*args.data_options)
File "/home/blake/cgcnn/cgcnn/data.py", line 288, in init
assert os.path.exists(root_dir), 'root_dir does not exist!'
AssertionError: root_dir does not exist!
I've tried having the the root_dir directory in the cgcnn folder and outside of it in my home directory but both have resulted in the same error. If it matters or helps, I'm running Ubuntu 18.10 cosmic cuttlefish.

Size mismatch when loading pre-trained models

Hi,

When I try to load pre-trained models to test predict.py, I was noticed as follows:

python predict.py pre-trained/final-energy-per-atom.pth.tar mp/
=> loading model params 'pre-trained/final-energy-per-atom.pth.tar'
=> loaded model params 'pre-trained/final-energy-per-atom.pth.tar'
=> loading model 'pre-trained/final-energy-per-atom.pth.tar'
Traceback (most recent call last):
File "E:\cgcnn-master\predict.py", line 298, in
main()
File "E:\cgcnn-master\predict.py", line 94, in main
model.load_state_dict(checkpoint['state_dict'])
File "C:\ProgramData\Anaconda3\envs\cgcnn1\lib\site-packages\torch\nn\modules\module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CrystalGraphConvNet:
size mismatch for convs.0.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.1.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.2.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).
size mismatch for convs.3.fc_full.weight: copying a param with shape torch.Size([128, 169]) from checkpoint, the shape in current model is torch.Size([128, 179]).

btw, then I tried to train my own model and use it to predict. The errors above didn't show up, but I got a TOO large MAE.

(cgcnn) E:\cgcnn-master>python predict.py E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar mp/
=> loading model params 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loaded model params 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loading model 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar'
=> loaded model 'E:\cgcnn-master\trained_files\from_cmd\mp-2\mp_model_best.pth.tar' (epoch 484, validation 0.05862389877438545)
C:\ProgramData\Anaconda3\envs\cgcnn\lib\site-packages\pymatgen\io\cif.py:1155: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Test: [0/74] Time 26.633 (26.633) Loss inf (inf) MAE 5.977 (5.977)
Test: [10/74] Time 24.787 (27.052) Loss inf (inf) MAE 6.005 (6.013)
Test: [20/74] Time 28.383 (28.096) Loss inf (inf) MAE 5.941 (6.010)
Test: [30/74] Time 31.305 (28.518) Loss inf (inf) MAE 6.081 (6.008)
Test: [40/74] Time 30.491 (29.037) Loss inf (inf) MAE 5.860 (6.010)
Test: [50/74] Time 35.822 (29.651) Loss inf (inf) MAE 6.035 (6.008)
Test: [60/74] Time 33.488 (30.191) Loss inf (inf) MAE 6.033 (6.012)
Test: [70/74] Time 34.823 (30.565) Loss inf (inf) MAE 5.955 (6.008)
** MAE 6.009

Thanks for your attention!

Add CIF module to be installed in the environment in README file

Traceback (most recent call last):
File "./main.py", line 22, in
from cgcnn.data import CIFData
File "cgcnn/data.py", line 19, in
from CifFile import ReadCif
ModuleNotFoundError: No module named 'CifFile'

https://github.com/txie-93/cgcnn/blob/master/README.md

On understanding the AtomInitializer class

The documentation says that the AtomInitializer class is the "Base class for intializing the vector representation for atoms." I'm trying to understand the reasoning behind the word "initializing", as this representation doesn't appear to be changing during the training (starting from an initial state). So, is the representation really a static one?

How to understand the shape of "nbr_fea_idx" in the code

I couldn't understand "the indices of M neighbors of each atom"
Could you give me a simple example about the matrix of "nbr_fea_idx".
Thank you!

Question about fetching Materials Project Data

Hi,

Thank you for your great work! I am trying to fetch the data from the Materials Project database based on the mp-ids that you provided. I am wondering if the mp-ids that you provided are materials id or task id?

I tried

MPRester().get_structure_by_material_id(id)

but a lot of the ids in your csv files return void responses. Then I tried

mid = get_materials_id_from_task_id(id)
structure = get_structure_by_material_id(mid)

and it worked. I want to ask if this is the correct way of fetching the dataset.

Thank you

why the cgcnn model cannot predict graphene and diamond?

I have trained cgcnn model by using material-project ids in "cgcnn/data/material-data/mp-ids-46744.csv" and I have tested the model to predict "energy_per_atom"(target value of materials in material-project) of the "graphene" and "diamond".

Both materials are made of one element "C(carbon)" but they have different structures. so the model must be able to distinguish the difference between them. but the result is not good. The model fails to predict similar value of target(energy_per_atom).

the following is that capture

In the above image, the target values of graphene and diamond are -9.0904, -9.2203 respectively. but the prediction values are -1.7248, -1.7681 respectively. (normalized target values are both 0.7071)

so, I want to know the reason why the trained cgcnn model cannot predict the target value of graphene and diamond?

I expect your kind explanation. Thank you.

The setting of hyperparameters to get the best performance

Hi Tian:
I have tried to reproduce your result like "MAE_model = 0.039" using your dataset.

The best MAE’s we achieved with Eq. 4 and Eq. 5 are 0.136 eV/atom and 0.039 eV/atom

But it is hard to set the hyperparameters correctly. Could you please share your settings of hypermeters?

Thank you and looking forward to your reply.

Atom features specifics

Your work is great! I try to reproduce your work. And I want to know what the nine attributes used in atom.init are.
Thank you!

Could you please tell me why do you use one hot encode?

Hello, Covalent radius, Electron affinity, Atomic volume and Electronegativity are properties you used. Why don't you just use the value?

Question: How to make dataset?

I have questions about datasets.

Materials Project database and API had changed a lot from when this repository started, and I can’t get the whole dataset written in the “cgcnn/data/material/mp-ids-○○.csv” file.

In CGCNN paper(https://arxiv.org/abs/1710.10324 ), it is written that

After removing ill-converged crystals, the full database has 46744 materials covering 87 elements, ...

My questions are as follows.

How to collect “mp-ids-27430", “mp-ids-46744”, “mp-ids-3402" like dataset?
What is the mean of ill-converged crystal ?

Thanks,

Bulid the element vector

Sorry to disturb you, I wonder that when i build the file atom_init.json, i notice that every element key-value pair have so much value.
If i need to fill it all? Or i only need to change The first few values?

CGCNN can't run on certain CIF files

Hello,

Great work with the project. I have been using CGCNN for perovskites, with the CIF files taken from ICSD database. However, I found out that it only works with certain CIF files, but not the others. Does it only take a certain "type" of CIF files? If so, how do I change the CIF files which don't work to the format that can be inputted into CGCNN?

As a reference, here are some of the ICSD collection codes:

Working: 110630, 243861, 252316, 255885, 255886, 432089
Not working: 252317, 252318, 252319, 252320, 254340, 254341, 434118, 432090, 254288

The error that comes out:

Traceback (most recent call last):
  File "main.py", line 513, in <module>
    main()
  File "main.py", line 119, in main
    sample_data_list = [dataset[i] for i in range(len(dataset))]
  File "main.py", line 119, in <listcomp>
    sample_data_list = [dataset[i] for i in range(len(dataset))]
  File "/run/user/1000/gvfs/user/cgcnn-ubuntu/cgcnn-master/cgcnn/data.py", line 326, in __getitem__
    print("Specie number: ",crystal[0].specie.number)
  File "/home/user/miniconda3/envs/cgcnn/lib/python3.7/site-packages/pymatgen/core/sites.py", line 79, in __getattr__
    raise AttributeError(a)
AttributeError: specie

Thank you!

How to bind cgcnn for matminer?

Hello

I'm looking at the matminer documentation which asserts that it's cgcnn featurizer requires cgcnn with python bindings. How would I install cgcnn as a package rather than launch it from main.py?

Issues with installation

Hey I am having issues installing cgcnn on my terminal. Following the steps as mentioned in the README I am unable to run the python main.py files.

One problem is regarding pytorch:
File "main.py", line 213
input_var = (Variable(input[0].cuda(async=True)),
^
SyntaxError: invalid syntax

While redoing this in a different python env, I cannot install cgcnn module from anywhere. The error I get
ModuleNotFoundError: No module named 'cgcnn'

How exactly should I go about this? Thanks!

Multiclass classification

Thank you for making this tool! I am running into an issue when I run python main.py --task classification:

/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/pymatgen/io/cif.py:1134: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Traceback (most recent call last):
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 513, in <module>
    main()
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 175, in main
    train(train_loader, model, criterion, optimizer, epoch, normalizer)
  File "/Users/gianmarcoterrones/Research/cgcnn/main.py", line 252, in train
    loss = criterion(output, target_var)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/modules/loss.py", line 216, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gianmarcoterrones/opt/anaconda3/envs/cgcnn/lib/python3.11/site-packages/torch/nn/functional.py", line 2704, in nll_loss
    return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Target 2 is out of bounds.

Have I made an error in setting up the customized dataset? Or does the code not currently support multiclass classification? The entries of my id_prop.csv look like this:

ACOFUU	1
ACOGAB	1
ACOGEF	1
ADABAK	2
AFEJUQ	1
AGUBUA	1
AKOXIJ	1
ALAMUW	0

Description for Materials Project .csv files

Hi Tian,

Trying to figure out what each of the mp-ids-*.csv files correspond to.

After removing ill-converged crystals, the full database has 46744 materials covering 87 elements, 7 lattice systems, and 216 space groups.
The database [34] we use includes the energy above hull of 18928 perovskite crystals
Figures 2(b) and 2(c) show the performance of the two models on 9350 test crystals

Table 1 has # train data with values of 28046, 16458, and 2041.

So "46744" seems pretty straightforward mp-ids-46744.csv

But what do mp-ids-3402.csv and mp-ids-27430.csv correspond to in the paper?

Thanks!

Sterling

How to run CGCNN through Jupyter or Python interface

I'm able to run CGCNN on a command line interface (PowerShell) through Anaconda, but I'm curious if you have suggestions for using it with an IDE.

About checkpoint.pth.tar

Could I delete checkpoint.pth.tar when pre-trained is completed?
And, could I rollback my pre-trained model by specifying the checkpoint.pth.tar?

MAE for predicting and training shear modulus and bulk modulus is very large

I found that the MAE I obtained when training the model with these two attributes actually reached more than 100! I don’t know if it’s a problem with the material I picked. I used the k_vrh and g_vrh of the material in the material project as the bulk modulus and shear modulus. I think there is no problem with my data set. Here is my training situation：

Data construction specifics

Hello,
Thanks for your awesome work! I was wondering if you could provide more details regarding the data construction with the materials project API. Specifically,

Which keys were queried to generate the features (Appendix Table 2)?
a. Are group & period number included in the API or did you use other methods?
b. Which keys in materialsproject held electronegativity, valence electrons (nsites?), etc.?
What IDs are saved in atom_init? Do these IDs have to match with information in the CIF files?

Cheers!

Pytorch version

Are there any plans to make CGCNN compatible with Pytorch 0.4+ (or even v1.0+)?

I ask because it seems like on some linux distributions (e.g. computing clusters) the required version is not available:

ardunn@n0001:~$ pip install torch==0.3.1
Collecting torch==0.3.1
  Could not find a version that satisfies the requirement torch== (from versions: 0.1.2, 0.1.2.post1, 0.4.1, 0.4.1.post2, 1.0.0)
No matching distribution found for torch==0.3.1

Whereas on other versions (e.g., Mac), the version is available.

Loading data issue with certain CIF files

Describe the issue

CGCNN cannot accept as input CIFs that PeriodicSite consists of multiple elements.

To Reproduce

from pymatgen.core.structure import Structure

crystal = Structure.from_file("1005004.cif")
crystal.replace_species({"Fe2+": {"Fe2+": 0.5, "Mg2+": 0.5}})
crystal.to(fmt='cif', filename="./data/sample-classification/1005004_multiple.cif")

"1005004.cif" can be downloaded at http://www.crystallography.net/cod/result.php?journal=Inorganic%20Chemistry

Expected behavior
CIFs that PeriodicSite consists of multiple elements should also be accepted as input.
(A number of such CIFs are included in the ICSD database)

Error message

$ python predict.py pre-trained/semi-metal-classification.pth.tar data/sample-classification

=> loading model params 'pre-trained/semi-metal-classification.pth.tar'
=> loaded model params 'pre-trained/semi-metal-classification.pth.tar'
=> loading model 'pre-trained/semi-metal-classification.pth.tar'
=> loaded model 'pre-trained/semi-metal-classification.pth.tar' (epoch 92, validation 0.958599858597622)
/python3.8/site-packages/pymatgen/io/cif.py:1155: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Traceback (most recent call last):
  File "predict.py", line 299, in <module>
    main()
  File "predict.py", line 102, in main
    validate(test_loader, model, criterion, normalizer, test=True)
  File "predict.py", line 125, in validate
    for i, (input, target, batch_cif_ids) in enumerate(val_loader):
  File "/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/python3.8/site-packages/torch/utils/data/dataloader.py", line 692, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/cgcnn/cgcnn/data.py", line 325, in __getitem__
    atom_fea = np.vstack([self.ari.get_atom_fea(crystal[i].specie.number)
  File "/cgcnn/cgcnn/data.py", line 325, in <listcomp>
    atom_fea = np.vstack([self.ari.get_atom_fea(crystal[i].specie.number)
  File "python3.8/site-packages/pymatgen/core/sites.py", line 79, in __getattr__
    raise AttributeError(a)
AttributeError: specie

Error raised while testing the code in README

I configured the code in both docker and real environment, and there was not any error duIring excuting the training part. But for the predicting part, it raised the following error:

root@ef1b695a4d41:/workspace/cgcnn# python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression
=> loading model params 'pre-trained/formation-energy-per-atom.pth.tar'
=> loaded model params 'pre-trained/formation-energy-per-atom.pth.tar' 
=> loading model 'pre-trained/formation-energy-per-atom.pth.tar'
=> loaded model 'pre-trained/formation-energy-per-atom.pth.tar' (epoch 968, validation 0.03972001800748568)
/opt/conda/lib/python3.6/site-packages/pymatgen/io/cif.py:1107: UserWarning: Issues encountered while parsing CIF:
  warnings.warn("Issues encountered while parsing CIF:")
/opt/conda/lib/python3.6/site-packages/pymatgen/io/cif.py:1109: UserWarning: Some fractional co-ordinates rounded to ideal values to avoid finite precision errors.
  warnings.warn(error)
predict.py:131: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_var = (Variable(input[0].cuda(non_blocking=True), volatile=True),
predict.py:132: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  Variable(input[1].cuda(non_blocking=True), volatile=True),
predict.py:146: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  volatile=True)
Traceback (most recent call last):
  File "predict.py", line 302, in <module>
    main()
  File "predict.py", line 106, in main
    validate(test_loader, model, criterion, normalizer, test=True)
  File "predict.py", line 157, in validate
    losses.update(loss.data.cpu()[0], target.size(0))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I used PyTorch 1.1 with CUDA10 in docker, while PyTorch 1.0 with CUDA 9.0 in real environment. The error was raised in both environment. So how could I deal with it?

Thanks!

atom_init.json

I was inspecting atom_init.json file and I found something strange in comparison to the documentation of atom features in the table S2 (Supplemental Material).

Since every feature is one hot encoded, length of the vector should be 93 according to the table, but it is actually 92.
Also Group number feature has 18 categories in the table, but in the file it has 19. Actinide and lanthanide elements are presented by having 1 at element with index 0, except Lutetium which is labeled as group number 3. Other elements have an element equal to 1 on the same index as their group number, e.g. for hydrogen, element at index number 1 is 1.
Regarding the Period number feature it says it has 9 categories, but in the file it has 7 since lathanide and actinide elements are described with period 6 and 7 respectively and not 8 and 9 as stated in the table S2.
Could you also explain why hydrogen's electronegativity is in the bin [1.9 - 2.25) instead of [2.25-2.6) according to Sanderson electronegativity, i.e. 2.59.

Problem with importing MPDataRetrieval

ImportError Traceback (most recent call last)

in ()
1 get_ipython().system('sudo pip3 install scipy')
----> 2 from matminer.data_retrieval.retrieve_MP import MPDataRetrieval
3
4 def data_query(mp_api_key, max_elms=3, min_elms=3, max_sites=20, include_te=False):
5 """

4 frames

/usr/local/lib/python3.7/dist-packages/scipy/cluster/vq.py in ()
68 import numpy as np
69 from collections import deque
---> 70 from scipy._lib._util import _asarray_validated, check_random_state,
71 rng_integers
72 from scipy.spatial.distance import cdist

ImportError: cannot import name 'rng_integers' from 'scipy._lib._util' (/usr/local/lib/python3.7/dist-packages/scipy/_lib/_util.py)

Predict at atom level

Hello,

Thank you so much for this project, very interesting!

I am trying to use this package to do a regression prediction to the atom level instead of predicting a property for the whole crystal. I am trying to input the atom labels but struggling to change the data.py script accordingly... what would be the easiest way to input the atom labels?

And I guess I need to output the atom prediction before the pooling? Does this correspond to return out of def forward(): in model.py?

Thanks!
Marta

Questions about recreating a data set

hi,天哥
I have the following questions：
(1)Why are the data sets used to predict band gaps and formation energies different and quite different?
(2)If I want to make a new data set, what criteria should I use to obtain some materials from MP?
(3)How to remove the ill-converged crystals mentioned in your article, or by what standards?
I hope da lao can help me answer it, thank you!

txie-93 / cgcnn Goto Github PK

cgcnn's People

Contributors

Stargazers

Watchers

Forkers

cgcnn's Issues

------Settings------#

------Settings------#

Make new folder to store CIFs

Read in structure data

Loop over structures and write each one out to a CIF

When I try to load pre-trained models to test predict.py, I was noticed as follows:

btw, then I tried to train my own model and use it to predict. The errors above didn't show up, but I got a TOO large MAE.

Recommend Projects

Recommend Topics

Recommend Org