diningphil / gnn-comparison Goto Github PK
View Code? Open in Web Editor NEWOfficial Repository of "A Fair Comparison of Graph Neural Networks for Graph Classification", ICLR 2020
License: GNU General Public License v3.0
Official Repository of "A Fair Comparison of Graph Neural Networks for Graph Classification", ICLR 2020
License: GNU General Public License v3.0
Hi! Thanks for the awesome code. I have a question about DGCNN when using my own data. What is the k value in the configuration? I notice you have 0.6 and 0.9 and then a value for the specific number for the specific dataset, what does this correspond to?
Thank you so much
Hi @diningphil ,
Thanks again for sharing this project.
I'm getting an error while training DiffPool on ENZYMES.
It seems that it does not happen on NCI1, and that other GNN types do work on ENZYMES.
So there's nothing wrong with DiffPool nor ENZYMES, but they don't work together.
I haven't changed anything in the code, python version is 3.6.7, pytorch version 1.4.0, torch-geometric version 1.4.2, running on CPU.
I'm running
python -u Launch_Experiments.py --config-file config_DiffPool.yml --dataset-name ENZYMES --result-folder mydir --inner-processes 1 --outer-processes 10 --outer-folds 10 --debug
And getting the following error:
File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 136, in forward
x, adj, l, e = self.diffpool_layers[i](x, adj, mask) # x has shape (batch, MAX_no_nodes, feature_size)
File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 75, in forward
s = self.gnn_pool(x, adj, mask)
File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 46, in forward
x1 = self.bn(1, F.relu(self.conv1(x0, adj, mask, add_loop=False)))
File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch_geometric/nn/dense/dense_sage_conv.py", line 58, in forward
out = torch.matmul(out, self.weight)
RuntimeError: size mismatch, m1: [384 x 3], m2: [21 x 64] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136
Any idea?
Thanks!
Hi,
Many thanks for this framework.
I would like to ask why you recommend editing the .bashrc in order to not allow PyTorch to spawn multiple threads. Without having done so, it doesn't seem like I encounter any issues. Could you please explain?
Thanks in advance for the response.
I RUN the experiments, however, the results is 20.0 for GraphSAGE(RED-M5K)
{"best_config": {"config": {"model": "GraphSAGE", "device": "cuda", "batch_size": 32, "learning_rate": 0.001, "l2": 0.0, "classifier_epochs": 1000, "optimizer": "Adam", "scheduler": null, "loss": "MulticlassClassificationLoss", "gradient_clipping": null, "early_stopper": {"class": "Patience", "args": {"patience": 500, "use_loss": true}}, "shuffle": true, "dim_embedding": 64, "num_layers": 5, "aggregation": "mean", "dataset": "REDDIT-MULTI-5K"}, "TR_score": 19.115831066573453, "VL_score": 20.0}, "OUTER_TR": 19.23931834911077, "OUTER_TS": 20.0}
gnn-comparison/datasets/data.py
Line 18 in 5722c3c
What is the purpose of these additional features? Where are they used in this project? Thanks.
Thanks, Very good Work.
I found dataset for RED-x, could you please share me a link for COLLAB
Regards
In Launch_Experiments.py, def train() and def _train(), The two functions call each other, resulting in an endless loop?
Would it be better if we put "config_id += 1" in line 72, instead of 84?
In the case where the process is killed when only several folds results are finished, and we want to rerun the process without overwriting the previous folds results, "continue" in the line 82 would make the process skip "config_id += 1", resulting in the error below:
File cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json already present! Shutting down to prevent loss of previous experiments
Config cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json already present! Shutting down to prevent loss of previous experiments
Traceback (most recent call last):
File "Launch_Experiments.py", line 41, in
raise e # print(e)
File "Launch_Experiments.py", line 38, in
result_folder=args.result_folder,label_ratio=args.label_ratio,ssl_option=args.ssl_option, debug=args.debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/EndToEnd_Evaluation.py", line 30, in main
risk_assesser.risk_assessment(experiment_class, debug=debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 92, in risk_assessment
self._risk_assessment_helper(outer_k, experiment_class, kfold_folder, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 119, in _risk_assessment_helper
self.model_configs, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 102, in model_selection
best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 46, in process_results
print('Model selection winner for experiment', HOLDOUT_MS_FOLDER, 'is config ', best_i, ':')
UnboundLocalError: local variable 'best_i' referenced before assignment
Thank you for sharing the code. I found an error when I tried to preprocess the data.
Traceback (most recent call last):
File "PrepareDatasets.py", line 73, in
preprocess_dataset(dataset_name, args_dict)
File "PrepareDatasets.py", line 60, in preprocess_dataset
dataset_class(**args_dict)
File "/home/xxx/code/gnn-comparison/datasets/manager.py", line 75, in init
self.processed_dir / f"{self.name}.pt"))
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
AttributeError: 'PosixPath' object has no attribute 'tell'
I installed torch 1.4.0 etc. following the instructions. Can you please take a look at this issue? Thanks.
Hi team :)
Thanks for this great work!
I was wondering why doesn't the configuration files contain cuda as a possible device option even though the paper mentions it as a possible device?
For example here are two images from GraphSAGE config file and from the paper:
Thank you in advance!
@Zmeos @diningphil @marcopodda @dbacciu
Hi Federico,
I was wondering if it is normal that the training process of DiffPool runs very slowly in your framework? It takes me almost 4h to run 200 epochs for DiffPool on DD dataset with "--debug" on cuda (K80).
BTW, the training speed for other models in your framework seems fair and acceptable when they run on my machine.
Is the "--debug" flag a must when I use Cuda? I try to launch the experiments using both multi-thread and Cuda. But, why the program crashes when I leave " --debug" being empty, and it runs as expected when I set this flag? Does it imply that I can't use multi-thread when I use Cuda?
Thank you for the very useful repo. I noticed that you use dropout at the output layer in https://github.com/diningphil/gnn-comparison/blob/master/models/graph_classifiers/GIN.py. Is this for ensuring that all GIN layers contribute to the output?
Hey!
I'm trying to reproduce Enzymes's results with GIN.
Running on CPU.
Following the instructions, I'm running on the corresponding virtual environment, prepared the dataset, copied the data_splits into the dataset's folder and ran:
python Launch_Experiments.py --config-file config_GIN.yml --dataset-name ENZYMES --result-folder results --debug
I get results for 10 folds and overall (assessment_results.json):
avg_TR_score 91.61179697196341
std_TR_score 7.0978242989793205
avg_TS_score 69.22222256130644
std_TS_score 4.850862399140947
which is higher than what is documented in the paper (~59.6 on the test).
Can you assist in understanding if I'm doing something wrong plz?
Thx!
In the 42th row of the file https://github.com/diningphil/gnn-comparison/blob/master/datasets/tu_utils.py
with open(edges_path, "r") as f:
for i, line in enumerate(f.readlines(), 1):
line = line.rstrip("\n")
edge = [int(e) for e in line.split(',')]
edge_indicator.append(edge)
graph_id = indicator[edge[0]]
graph_edges[graph_id].append(edge)
graph_id = indicator[edge[0]] should be graph_id = indicator[i] in order to get the right graph_id ?
edge[0] means the index of the starting node of an edge.
Hi, I followed all the instructions in the github but turns out that when I'm trying to train the GraphSAGE for Social-1/IMDB-BINARY and Social-DEGREE/IMDB-BINARY the accuracies don't go higher than 0.5 (including training acc) which shows that the predictions are random. The same thing does not happen for GIN where the accuracies increase.
Pytorch 2.0.1
torch-geometric 2.3.1
python 3.10.10
Hi,
Thank you for sharing this great repository and thorough experimental results.
I wish to play with the existing results and possible improvements to the GNN architectures.
Currently, to reproduce results even for a specific architecture and a specific dataset, I need to run (10 folds) * (72 configurations + 3 re-trainings).
Can you somehow share the best-found configurations for each (model) x (task) x (fold)? Or best-hyperparams that were common across all outer-folds, such that my grid search will be smaller?
Thanks!
Hi! There are some errors when I try to run the code in the multi-process mode.(i.e. without --debug)
The errors are shown below:
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/outer_results.json'
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:53: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TR_score'] = outer_TR_scores.mean()
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:217: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:186: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:209: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:55: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TS_score'] = outer_TS_scores.mean()
It seems that the errors come from HoldOutSeletor.process_results(), where the son processes are killed before they finish writing "config_results.json"
or that
"best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)" begins to run before the processes in the pool finish.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.