diningphil / gnn-comparison Goto Github PK

Official Repository of "A Fair Comparison of Graph Neural Networks for Graph Classification", ICLR 2020

License: GNU General Public License v3.0

Jupyter Notebook 3.95% Python 92.26% Shell 3.80%

gnn-comparison's Issues

Is there a mistake in Data Preprocessing?

In the 42th row of the file https://github.com/diningphil/gnn-comparison/blob/master/datasets/tu_utils.py

with open(edges_path, "r") as f:
    for i, line in enumerate(f.readlines(), 1):
        line = line.rstrip("\n")
        edge = [int(e) for e in line.split(',')]
        edge_indicator.append(edge)
        graph_id = indicator[edge[0]]
        graph_edges[graph_id].append(edge)

graph_id = indicator[edge[0]] should be graph_id = indicator[i] in order to get the right graph_id ?

edge[0] means the index of the starting node of an edge.

OUTER_TS is 20.0 FOR M5k(GraphSAGE)

I RUN the experiments, however, the results is 20.0 for GraphSAGE(RED-M5K)

{"best_config": {"config": {"model": "GraphSAGE", "device": "cuda", "batch_size": 32, "learning_rate": 0.001, "l2": 0.0, "classifier_epochs": 1000, "optimizer": "Adam", "scheduler": null, "loss": "MulticlassClassificationLoss", "gradient_clipping": null, "early_stopper": {"class": "Patience", "args": {"patience": 500, "use_loss": true}}, "shuffle": true, "dim_embedding": 64, "num_layers": 5, "aggregation": "mean", "dataset": "REDDIT-MULTI-5K"}, "TR_score": 19.115831066573453, "VL_score": 20.0}, "OUTER_TR": 19.23931834911077, "OUTER_TS": 20.0}

Different best config in each k_fold

Can't find device = cuda in config files

Hi team :)
Thanks for this great work!

I was wondering why doesn't the configuration files contain cuda as a possible device option even though the paper mentions it as a possible device?

For example here are two images from GraphSAGE config file and from the paper:

Thank you in advance!
@Zmeos @diningphil @marcopodda @dbacciu

GraphSage doesn't learn.

Hi, I followed all the instructions in the github but turns out that when I'm trying to train the GraphSAGE for Social-1/IMDB-BINARY and Social-DEGREE/IMDB-BINARY the accuracies don't go higher than 0.5 (including training acc) which shows that the predictions are random. The same thing does not happen for GIN where the accuracies increase.

Pytorch 2.0.1
torch-geometric 2.3.1
python 3.10.10

the dataset for the experiments

Thanks, Very good Work.

I found dataset for RED-x, could you please share me a link for COLLAB

Regards

PyTorch -- multiple threads for each process

Hi,

Many thanks for this framework.

I would like to ask why you recommend editing the .bashrc in order to not allow PyTorch to spawn multiple threads. Without having done so, it doesn't seem like I encounter any issues. Could you please explain?

Thanks in advance for the response.

Call each other

In Launch_Experiments.py， def train() and def _train(), The two functions call each other, resulting in an endless loop?

Best configs / splits

Hi,
Thank you for sharing this great repository and thorough experimental results.

I wish to play with the existing results and possible improvements to the GNN architectures.
Currently, to reproduce results even for a specific architecture and a specific dataset, I need to run (10 folds) * (72 configurations + 3 re-trainings).

Can you somehow share the best-found configurations for each (model) x (task) x (fold)? Or best-hyperparams that were common across all outer-folds, such that my grid search will be smaller?

Thanks!

Results don't match - am I running something in the wrong way?

Hey!
I'm trying to reproduce Enzymes's results with GIN.
Running on CPU.
Following the instructions, I'm running on the corresponding virtual environment, prepared the dataset, copied the data_splits into the dataset's folder and ran:
python Launch_Experiments.py --config-file config_GIN.yml --dataset-name ENZYMES --result-folder results --debug

I get results for 10 folds and overall (assessment_results.json):
avg_TR_score 91.61179697196341
std_TR_score 7.0978242989793205
avg_TS_score 69.22222256130644
std_TS_score 4.850862399140947

which is higher than what is documented in the paper (~59.6 on the test).

Can you assist in understanding if I'm doing something wrong plz?

Thx!

Multi-process running errors

Hi! There are some errors when I try to run the code in the multi-process mode.(i.e. without --debug)

The errors are shown below:

[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/outer_results.json'
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:53: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TR_score'] = outer_TR_scores.mean()
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:217: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:186: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:209: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:55: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TS_score'] = outer_TS_scores.mean()

It seems that the errors come from HoldOutSeletor.process_results(), where the son processes are killed before they finish writing "config_results.json"
or that
"best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)" begins to run before the processes in the pool finish.

DiffPool breaks on ENZYMES?

Hi @diningphil ,
Thanks again for sharing this project.

I'm getting an error while training DiffPool on ENZYMES.
It seems that it does not happen on NCI1, and that other GNN types do work on ENZYMES.
So there's nothing wrong with DiffPool nor ENZYMES, but they don't work together.

I haven't changed anything in the code, python version is 3.6.7, pytorch version 1.4.0, torch-geometric version 1.4.2, running on CPU.

I'm running

python -u Launch_Experiments.py --config-file config_DiffPool.yml --dataset-name ENZYMES --result-folder mydir --inner-processes 1 --outer-processes 10 --outer-folds 10 --debug

And getting the following error:

File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 136, in forward
    x, adj, l, e = self.diffpool_layers[i](x, adj, mask)  # x has shape (batch, MAX_no_nodes, feature_size)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 75, in forward
    s = self.gnn_pool(x, adj, mask)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 46, in forward
    x1 = self.bn(1, F.relu(self.conv1(x0, adj, mask, add_loop=False)))
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch_geometric/nn/dense/dense_sage_conv.py", line 58, in forward
    out = torch.matmul(out, self.weight)
RuntimeError: size mismatch, m1: [384 x 3], m2: [21 x 64] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136

Any idea?
Thanks!

additional features

gnn-comparison/datasets/data.py

Line 18 in 5722c3c

additional_fields = {

What is the purpose of these additional features? Where are they used in this project? Thanks.

The program crashes without "--debug" flag being set when using Cuda.

Is the "--debug" flag a must when I use Cuda? I try to launch the experiments using both multi-thread and Cuda. But, why the program crashes when I leave " --debug" being empty, and it runs as expected when I set this flag? Does it imply that I can't use multi-thread when I use Cuda?

It takes long time to train DiffPool

Hi Federico,

I was wondering if it is normal that the training process of DiffPool runs very slowly in your framework? It takes me almost 4h to run 200 epochs for DiffPool on DD dataset with "--debug" on cuda (K80).

BTW, the training speed for other models in your framework seems fair and acceptable when they run on my machine.

Using dropout in last layer

Thank you for the very useful repo. I noticed that you use dropout at the output layer in https://github.com/diningphil/gnn-comparison/blob/master/models/graph_classifiers/GIN.py. Is this for ensuring that all GIN layers contribute to the output?

error message

Thank you for sharing the code. I found an error when I tried to preprocess the data.

Traceback (most recent call last):
File "PrepareDatasets.py", line 73, in
preprocess_dataset(dataset_name, args_dict)
File "PrepareDatasets.py", line 60, in preprocess_dataset
dataset_class(**args_dict)
File "/home/xxx/code/gnn-comparison/datasets/manager.py", line 75, in init
self.processed_dir / f"{self.name}.pt"))
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
AttributeError: 'PosixPath' object has no attribute 'tell'

I installed torch 1.4.0 etc. following the instructions. Can you please take a look at this issue? Thanks.

The position of "config_id += 1" in the 84 of evaluation/model_selection/HoldOutSelector.py

Would it be better if we put "config_id += 1" in line 72, instead of 84?
In the case where the process is killed when only several folds results are finished, and we want to rerun the process without overwriting the previous folds results, "continue" in the line 82 would make the process skip "config_id += 1", resulting in the error below:

File cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json already present! Shutting down to prevent loss of previous experiments
Config cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json already present! Shutting down to prevent loss of previous experiments
Traceback (most recent call last):
File "Launch_Experiments.py", line 41, in
raise e # print(e)
File "Launch_Experiments.py", line 38, in
result_folder=args.result_folder,label_ratio=args.label_ratio,ssl_option=args.ssl_option, debug=args.debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/EndToEnd_Evaluation.py", line 30, in main
risk_assesser.risk_assessment(experiment_class, debug=debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 92, in risk_assessment
self._risk_assessment_helper(outer_k, experiment_class, kfold_folder, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 119, in _risk_assessment_helper
self.model_configs, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 102, in model_selection
best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 46, in process_results
print('Model selection winner for experiment', HOLDOUT_MS_FOLDER, 'is config ', best_i, ':')
UnboundLocalError: local variable 'best_i' referenced before assignment

Not a question, the code is running well. Thanks.

Using DGCNN with new data

Hi! Thanks for the awesome code. I have a question about DGCNN when using my own data. What is the k value in the configuration? I notice you have 0.6 and 0.9 and then a value for the specific number for the specific dataset, what does this correspond to?
Thank you so much

diningphil / gnn-comparison Goto Github PK

gnn-comparison's Issues

Recommend Projects

Recommend Topics

Recommend Org