Git Product home page Git Product logo

gnn-comparison's Issues

Is there a mistake in Data Preprocessing?

In the 42th row of the file https://github.com/diningphil/gnn-comparison/blob/master/datasets/tu_utils.py

with open(edges_path, "r") as f:
    for i, line in enumerate(f.readlines(), 1):
        line = line.rstrip("\n")
        edge = [int(e) for e in line.split(',')]
        edge_indicator.append(edge)
        graph_id = indicator[edge[0]]
        graph_edges[graph_id].append(edge)

graph_id = indicator[edge[0]] should be graph_id = indicator[i] in order to get the right graph_id ?

edge[0] means the index of the starting node of an edge.

OUTER_TS is 20.0 FOR M5k(GraphSAGE)

I RUN the experiments, however, the results is 20.0 for GraphSAGE(RED-M5K)

{"best_config": {"config": {"model": "GraphSAGE", "device": "cuda", "batch_size": 32, "learning_rate": 0.001, "l2": 0.0, "classifier_epochs": 1000, "optimizer": "Adam", "scheduler": null, "loss": "MulticlassClassificationLoss", "gradient_clipping": null, "early_stopper": {"class": "Patience", "args": {"patience": 500, "use_loss": true}}, "shuffle": true, "dim_embedding": 64, "num_layers": 5, "aggregation": "mean", "dataset": "REDDIT-MULTI-5K"}, "TR_score": 19.115831066573453, "VL_score": 20.0}, "OUTER_TR": 19.23931834911077, "OUTER_TS": 20.0}

GraphSage doesn't learn.

Hi, I followed all the instructions in the github but turns out that when I'm trying to train the GraphSAGE for Social-1/IMDB-BINARY and Social-DEGREE/IMDB-BINARY the accuracies don't go higher than 0.5 (including training acc) which shows that the predictions are random. The same thing does not happen for GIN where the accuracies increase.

Pytorch 2.0.1
torch-geometric 2.3.1
python 3.10.10

PyTorch -- multiple threads for each process

Hi,

Many thanks for this framework.

I would like to ask why you recommend editing the .bashrc in order to not allow PyTorch to spawn multiple threads. Without having done so, it doesn't seem like I encounter any issues. Could you please explain?

Thanks in advance for the response.

Call each other

In Launch_Experiments.py, def train() and def _train(), The two functions call each other, resulting in an endless loop?

Best configs / splits

Hi,
Thank you for sharing this great repository and thorough experimental results.

I wish to play with the existing results and possible improvements to the GNN architectures.
Currently, to reproduce results even for a specific architecture and a specific dataset, I need to run (10 folds) * (72 configurations + 3 re-trainings).

Can you somehow share the best-found configurations for each (model) x (task) x (fold)? Or best-hyperparams that were common across all outer-folds, such that my grid search will be smaller?

Thanks!

Results don't match - am I running something in the wrong way?

Hey!
I'm trying to reproduce Enzymes's results with GIN.
Running on CPU.
Following the instructions, I'm running on the corresponding virtual environment, prepared the dataset, copied the data_splits into the dataset's folder and ran:
python Launch_Experiments.py --config-file config_GIN.yml --dataset-name ENZYMES --result-folder results --debug

I get results for 10 folds and overall (assessment_results.json):
avg_TR_score 91.61179697196341
std_TR_score 7.0978242989793205
avg_TS_score 69.22222256130644
std_TS_score 4.850862399140947

which is higher than what is documented in the paper (~59.6 on the test).

Can you assist in understanding if I'm doing something wrong plz?

Thx!

Multi-process running errors

Hi! There are some errors when I try to run the code in the multi-process mode.(i.e. without --debug)

The errors are shown below:

[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_1/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/HOLDOUT_MS/config_2/config_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_3/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_4/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_5/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_6/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_7/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_8/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_9/outer_results.json'
[Errno 2] No such file or directory: 'test-results/GraphSAGE_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_10/outer_results.json'
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:53: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TR_score'] = outer_TR_scores.mean()
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:217: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:186: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/mnt/home/wangy206/anaconda3/envs/gnn-comparison/lib/python3.7/site-packages/numpy/core/_methods.py:209: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/gnn-comparison/evaluation/risk_assessment/K_Fold_Assessment.py:55: RuntimeWarning: Mean of empty slice.
assessment_results['avg_TS_score'] = outer_TS_scores.mean()

It seems that the errors come from HoldOutSeletor.process_results(), where the son processes are killed before they finish writing "config_results.json"
or that
"best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)" begins to run before the processes in the pool finish.

DiffPool breaks on ENZYMES?

Hi @diningphil ,
Thanks again for sharing this project.

I'm getting an error while training DiffPool on ENZYMES.
It seems that it does not happen on NCI1, and that other GNN types do work on ENZYMES.
So there's nothing wrong with DiffPool nor ENZYMES, but they don't work together.

I haven't changed anything in the code, python version is 3.6.7, pytorch version 1.4.0, torch-geometric version 1.4.2, running on CPU.

I'm running

python -u Launch_Experiments.py --config-file config_DiffPool.yml --dataset-name ENZYMES --result-folder mydir --inner-processes 1 --outer-processes 10 --outer-folds 10 --debug

And getting the following error:

File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 136, in forward
    x, adj, l, e = self.diffpool_layers[i](x, adj, mask)  # x has shape (batch, MAX_no_nodes, feature_size)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 75, in forward
    s = self.gnn_pool(x, adj, mask)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/models/graph_classifiers/DiffPool.py", line 46, in forward
    x1 = self.bn(1, F.relu(self.conv1(x0, adj, mask, add_loop=False)))
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/urialon/gnn-comparison/venv/lib/python3.6/site-packages/torch_geometric/nn/dense/dense_sage_conv.py", line 58, in forward
    out = torch.matmul(out, self.weight)
RuntimeError: size mismatch, m1: [384 x 3], m2: [21 x 64] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136

Any idea?
Thanks!

The program crashes without "--debug" flag being set when using Cuda.

Is the "--debug" flag a must when I use Cuda? I try to launch the experiments using both multi-thread and Cuda. But, why the program crashes when I leave " --debug" being empty, and it runs as expected when I set this flag? Does it imply that I can't use multi-thread when I use Cuda?

It takes long time to train DiffPool

Hi Federico,

I was wondering if it is normal that the training process of DiffPool runs very slowly in your framework? It takes me almost 4h to run 200 epochs for DiffPool on DD dataset with "--debug" on cuda (K80).

BTW, the training speed for other models in your framework seems fair and acceptable when they run on my machine.

error message

Thank you for sharing the code. I found an error when I tried to preprocess the data.

Traceback (most recent call last):
File "PrepareDatasets.py", line 73, in
preprocess_dataset(dataset_name, args_dict)
File "PrepareDatasets.py", line 60, in preprocess_dataset
dataset_class(**args_dict)
File "/home/xxx/code/gnn-comparison/datasets/manager.py", line 75, in init
self.processed_dir / f"{self.name}.pt"))
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/home/xxx/miniconda3/envs/torch1.4.0/lib/python3.7/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
AttributeError: 'PosixPath' object has no attribute 'tell'

I installed torch 1.4.0 etc. following the instructions. Can you please take a look at this issue? Thanks.

The position of "config_id += 1" in the 84 of evaluation/model_selection/HoldOutSelector.py

Would it be better if we put "config_id += 1" in line 72, instead of 84?
In the case where the process is killed when only several folds results are finished, and we want to rerun the process without overwriting the previous folds results, "continue" in the line 82 would make the process skip "config_id += 1", resulting in the error below:

File cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_1/outer_results.json already present! Shutting down to prevent loss of previous experiments
Config cuda_results/diffpool/ENZYMES/baseline/DiffPoolssl_ENZYMES_assessment/10_NESTED_CV/OUTER_FOLD_2/HOLDOUT_MS/config_1/config_results.json already present! Shutting down to prevent loss of previous experiments
Traceback (most recent call last):
File "Launch_Experiments.py", line 41, in
raise e # print(e)
File "Launch_Experiments.py", line 38, in
result_folder=args.result_folder,label_ratio=args.label_ratio,ssl_option=args.ssl_option, debug=args.debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/EndToEnd_Evaluation.py", line 30, in main
risk_assesser.risk_assessment(experiment_class, debug=debug)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 92, in risk_assessment
self._risk_assessment_helper(outer_k, experiment_class, kfold_folder, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/risk_assessment/K_Fold_Assessment.py", line 119, in _risk_assessment_helper
self.model_configs, debug, other)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 102, in model_selection
best_config = self.process_results(HOLDOUT_MS_FOLDER, config_id)
File "/mnt/ufs18/home-118/wangy206/torch-projects/self-surpervised-GNN/ssl-verification-1/evaluation/model_selection/HoldOutSelector.py", line 46, in process_results
print('Model selection winner for experiment', HOLDOUT_MS_FOLDER, 'is config ', best_i, ':')
UnboundLocalError: local variable 'best_i' referenced before assignment

Using DGCNN with new data

Hi! Thanks for the awesome code. I have a question about DGCNN when using my own data. What is the k value in the configuration? I notice you have 0.6 and 0.9 and then a value for the specific number for the specific dataset, what does this correspond to?
Thank you so much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.