Git Product home page Git Product logo

benedekrozemberczki / clustergcn Goto Github PK

View Code? Open in Web Editor NEW
776.0 20.0 134.0 14.57 MB

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

License: GNU General Public License v3.0

Python 100.00%
gcn graph-convolution graph-neural-networks graph-convolutional-networks deepwalk node2vec pytorch graphsage graph2vec musae

clustergcn's Introduction

Benedek A. Rozemberczki/ Homepage / Twitter / GitHub / Google Scholar

Welcome stranger

  • ⏰ Currently working on machine learning for drug discovery.
  • 🤖 I would love to collaborate on the machine learning libraries ChemicalX and RexMex.

Great news

clustergcn's People

Contributors

benedekrozemberczki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clustergcn's Issues

ppi

`import torch
import time
import torch.nn as nn
import torch.nn.functional as F
import os.path as osp
from torch_geometric.datasets import PPI
from ppi_cluster import ClusterData, ClusterLoader
from torch_geometric.nn import SAGEConv, ChebConv
from sklearn.metrics import f1_score

path = osp.join(osp.dirname(osp.realpath(file)), '..', 'data', 'PPI')
dataset = PPI(path)
train_dataset = PPI(path, split='train') #20graphs
val_dataset = PPI(path, split='val') #2graphs
test_dataset = PPI(path, split='test') #2graphs

print('Partioning the graph... (this may take a while)')
train_dataset_list = []
val_dataset_list = []
test_dataset_list = []
dataset_list = []
train_dataset_index = test_dataset_index = val_dataset_index = 0

for data in train_dataset:
cluster_data = ClusterData(data, 'train', train_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
train_dataset_list.append(loader)
dataset_list.append(loader)
train_dataset_index += 1

for data in test_dataset:
cluster_data = ClusterData(data, 'test', test_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
test_dataset_list.append(loader)
dataset_list.append(loader)
test_dataset_index += 1

for data in val_dataset:
cluster_data = ClusterData(data, 'val', val_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
val_dataset_list.append(loader)
dataset_list.append(loader)
val_dataset_index += 1

print('Done!')`

ppi dataset

`
import torch
import time
import torch.nn as nn
import torch.nn.functional as F
import os.path as osp
from torch_geometric.datasets import PPI
from ppi_cluster import ClusterData, ClusterLoader
from torch_geometric.nn import SAGEConv, ChebConv
from sklearn.metrics import f1_score

path = osp.join(osp.dirname(osp.realpath(file)), '..', 'data', 'PPI')
dataset = PPI(path)
train_dataset = PPI(path, split='train') #20graphs
val_dataset = PPI(path, split='val') #2graphs
test_dataset = PPI(path, split='test') #2graphs

print('Partioning the graph... (this may take a while)')
train_dataset_list = []
val_dataset_list = []
test_dataset_list = []
dataset_list = []
train_dataset_index = test_dataset_index = val_dataset_index = 0

for data in train_dataset:
cluster_data = ClusterData(data, 'train', train_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
train_dataset_list.append(loader)
dataset_list.append(loader)
train_dataset_index += 1

for data in test_dataset:
cluster_data = ClusterData(data, 'test', test_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
test_dataset_list.append(loader)
dataset_list.append(loader)
test_dataset_index += 1

for data in val_dataset:
cluster_data = ClusterData(data, 'val', val_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
val_dataset_list.append(loader)
dataset_list.append(loader)
val_dataset_index += 1

print('Done!')
`

Cannot run main.py

src/main.py --epochs 100
+-------------------+----------------------+
| Parameter | Value |
+===================+======================+
| Cluster number | 10 |
+-------------------+----------------------+
| Clustering method | metis |
+-------------------+----------------------+
| Dropout | 0.500 |
+-------------------+----------------------+
| Edge path | ./input/edges.csv |
+-------------------+----------------------+
| Epochs | 100 |
+-------------------+----------------------+
| Features path | ./input/features.csv |
+-------------------+----------------------+
| Layers | [16, 16, 16] |
+-------------------+----------------------+
| Learning rate | 0.010 |
+-------------------+----------------------+
| Seed | 42 |
+-------------------+----------------------+
| Target path | ./input/target.csv |
+-------------------+----------------------+
| Test ratio | 0.900 |
+-------------------+----------------------+

Metis graph clustering started.

Traceback (most recent call last):
File "src/main.py", line 24, in
main()
File "src/main.py", line 18, in main
clustering_machine.decompose()
File "/Users/linmiao/gits/ClusterGCN/src/clustering.py", line 38, in decompose
self.metis_clustering()
File "/Users/linmiao/gits/ClusterGCN/src/clustering.py", line 56, in metis_clustering
(st, parts) = metis.part_graph(self.graph, self.args.cluster_number)
File "/usr/local/lib/python3.7/site-packages/metis.py", line 765, in part_graph
graph = networkx_to_metis(graph)
File "/usr/local/lib/python3.7/site-packages/metis.py", line 574, in networkx_to_metis
for i in H.node:
AttributeError: 'Graph' object has no attribute 'node'

The error of metis, Segmentation fault (core dumped)

I found that I can use the random model to divide the graph, but when using Metis, the code will terminate abnormally. I want to ask what causes this. I change "IDXTYPEWIDTH = os.getenv('METIS_IDXTYPEWIDTH', '32')" in metis.py (line 31) to "IDXTYPEWIDTH = os.getenv('METIS_IDXTYPEWIDTH', '64')", but it doesn't work!!!

python src/main.py
+-------------------+----------------------+
| Parameter | Value |
+===================+======================+
| Cluster number | 10 |
+-------------------+----------------------+
| Clustering method | metis |
+-------------------+----------------------+
| Dropout | 0.500 |
+-------------------+----------------------+
| Edge path | ./input/edges.csv |
+-------------------+----------------------+
| Epochs | 200 |
+-------------------+----------------------+
| Features path | ./input/features.csv |
+-------------------+----------------------+
| Layers | [16, 16, 16] |
+-------------------+----------------------+
| Learning rate | 0.010 |
+-------------------+----------------------+
| Seed | 42 |
+-------------------+----------------------+
| Target path | ./input/target.csv |
+-------------------+----------------------+
| Test ratio | 0.900 |
+-------------------+----------------------+

Metis graph clustering started.

Segmentation fault (core dumped)

About installation

Hi there:
Thank you for your great work, I've finally got the code running.
To make the installation in README.md more precise & complete. You may want to add the following dependancies:

  • torch_spline_conv == 1.0.4
  • torch_sparse == 0.2.2
  • torch_scatter == 1.0.4
  • torch_cluster == 1.1.5 (strict)

Failed to locate Metis

Hi.
I installed metis with 'pip install metis' and an error occurred: 'RuntimeError: Could not locate METIS dll. Please set the METIS_DLL environment variable to its full path.' In the linkhttps://metis.readthedocs.io/en/latest/_modules/metis.html I knew that I needed to download this: http://glaros.dtc.umn.edu/gkhome/views/metis
but still I failed to build the source code of metis.

Would you please give me some advise on solving this problem? Thank you in advance!

some code is missing

there is no code of Stochastic Multiple Partitions and Issues of training deeper GCNs...

RuntimeError: Could not locate METIS dll.

hello,when I run main.py, the error massage appears:

raise RuntimeError('Could not locate METIS dll. Please set the METIS_DLL environment variable to its full path.')
RuntimeError: Could not locate METIS dll. Please set the METIS_DLL environment variable to its full path.

do you know how to solve it?

For ppi

Hello. Thanks for your work and code. It's great that Cluster-GCN achieves great performance in PPI datasets. But it seems that you have not opened source the code for PPI node classification.

Do you find the best model on validation dataset at first then test on the unseen test dataset?
I notice that GraphStar now is the SOTA. However, they don't use the validation dataset and directly find the best model on test dataset.

Can you share code of PPI with us and mention how to split dataset in the readme file? It's important for others to follow your great job.

ImportError: No module named 'torch_spline_conv'

I followed the instructions of installation properly, however, error above occurred.

After checking the site packages folder, i do not find the file torch_spline_conv.
I will google around for finding out why that is happening, but thought you might have some insights

Any help is appreciated.

The complete trace is as follows

File "src/main.py", line 4, in <module>
    from clustergcn import ClusterGCNTrainer
  File "/media/anuj/Softwares & Study Material/Study Material/MS Stuff/RA/ClusterGCN/src/clustergcn.py", line 5, in <module>
    from layers import StackedGCN
  File "/media/anuj/Softwares & Study Material/Study Material/MS Stuff/RA/ClusterGCN/src/layers.py", line 2, in <module>
    from torch_geometric.nn import GCNConv
  File "/home/anuj/virtualenv-forest/gcn/lib/python3.5/site-packages/torch_geometric/nn/__init__.py", line 1, in <module>
    from .conv import *  # noqa
  File "/home/anuj/virtualenv-forest/gcn/lib/python3.5/site-packages/torch_geometric/nn/conv/__init__.py", line 1, in <module>
    from .spline_conv import SplineConv
  File "/home/anuj/virtualenv-forest/gcn/lib/python3.5/site-packages/torch_geometric/nn/conv/spline_conv.py", line 3, in <module>
    from torch_spline_conv import SplineConv as Conv
ImportError: No module named 'torch_spline_conv'

TypeError: object of type 'int' has no len()

hello, when I run main.py, I found the error message:
File "D:\anaconda3.4\lib\site-packages\pymetis_init_.py", line 44, in _prepare_graph
for i in range(len(adjacency)):
TypeError: object of type 'int' has no len()

I have installed pymetis package to solve the metis.dll, this error occurs in the pymetis_init_.py.
do you know how to solve it?

Metis Segmentation Fault (Cored Dumped)

Hi, @benedekrozemberczki thanks so much for sharing the code. After following the installation instructions, it seems to failing to install the metis for python. I can run the main.py with random partition method. However, I always get "aborted (core dumped)" when using metis partition method. Could you help me to target issue? It'd better if you can share some suggestions.

  1. Download metis-5.1.0.tar.gz from http://glaros.dtc.umn.edu/gkhome/metis/metis/download and unpack it
  2. cd metis-5.1.0
  3. make config shared=1 prefix=~/.local/
  4. make install
  5. export METIS_DLL=~/.local/lib/libmetis.so

Here is an official example to check metis. However, I always get "aborted (core dumped)" error.
`

import networkx as nx
import metis
G = metis.example_networkx()
(edgecuts, parts) = metis.part_graph(G, 3)
`

===== My System Information ====
Ubuntu 18, python 3.7

ppi with clusterdata

`import torch
import time
import torch.nn as nn
import torch.nn.functional as F
import os.path as osp
from torch_geometric.datasets import PPI
from ppi_cluster import ClusterData, ClusterLoader
from torch_geometric.nn import SAGEConv, ChebConv
from sklearn.metrics import f1_score

path = osp.join(osp.dirname(osp.realpath(file)), '..', 'data', 'PPI')
dataset = PPI(path)
train_dataset = PPI(path, split='train') #20graphs
val_dataset = PPI(path, split='val') #2graphs
test_dataset = PPI(path, split='test') #2graphs

print('Partioning the graph... (this may take a while)')
train_dataset_list = []
val_dataset_list = []
test_dataset_list = []
dataset_list = []
train_dataset_index = test_dataset_index = val_dataset_index = 0

for data in train_dataset:
cluster_data = ClusterData(data, 'train', train_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
train_dataset_list.append(loader)
dataset_list.append(loader)
train_dataset_index += 1

for data in test_dataset:
cluster_data = ClusterData(data, 'test', test_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
test_dataset_list.append(loader)
dataset_list.append(loader)
test_dataset_index += 1

for data in val_dataset:
cluster_data = ClusterData(data, 'val', val_dataset_index, num_parts=2, recursive=False,
save_dir=dataset.processed_dir)
loader = ClusterLoader(cluster_data, batch_size=20, shuffle=True,
num_workers=0)
val_dataset_list.append(loader)
dataset_list.append(loader)
val_dataset_index += 1

print('Done!')`

ppi

import torch import time

issues about the

(st, parts) = metis.part_graph(self.graph, self.args.cluster_number)

Amazon2M Dataset

Hi,

Are you planning on releasing the Amazon2M dataset used in the paper?

Thanks,
Emanuele

Runtime error about metis

At the train begining that part the full graph, the function "metis.part_graph(self.graph, self.args.cluster_number)" throws an error:
Traceback (most recent call last): File "C:/Users/xieRu/Desktop/ML/ClusterGCN/src/main.py", line 30, in <module> main() File "C:/Users/xieRu/Desktop/ML/ClusterGCN/src/main.py", line 19, in main clustering_machine.decompose() File "C:\Users\xieRu\Desktop\ML\ClusterGCN\src\clustering.py", line 38, in decompose self.metis_clustering() File "C:\Users\xieRu\Desktop\ML\ClusterGCN\src\clustering.py", line 56, in metis_clustering (st, parts) = metis.part_graph(self.graph, self.args.cluster_number) File "D:\Program\Anaconda\lib\site-packages\metis.py", line 800, in part_graph _METIS_PartGraphKway(*args) File "D:\Program\Anaconda\lib\site-packages\metis.py", line 677, in _METIS_PartGraphKway adjwgt, nparts, tpwgts, ubvec, options, objval, part) OSError: exception: access violation writing 0x000001B0B9C0E000

But I tried test package metis as follow, It works:
`
import metis
from networkx import karate_club_graph

zkc = karate_club_graph()
graph_clustering=metis.part_graph(zkc)
`
So, what happend?

Different partition size

Metis may provide sub-graphs with unequal number of nodes. The size of the adjacency matrix will be different. How do you handle this issue during training?

some question about code

It seems like your code didn't consider the connection between clusters,and normalization that are mentioned in paper ,will you add these two options?

Segmentation Fault when running code

Hi, I was running your code and encounter a segmentation fault
the error was happenend at
clustering.py line 58
(st, parts) = metis.part_graph(self.graph, self.args.cluster_number)

I was wondering if anyone also encounter this issue?

I've changed the
IDXTYPEWIDTH = os.getenv('METIS_IDXTYPEWIDTH', '64')
and my python version is 3.7

I know the version in repo is 3.5, but I encounter trouble when installing torch-scatter, torch-sparse.. using python3.7
so I changed to python3.5

Frame problem

Hello, I would like to ask my data is all labeled, can I use this framework for training? I think many of them are semi-supervised frameworks.

Segmentation fault While running main.py on Ubuntu

while i am running main.py i am getting the segmentation fault error on Ubuntu.

python3 main.py --epochs 100

+-------------------+----------------------------------------------------------+
| Parameter | Value |
+===================+==========================================================+
| Cluster number | 10 |
+-------------------+----------------------------------------------------------+
| Clustering method | metis |
+-------------------+----------------------------------------------------------+
| Dropout | 0.500 |
+-------------------+----------------------------------------------------------+
| Edge path | /home/User/Desktop/ClusterGCN-master/input/edges.csv |
+-------------------+----------------------------------------------------------+
| Epochs | 100 |
+-------------------+----------------------------------------------------------+
| Features path | /home/User/Desktop/ClusterGCN- |
| | master/input/features.csv |
+-------------------+----------------------------------------------------------+
| Layers | [16, 16, 16] |
+-------------------+----------------------------------------------------------+
| Learning rate | 0.010 |
+-------------------+----------------------------------------------------------+
| Seed | 42 |
+-------------------+----------------------------------------------------------+
| Target path | /home/User/Desktop/ClusterGCN- |
| | master/input//target.csv |
+-------------------+----------------------------------------------------------+
| Test ratio | 0.900 |
+-------------------+----------------------------------------------------------+

Metis graph clustering started.

Segmentation fault

About the feature

Hi,

Thanks for your inspiring work!
I wonder what the values represent in the features.csv.

Metis hits a Segmentation fault when running _METIS_PartGraphKway

  • I'm using the default test input files.

  • I've attached pdb screenshot during the run.

  • Environment:
    Ubuntu 18.04
    Anaconda (Python 3.7.3),
    torch-geometric==1.3.0
    torch-scatter==1.3.0
    torch-sparse==0.4.0
    torch-spline-conv==1.1.0
    metis==0.2a.4

PDB Error
Screenshot from 2019-07-04 13-56-16

Requirements.txt
Screenshot from 2019-07-04 14-02-14

issues about the metis algorithm

(st, parts) = metis.part_graph(self.graph, self.args.cluster_number)
Thanks for your awesome code, could you please tell me how metis conduct the graph partition?
Cause the self.graph here doesn't include the information about edge weights and feature attributes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.