tkipf / pygcn Goto Github PK

View Code? Open in Web Editor NEW

5.1K 5.1K 1.2K 225 KB

Graph Convolutional Networks in PyTorch

License: MIT License

Python 100.00%

pygcn's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle codeaudit ml-lab fireae ericmjl dl-yc ajaytalati pbaljeka amoliu guanqj932 watkyns apilastri vikingmew xintianhan sagarsamtani07 shiruipan huangzhii wouterkool qinkevin ciela zhixinshu gumpfly chaihua483 mingltu morganjk senthilps8 chengtaoli wpfhtl fredlang kyocen leezqcst shubhampachori12110095 haofusheng lcfractal diego999 kingofspace0wzz johny-c 9310gaurav rishabgoel ayg-dl glnmario harshnarang8 aparna-b sshan-zhao cs-minglong uditsaxena statml anandbhattad ai3dvision b2220333 reeshark maitykalyan slooowtyk zxplsec sjtusuperxu llwc ahhaa songfgh zcrwind hyzcn wxy920801 codes-kzhan zfchen95 daijucug aimeng100 daniilidis-group jhy1993 david4096 christinaliang lukaszozimek emigmo huyhoang17 zhaoyu775885 zetayue skkuggi prokia amar-iastate weigq hongxuchenuq qiuhual yingweiy auserj hulalazz mrsun15 zju-plp arbi11 fongyk shenzhenzhaoxin zhf459 lidongyv wh-forker yowhatever roszcz snuffle-px new2scala kevin-ssy yayachenyi fly6464 pencoa lidaiqing

pygcn's Issues

Order of nodes

Hi,
Thank you for your implementation, both in Tensorflow and Pytorch.
I would like to ask if you can provide the order of nodes used for the Cora, Citeseer and Pubmed datasets in your Tensorflow implementation.

I look into your code in this repository and I can see the order comes from cora.content.

However, it must be different from the order in the Tensorflow repo (where binary files are provided) because when I use the adjacency matrix created by this repo (Pytorch) for the Tensorflow code, the result produced by the Tensorflow code is very low, thus the two adjacency matrices must be different.
Many thanks.

GPU usage is 0 when training

When training with the default setting, GPU usage is 0% while CPU is 100%.

The training code seems to be using CUDA, however, it doesn't seem helpful in boosting the speed.

Why?

Where is the implementation of the convolved signal matrix

hi, I cannot find the implementation of the convolved signal matrix as the following picture:

can u tell me?
@tkipf

Could I get the node embedding?

Hi,

Could I get the node embedding with a certain length based on this code? I should extract the output of which step?

Thanks,

How convolutional filters are initialized in spectral domain ?

determinism problem

Hi Kipf,

Nice work. I am trying to run the model but failed to get the same result everytime even with the below lines.

np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

Chebyshev convolution in Pytroch and its kernel size

Hello,

In your Pytorch version, where the Chebyshev convolution is implemented ?

l don't find it in https://github.com/tkipf/pygcn/tree/master/pygcn

l would like to see the kernel size of convolutional filters
Thank you

Xavier initialization use output size instead of input size

I notice the standard deviation for initialization used output size instead of input size. Is this implementation intended?

self.weight = Parameter(torch.FloatTensor(in_features, out_features))
stdv = 1. / math.sqrt(self.weight.size(1))

Batch operations

Hi @tkipf , excellent work on pygcn! Really nice engineering setting up sparse adjacency multiplications and super clean code. I'm curious to hear how you suggest dealing with batch operations? Unless I am misunderstanding, it looks in train.py that each epoch operates on a single large graph, and the labels are per-node labels. If this interpretation is correct, do you have any suggestions for datasets consisting of many graphs (a series of sparse matrices) each mapped to a graph-level output/label? this would be solved if PyTorch could accept a list of tensors as an input but that does not seem (easily) supported right now. Thanks for any advice!

Cheers,
Evan

PS Great to meet you at Stanford a few weeks ago!

About symmetrically normalization of adjacency matrix

Hi, I notice that in this PyTorch version code, the adjacency matrix is row-normalized instead of symmetrically normalized. However, the accuracy (82.5%) is higher than the TensorFlow version code (81.6%). Moreover, I also tried to symmetrically normalize the adjacency matrix in this PyTorch version, but the result dropped (to 79.9%). Nevertheless, result of TensorFlow version does not change after modification of normalization. For summarization, this is the experiments I did:

Cora dataset	TensorFlow	PyTorch
Symmetrically Normalization	81.6	79.9
Row Normalization	81.6	82.5

Is there any idea why does this happen?

The SpecialSpmmFunction Class question

I noticed that SpecialSpmmFunction is the subclass of torch.autograd.Function and there is only one object in class SpGraphAttentionLayer.

`class SpGraphAttentionLayer(nn.Module):
def init(self, in_features, out_features, dropout, alpha, concat=True):
super(SpGraphAttentionLayer, self).init()
self.in_features = in_features
self.out_features = out_features
self.alpha = alpha
self.concat = concat

    self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
    nn.init.xavier_normal_(self.W.data, gain=1.414)
            
    self.a = nn.Parameter(torch.zeros(size=(1, 2*out_features)))
    nn.init.xavier_normal_(self.a.data, gain=1.414)

    self.dropout = nn.Dropout(dropout)
    self.leakyrelu = nn.LeakyReLU(self.alpha)
    self.special_spmm = SpecialSpmm()`

But in official documents， there is a saying that Each function object is meant to be used only once (in the forward pass). I found the self.special_spmm forward twice in

`e_rowsum = self.special_spmm(edge, edge_e, torch.Size([N, N]), torch.ones(size=(N,1), device=dv))
# e_rowsum: N x 1

    edge_e = self.dropout(edge_e)
    # edge_e: E

    # Each function object is meant to be used only once (in the forward pass).
    h_prime = self.special_spmm(edge, edge_e, torch.Size([N, N]), h)`

Have I misunderstand sth.?

How to achieve the training using multi-graph (each graph/mesh from an 3D medical image)?

Hi，

Thanks for your work. I'm totally a green hand of computer vision.

Recently, I'm trying to achieve the medical image segmentation using GCNs.

However, I read your code and find that the input is a whole graph, which includes the training data, test data, and validation data. However, in my framework, the training image is regarded as a graph, and there is not a connection between each image.

Thus, I'm confused about how to achieve this.

Besides, I want to know whether the input feature can be extracted patch from the target node.

Look forward to your reply.

Thanks,
lei

torch.spmm

can I adapt the code to multi-label classificaiton?

Hi,

I have a multi-label classification problem, where one node can have multi labels, DO I need change the code for multi-class classification? thanks.

Why test sample number is much more than train / validation sample?

I saw you assign train/val/test follow this way:
idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)

I don't understand why you assign test sample number much more than train and val?

Vocabulary items

Hi! I'd be very curious to know what the words in the vocabulary are. Could I find that somewhere?

Question about the results

Hello, thank you for you amazing work! I am a starter in this field, and I'm confused about the result. I ran the code in PyCharm, I was wondering why I got different output every time I ran the code. As I see, the random seeds are fixed at the begining, am I missing something?

build symmetric adjacency matrix

Hi tkipf, thank you for your amazing work!
in utils.py, starts from Line 36,

    # build symmetric adjacency matrix
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

As far as I understand, these lines are turning a directed adjacency matrix into an undirected adjacency matrix?
Since adj is a 0-1 matrix, then for the positions adj[i,j] where adj.T > adj, we should have adj[i,j] = 0, so the - adj.multiply(adj.T > adj) part is always zero.

Then what's the purpose of having that part, or am I understand it incorrectly?

load own adjacency matrix and features

I have my own array for adjacency matrix and features, thus i did not use the load function in utils.py
Could I ask is it right for me to generate adjacency and feature data for training graph cnn. features and adj are full numpy array:

features = sp.csr_matrix(features, dtype=np.float32)
features = normalize(features)
features = np.array(features.todense())
adj = sp.csr_matrix(adj)
adj = normalize(adj + sp.eye(adj.shape[0]))
adj = sparse_mx_to_torch_sparse_tensor(adj)

Another question is that why need "adj + sp.eye(adj.shape[0])" if adj is an adjacency matrix? thanks.

What is the difference between two adjacency matrix normalization?

Hello, thanks for the amazing work. In your implementation, you use D^-1A, but I noticed that some other work use D^-1/2AD^-1/2, I suppose these two calculation won't get the same normalized adjacency matrix. Which one should I choose? Or they will have the same performance?
I think maybe in a large graph(A is very big), D^-1/2AD^-1/2 will roughly equal to D^-1A, is that correct?

Citeseer data set accuracy

Dear professor,
Hello!
I am very interesting in your recent GCN work.
Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

Values of some parameters

In the module named "MODELS" :

class GCN(nn.Module):
def init(self, nfeat, nhid, nclass, dropout):

but the values of these paramters are not specified in the code. I am confused how the code gets these values! nfeat, nhid, nclass

would be great helping me out

Why Early Stop is not leverage in this codes?

Early Stop scheme could enhance the performance to a certain extent. Why it's not used in this codes?

Symmetrically normalize adjacency matrix

Why in this pytorch implement，use row-normalize sparse matrix both for features and adj which is different from tensorflow implement？

F.log_softmax(x, dim=1) output is not probability?

Hi,

calling output = model(features, adj) does not give probability output? if I want model to return probability, what should I change?
If I change log_softmax to softmax, the loss function F.nll_loss should be changed?
thanks.

What is difference between tensorflow version and pytorch version?

What is difference between tensorflow version and pytorch version?
and why did you change? (split of training dataset and add dropout layer)

the normalize function in utils.py

Hi,
The normalize function in utils.py only normalize the row of adjacency matrix. While for the tensorflow version the implementation is different. You normalized both row and column. I am wondering will this lead to a difference for performance of GCN in accuracy ?

Best,
Xiaoyun

output = model(features, adj) Are test features involved in the trainning process?

Hello,Kpitf !
Thanks for your share! In the trainng process,features are 2708 dims, does it involve test samples?

thank you very much!

Still curious about the code of building symmetric adjacency matrix

` adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
shape=(labels.shape[0], labels.shape[0]),
dtype=np.float32)

# build symmetric adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)`

I don't understand how the last line code produces the symmetric matrix.
And I think it is intuitive to build the symmetric like this
adj = adj + adj.T

Can anyone help to answer my questions? thanks a lot

How to implement residual connections in gcn?

Hi @tkipf . I have some confusion about residual connection in GCN.

def forward(self, x, adj):
        # x size: [2708, 1433]
        # adj size: [2708, 2708]
        x = F.relu(self.gc1(x, adj))   # [2708, 16]
        ......

input size: [2708, 1433] , but first layer output's size is [2708, 16]

If I implement residual like equation above.

x = F.relu(self.gc1(x, adj))  + x

Error! size miss match.

output = model(features, adj) Are test features involved in the trainning process?

Hello,tkipf !
Thanks for your share! In the trainng process,features are 2708 dims, does it involve test samples? thank you very much!

Multiple instance learning and GCN..

Hi,
Thank you for providing such a great model. I would like to ask , can we apply Multiple Instance learning on text-GCN ? and which level that would be , graph classification level or node classification level.

Thank you in advance

datasets partition

Hi tkipf, thanks for your sharing.

There are a total of 2708 lines in cora.content.However in utils.py,data division is as following:

idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)

May I ask what is the reason for splitting in this way? Thank you

No features + sparse labels training

Hello and thanks for your work.

I would like to apply the GCN architecture on a graph whose nodes have no features, and also very few nodes have labels. More specifically, this is going to be a graph of words, where related words are connected with an edge, and also I have some document nodes that are connected to the words they contain. Some document nodes have labels, and the rest are left to be predicted. Word nodes are just there to help associate document nodes with one another, and, hopefully, propagate the labels from the training document nodes to the testing document nodes. The first dataset I tried is OHSUMED, if that makes a difference.

I started transforming the code to fit my needs, but I have a couple of issues:

What do I replace the feature matrix with? F is a no. of nodes x features size matrix, that I have no way to populate. What I tried was to set it as an identity matrix, but that seems random. Also, I tried to set this node to features matrix as another trainable parameter.
In the original problem, every node has a label associated with it. However, in my case, less than .1% of the nodes have a label. I decided to just provide the indexes of the adjacency matrix that are associated with the training/validating/testing document nodes. Is there an optimal way to represent non-labeled nodes?

So far, I haven't been able to get the model to work in this problem. With several permutations of the modifications above, I get an accuracy of about 20%, far below my other baselines. Am I missing something obvious in the model definition or the optimization process?

Any help is welcome.

Using for Regression

Thank you for sharing your implementation in Pytorch.
I am using a similar GCN structure for regression analysis. Therefor the last layer would be the same as others. My proposed GCN follows the below structure.
model GCN(
(gc1): GraphConvolution (2 -> 2)
(gc2): GraphConvolution (2 -> 20)
(gc3): GraphConvolution (20 -> 20)
(gc4): GraphConvolution (20 -> 20)
(gc5): GraphConvolution (20 -> 2)
(gc6): GraphConvolution (2 -> 2)
)
The inputs are locations of 2D vertices and adjacency matrix of synthetic data (for simplicity a circular shape graphs).
The activation functions are tanh and the loss function is L2norm (because the problem is regression).
I’ve also initialized the weights and bias parameters as following:
def reset_parameters(self):
stdv = 1. / math.sqrt(10/self.nhid)
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.fill_(0)
I feed the network with some noisy data (as input graphs) and the target is a circle. It is expected to networks can regressed a circular shape but outputs have elliptic shape. I got that this network comes to hight sensitivity respect to weight initialization.
Why this GCN couldn’t work to solve a regression problem? Could you please give me your advise and some feedback about this.

Question about adj

In your code, use below code to create adj matrix

    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
                        shape=(labels.shape[0], labels.shape[0]),
                        dtype=np.float32)

That meansadj[edges[:, 0][k], edges[:, 1][k]] = np.ones(edges.shape[0])[k]. But in the file data/cora/README, it says that the direction of the link is from right to left. Details are as follows：

The .cites file contains the citation graph of the corpus. Each line describes a link in the following format:

		<ID of cited paper> <ID of citing paper>

Each line contains two paper IDs. The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation. The direction of the link is from right to left. If a line is represented by "paper1 paper2" then the link is "paper2->paper1".

I would like to ask why the code that produces the adjacency matrix is not like this：

    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 1], edges[:, 0])),
                        shape=(labels.shape[0], labels.shape[0]),
                        dtype=np.float32)

That meansadj[edges[:, 1][k], edges[:, 0][k]] = np.ones(edges.shape[0])[k].

Thanks a lot ~

The intended output

Thank you for sharing the work. Is the model for detecting communities? (classifying nodes with community labels?) Please include the intended output in the Readme. Then it will be easier for beginners like me.

TypeError: expected torch.FloatTensor (got torch.LongTensor)

Hi tkipf, thank you for sharing the source code.

I ran it on Pytorch 0.4.0 and Python 2.7, but got this type error. However, if I used python 3.5 it can be run.
Loading cora dataset...

Traceback (most recent call last):
  File "train.py", line 49, in <module>
    dropout=args.dropout)
  File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 11, in __init__
    self.gc2 = GraphConvolution(nhid, nclass)
  File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 43, in __init__
    self.bias = Parameter(torch.Tensor(out_features))
TypeError: expected torch.FloatTensor (got torch.LongTensor)

May I ask how to solve this issue? Thank you

The cora sites graph is not connected?

@tkipf

When I displayed the graph I got a graph with many components. Is this correct? pdf attached. Maybe you included extra info in the edge list?

cora.pdf

Scale up to million of nodes

Hi @tkipf, thank you so much for providing the code.

I'm wondering if it's possible to scale this implementation up to millions of nodes (obviously the number of edges must scale linearly), for example a grid. I'm not familiar with PyTorch's sparse matrix implementation, so I'm not sure if representing the adjacency matrix as a sparse matrix is enough to deal with large graphs?

Node Classification for Multiple Graphs

I have multiple graphs for node classification task. All the examples I've seen so far was for graph classification(or there is just one graph for node classification task). Although I've seen building block diagonal adjacency matrix, I'm not sure if it is for graph classification or node. Also I didn't understand whether should I create a block diagonal matrix with feature matrix and labels or not.

Let's suppose I have 20 different graphs(with different number of nodes, edges, features). And each node of every graph is labeled.
All the nodes in first 10 graphs are for training, all the nodes of the next 5 graphs are for val, and all the nodes of last 5 graphs are for test, and what I try to do is predicting labels of the nodes for the graphs in test-set. How can I input multiple graphs into GCN with these conditions for node classification task(not for graph classification).? If the solution is diagonal adj. matrix, should I do the same for labels and feature matrix?

What is different from the original code?

As I trained, the result of this repository is more accurate than the original paper (SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS) in cora dateset.

What is different from the original code?

non-square adj matrix

Hello, thank you for your great work.

I want to extend gcn which involves message passing,
but I'm new to GCN so I have a minor question.

I have to types of node A, B.
Basically I want to train different weights jointly. (Weight_AA, Weight_AB, Weight_BA, Weight_BB)
During the node representation update,
A(t+1) = Weight_AA*A(t)adj(AA) + Weight_ABB(t)adj(BA)
B(t+1) = Weight_BBB(t)adj(BB) + Weight_BAA(t)*adj(AB)
The first terms are simple graph convolution layer with adj(AA), adj(BB) are both square
but for the adj(BA), adj(AB) it might not be square, (# of two types of nodes will differ)

Can I use non-square adj matrix during the whole process? (normalize, forward, ...)

question about inputs matrix X, the node feature matirx

I want to know how I can get the feature matrix of nodes with my own training data, suppose that the number of nodes is N and the dim of every featue vector is d, how can I get the inputs X whose shape is N*d?
Thanks a lot!

Processing a single graph in batch mode

Hi @tkipf, thanks for your awesome work and providing this code!

I have a bit of a novice question: I'm trying to process my graph's features through a GCN in mini-batches. I.e. let's say I have a 1000-node graph and I want to process it through the GCN in mini-batches of size 50.

It doesn't seem like the code currently supports this because we have to multiply by the full adjacency matrix in the GCN layer's forward pass - do you have any sense of how I can support these "batch" operations? Better yet, do you have example code that does this?

My code looks something like:

z0 = F.tanh(self.gcn1(x, self.fancy_adj))

where x is a sampled batch of size 50 x F (not the full batch of 1000 x F) and self.fancy_adj is the adjacency matrix transformed as suggested in your paper ( adjacency + identity + row normalized). The problem, of course, is that self.fancy_adj is a 1000 x 1000 matrix. Even if I take just the rows of self.fancy_adj corresponding to the 50 points in the batch, then the adjacency matrix becomes a 50 x 1000 matrix which can't be multiplied by the 50 x F sampled batch.

I am confused about the dataset

In the cora.content file, I don't know what is the features mean.

Question normalization trick

Dear authors,

I was wondering why in the function normalize, the power is to -1 and not -1/2 as in the tensorflow code. Is there any reason for this ?

Thanks for the answer !

Data format

Hi! How do you process the data and save it as .content format ? what is the the content of 'cora.cities' and 'cora.content' ? Thanks!

inconsistent tensor size

Hi,
When I run train.py there is a error message says:
Loading cora dataset...
Traceback (most recent call last):
File "train.py", line 104, in
train(epoch)
File "train.py", line 69, in train
output = model(features, adj)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 15, in forward
x = F.relu(self.gc1(x, adj))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 61, in forward
return output + self.bias
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 745, in add
return self.add(other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 283, in add
return self._add(other, False)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 277, in _add
return Add(inplace)(self, other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py", line 20, in forward
return a.add(b)
RuntimeError: inconsistent tensor size at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:831

Using GCN as encoder alone for multiple small graph data.

Thanks for providing this clean example of GCN in pytorch. I am new to graph machine learnig so I would like to ask is this work fit the following scenario (See the figure and text below)?

Given many small graphs (G1,G2,...Gn), each has node vector X (#node * #features) and adjacent matrix A (#node * #node), I just want to use GCN as an encoder to get the representation of each graph for the downstream task (The label is kind of sequential tags)

(There is only one GCN layers, All blue blocks are the same.)

In addition, after checking the batch operation issue (#1 ). In my scenario, I wonder whether it is possible to just stack multiple node vector X and its adjacent matrix A like below then apply the same batch operation as usual instead of creating a big block-diagonal matrix?