Git Product home page Git Product logo

pygcn's People

Contributors

amar-iastate avatar nkolot avatar queuecumber avatar rusty1s avatar tkipf avatar wbadart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pygcn's Issues

Order of nodes

Hi,
Thank you for your implementation, both in Tensorflow and Pytorch.
I would like to ask if you can provide the order of nodes used for the Cora, Citeseer and Pubmed datasets in your Tensorflow implementation.

I look into your code in this repository and I can see the order comes from cora.content.

However, it must be different from the order in the Tensorflow repo (where binary files are provided) because when I use the adjacency matrix created by this repo (Pytorch) for the Tensorflow code, the result produced by the Tensorflow code is very low, thus the two adjacency matrices must be different.
Many thanks.

GPU usage is 0 when training

When training with the default setting, GPU usage is 0% while CPU is 100%.

The training code seems to be using CUDA, however, it doesn't seem helpful in boosting the speed.

Why?

Could I get the node embedding?

Hi,

Could I get the node embedding with a certain length based on this code? I should extract the output of which step?

Thanks,

determinism problem

Hi Kipf,

Nice work. I am trying to run the model but failed to get the same result everytime even with the below lines.

np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

Xavier initialization use output size instead of input size

I notice the standard deviation for initialization used output size instead of input size. Is this implementation intended?

self.weight = Parameter(torch.FloatTensor(in_features, out_features))
stdv = 1. / math.sqrt(self.weight.size(1))

Batch operations

Hi @tkipf , excellent work on pygcn! Really nice engineering setting up sparse adjacency multiplications and super clean code. I'm curious to hear how you suggest dealing with batch operations? Unless I am misunderstanding, it looks in train.py that each epoch operates on a single large graph, and the labels are per-node labels. If this interpretation is correct, do you have any suggestions for datasets consisting of many graphs (a series of sparse matrices) each mapped to a graph-level output/label? this would be solved if PyTorch could accept a list of tensors as an input but that does not seem (easily) supported right now. Thanks for any advice!

Cheers,
Evan

PS Great to meet you at Stanford a few weeks ago!

About symmetrically normalization of adjacency matrix

Hi, I notice that in this PyTorch version code, the adjacency matrix is row-normalized instead of symmetrically normalized. However, the accuracy (82.5%) is higher than the TensorFlow version code (81.6%). Moreover, I also tried to symmetrically normalize the adjacency matrix in this PyTorch version, but the result dropped (to 79.9%). Nevertheless, result of TensorFlow version does not change after modification of normalization. For summarization, this is the experiments I did:

Cora dataset TensorFlow PyTorch
Symmetrically Normalization 81.6 79.9
Row Normalization 81.6 82.5

Is there any idea why does this happen?

The SpecialSpmmFunction Class question

I noticed that SpecialSpmmFunction is the subclass of torch.autograd.Function and there is only one object in class SpGraphAttentionLayer.

`class SpGraphAttentionLayer(nn.Module):
def init(self, in_features, out_features, dropout, alpha, concat=True):
super(SpGraphAttentionLayer, self).init()
self.in_features = in_features
self.out_features = out_features
self.alpha = alpha
self.concat = concat

    self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
    nn.init.xavier_normal_(self.W.data, gain=1.414)
            
    self.a = nn.Parameter(torch.zeros(size=(1, 2*out_features)))
    nn.init.xavier_normal_(self.a.data, gain=1.414)

    self.dropout = nn.Dropout(dropout)
    self.leakyrelu = nn.LeakyReLU(self.alpha)
    self.special_spmm = SpecialSpmm()`

But in official documents, there is a saying that Each function object is meant to be used only once (in the forward pass). I found the self.special_spmm forward twice in

`e_rowsum = self.special_spmm(edge, edge_e, torch.Size([N, N]), torch.ones(size=(N,1), device=dv))
# e_rowsum: N x 1

    edge_e = self.dropout(edge_e)
    # edge_e: E

    # Each function object is meant to be used only once (in the forward pass).
    h_prime = self.special_spmm(edge, edge_e, torch.Size([N, N]), h)`

Have I misunderstand sth.?

How to achieve the training using multi-graph (each graph/mesh from an 3D medical image)?

Hi,

Thanks for your work. I'm totally a green hand of computer vision.

Recently, I'm trying to achieve the medical image segmentation using GCNs.

However, I read your code and find that the input is a whole graph, which includes the training data, test data, and validation data. However, in my framework, the training image is regarded as a graph, and there is not a connection between each image.

Thus, I'm confused about how to achieve this.

Besides, I want to know whether the input feature can be extracted patch from the target node.

Look forward to your reply.

Thanks,
lei

Vocabulary items

Hi! I'd be very curious to know what the words in the vocabulary are. Could I find that somewhere?

Question about the results

Hello, thank you for you amazing work! I am a starter in this field, and I'm confused about the result. I ran the code in PyCharm, I was wondering why I got different output every time I ran the code. As I see, the random seeds are fixed at the begining, am I missing something?

build symmetric adjacency matrix

Hi tkipf, thank you for your amazing work!
in utils.py, starts from Line 36,

    # build symmetric adjacency matrix
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

As far as I understand, these lines are turning a directed adjacency matrix into an undirected adjacency matrix?
Since adj is a 0-1 matrix, then for the positions adj[i,j] where adj.T > adj, we should have adj[i,j] = 0, so the - adj.multiply(adj.T > adj) part is always zero.

Then what's the purpose of having that part, or am I understand it incorrectly?

load own adjacency matrix and features

I have my own array for adjacency matrix and features, thus i did not use the load function in utils.py
Could I ask is it right for me to generate adjacency and feature data for training graph cnn. features and adj are full numpy array:

features = sp.csr_matrix(features, dtype=np.float32)
features = normalize(features)
features = np.array(features.todense())
adj = sp.csr_matrix(adj)
adj = normalize(adj + sp.eye(adj.shape[0]))
adj = sparse_mx_to_torch_sparse_tensor(adj)

Another question is that why need "adj + sp.eye(adj.shape[0])" if adj is an adjacency matrix? thanks.

What is the difference between two adjacency matrix normalization?

Hello, thanks for the amazing work. In your implementation, you use D^-1A, but I noticed that some other work use D^-1/2AD^-1/2, I suppose these two calculation won't get the same normalized adjacency matrix. Which one should I choose? Or they will have the same performance?
I think maybe in a large graph(A is very big), D^-1/2
AD^-1/2 will roughly equal to D^-1A, is that correct?

Citeseer data set accuracy

Dear professor,
Hello!
I am very interesting in your recent GCN work.
Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

Values of some parameters

In the module named "MODELS" :

class GCN(nn.Module):
def init(self, nfeat, nhid, nclass, dropout):

but the values of these paramters are not specified in the code. I am confused how the code gets these values! nfeat, nhid, nclass

would be great helping me out

F.log_softmax(x, dim=1) output is not probability?

Hi,

calling output = model(features, adj) does not give probability output? if I want model to return probability, what should I change?
If I change log_softmax to softmax, the loss function F.nll_loss should be changed?
thanks.

the normalize function in utils.py

Hi,
The normalize function in utils.py only normalize the row of adjacency matrix. While for the tensorflow version the implementation is different. You normalized both row and column. I am wondering will this lead to a difference for performance of GCN in accuracy ?

Best,
Xiaoyun

Still curious about the code of building symmetric adjacency matrix

` adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
shape=(labels.shape[0], labels.shape[0]),
dtype=np.float32)

# build symmetric adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)`

I don't understand how the last line code produces the symmetric matrix.
And I think it is intuitive to build the symmetric like this
adj = adj + adj.T

Can anyone help to answer my questions? thanks a lot

How to implement residual connections in gcn?

Hi @tkipf . I have some confusion about residual connection in GCN.

def forward(self, x, adj):
        # x size: [2708, 1433]
        # adj size: [2708, 2708]
        x = F.relu(self.gc1(x, adj))   # [2708, 16]
        ......

input size: [2708, 1433] , but first layer output's size is [2708, 16]

image
If I implement residual like equation above.

x = F.relu(self.gc1(x, adj))  + x   

Error! size miss match.

Multiple instance learning and GCN..

Hi,
Thank you for providing such a great model. I would like to ask , can we apply Multiple Instance learning on text-GCN ? and which level that would be , graph classification level or node classification level.

Thank you in advance

datasets partition

Hi tkipf, thanks for your sharing.

There are a total of 2708 lines in cora.content.However in utils.py,data division is as following:

idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)

May I ask what is the reason for splitting in this way? Thank you

No features + sparse labels training

Hello and thanks for your work.

I would like to apply the GCN architecture on a graph whose nodes have no features, and also very few nodes have labels. More specifically, this is going to be a graph of words, where related words are connected with an edge, and also I have some document nodes that are connected to the words they contain. Some document nodes have labels, and the rest are left to be predicted. Word nodes are just there to help associate document nodes with one another, and, hopefully, propagate the labels from the training document nodes to the testing document nodes. The first dataset I tried is OHSUMED, if that makes a difference.

I started transforming the code to fit my needs, but I have a couple of issues:

  1. What do I replace the feature matrix with? F is a no. of nodes x features size matrix, that I have no way to populate. What I tried was to set it as an identity matrix, but that seems random. Also, I tried to set this node to features matrix as another trainable parameter.

  2. In the original problem, every node has a label associated with it. However, in my case, less than .1% of the nodes have a label. I decided to just provide the indexes of the adjacency matrix that are associated with the training/validating/testing document nodes. Is there an optimal way to represent non-labeled nodes?

So far, I haven't been able to get the model to work in this problem. With several permutations of the modifications above, I get an accuracy of about 20%, far below my other baselines. Am I missing something obvious in the model definition or the optimization process?

Any help is welcome.

Using for Regression

Hi

Thank you for sharing your implementation in Pytorch.
I am using a similar GCN structure for regression analysis. Therefor the last layer would be the same as others. My proposed GCN follows the below structure.
model GCN(
(gc1): GraphConvolution (2 -> 2)
(gc2): GraphConvolution (2 -> 20)
(gc3): GraphConvolution (20 -> 20)
(gc4): GraphConvolution (20 -> 20)
(gc5): GraphConvolution (20 -> 2)
(gc6): GraphConvolution (2 -> 2)
)
The inputs are locations of 2D vertices and adjacency matrix of synthetic data (for simplicity a circular shape graphs).
The activation functions are tanh and the loss function is L2norm (because the problem is regression).
I’ve also initialized the weights and bias parameters as following:
def reset_parameters(self):
stdv = 1. / math.sqrt(10/self.nhid)
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.fill_(0)
I feed the network with some noisy data (as input graphs) and the target is a circle. It is expected to networks can regressed a circular shape but outputs have elliptic shape. I got that this network comes to hight sensitivity respect to weight initialization.
Why this GCN couldn’t work to solve a regression problem? Could you please give me your advise and some feedback about this.

Question about adj

In your code, use below code to create adj matrix

    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
                        shape=(labels.shape[0], labels.shape[0]),
                        dtype=np.float32)

That meansadj[edges[:, 0][k], edges[:, 1][k]] = np.ones(edges.shape[0])[k]. But in the file data/cora/README, it says that the direction of the link is from right to left. Details are as follows:

The .cites file contains the citation graph of the corpus. Each line describes a link in the following format:

		<ID of cited paper> <ID of citing paper>

Each line contains two paper IDs. The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation. The direction of the link is from right to left. If a line is represented by "paper1 paper2" then the link is "paper2->paper1". 

I would like to ask why the code that produces the adjacency matrix is not like this:

    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 1], edges[:, 0])),
                        shape=(labels.shape[0], labels.shape[0]),
                        dtype=np.float32)

That meansadj[edges[:, 1][k], edges[:, 0][k]] = np.ones(edges.shape[0])[k].

Thanks a lot ~

The intended output

Thank you for sharing the work. Is the model for detecting communities? (classifying nodes with community labels?) Please include the intended output in the Readme. Then it will be easier for beginners like me.

TypeError: expected torch.FloatTensor (got torch.LongTensor)

Hi tkipf, thank you for sharing the source code.

I ran it on Pytorch 0.4.0 and Python 2.7, but got this type error. However, if I used python 3.5 it can be run.
Loading cora dataset...

Traceback (most recent call last):
  File "train.py", line 49, in <module>
    dropout=args.dropout)
  File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 11, in __init__
    self.gc2 = GraphConvolution(nhid, nclass)
  File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 43, in __init__
    self.bias = Parameter(torch.Tensor(out_features))
TypeError: expected torch.FloatTensor (got torch.LongTensor)

May I ask how to solve this issue? Thank you

Scale up to million of nodes

Hi @tkipf, thank you so much for providing the code.

I'm wondering if it's possible to scale this implementation up to millions of nodes (obviously the number of edges must scale linearly), for example a grid. I'm not familiar with PyTorch's sparse matrix implementation, so I'm not sure if representing the adjacency matrix as a sparse matrix is enough to deal with large graphs?

Node Classification for Multiple Graphs

I have multiple graphs for node classification task. All the examples I've seen so far was for graph classification(or there is just one graph for node classification task). Although I've seen building block diagonal adjacency matrix, I'm not sure if it is for graph classification or node. Also I didn't understand whether should I create a block diagonal matrix with feature matrix and labels or not.

Let's suppose I have 20 different graphs(with different number of nodes, edges, features). And each node of every graph is labeled.
All the nodes in first 10 graphs are for training, all the nodes of the next 5 graphs are for val, and all the nodes of last 5 graphs are for test, and what I try to do is predicting labels of the nodes for the graphs in test-set. How can I input multiple graphs into GCN with these conditions for node classification task(not for graph classification).? If the solution is diagonal adj. matrix, should I do the same for labels and feature matrix?

What is different from the original code?

As I trained, the result of this repository is more accurate than the original paper (SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS) in cora dateset.

What is different from the original code?

non-square adj matrix

Hello, thank you for your great work.

I want to extend gcn which involves message passing,
but I'm new to GCN so I have a minor question.

I have to types of node A, B.
Basically I want to train different weights jointly. (Weight_AA, Weight_AB, Weight_BA, Weight_BB)
During the node representation update,
A(t+1) = Weight_AA*A(t)adj(AA) + Weight_ABB(t)adj(BA)
B(t+1) = Weight_BB
B(t)adj(BB) + Weight_BAA(t)*adj(AB)
The first terms are simple graph convolution layer with adj(AA), adj(BB) are both square
but for the adj(BA), adj(AB) it might not be square, (# of two types of nodes will differ)

Can I use non-square adj matrix during the whole process? (normalize, forward, ...)

question about inputs matrix X, the node feature matirx

I want to know how I can get the feature matrix of nodes with my own training data, suppose that the number of nodes is N and the dim of every featue vector is d, how can I get the inputs X whose shape is N*d?
Thanks a lot!

Processing a single graph in batch mode

Hi @tkipf, thanks for your awesome work and providing this code!

I have a bit of a novice question: I'm trying to process my graph's features through a GCN in mini-batches. I.e. let's say I have a 1000-node graph and I want to process it through the GCN in mini-batches of size 50.

It doesn't seem like the code currently supports this because we have to multiply by the full adjacency matrix in the GCN layer's forward pass - do you have any sense of how I can support these "batch" operations? Better yet, do you have example code that does this?

My code looks something like:

z0 = F.tanh(self.gcn1(x, self.fancy_adj))

where x is a sampled batch of size 50 x F (not the full batch of 1000 x F) and self.fancy_adj is the adjacency matrix transformed as suggested in your paper ( adjacency + identity + row normalized). The problem, of course, is that self.fancy_adj is a 1000 x 1000 matrix. Even if I take just the rows of self.fancy_adj corresponding to the 50 points in the batch, then the adjacency matrix becomes a 50 x 1000 matrix which can't be multiplied by the 50 x F sampled batch.

Question normalization trick

Dear authors,

I was wondering why in the function normalize, the power is to -1 and not -1/2 as in the tensorflow code. Is there any reason for this ?

Thanks for the answer !

Data format

Hi! How do you process the data and save it as .content format ? what is the the content of 'cora.cities' and 'cora.content' ? Thanks!

inconsistent tensor size

Hi,
When I run train.py there is a error message says:
Loading cora dataset...
Traceback (most recent call last):
File "train.py", line 104, in
train(epoch)
File "train.py", line 69, in train
output = model(features, adj)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 15, in forward
x = F.relu(self.gc1(x, adj))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 61, in forward
return output + self.bias
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 745, in add
return self.add(other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 283, in add
return self._add(other, False)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 277, in _add
return Add(inplace)(self, other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py", line 20, in forward
return a.add(b)
RuntimeError: inconsistent tensor size at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:831

Using GCN as encoder alone for multiple small graph data.

Hi

Thanks for providing this clean example of GCN in pytorch. I am new to graph machine learnig so I would like to ask is this work fit the following scenario (See the figure and text below)?

Given many small graphs (G1,G2,...Gn), each has node vector X (#node * #features) and adjacent matrix A (#node * #node), I just want to use GCN as an encoder to get the representation of each graph for the downstream task (The label is kind of sequential tags)

tmp1
(There is only one GCN layers, All blue blocks are the same.)

In addition, after checking the batch operation issue (#1 ). In my scenario, I wonder whether it is possible to just stack multiple node vector X and its adjacent matrix A like below then apply the same batch operation as usual instead of creating a big block-diagonal matrix?

tmp2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.