seongjunyun / graph_transformer_networks Goto Github PK
View Code? Open in Web Editor NEWGraph Transformer Networks (Authors' PyTorch implementation for the NeurIPS 19 paper)
Graph Transformer Networks (Authors' PyTorch implementation for the NeurIPS 19 paper)
I found the raw data from authors in 'Heterogeneous Graph Attention Networks'.
However, I cannot find out how to make input files.
Could you share the preprocessing code for IMDB datasets?
Hi,
I am a GPU architecture and systems researcher looking to use your work as part of a characterization study. Is main_sparse.py the GPU script or is it main.py?. Also how to vary the batch size in your code?.
Thank you.
Hi, thanks for your great works!
I notice that your paper has shown the metapath APCPA is important in DBLP dataset. I am confused how to get this importance score, since you stack only three GTN layers.
Regards!
It seems that newer version of torch-sparse
(I used the version 0.6.7) has the proper backward
function and there is no need to install torch-sparse-old
. (Actually I failed to install it.) After replacing torch-sparse-old
with torch-sparse
I successfully run the code.
There are the package versions I used:
torch 1.10.2
torch-cluster 1.5.7
torch-geometric 2.0.3
torch-scatter 2.0.5
torch-sparse 0.6.7
torch-spline-conv 1.2.0
In dblp dataset, I guess there are 4057, 14328 and 20 nodes for type paper, author and conference, respectively. And they are aligned in this order along each dimension.
In your paper, you point out that the top-3 metapaths between target nodes in DBLP are APA, APCPA, e.g. But I find that the train_idx is within the range of [0, 4056], which means P-type nodes are to be classified rather than A-type?
Is there a mistake about the introduction in the paper, or I misunderstand something.
Are self-connected edges removed from fastGTN?
When I print weight in GTConv (Actually I print Ws in GTN), I find it is always [[0.2, 0.2, 0.2, 0.2, 0.2], [0.2, 0.2, 0.2, 0.2, 0.2]], seems like it never updates. Why would this happen?
Hi
The variable "edges" for the DBLP dataset is a list of four CSR matrix, what the meaning of each one?
In the class "GTConv", I saw that you initialize "self.weight" by nn.init.constant_(self.weight, 0.1).
I don't really understand why the model weights are initialized constantly.
Is there any specific reason for you?
could please tell the edge type in edges.pkl of IMDB dataset? or upload the preprocess code like other dataset in this repo, i.e. ACM and DBLP?
To find meta-paths with high attention scores learnt by GTNs, I print the attention scores in main.py (denoted as Ws in line 100) and main_sparse.py (denoted as _ in line 127).
I run your code with: "python main_sparse.py --dataset IMDB --num_layers 3 --adaptive_lr true".
Surprisingly, it seems that the model did not train the weight of each GTConv at all, the weights after softmax are always [0.2, 0.2, 0.2, 0.2, 0.2].
In the paper, I have read "each node in the two datasets is represented as bag-of-words of keywords" for DBLP and ACM. I know paper nodes have keywords, but what about other types of nodes like authors? conferences? subjects? What are their node features?
For IMDB, I have read "node features are given as bag-of-words representations of plots". What is the mean by "representations of plots"? What are the node features of movies, authors, directors respectively?
I think you should provide sparse version implement, because the adjacency matrix is always sparse.
Hi,
I've noticed that you use main.py
for DBLP and ACM dataset, but main_sparse.py
for IMDB dataset.
For main.py
, the weight W or the list Ws is updated every epoch.
But for main_sparse.py
, the weight is unchanged for every epoch. It is always 0.2.
So how you derived that the IMDB meta-path learned by GTN?
Hi!
When I run example with IMDB dataset
!python main_sparse.py --dataset IMDB --num_layers 2 --adaptive_lr true --epoch 3
I received error:
RuntimeError: scatter_add() expected at most 5 argument(s) but received 6 argument(s). Declaration: scatter_add(Tensor src, Tensor index, int dim=-1, Tensor? out=None, int? dim_size=None) -> (Tensor)
How do I resolve this ?
At first I use the latter to train my model and get about 88 f-score on test dataset(ACM), then I change it to the former and get about 92 f-score. What's the difference?
Hi I noticed in your paper that you point out the top metapaths. My question is, how do you extract these specific metapaths? I don't see a function for this. Also, I am assuming that your attention score is your Ws parameter. Is this correct?
Is there an update with modified name of f1_score from the torch_geometric?
Hi authors,
I tried running the code to reproduce the results and I have quite a few questions. I am assuming the number of epochs you trained on is the same as the default (40) in the implementation. The Macro F1 score was fluctuating a bit. Can you tell me the number of times you repeated the experiments? Did you take an average over all the repetitions or considered the maximum? I could only reach the reported value a couple of times out of like 10 repetitions of the code. Eg., for ACM the values ranged from 91.4 to 93.1. But mostly stays around 92.3. Also, you are printing the test score for each epoch. Did you choose the maximum of that test score or did you test with the model that gave max validation score?
Thanks!
Hi,
I found the paper, "Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs", recently proposed a much more efficient version of GTN, called FastGTN. The paper guided me here to find the code, but it doesn't seem to be updated yet. When are you planning to release the code? Or if you have released it already, could you guide me to that repository?
Thanks,
Eunjeong
Hello,
I have tried running the code with the DBLP dataset, and my 32G RAM machine kills the process due to excessive memory usage before it can run even a single epoch.
How much memory is used by GTN on DBLP dataset?
Thank you.
how to generate node_features? thanks.
I need to run code on on CUDA, but even colab doesn't have enough VRAM to run it. So i am trying to decrease the batch_size, but dont know where to modify it. Can you tell me where is it defined?
Hi,
I found it interesting that, in the paper, it is mentioned that "It is used for node classification on top and two dense layers followed by a softmax layer are used" at the bottom of page 5.
However, the code implementation indicates that only two linear layers with relu nonlinearity instead of two dense layers were used, and the output of the second linear layer is directly compared with the label using cross-entropy. No softmax layer was followed.
X_ = self.linear1(X_)
X_ = F.relu(X_)
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
I wonder which one I should rely on, the paper description, or the provided code implementation?
Is there any GPU version?
The distributed problems in pytorch vesion seems not suitble for this algorithm, for the mul operation(bmm) is between the matrix of whole graph, can not split into slices.
Is that possible to know the edge type of each matrix? Besides, could you please provide the label information of all the nodes? Thanks a lot!
Hi seongjunyun, I want to change the pre-defined meta_path. It is convenient for you to provide the preprocess code of dataset ACM and DBLP? Thank you very much!
Hi, I am going through your code and paper. I want to apply your code on Cora, Citeseer types of graph and compare the result with GCN and GAT.
So in Cora, Citeseer :
The feature matrix is N x F
Adj matrix is N x N
and Labels are one hot encoded
The result you shown in paper is only on the heterogeneous graph. How can I apply on Cora, Citeseer dataset with feature and adj matrix information?
Can you please share the code?
Thank you for this awesome work :)
Thanks for the updates. Could you also post the mini-batch version of fastGTN and GTN as mentioned in the paper? They would be very helpful.
May I know how you generate candidate adjacency matrices?
Suppose the graph contains N different edge types. Will you generate N different candidate adjacency matrices while each candidate adjacency matrix denotes the adjacency information of that edge type?
Besides, if the graph is large, the adjacency matrix should be large. It is impossible to store the whole matrix in memory. When you train the model, do you use message passing instead of matrix multiplication?
Is there any one do this work?
hi, I notice your processed data are all in an adjacent matrix.
So what is the exact node type of each node in the row dimension?
Could tell me about this? Or Could you provide the preprocess code?
Thanks a lot.
Hi, if this model can be used in this type of heterogeneous network? type of edge > 1 and type of nodes > 1
Hello, Thanks for the release of your great work!
I have met with a loss 'nan' problem when I applied your GTN model on my own dataset. I have preprocessed the data following your ACM preprocessing example. However, the loss became 'nan' for the first valid process.
I have printed the tensors and I found that after the first train backpropagation, the 'self.weight' became all 'nan'. Thus, all the tensors calculated after are all 'nan'.
I have tried smaller lr, norm=false, and modified my dataset for several different types but nothing changed.
I'd like to know if you have any idea about this problem. Thank you a lot.
Dear authors,
This paper has great contribution for HIN network embedding. However, one significant drawback of your code hinders its potential [1]. Your implementation directly compute the product of matrices A_1A_2...A_n (though exactly just two matrices are used in code) and then apply to vector x. This straightforward implementation cost extremely high resources despite the sparse format is used. The suggested implementation could be recursively multiplying the matrices to x one by one. To implement this, multiple torch_geometric GCN models with different edge weights could be instantiated and process the x recursively while some of them disable the ability of linear transform mapping.
I met with your paper “Graph Transformer Networks”(arXiv:1911.06455v1). And I am very interested in your algorithm. I am sure that this algorithm is promising in the field of AI.
Take the DBLP dataset as example, I got the Ws with dimension of “Te X 4” ? after 4 times of GT operation, each element in Ws indicate the contribution of Te for the obtained meta-paths. Then I can calculate the probabilities of each meta-paths with 2,3 and 4 elements.
Example of Ws:
The probility of meta-path ABCD will be calculated as W(a,1)*W(b,2)*W(c,3)*W(d,4)
The meta-paths with the highest score will be selected for prediction.
Is my understanding true?
If it is correct, how can I compare the attention score for these meta-paths with different lengths? After all, the attention score is a value of [0,1], therefore the longer meta-paths tend to be with a smaller score
Hi, I was trying to reproduce the DBLP dataset results but it seems the link is not working
can you please refer to another link to download the dataset.
thanks in advance.
My GPU has 10.92 GiB total memory. I run main_sparse.py on ACM data. It raised RuntimeError: CUDA out of memory
Thanks for your sharing! That's really a good work! I want to know how large memory is required? sparse version and dense version. Looking forward to your reply! Thank you Again!
Hi
what the shape of the graph A before using graphtransformer deal it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.