seongjunyun / graph_transformer_networks Goto Github PK

View Code? Open in Web Editor NEW

936.0 936.0 176.0 18.84 MB

Graph Transformer Networks (Authors' PyTorch implementation for the NeurIPS 19 paper)

Python 34.77% Jupyter Notebook 65.23%

graph_transformer_networks's People

Contributors

Stargazers

Watchers

Forkers

jhy1993 jlqzzz shiqing1234 aspirincode sean0719 deep-mining-swang yichengdwu francis-zheng codewoohyun codeinging tanxiaoqin888 jankintian wuqianliang shengyupei xuchensjtu seongjinahn haritzpuerto syyunn mcdragon liangxun ammieqi liyuwe thanhducpham manangoel99 aboltachev songfgh guyulongcs isfengg chunyuany mike575 icyxiaowenyi nyxflower liu-guo-jing zscdumin kounianhuadu mooncakex milkigit demirugur roholazandie freekang fusion-research sutanay czheng17 dasolhwang why986 tianqi-wang wutaiqiang lovelynns lizhaofu viet98lx sorrowyn yummpy123 fortuneseeker srijithbalachander yhjflower henry-ding mlvlab dannywu1996 suchun-sv alnaeini lhx3762 fendouwangshan mymuli lliai shadyalkhouri drownfish19 linhht12 rzj-1997 zhaosiheng jmhicoding earlbabson chrisbyd xuaikun l-hoang polarbe sxxtyz hyusheng kw-lee zhh0998 youjibiying laplacekorea yt8523 yuchenbian sts-sadr truongquocchien m8nre sangyoon-bae tlntin dreammy99 wwy2020211 khushsi wonlee2019 zentim fanmengdan hell-to-heaven sophiezang wanghao-nlp aayush97 xiexiaqing lotbear

graph_transformer_networks's Issues

can you share imdb preprocessig code?

I found the raw data from authors in 'Heterogeneous Graph Attention Networks'.
However, I cannot find out how to make input files.
Could you share the preprocessing code for IMDB datasets?

GPU and changing batch size

Hi,

I am a GPU architecture and systems researcher looking to use your work as part of a characterization study. Is main_sparse.py the GPU script or is it main.py?. Also how to vary the batch size in your code?.

Thank you.

How to get the importance of the metapath with a length greater than 3?

Hi, thanks for your great works!
I notice that your paper has shown the metapath APCPA is important in DBLP dataset. I am confused how to get this importance score, since you stack only three GTN layers.

Regards!

No need for torch-sparse-old now

It seems that newer version of torch-sparse (I used the version 0.6.7) has the proper backward function and there is no need to install torch-sparse-old. (Actually I failed to install it.) After replacing torch-sparse-old with torch-sparse I successfully run the code.

There are the package versions I used:

torch                     1.10.2
torch-cluster             1.5.7
torch-geometric           2.0.3
torch-scatter             2.0.5
torch-sparse              0.6.7
torch-spline-conv         1.2.0

Is there a mistake about the target node in DBLP dataset?

In dblp dataset, I guess there are 4057, 14328 and 20 nodes for type paper, author and conference, respectively. And they are aligned in this order along each dimension.

In your paper, you point out that the top-3 metapaths between target nodes in DBLP are APA, APCPA, e.g. But I find that the train_idx is within the range of [0, 4056], which means P-type nodes are to be classified rather than A-type?

Is there a mistake about the introduction in the paper, or I misunderstand something.

Are self-connected edges removed from fastGTN?

weight in GTConv is alwasy the same

When I print weight in GTConv (Actually I print Ws in GTN), I find it is always [[0.2, 0.2, 0.2, 0.2, 0.2], [0.2, 0.2, 0.2, 0.2, 0.2]], seems like it never updates. Why would this happen?

Dataset

Hi
The variable "edges" for the DBLP dataset is a list of four CSR matrix, what the meaning of each one?

GTConv Weight Initialization

In the class "GTConv", I saw that you initialize "self.weight" by nn.init.constant_(self.weight, 0.1).
I don't really understand why the model weights are initialized constantly.
Is there any specific reason for you?

IMDB Dataset

could please tell the edge type in edges.pkl of IMDB dataset? or upload the preprocess code like other dataset in this repo, i.e. ACM and DBLP?

The attention score in model_sparse

To find meta-paths with high attention scores learnt by GTNs, I print the attention scores in main.py (denoted as Ws in line 100) and main_sparse.py (denoted as _ in line 127).
I run your code with: "python main_sparse.py --dataset IMDB --num_layers 3 --adaptive_lr true".
Surprisingly, it seems that the model did not train the weight of each GTConv at all, the weights after softmax are always [0.2, 0.2, 0.2, 0.2, 0.2].

How did the node features are constructed concretely?

In the paper, I have read "each node in the two datasets is represented as bag-of-words of keywords" for DBLP and ACM. I know paper nodes have keywords, but what about other types of nodes like authors? conferences? subjects? What are their node features?

For IMDB, I have read "node features are given as bag-of-words representations of plots". What is the mean by "representations of plots"? What are the node features of movies, authors, directors respectively?

Can you share the link to raw files and preprocessing scripts used for creating the datasets?

sparse version GTN?

I think you should provide sparse version implement, because the adjacency matrix is always sparse.

How to get meta-path for IMDB dataset?

Hi,
I've noticed that you use main.py for DBLP and ACM dataset, but main_sparse.py for IMDB dataset.
For main.py, the weight W or the list Ws is updated every epoch.
But for main_sparse.py, the weight is unchanged for every epoch. It is always 0.2.
So how you derived that the IMDB meta-path learned by GTN?

Error when run example with IMDB dataset

Hi!
When I run example with IMDB dataset
!python main_sparse.py --dataset IMDB --num_layers 2 --adaptive_lr true --epoch 3

I received error:
RuntimeError: scatter_add() expected at most 5 argument(s) but received 6 argument(s). Declaration: scatter_add(Tensor src, Tensor index, int dim=-1, Tensor? out=None, int? dim_size=None) -> (Tensor)

How do I resolve this ?

What's the difference between "X_ = torch.cat((X_,X_tmp), dim=1)" and "X_ = torch.cat([X_,X_tmp], dim=1)"

At first I use the latter to train my model and get about 88 f-score on test dataset(ACM), then I change it to the former and get about 92 f-score. What's the difference?

Obtaining Metapaths

Hi I noticed in your paper that you point out the top metapaths. My question is, how do you extract these specific metapaths? I don't see a function for this. Also, I am assuming that your attention score is your Ws parameter. Is this correct?

ImportError: cannot import name 'f1_score' from 'torch_geometric.utils'

Is there an update with modified name of f1_score from the torch_geometric?

how much gpu memory is required?

Reproducing the results

Hi authors,

I tried running the code to reproduce the results and I have quite a few questions. I am assuming the number of epochs you trained on is the same as the default (40) in the implementation. The Macro F1 score was fluctuating a bit. Can you tell me the number of times you repeated the experiments? Did you take an average over all the repetitions or considered the maximum? I could only reach the reported value a couple of times out of like 10 repetitions of the code. Eg., for ACM the values ranged from 91.4 to 93.1. But mostly stays around 92.3. Also, you are printing the test score for each epoch. Did you choose the maximum of that test score or did you test with the model that gave max validation score?

Thanks!

code for FastGTN?

Hi,

I found the paper, "Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs", recently proposed a much more efficient version of GTN, called FastGTN. The paper guided me here to find the code, but it doesn't seem to be updated yet. When are you planning to release the code? Or if you have released it already, could you guide me to that repository?

Thanks,
Eunjeong

Memory requirements while embedding DBLP using CPU

Hello,

I have tried running the code with the DBLP dataset, and my 32G RAM machine kills the process due to excessive memory usage before it can run even a single epoch.
How much memory is used by GTN on DBLP dataset?

Thank you.

node_features

how to generate node_features? thanks.

Where batch_size has been defined?

I need to run code on on CUDA, but even colab doesn't have enough VRAM to run it. So i am trying to decrease the batch_size, but dont know where to modify it. Can you tell me where is it defined?

Node number of 'IMDB' dataset

In fact, I find the node number of this dataset is shown as

It is 12772, not 12624 reported in paper. Is there anything wrong?

Difference between code implementation and paper description

Hi,

I found it interesting that, in the paper, it is mentioned that "It is used for node classification on top and two dense layers followed by a softmax layer are used" at the bottom of page 5.

However, the code implementation indicates that only two linear layers with relu nonlinearity instead of two dense layers were used, and the output of the second linear layer is directly compared with the label using cross-entropy. No softmax layer was followed.

X_ = self.linear1(X_)
X_ = F.relu(X_)
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws

I wonder which one I should rely on, the paper description, or the provided code implementation?

Is there any GPU version? Meet some distributed problems

Is there any GPU version?
The distributed problems in pytorch vesion seems not suitble for this algorithm, for the mul operation(bmm) is between the matrix of whole graph, can not split into slices.

About IMDB Dataset

Is that possible to know the edge type of each matrix? Besides, could you please provide the label information of all the nodes? Thanks a lot!

preprossed code

Hi seongjunyun, I want to change the pre-defined meta_path. It is convenient for you to provide the preprocess code of dataset ACM and DBLP? Thank you very much!

How to deal with normal graphs?

Hi, I am going through your code and paper. I want to apply your code on Cora, Citeseer types of graph and compare the result with GCN and GAT.
So in Cora, Citeseer :

The feature matrix is N x F
Adj matrix is N x N
and Labels are one hot encoded

The result you shown in paper is only on the heterogeneous graph. How can I apply on Cora, Citeseer dataset with feature and adj matrix information?

Can you please share the code?

Thank you for this awesome work :)

mini-batch version of fastGTN and GTN

Thanks for the updates. Could you also post the mini-batch version of fastGTN and GTN as mentioned in the paper? They would be very helpful.

how to generate candidate adjacency matrices

May I know how you generate candidate adjacency matrices?
Suppose the graph contains N different edge types. Will you generate N different candidate adjacency matrices while each candidate adjacency matrix denotes the adjacency information of that edge type?

Besides, if the graph is large, the adjacency matrix should be large. It is impossible to store the whole matrix in memory. When you train the model, do you use message passing instead of matrix multiplication?

How to apply Graph Transformer to graph classification?

Is there any one do this work?

What is the node type of nodes in A?

hi, I notice your processed data are all in an adjacent matrix.
So what is the exact node type of each node in the row dimension?
Could tell me about this? Or Could you provide the preprocess code?

Thanks a lot.

Applicable network type

Hi， if this model can be used in this type of heterogeneous network? type of edge > 1 and type of nodes > 1

Back propogation 'nan' for self.weight and thus loss

Hello, Thanks for the release of your great work!
I have met with a loss 'nan' problem when I applied your GTN model on my own dataset. I have preprocessed the data following your ACM preprocessing example. However, the loss became 'nan' for the first valid process.
I have printed the tensors and I found that after the first train backpropagation, the 'self.weight' became all 'nan'. Thus, all the tensors calculated after are all 'nan'.

I have tried smaller lr, norm=false, and modified my dataset for several different types but nothing changed.
I'd like to know if you have any idea about this problem. Thank you a lot.

The extra complexity cost by implementation

Dear authors,

This paper has great contribution for HIN network embedding. However, one significant drawback of your code hinders its potential [1]. Your implementation directly compute the product of matrices A_1A_2...A_n (though exactly just two matrices are used in code) and then apply to vector x. This straightforward implementation cost extremely high resources despite the sparse format is used. The suggested implementation could be recursively multiplying the matrices to x one by one. To implement this, multiple torch_geometric GCN models with different edge weights could be instantiated and process the x recursively while some of them disable the ability of linear transform mapping.

Qingsong Lv, et, al. 2021. Are we really making much progress? Revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 1150–1160.

How to get meta-path? I guess they are obtained from the trained matrix Ws ? How can I obtain meta-path based on Ws ?

I met with your paper “Graph Transformer Networks”(arXiv:1911.06455v1). And I am very interested in your algorithm. I am sure that this algorithm is promising in the field of AI.

Take the DBLP dataset as example, I got the Ws with dimension of “Te X 4” ? after 4 times of GT operation, each element in Ws indicate the contribution of Te for the obtained meta-paths. Then I can calculate the probabilities of each meta-paths with 2,3 and 4 elements.
Example of Ws:

https://lh3.googleusercontent.com/YuXRKo2fhYpNu8hLI9gFMQYMRzeU91OXVJZdXpseCoLVPfcD0CEGY8sZDTt53rLQJVxoVig=s170

The probility of meta-path ABCD will be calculated as W(a,1)*W(b,2)*W(c,3)*W(d,4)

The meta-paths with the highest score will be selected for prediction.

Is my understanding true?

If it is correct, how can I compare the attention score for these meta-paths with different lengths? After all, the attention score is a value of [0,1], therefore the longer meta-paths tend to be with a smaller score