Git Product home page Git Product logo

graph_transformer_networks's People

Contributors

seongjunyun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graph_transformer_networks's Issues

can you share imdb preprocessig code?

I found the raw data from authors in 'Heterogeneous Graph Attention Networks'.
However, I cannot find out how to make input files.
Could you share the preprocessing code for IMDB datasets?

GPU and changing batch size

Hi,

I am a GPU architecture and systems researcher looking to use your work as part of a characterization study. Is main_sparse.py the GPU script or is it main.py?. Also how to vary the batch size in your code?.

Thank you.

No need for torch-sparse-old now

It seems that newer version of torch-sparse (I used the version 0.6.7) has the proper backward function and there is no need to install torch-sparse-old. (Actually I failed to install it.) After replacing torch-sparse-old with torch-sparse I successfully run the code.

There are the package versions I used:

torch                     1.10.2
torch-cluster             1.5.7
torch-geometric           2.0.3
torch-scatter             2.0.5
torch-sparse              0.6.7
torch-spline-conv         1.2.0

Is there a mistake about the target node in DBLP dataset?

In dblp dataset, I guess there are 4057, 14328 and 20 nodes for type paper, author and conference, respectively. And they are aligned in this order along each dimension.

In your paper, you point out that the top-3 metapaths between target nodes in DBLP are APA, APCPA, e.g. But I find that the train_idx is within the range of [0, 4056], which means P-type nodes are to be classified rather than A-type?

Is there a mistake about the introduction in the paper, or I misunderstand something.

weight in GTConv is alwasy the same

When I print weight in GTConv (Actually I print Ws in GTN), I find it is always [[0.2, 0.2, 0.2, 0.2, 0.2], [0.2, 0.2, 0.2, 0.2, 0.2]], seems like it never updates. Why would this happen?

Dataset

Hi
The variable "edges" for the DBLP dataset is a list of four CSR matrix, what the meaning of each one?

GTConv Weight Initialization

In the class "GTConv", I saw that you initialize "self.weight" by nn.init.constant_(self.weight, 0.1).
I don't really understand why the model weights are initialized constantly.
Is there any specific reason for you?

IMDB Dataset

could please tell the edge type in edges.pkl of IMDB dataset? or upload the preprocess code like other dataset in this repo, i.e. ACM and DBLP?

The attention score in model_sparse

To find meta-paths with high attention scores learnt by GTNs, I print the attention scores in main.py (denoted as Ws in line 100) and main_sparse.py (denoted as _ in line 127).
I run your code with: "python main_sparse.py --dataset IMDB --num_layers 3 --adaptive_lr true".
Surprisingly, it seems that the model did not train the weight of each GTConv at all, the weights after softmax are always [0.2, 0.2, 0.2, 0.2, 0.2].

How did the node features are constructed concretely?

In the paper, I have read "each node in the two datasets is represented as bag-of-words of keywords" for DBLP and ACM. I know paper nodes have keywords, but what about other types of nodes like authors? conferences? subjects? What are their node features?

For IMDB, I have read "node features are given as bag-of-words representations of plots". What is the mean by "representations of plots"? What are the node features of movies, authors, directors respectively?

sparse version GTN?

I think you should provide sparse version implement, because the adjacency matrix is always sparse.

How to get meta-path for IMDB dataset?

Hi,
I've noticed that you use main.py for DBLP and ACM dataset, but main_sparse.py for IMDB dataset.
For main.py, the weight W or the list Ws is updated every epoch.
But for main_sparse.py, the weight is unchanged for every epoch. It is always 0.2.
So how you derived that the IMDB meta-path learned by GTN?

Error when run example with IMDB dataset

Hi!
When I run example with IMDB dataset
!python main_sparse.py --dataset IMDB --num_layers 2 --adaptive_lr true --epoch 3

I received error:
RuntimeError: scatter_add() expected at most 5 argument(s) but received 6 argument(s). Declaration: scatter_add(Tensor src, Tensor index, int dim=-1, Tensor? out=None, int? dim_size=None) -> (Tensor)

How do I resolve this ?

Obtaining Metapaths

Hi I noticed in your paper that you point out the top metapaths. My question is, how do you extract these specific metapaths? I don't see a function for this. Also, I am assuming that your attention score is your Ws parameter. Is this correct?

Reproducing the results

Hi authors,

I tried running the code to reproduce the results and I have quite a few questions. I am assuming the number of epochs you trained on is the same as the default (40) in the implementation. The Macro F1 score was fluctuating a bit. Can you tell me the number of times you repeated the experiments? Did you take an average over all the repetitions or considered the maximum? I could only reach the reported value a couple of times out of like 10 repetitions of the code. Eg., for ACM the values ranged from 91.4 to 93.1. But mostly stays around 92.3. Also, you are printing the test score for each epoch. Did you choose the maximum of that test score or did you test with the model that gave max validation score?

Thanks!

code for FastGTN?

Hi,

I found the paper, "Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs", recently proposed a much more efficient version of GTN, called FastGTN. The paper guided me here to find the code, but it doesn't seem to be updated yet. When are you planning to release the code? Or if you have released it already, could you guide me to that repository?

Thanks,
Eunjeong

Memory requirements while embedding DBLP using CPU

Hello,

I have tried running the code with the DBLP dataset, and my 32G RAM machine kills the process due to excessive memory usage before it can run even a single epoch.
How much memory is used by GTN on DBLP dataset?

Thank you.

Where batch_size has been defined?

I need to run code on on CUDA, but even colab doesn't have enough VRAM to run it. So i am trying to decrease the batch_size, but dont know where to modify it. Can you tell me where is it defined?

Node number of 'IMDB' dataset

In fact, I find the node number of this dataset is shown as
image

It is 12772, not 12624 reported in paper. Is there anything wrong?

Difference between code implementation and paper description

Hi,

I found it interesting that, in the paper, it is mentioned that "It is used for node classification on top and two dense layers followed by a softmax layer are used" at the bottom of page 5.

However, the code implementation indicates that only two linear layers with relu nonlinearity instead of two dense layers were used, and the output of the second linear layer is directly compared with the label using cross-entropy. No softmax layer was followed.

X_ = self.linear1(X_)
X_ = F.relu(X_)
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws

I wonder which one I should rely on, the paper description, or the provided code implementation?

About IMDB Dataset

Is that possible to know the edge type of each matrix? Besides, could you please provide the label information of all the nodes? Thanks a lot!

preprossed code

Hi seongjunyun, I want to change the pre-defined meta_path. It is convenient for you to provide the preprocess code of dataset ACM and DBLP? Thank you very much!

How to deal with normal graphs?

Hi, I am going through your code and paper. I want to apply your code on Cora, Citeseer types of graph and compare the result with GCN and GAT.
So in Cora, Citeseer :

The feature matrix is N x F
Adj matrix is N x N
and Labels are one hot encoded 

The result you shown in paper is only on the heterogeneous graph. How can I apply on Cora, Citeseer dataset with feature and adj matrix information?

Can you please share the code?

Thank you for this awesome work :)

how to generate candidate adjacency matrices

May I know how you generate candidate adjacency matrices?
Suppose the graph contains N different edge types. Will you generate N different candidate adjacency matrices while each candidate adjacency matrix denotes the adjacency information of that edge type?

Besides, if the graph is large, the adjacency matrix should be large. It is impossible to store the whole matrix in memory. When you train the model, do you use message passing instead of matrix multiplication?

What is the node type of nodes in A?

hi, I notice your processed data are all in an adjacent matrix.
So what is the exact node type of each node in the row dimension?
Could tell me about this? Or Could you provide the preprocess code?

Thanks a lot.

Applicable network type

Hi, if this model can be used in this type of heterogeneous network? type of edge > 1 and type of nodes > 1

Back propogation 'nan' for self.weight and thus loss

Hello, Thanks for the release of your great work!
I have met with a loss 'nan' problem when I applied your GTN model on my own dataset. I have preprocessed the data following your ACM preprocessing example. However, the loss became 'nan' for the first valid process.
I have printed the tensors and I found that after the first train backpropagation, the 'self.weight' became all 'nan'. Thus, all the tensors calculated after are all 'nan'.
image
I have tried smaller lr, norm=false, and modified my dataset for several different types but nothing changed.
I'd like to know if you have any idea about this problem. Thank you a lot.

The extra complexity cost by implementation

Dear authors,

This paper has great contribution for HIN network embedding. However, one significant drawback of your code hinders its potential [1]. Your implementation directly compute the product of matrices A_1A_2...A_n (though exactly just two matrices are used in code) and then apply to vector x. This straightforward implementation cost extremely high resources despite the sparse format is used. The suggested implementation could be recursively multiplying the matrices to x one by one. To implement this, multiple torch_geometric GCN models with different edge weights could be instantiated and process the x recursively while some of them disable the ability of linear transform mapping.

  1. Qingsong Lv, et, al. 2021. Are we really making much progress? Revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 1150–1160.

How to get meta-path? I guess they are obtained from the trained matrix Ws ? How can I obtain meta-path based on Ws ?

I met with your paper “Graph Transformer Networks”(arXiv:1911.06455v1). And I am very interested in your algorithm. I am sure that this algorithm is promising in the field of AI.

Take the DBLP dataset as example, I got the Ws with dimension of “Te X 4” ? after 4 times of GT operation, each element in Ws indicate the contribution of Te for the obtained meta-paths. Then I can calculate the probabilities of each meta-paths with 2,3 and 4 elements.
Example of Ws:

https://lh3.googleusercontent.com/YuXRKo2fhYpNu8hLI9gFMQYMRzeU91OXVJZdXpseCoLVPfcD0CEGY8sZDTt53rLQJVxoVig=s170

The probility of meta-path ABCD will be calculated as W(a,1)*W(b,2)*W(c,3)*W(d,4)

The meta-paths with the highest score will be selected for prediction.

Is my understanding true?

If it is correct, how can I compare the attention score for these meta-paths with different lengths? After all, the attention score is a value of [0,1], therefore the longer meta-paths tend to be with a smaller score

Datasets link is not working

Hi, I was trying to reproduce the DBLP dataset results but it seems the link is not working
can you please refer to another link to download the dataset.
thanks in advance.

Memory issue

Thanks for your sharing! That's really a good work! I want to know how large memory is required? sparse version and dense version. Looking forward to your reply! Thank you Again!

about the graph A

Hi
what the shape of the graph A before using graphtransformer deal it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.