qitianwu / nodeformer Goto Github PK

The official implementation of NeurIPS22 spotlight paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification"

Python 25.80% Shell 1.41% Jupyter Notebook 72.79%

graph-neural-networks graph-structure-learning graph-transformer node-classification pytorch image-classification large-graph neurips-2022 pytorch-geometric relational-learning

nodeformer's People

Contributors

Stargazers

Watchers

nodeformer's Issues

Clarification on the Role of 'K' in kernelized_gumbel_softmax Function

I have a question regarding the 'kernelized_gumbel_softmax' function. Specifically, I am curious about the role of 'K' in this context. Does 'K' serve a purpose similar to the number of heads in multihead attention?

Is the relatinoal bias still calculated in w/o graph circumstances?

Hi Qitian,

Thanks for this fantastic work!

In nodeformer.py line 284:

        # compute update by relational bias of input adjacency, requires O(E)
        for i in range(self.rb_order):
            z_next += add_conv_relational_bias(value, adjs[i], self.b[i], self.rb_trans)

I am a little bit confused that, in w/o graph circumstances (e.g. #2 ,) is the relatinoal bias still calculated?
Thanks.

Same results for the last four runs

Hello, I've encountered an issue while running the model on the official splits of Cora, CiteSeer, and Pubmed. Despite conducting five separate runs, the results for the last four runs are identical. I have meticulously ensured that the parameters are reset at the beginning of each run. Could this be a mere coincidence, or might there be an underlying error that I'm missing?

求一份PPT

你好，这篇论文的PPT疑似失效了，作者大大能再上传一份新的PPT吗？

Errors occur while calling NodePropPredDataset in dataset.py

I can run your code correctly on small dataset by using scripts in run.sh and get similar results within the paper but when I'm trying to reproduce nodeformer on large graph dataset, it comes out an error on both amazon2m and ogb proteins dataset.

Traceback (most recent call last):
  File "main-batch.py", line 43, in <module>
    dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 102, in load_dataset
    dataset = load_amazon2m_dataset(data_dir)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 308, in load_amazon2m_dataset
    ogb_dataset = NodePropPredDataset(name='ogbn-products', root=f'{data_dir}/ogb')
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
    self.pre_process()
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 111, in pre_process
    additional_node_files = self.meta_info['additional node files'].split(',')
AttributeError: 'float' object has no attribute 'split'

This error occurs in dataset.py while calling the NodePropPredDataset functiuon :

NodeFormer/dataset.py

Line 290 in 64d2658

ogb_dataset = NodePropPredDataset(name=name, root=f'{data_dir}/ogb')

NodeFormer/dataset.py

Line 306 in 64d2658

ogb_dataset = NodePropPredDataset(name='ogbn-products', root=f'{data_dir}/ogb')

I tried to fix this error and ran into the implementation of NodePropPredDataset by changing the 'float' object into 'str':

additional_node_files = str(self.meta_info['additional node files']).split(',')

It passed, but another error comes out ：

Loading necessary files...
This might take a while.
Traceback (most recent call last):
  File "main-batch.py", line 43, in <module>
    dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 98, in load_dataset
    dataset = load_proteins_dataset(data_dir)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 268, in load_proteins_dataset
    ogb_dataset = NodePropPredDataset(name='ogbn-proteins', root=f'{data_dir}/ogb')
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
    self.pre_process()
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 137, in pre_process
    self.graph = read_csv_graph_raw(raw_dir, add_inverse_edge = add_inverse_edge, additional_node_files = additional_node_files, additional_edge_files = additional_edge_files)[0] # only a single graph
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/io/read_graph_raw.py", line 83, in read_csv_graph_raw
    temp = pd.read_csv(osp.join(raw_dir, additional_file + '.csv.gz'), compression='gzip', header = None).values
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
    self.handles = get_handle(
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/common.py", line 753, in get_handle
    handle = gzip.GzipFile(  # type: ignore[assignment]
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/gzip.py", line 173, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '../data//ogb/ogbn_proteins/raw/nan.csv.gz'

I don't know what‘s happening...... If you need more information, please let me know.

System Info

WSL2 Ubuntu 20.04LTS
anaconda3
python 3.8.16
torch 1.9.0+cu111
torch-cluster 1.5.9
torch-geometric 1.7.2
torch-sparse 0.6.12
torch-scatter 2.0.9
torch-spline-conv 1.2.1
ogb 1.3.1
numpy 1.22.4
networkx 2.6.1
scipy 1.6.2
scikit-learn 1.1.3

ValueError: Both 'src' and 'index' must be on the same device (got 'cpu' and 'cuda:1')

This bug occurs when I do node classification experiments on the dataset ogbn-proteins.

Traceback (most recent call last):
File "/NodeFormer/main-batch.py", line 138, in
edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
File "/python3.10/site-packages/torch_geometric/utils/subgraph.py", line 104, in subgraph
edge_index, _ = map_index(
File "/python3.10/site-packages/torch_geometric/utils/map.py", line 63, in map_index
raise ValueError(f"Both 'src' and 'index' must be on the same device "

Regarding DGL Implementation

Has the author implemented this in the DGL framework, or are there any examples of implementing a graph transformer using DGL?

the training code for the GCN, SGC models on the ogbn-proteins and amazon2m datasets

Hello! The work you have done is very interesting and has inspired me a lot. I am trying to reproduce the results of your paper for the GCN, SGC models on the ogbn-proteins and amazon2m datasets. Can you provide me with the training code in your experiments (similar to main-batch.py, but main-batch.py only contains the training code for nodeformer, not for GCN, SGC) and the associated hyperparameters? This would allow us to better follow your work. Thanks!

What is the principle of exchanging the first two dimensions when calculating QKV attention?

When reading the source code of NodeFormer, I found that when calculating QKV attention, the first and second dimensions of query/key/value were exchanged, such as lines 169-171 of nodeformer.py. After calculating attention, the first two dimensions were exchanged again when performing normalization.
At first, I thought this work was unnecessary until I commented out the code and discovered a program memory overflow. Therefore, I am very curious about the principle of this step.
Does placing the node_number in the second dimension affect the complexity of matrix multiplication when calculating the dot product of key and value? Therefore, the node_number was placed in the first dimension in advance.

how to reproduce the result of Mini-ImageNet( w/o graph) ?

Thank you for your great work.
I've noticed that when the input k-NN graph is not used, Nodeformer can yield superior results on Mini-ImageNet. This suggests that the k-NN graphs are not necessarily informative and besides, Nodeformer learns useful latent graph structures from data.
I tried to train Nodeformer on Mini-ImageNet without input k-NN graph, removing both the edge regularization and relational bias, but I only achieved an accuracy of 83.76 ± 0.78%.When training Nodeformer on Mini-ImageNet with a k-NN graph (k=5), I was able to achieve an accuracy of 87.01 ± 0.45%.I wonder if I miss some important details during the training process.

Code without inductive settings????

Hello, you explained in the experiment of your paper that this is an inductive setting, but I did not see the code for the inference stage of unknown nodes in the inductive setting. Do you have this code?

Deezer Europe Dataset on SGFormer/NodeFormer

Hi,

In the SGFormer paper, that paper claims that NodeFormer achieves a 66% test accuracy on the DeezerEurope dataset, whereas on the NodeFormer paper, the paper claims that NodeFormer achieves a 71% test accuracy on the DeezerEurope dataset.

Which one is accurate, and what is the reason for the discrepancy? I ran the NodeFormer code on the DeezerEurope dataset and got around 71% test accuracy.

Thanks

BUG: x_i doesn't match edge_index_i in main-batch.py

Description

Maybe there is a bug in main-batch.py because of random permutation of index:

NodeFormer/main-batch.py

Lines 133 to 144 in 64d2658

 idx = torch.randperm(train_idx.size(0)) 

 for i in range(num_batch): 

 idx_i = train_idx[idx[i*args.batch_size:(i+1)*args.batch_size]] 

 x_i = x[idx_i].to(device) 

 adjs_i = [] 

 edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True) 

 adjs_i.append(edge_index_i.to(device)) 

 for k in range(args.rb_order - 1): 

 edge_index_i, _ = subgraph(idx_i, adjs[k+1], num_nodes=n, relabel_nodes=True) 

 adjs_i.append(edge_index_i.to(device)) 

 optimizer.zero_grad() 

 out_i, link_loss_ = model(x_i, adjs_i, args.tau)

In line 135, train_idx is sorted but idx_i is shuffled, then in line 136 x_i is also randomly permuted which means the original order of node has been changed.
However in line 138, node idx in adjs[0] is still in order and subgraph() also remain the order.
In this way, in line 144, x_i doesn't align with adjs_i.

Then, I change the code like this way so that the node idx can keep the original order:

idx = torch.randperm(train_idx.size(0))
for i in range(num_batch):
    idx_i = train_idx[idx[i * args.batch_size:(i + 1) * args.batch_size]]
    x_i = x[idx_i].to(device)
    adjs_i = []
    edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
   
    # Modify
    idx_perm = torch.argsort(idx_i)
    edge_index_i = idx_perm[edge_index_i]

    adjs_i.append(edge_index_i.to(device))
    for k in range(args.rb_order - 1):
        edge_index_i, _ = subgraph(idx_i, adjs[k + 1], num_nodes=n, relabel_nodes=True)
        adjs_i.append(edge_index_i.to(device))
    optimizer.zero_grad()
    out_i, link_loss_ = model(x_i, adjs_i, args.tau)

Experiments

python main-batch.py --dataset ogbn-arxiv --metric acc --method nodeformer --lr 1e-2 --weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity --lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 20000 --runs 1 --epochs 1000 --eval_step 9 --device 0

Before modification, test acc in ogbn-arxiv is only about 55%. After modification, test acc of it can be over 65%.

I have a list of graphs on which I want to perform node classification task. Each graph has nodes, node attributes, edges and associated labels for each node. Any guidelines on how to train this dataset?

Reproducing results on Deezer dataset

Hi there,

Thanks for the fantastic work!

I'm running run.sh and find out that the metric for Deezer is set to be "rocauc", while the paper uses accuracy as the metric on Deezer (shown in Figure 2). When I change the metric from "rocauc" to "acc" in run.sh, the averaged accuracy is 65.12%, which is much lower than the accuracy reported in the paper (~71%). Could you kindly let me know the proper hyperparameter setting for reproducing the results on Deezer? Thanks in advance!

Accurate scores of Figure 2 in the paper

Hi, your paper is interesting and I would like to follow-up your work. But I noticed that when it comes to the experiment on cora, citeseer, deezer-europe and actor (Figure2 in the paper) there are only plots but not the accurate scores. Would it be convenient to share the accurate scores used for plotting? Thanks in advance.

'NoneType' object has no attribute 'origin'

When I run the main.py and I have the dataset named 20news,but the code can't run.
The traceback is that:
python main.py --dataset 20news --metric acc --rand_split --method nodeformer --lr 0.001 --weight_decay 5e-3 --num_layers 2 --hidden_channels 64 --num_heads 4 --rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --run 5 --epochs 200 --device 1
Traceback (most recent call last):
File "main.py", line 8, in
from torch_geometric.utils import to_undirected, remove_self_loops, add_self_loops
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/init.py", line 5, in
import torch_geometric.data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in
from torch_sparse import coalesce, SparseTensor
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_sparse/init.py", line 14, in
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'

Clarification on Visualization Techniques for Graphs 4 and 7 under Linear Complexity Constraints

I am currently examining the methodologies presented in your paper and am unclear about the techniques used to visualize Graphs 4 and 7 under linear complexity constraints. According to Equation 7, it seems infeasible to compute the attention scores between any two nodes using a linear approximation.
Could you please clarify the specific approach utilized for these visualizations under the mentioned constraints? Additionally, if there are any potential modifications or alternative methods that could be recommended to handle these calculations more feasibly, I would appreciate your insights.

$$z_{(l+1)u} \approx \frac{\phi(q_{u}/\sqrt{\tau})^T\sum_{\nu=1}^N e^{g_{\nu}/\tau} \phi(k_{\nu}/\sqrt{\tau}) v_{\nu}^T}{\phi(q_{u}/\sqrt{\tau})^T\sum_{\omega=1}^N e^{g_{\omega}/\tau} \phi(k_{\omega}/\sqrt{\tau}) } $$

Regarding the edge-level regularization w/o input graph

Hello,
thank you for your very awesome work!
I just come up with one question, if there is no input graph, does that mean we can not construct any edge-level regularization loss? If so, is that enough to train the model with relatively high degree of freedom? And I want to find out if there are some suggestions on how to address this problem if there is no input graph and not enough supervised information? Thank you!

	idx = torch.randperm(train_idx.size(0))
	for i in range(num_batch):
	idx_i = train_idx[idx[iargs.batch_size:(i+1)args.batch_size]]
	x_i = x[idx_i].to(device)
	adjs_i = []
	edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
	adjs_i.append(edge_index_i.to(device))
	for k in range(args.rb_order - 1):
	edge_index_i, _ = subgraph(idx_i, adjs[k+1], num_nodes=n, relabel_nodes=True)
	adjs_i.append(edge_index_i.to(device))
	optimizer.zero_grad()
	out_i, link_loss_ = model(x_i, adjs_i, args.tau)

qitianwu / nodeformer Goto Github PK

nodeformer's People

Contributors

Stargazers

Watchers

Forkers

nodeformer's Issues

System Info

Description

Experiments

Recommend Projects

Recommend Topics

Recommend Org