qitianwu / nodeformer Goto Github PK
View Code? Open in Web Editor NEWThe official implementation of NeurIPS22 spotlight paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification"
The official implementation of NeurIPS22 spotlight paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification"
Hi Qitian,
Thanks for this fantastic work!
In nodeformer.py
line 284:
# compute update by relational bias of input adjacency, requires O(E)
for i in range(self.rb_order):
z_next += add_conv_relational_bias(value, adjs[i], self.b[i], self.rb_trans)
I am a little bit confused that, in w/o graph circumstances (e.g. #2 ,) is the relatinoal bias still calculated?
Thanks.
Hello, I've encountered an issue while running the model on the official splits of Cora, CiteSeer, and Pubmed. Despite conducting five separate runs, the results for the last four runs are identical. I have meticulously ensured that the parameters are reset at the beginning of each run. Could this be a mere coincidence, or might there be an underlying error that I'm missing?
你好,这篇论文的PPT疑似失效了,作者大大能再上传一份新的PPT吗?
I can run your code correctly on small dataset by using scripts in run.sh
and get similar results within the paper but when I'm trying to reproduce nodeformer on large graph dataset, it comes out an error on both amazon2m and ogb proteins dataset.
Traceback (most recent call last):
File "main-batch.py", line 43, in <module>
dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
File "/home/workspace/NF/NodeFormer/dataset.py", line 102, in load_dataset
dataset = load_amazon2m_dataset(data_dir)
File "/home/workspace/NF/NodeFormer/dataset.py", line 308, in load_amazon2m_dataset
ogb_dataset = NodePropPredDataset(name='ogbn-products', root=f'{data_dir}/ogb')
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
self.pre_process()
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 111, in pre_process
additional_node_files = self.meta_info['additional node files'].split(',')
AttributeError: 'float' object has no attribute 'split'
This error occurs in dataset.py
while calling the NodePropPredDataset functiuon :
Line 290 in 64d2658
Line 306 in 64d2658
I tried to fix this error and ran into the implementation of NodePropPredDataset by changing the 'float' object into 'str':
additional_node_files = str(self.meta_info['additional node files']).split(',')
It passed, but another error comes out :
Loading necessary files...
This might take a while.
Traceback (most recent call last):
File "main-batch.py", line 43, in <module>
dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
File "/home/workspace/NF/NodeFormer/dataset.py", line 98, in load_dataset
dataset = load_proteins_dataset(data_dir)
File "/home/workspace/NF/NodeFormer/dataset.py", line 268, in load_proteins_dataset
ogb_dataset = NodePropPredDataset(name='ogbn-proteins', root=f'{data_dir}/ogb')
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
self.pre_process()
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 137, in pre_process
self.graph = read_csv_graph_raw(raw_dir, add_inverse_edge = add_inverse_edge, additional_node_files = additional_node_files, additional_edge_files = additional_edge_files)[0] # only a single graph
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/io/read_graph_raw.py", line 83, in read_csv_graph_raw
temp = pd.read_csv(osp.join(raw_dir, additional_file + '.csv.gz'), compression='gzip', header = None).values
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
self._engine = self._make_engine(f, self.engine)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
self.handles = get_handle(
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/common.py", line 753, in get_handle
handle = gzip.GzipFile( # type: ignore[assignment]
File "/root/anaconda3/envs/nodeformer/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '../data//ogb/ogbn_proteins/raw/nan.csv.gz'
I don't know what‘s happening...... If you need more information, please let me know.
This bug occurs when I do node classification experiments on the dataset ogbn-proteins.
Traceback (most recent call last):
File "/NodeFormer/main-batch.py", line 138, in
edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
File "/python3.10/site-packages/torch_geometric/utils/subgraph.py", line 104, in subgraph
edge_index, _ = map_index(
File "/python3.10/site-packages/torch_geometric/utils/map.py", line 63, in map_index
raise ValueError(f"Both 'src' and 'index' must be on the same device "
Has the author implemented this in the DGL framework, or are there any examples of implementing a graph transformer using DGL?
Hello! The work you have done is very interesting and has inspired me a lot. I am trying to reproduce the results of your paper for the GCN, SGC models on the ogbn-proteins and amazon2m datasets. Can you provide me with the training code in your experiments (similar to main-batch.py, but main-batch.py only contains the training code for nodeformer, not for GCN, SGC) and the associated hyperparameters? This would allow us to better follow your work. Thanks!
When reading the source code of NodeFormer, I found that when calculating QKV attention, the first and second dimensions of query/key/value were exchanged, such as lines 169-171 of nodeformer.py. After calculating attention, the first two dimensions were exchanged again when performing normalization.
At first, I thought this work was unnecessary until I commented out the code and discovered a program memory overflow. Therefore, I am very curious about the principle of this step.
Does placing the node_number in the second dimension affect the complexity of matrix multiplication when calculating the dot product of key and value? Therefore, the node_number was placed in the first dimension in advance.
Thank you for your great work.
I've noticed that when the input k-NN graph is not used, Nodeformer can yield superior results on Mini-ImageNet. This suggests that the k-NN graphs are not necessarily informative and besides, Nodeformer learns useful latent graph structures from data.
I tried to train Nodeformer on Mini-ImageNet without input k-NN graph, removing both the edge regularization and relational bias, but I only achieved an accuracy of 83.76 ± 0.78%.When training Nodeformer on Mini-ImageNet with a k-NN graph (k=5), I was able to achieve an accuracy of 87.01 ± 0.45%.I wonder if I miss some important details during the training process.
Hello, you explained in the experiment of your paper that this is an inductive setting, but I did not see the code for the inference stage of unknown nodes in the inductive setting. Do you have this code?
Hi,
In the SGFormer paper, that paper claims that NodeFormer achieves a 66% test accuracy on the DeezerEurope dataset, whereas on the NodeFormer paper, the paper claims that NodeFormer achieves a 71% test accuracy on the DeezerEurope dataset.
Which one is accurate, and what is the reason for the discrepancy? I ran the NodeFormer code on the DeezerEurope dataset and got around 71% test accuracy.
Thanks
Maybe there is a bug in main-batch.py
because of random permutation of index:
Lines 133 to 144 in 64d2658
In line 135, train_idx
is sorted but idx_i
is shuffled, then in line 136 x_i
is also randomly permuted which means the original order of node has been changed.
However in line 138, node idx in adjs[0]
is still in order and subgraph()
also remain the order.
In this way, in line 144, x_i
doesn't align with adjs_i
.
Then, I change the code like this way so that the node idx can keep the original order:
idx = torch.randperm(train_idx.size(0))
for i in range(num_batch):
idx_i = train_idx[idx[i * args.batch_size:(i + 1) * args.batch_size]]
x_i = x[idx_i].to(device)
adjs_i = []
edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
# Modify
idx_perm = torch.argsort(idx_i)
edge_index_i = idx_perm[edge_index_i]
adjs_i.append(edge_index_i.to(device))
for k in range(args.rb_order - 1):
edge_index_i, _ = subgraph(idx_i, adjs[k + 1], num_nodes=n, relabel_nodes=True)
adjs_i.append(edge_index_i.to(device))
optimizer.zero_grad()
out_i, link_loss_ = model(x_i, adjs_i, args.tau)
python main-batch.py --dataset ogbn-arxiv --metric acc --method nodeformer --lr 1e-2 --weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity --lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 20000 --runs 1 --epochs 1000 --eval_step 9 --device 0
Before modification, test acc in ogbn-arxiv
is only about 55%. After modification, test acc of it can be over 65%.
Hi there,
Thanks for the fantastic work!
I'm running run.sh
and find out that the metric for Deezer is set to be "rocauc", while the paper uses accuracy as the metric on Deezer (shown in Figure 2). When I change the metric from "rocauc" to "acc" in run.sh
, the averaged accuracy is 65.12%, which is much lower than the accuracy reported in the paper (~71%). Could you kindly let me know the proper hyperparameter setting for reproducing the results on Deezer? Thanks in advance!
Hi, your paper is interesting and I would like to follow-up your work. But I noticed that when it comes to the experiment on cora, citeseer, deezer-europe and actor (Figure2 in the paper) there are only plots but not the accurate scores. Would it be convenient to share the accurate scores used for plotting? Thanks in advance.
When I run the main.py and I have the dataset named 20news,but the code can't run.
The traceback is that:
python main.py --dataset 20news --metric acc --rand_split --method nodeformer --lr 0.001 --weight_decay 5e-3 --num_layers 2 --hidden_channels 64 --num_heads 4 --rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --run 5 --epochs 200 --device 1
Traceback (most recent call last):
File "main.py", line 8, in
from torch_geometric.utils import to_undirected, remove_self_loops, add_self_loops
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/init.py", line 5, in
import torch_geometric.data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in
from torch_sparse import coalesce, SparseTensor
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_sparse/init.py", line 14, in
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'
I am currently examining the methodologies presented in your paper and am unclear about the techniques used to visualize Graphs 4 and 7 under linear complexity constraints. According to Equation 7, it seems infeasible to compute the attention scores between any two nodes using a linear approximation.
Could you please clarify the specific approach utilized for these visualizations under the mentioned constraints? Additionally, if there are any potential modifications or alternative methods that could be recommended to handle these calculations more feasibly, I would appreciate your insights.
Hello,
thank you for your very awesome work!
I just come up with one question, if there is no input graph, does that mean we can not construct any edge-level regularization loss? If so, is that enough to train the model with relatively high degree of freedom? And I want to find out if there are some suggestions on how to address this problem if there is no input graph and not enough supervised information? Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.