snap-stanford / deepsnap Goto Github PK
View Code? Open in Web Editor NEWPython library assists deep learning on graphs
Home Page: https://snap.stanford.edu/deepsnap/
License: MIT License
Python library assists deep learning on graphs
Home Page: https://snap.stanford.edu/deepsnap/
License: MIT License
It may happen that the absence of an edge in a graph is due to the absence of knowledge.
It may be useful to add the possibility to pass True Negative edges to the graph constructor.
I couldn't find the detailed tutorial of EnsembleGenerator in your document.
Could you give an example of how EnsembleGenerator works?
What is the meaning of each parameter?
Hi,
I'm getting an error in the function num_nodes
because sometimes self.G
is a list.
Line 220 in 6197dce
Would it possible to update such function and just return the length of the attribute G.nodes
? Something like this
@property
def num_nodes(self) -> int:
r"""
Return number of nodes in the graph.
Returns:
int: Number of nodes in the graph.
"""
G = None
if isinstance(self.G, list):
G = self.G[0]
else:
G = self.G
return len(G.nodes)
Thanks in advance
Hi,
I am trying to reproduce the examples for heterogeneous node classification but with multiples graphs, i.e. with an inductive split.
Is there any resource to implement a model with type of dataset ? Or can you provide any guidance to implement such a model.
Thanks in advance.
Armand
Hi there!
I would like to do negative sampling in training with a HeteroGraph for link prediction, and I am unsure how to retrieve the negative edges and their scores. I've been following the link_prediction.py script mostly. What I have done so far is the following:
# Create a dataset with negative sampling enabled
dataset = GraphDataset(
[hetero],
task='link_pred',
edge_train_mode=edge_train_mode,
edge_negative_sampling_ratio=1,
)
...
# Training
pred = model(batch)
The output of pred and batch.edge_index looks like this:
batch.edge_index
{('n4', '0', 'n1'): tensor([[1, 3, 4, 7,...4, 7, 8]]), ('n1', '1', 'n0'): tensor([[12, 13, 17,... 8, 9]]), ('n4', '2', 'n4'): tensor([[10, 14, 18,... 23, 25]]), ('n3', '3', 'n3'): tensor([[ 0, 6, 10,... 15, 17]]), ('n0', '4', 'n5'): tensor([[13, 15, 17,... 8, 9]])}
pred
{('n4', '0', 'n1'): tensor([-1.4144e-06,...ackward1>), ('n1', '1', 'n0'): tensor([-1.6777e-06,...ackward1>), ('n4', '2', 'n4'): tensor([6.2179, 6.21...ackward1>), ('n3', '3', 'n3'): tensor([ 10.5874, 1...ackward1>), ('n0', '4', 'n5'): tensor([-1.5251, -1....ackward1>)}
How do I know which of these edges are the positive and negative ones? Or are these only positive ones and I have to generate the negative ones?
I'd be thankful for any help !!
Best,
Sophia
Hello, i ran a lot of examples in this repo, but did not mention how my data should be modeled into a graph.
For example, how is a MultiDiGraph such as WN18.gpickle established step by step, could you provide a sample code?
To fit the GPU memory, DGL and PyG use neighboer sampler. I searched and found no information about neighbor samplers. So there is none? If none, would you make in near future?
Example code:
import networkx as nx
G = nx.DiGraph()
G.add_node(0, node_type='n1', node_label=1, node_feature=torch.Tensor([0.1, 0.2, 0.3, 0.7]))
G.add_node(1, node_type='n1', node_label=1, node_feature=torch.Tensor([0.2, 0.3, 0.4, 0.4]))
G.add_node(2, node_type='n2', node_label=0, node_feature=torch.Tensor([0.3, 0.4, 0.5]))
G.add_edge(0, 1, edge_type='e1', edge_feature=torch.Tensor([0.1,0.1]))
G.add_edge(0, 2, edge_type='e1', edge_feature=torch.Tensor([0.1,0.1]))
G.add_edge(1, 2, edge_type='e2', edge_feature=torch.Tensor([0.1,0.1, 0.1]))
H = HeteroGraph(G)
H = Batch.from_data_list([H]).to("cuda")
import torch_geometric.nn as pyg_nn
nf, batch = H.node_feature, H.batch
H = pyg_nn.global_add_pool(nf, batch)
Hi,
I encountered a runtime error while splitting a dataset for link prediction tasks (see code below). This seems to be a bug for python 3.8 only, the code runs without error on python 3.7.4.
Thanks!
Environment:
Python 3.8.5
Deepsnap 0.1.1
networkx 2.5
Code run:
import networkx as nx
from deepsnap.graph import Graph
from deepsnap.dataset import GraphDataset
G = nx.complete_graph(100)
H1 = Graph(G)
H2 = H1.clone()
dataset = GraphDataset(graphs=[H1, H2], task='link_pred') # modified to link_prediction task.
train, val, test = dataset.split(transductive=True, split_ratio=[0.8, 0.1, 0.1])
print(train, val, test)
Error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-70292612295a> in <module>
8 dataset = GraphDataset(graphs=[H1, H2], task='link_pred')
9
---> 10 train, val, test = dataset.split(transductive=True, split_ratio=[0.8, 0.1, 0.1])
11 print(train, val, test)
~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/dataset.py in split(self, transductive, split_ratio, split_types)
591 if transductive and self.task != 'graph':
592 dataset_return = \
--> 593 self._split_transductive(split_ratio, split_types)
594 else:
595 dataset_return = \
~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/dataset.py in _split_transductive(self, split_ratio, split_types)
450 for graph_temp in dataset_current.graphs:
451 if type(graph_temp) == Graph:
--> 452 graph_temp._create_neg_sampling(
453 self.edge_negative_sampling_ratio)
454 elif type(graph_temp) == HeteroGraph:
~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/graph.py in _create_neg_sampling(self, negative_sampling_ratio, resample)
911 if (len(edge_index_all) > 0):
912 negative_edges = \
--> 913 self.negative_sampling(edge_index_all, self.num_nodes, num_neg_edges)
914 else:
915 return torch.LongTensor([])
~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/graph.py in negative_sampling(edge_index, num_nodes, num_neg_samples)
1083 rest = rest[mask.nonzero().view(-1)]
1084
-> 1085 row = perm / num_nodes
1086 col = perm % num_nodes
1087 neg_edge_index = torch.stack([row, col], dim=0).long()
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
hetero_gnn.py imports container_abcs from torch._six, which has been removed in later versions of pytorch. Should be updated to use collections.abc directly, since container_abcs was just a wrapper for this module.
Hello,
I am trying to reproduce the examples for heterogeneous link prediction but using multiples graphs, i.e. with inductive split.
Is there any resources or guidance to implement a model with this type of dataset?
Thank you!
Laura
Hello,
I am working on setting up link prediction on a heterogeneous graph with different node types and used the examples in deepsnap/examples/link_prediction_hetero
as a starting point. In these examples, the HeteroSAGEConv
layer is used, with its default of setting 'remove_self_loops' to true.
Could this be problematic when I have two different node types, as the edge index for the links between these could have links between, for example, node 0 of one node type and node 0 of the other node type? In fact, this is what happened for me.
As far as I understood the negative sampling algorithm in the hetero_graph/negative_sampling()
method expects the indices for both node types to start from zero, rather than have unique indices, so the solution can't be to create unique node indices across the different node types?
If I am correct, it may be worth adding a comment on the need to remove self-loops in careful pre-processing instead of using the default functionality in the HeteroSAGEConv layer?
Another option could be to add a check if the source and destination node type are the same in the HeteroSAGEConv implementation, before running pyg_utils.remove_self_loops(edge_index)? I guess it doesn't really make sense to remove self loops in the case where the source and destination node type is different anyway?
If I have misunderstood the situation, I would very much appreciate to be corrected!
Thank you!
In the constructor of the HeteroSAGE class, even if self.in_channels_self has been initialized to in_channels_neigh in case of None input, self.lin_self = nn.Linear(in_channels_self, out_channels)
does not use the object parameter.
Therefore, the code returns the following error: TypeError new(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2
class HeteroSAGEConv(pyg_nn.MessagePassing):
r"""The heterogeneous compitable GraphSAGE operator is derived from the `"Inductive Representation
Learning on Large Graphs" <https://arxiv.org/abs/1706.02216>`_, `"Modeling polypharmacy side
effects with graph convolutional networks" <https://arxiv.org/abs/1802.00543>`_ and `"Modeling
Relational Data with Graph Convolutional Networks" <https://arxiv.org/abs/1703.06103>`_ papers.
Args:
in_channels_neigh (int): The input dimension of the end node type.
out_channels (int): The dimension of the output.
in_channels_self (int): The input dimension of the start node type.
Default is `None` where the `in_channels_self` is equal to `in_channels_neigh`.
"""
def __init__(self, in_channels_neigh, out_channels, in_channels_self=None):
super(HeteroSAGEConv, self).__init__(aggr='add')
self.in_channels_neigh = in_channels_neigh
if in_channels_self is None:
self.in_channels_self = in_channels_neigh
else:
self.in_channels_self = in_channels_self
self.out_channels = out_channels
self.lin_neigh = nn.Linear(in_channels_neigh, out_channels)
self.lin_self = nn.Linear(in_channels_self, out_channels)
self.lin_update = nn.Linear(out_channels * 2, out_channels)
I created a KG and intended to split them into train/val/test without negative triples. I saved my KG as a multidigraph (networkx) and set resample_negatives=False. However, the splitting graphs still contain many non-existing edges. But the data "WN18.gpickle" you provided did not have this issue. Could you please tell me how you generate the WN18.gpickle? Thank you very much for your help.
I made a PyG heterograph as below.
cipdata
Out[145]:
HeteroData(
card={ x=[13752, 302] },
user={ x=[10996, 304] },
(user, solved, card)={
edge_index=[2, 28992],
edge_attr=[28992]
},
(card, rev_solved, user)={
edge_index=[2, 28992],
edge_attr=[28992]
}
)
Then I tried to convert this cipdata PyG heterograph into DeepSnap's heterograph
from deepsnap.hetero_graph import HeteroGraph
graph = HeteroGraph(cipdata)
Then I get this error
graph = HeteroGraph(cipdata)
Traceback (most recent call last):
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 48, in __getattr__
return self[key]
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 68, in __getitem__
return self._mapping[key]
KeyError: 'number_of_nodes'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-147-5bdfc5ffc776>", line 1, in <module>
graph = HeteroGraph(cipdata)
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/hetero_graph.py", line 94, in __init__
self._update_tensors(init=True)
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/graph.py", line 515, in _update_tensors
self._update_attributes()
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/graph.py", line 531, in _update_attributes
if self.G.number_of_nodes() == 0:
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/hetero_data.py", line 118, in __getattr__
return getattr(self._global_store, key)
File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 51, in __getattr__
f"'{self.__class__.__name__}' object has no attribute '{key}'")
AttributeError: 'BaseStorage' object has no attribute 'number_of_nodes'
How do I easily convert pyg heterograph to deepsnap one?
I wish some detailed example code can be provided.
Hi, in the Inductive Link Prediction Split section of this colab notebook, I see we're making copies of the input graph to help set up the train / val / test splits.
This technique is echoed in this example as well but it seems to be more explicitly implemented to support learning on multiple graphs.
I'd like to use DeepSNAP's functionalities to manage train/val/test splits for edge prediction on a large graph (100MM+ nodes) and it seems strange/undesirable to have to create copies in this way. Is this truly the case or am I missing something? Does DeepSNAP allow inductive learning on single graphs?
import networkx as nx
from copy import deepcopy
from deepsnap.graph import Graph
from deepsnap.dataset import GraphDataset
run_transductive = False
n_copies_for_inductive = 10
myG = nx.complete_graph(50)
snap_graph = Graph(myG)
if not run_transductive:
graphs = [copy.deepcopy(snap_graph) for _ in range(n_copies_for_inductive)]
else:
graphs = [snap_graph]
dataset = GraphDataset(
graphs,
task='link_pred',
edge_negative_sampling_ratio=1,
edge_train_mode='disjoint'
)
train, val, test = dataset.split(transductive=run_transductive, split_ratio=[0.85, 0.05, 0.1])
the order with source node and target node will exchange.
`
if self.isdirect():
if isinstance(edge_index, dict):
for key in edge_index:
edge_index[key] = torch.cat(
[edge_index[key], torch.flip(edge_index[key], [1])], dim=0)
if isinstance(edge_index, dict):
for key in edge_index:
permute_tensor = edge_index[key].permute(1, 0)
source_node_index =
_convert_to_tensor_index(permute_tensor[0])
target_node_index =
_convert_to_tensor_index(permute_tensor[1])
edge_index[key] =
torch.stack([source_node_index, target_node_index])`
I have a graph with two different types of nodes (user and doc). The edge_index contains edges between users and also users and docs. In message_passing, node_feature_neigh is getting assigned to data (data = kwargs.get(arg[:-2], inspect.Parameter.empty)
) when arg is 'node_feature_neigh_j'
. However, later, index_select is used to select rows from data using node_id in edge_index which can be either a user or a doc. So the issue is that data at this point contains only the features for doc nodes but edge_index contains node_ids from users as well and therefore, I get index out of range exception.
File "...torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "...deepsnap/deepsnap/hetero_gnn.py", line 135, in forward
self.convs[message_key](
File "...torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "...deepsnap/deepsnap/hetero_gnn.py", line 44, in forward
return self.propagate(
File "...torch_geometric/nn/conv/message_passing.py", line 250, in propagate
kwargs = self.__collect__(edge_index, size, mp_type, kwargs)
File "...torch_geometric/nn/conv/message_passing.py", line 147, in __collect__
out[arg] = data.index_select(self.node_dim,
IndexError: index out of range in self
In the provided example for heterogenous graph, cora and citseer are combined in one graph where two graphs are isolated and each have their own node and edge types and I guess that's why this code works.
I would really appreciate any help with this. Happy to elaborate more if necessary.
Hello. I know what edge_index is but I am still struggling with edge_label_index.
Could you please make it clear with some examples and detailed explanation?
Thank you in advance.
Built from source using
$ git clone https://github.com/snap-stanford/deepsnap $ cd deepsnap $ pip install .
but running
python3 node_classification_acm.py
throws the following error
Traceback (most recent call last): File "node_classification_acm.py", line 278, in <module> loss = train(model, optimizer, hetero_graph, train_idx) File "node_classification_acm.py", line 182, in train preds = model(hetero_graph.node_feature, hetero_graph.edge_index) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "node_classification_acm.py", line 163, in forward x = forward_op(x, self.bns1) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/deepsnap/hetero_gnn.py", line 245, in forward_op res[key] = module_dict(x[key], **kwargs) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) TypeError: forward() takes 1 positional argument but 2 were given
Not sure if these info are relevant but
`deepsnap.version
'0.2.0'
torch_geometric.version
'1.7.2'
`
hello, when i call:
hete = HeteroGraph(G_orig)
i get an answer
AttributeError: 'HeteroGraph' object has no attribute 'node_to_tensor_mapping'
the graph is very simple - two types of vertices
When importing Graph
from deepsnap.graph
, I get the following error:
ImportError Traceback (most recent call last)
/tmp/ipykernel_119/3679505279.py in <module>
1 import torch.optim as optim
2 from torch_geometric.data import DataLoader
----> 3 from deepsnap.graph import Graph
4 from deepsnap.batch import Batch
5 from deepsnap.dataset import GraphDataset
/opt/conda/lib/python3.9/site-packages/deepsnap/__init__.py in <module>
7 import deepsnap.batch
8 import deepsnap.hetero_graph
----> 9 import deepsnap.hetero_gnn
10
11 import networkx as _netlib
/opt/conda/lib/python3.9/site-packages/deepsnap/hetero_gnn.py in <module>
5
6 from torch import Tensor
----> 7 from torch._six import container_abcs
8 from torch_geometric.nn.inits import reset
9 from torch_sparse import matmul
ImportError: cannot import name 'container_abcs' from 'torch._six' (/opt/conda/lib/python3.9/site-packages/torch/_six.py)
After some looking, I saw that this was fixed in this commit on June 19th, but the latest update to PyPI was on April 7, so it still persists when pulling using pip. Could a new version be pushed to PyPI?
Thanks!
Could you add a bibentry so we can cite the framework?
I am converting a custom PyG dataset in the following manner
pyg_dataset = GraphCountDataset(
dataset_path, name="subgraph_counting", task_label_idx=self.task_label_index[task_key]
)
mode_idxs = pyg_dataset.separate_data(seed=None, fold_idx=None)
deepsnap_graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
dataset = GraphDataset(deepsnap_graphs, task="graph")
and then using a DataLoader with the collate_fn
provided by deepSNAP. I found out the overall pipeline to be incredibly slower after the conversion, and found out that it was due to the NetworkX graph being collated along with the rest of the data.
I have currently resolved my issue by wrapping the collate_fn in this function where I am removing unwanted keys, such as G
or any other data that I won't be using in the pipeline.
def from_data_list_ignore_keys(
data_list: List[Graph],
keys_to_ignore: List[str] = None,
follow_batch: List = None,
transform: Callable = None,
**kwargs
):
if keys_to_ignore is not None:
for key in keys_to_ignore:
for data in data_list:
data[key] = None
return Batch.from_data_list(data_list=data_list, follow_batch=follow_batch, transform=transform, **kwargs)
Am I doing something wrong or would it be reasonable to integrate the functionality to choose the keys to collate directly into the library?
I am taking CS224W, just saw this project : )
Just a suggestion switching from Travis to GitHub Actions when this project is still young (later will take more effects), because
Here is speed comparison for one of my repo between Travis and GitHub Actions:
Does deep snap offer any methods for temporal link prediction?
When the CORA Node Classification example is run on old cpus (eg. madmax), the line that loads the dataset pyg_dataset = Planetoid('./cora', name, transform=T.TargetIndegree())
, throws a Illegal instruction (core dumped)
error
Hi there!
I was just trying out to use the ogbl-biokg graph with DeepSNAP, more precisely using it as input for the link_prediction.py for heterogeneous graphs. Since deepSNAP requires a networkx or pytorch geometric object, I tried to convert the ogbl biokg graph into a pytorch geometric object and then to transform it to a HeteroGraph, as you point out in the tutorial here.
Yet, when I did that it threw an error since the graph would not have an 'edge_index':
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
47 try:
---> 48 return self[key]
49 except KeyError:
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getitem__(self, key)
67 def __getitem__(self, key: str) -> Any:
---> 68 return self._mapping[key]
69
KeyError: 'edge_index'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/tmp/ipykernel_42299/3068006995.py in <module>
----> 1 graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/deepsnap/graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
1991 if netlib is not None:
1992 deepsnap._netlib = netlib
-> 1993 if data.is_directed():
1994 G = deepsnap._netlib.DiGraph()
1995 else:
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_directed(self)
184 def is_directed(self) -> bool:
185 r"""Returns :obj:`True` if graph edges are directed."""
--> 186 return not self.is_undirected()
187
188 def clone(self):
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_undirected(self)
180 def is_undirected(self) -> bool:
181 r"""Returns :obj:`True` if graph edges are undirected."""
--> 182 return all([store.is_undirected() for store in self.edge_stores])
183
184 def is_directed(self) -> bool:
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in <listcomp>(.0)
180 def is_undirected(self) -> bool:
181 r"""Returns :obj:`True` if graph edges are undirected."""
--> 182 return all([store.is_undirected() for store in self.edge_stores])
183
184 def is_directed(self) -> bool:
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in is_undirected(self)
395 return value.is_symmetric()
396
--> 397 edge_index = self.edge_index
398 edge_attr = self.edge_attr if 'edge_attr' in self else None
399 return is_undirected(edge_index, edge_attr, num_nodes=self.size(0))
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
48 return self[key]
49 except KeyError:
---> 50 raise AttributeError(
51 f"'{self.__class__.__name__}' object has no attribute '{key}'")
52
AttributeError: 'GlobalStorage' object has no attribute 'edge_index'
How can I convert the ogbl-biokg graph into an object that can be used with deepSNAP?
I would very much appreciate any help!
A person may be interested in extending the Graph classes with new features, however, the strict check over the types forces the final user to modify a lot of classes.
An alternative may be to use the function issubclass
.
if type(graph) == Graph:
graphs_split = graph.split(self.task, split_ratio)
elif type(graph) == HeteroGraph:
graphs_split = graph.split(
task=self.task,
split_types=split_types,
split_ratio=split_ratio,
edge_split_mode=self.edge_split_mode)
else:
raise TypeError('element in self.graphs of unexpected type')
I'm sorry I didn't find a good example code or documentation online that I can follow.
In example code below, how come the GNN didn't require a dimension 200/350/800 input?
from deepsnap.hetero_gnn import HeteroConv, HeteroSAGEConv, forward_op, loss_op
from deepsnap.hetero_graph import HeteroGraph
import networkx as nx
conv1 = {}
conv1[("n1","e0","n0")] = HeteroSAGEConv(800,600,200)
conv1[("n1","e1","n1")] = HeteroSAGEConv(350,600,350)
conv1 = HeteroConv(conv1)
G = nx.DiGraph()
G.add_node("n0", node_type="n0", node_feature=\
torch.zeros((1)).float())
G.add_node("n1", node_type="n1", node_feature=\
torch.zeros((1)).float())
G.add_edge("n0", "n1", edge_type="e1")
G = HeteroGraph(G)
G = conv1(G, G.edge_index)
Hello,
I am trying to use a custom dataset for link prediction,
What i tried was
pyg_dataset = My_Own_Dataset()
graphs1 = GraphDataset.pyg_to_graphs(pyg_dataset) #error here
but i am getting an error: TypeError: zip argument #2 must support iteration
The full error is
/usr/local/lib/python3.7/dist-packages/deepsnap/dataset.py in pyg_to_graphs(dataset, verbose, fixed_split, tensor_backend, netlib)
1280 netlib=netlib
1281 )
-> 1282 for data in dataset
1283 ]
1284
/usr/local/lib/python3.7/dist-packages/deepsnap/dataset.py in <listcomp>(.0)
1280 netlib=netlib
1281 )
-> 1282 for data in dataset
1283 ]
1284
/usr/local/lib/python3.7/dist-packages/deepsnap/graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
2025 if Graph._is_node_attribute(key):
2026 if not tensor_backend:
-> 2027 Graph.add_node_attr(G, key, value)
2028 else:
2029 attributes[key] = value
/usr/local/lib/python3.7/dist-packages/deepsnap/graph.py in add_node_attr(G, attr_name, node_attr)
1909 # TODO: Better method here?
1910 node_list = list(G.nodes)
-> 1911 attr_dict = dict(zip(node_list, node_attr))
1912 deepsnap._netlib.set_node_attributes(G, attr_dict, name=attr_name)
1913
TypeError: zip argument #2 must support iteration
I have also tried to do this using networkx and converting it to a pyg graph and converting from there, in that case I get a different error .
This error doesn't happen when I am using a graph in Planetoid as in the Link prediction with DeepSnap example colab notebook.
What could be causing this problem? Is there a guide on how I can use custom data on DeepSnap?
Thank you for your help!
What does gengraph.py
do in examples/syn/
? Is this a graph generator? What paper/method is this based on?
Hello,
After upgrading from PyG 2.3.x to PyG 2.4.0, the keys
property of torch_geometric.data.data.BaseData
was refactored into a method, leading to the following error when calling GraphDataset.pyg_to_graphs
File [.../lib/python3.10/site-packages/deepsnap/dataset.py:1277](https://file+.vscode-resource.vscode-cdn.net/.../lib/python3.10/site-packages/deepsnap/dataset.py:1277), in <listcomp>(.0)
1274 return graphs_split
1275 else:
1276 return [
...
1975 data.edge_attr if "edge_attr" in data.keys else None
1976 )
1977 kwargs["node_label"], kwargs["edge_label"] = None, None
TypeError: argument of type 'method' is not iterable
You can use this sample to replicate the issue:
from deepsnap.dataset import GraphDataset
from torch_geometric.datasets import Planetoid
root = './tmp/cora'
name = 'Cora'
pyg_dataset= Planetoid(root, name)
graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
Could someone please have a look?
Thank you in advance!
Sebastian
A person may be interested in using a fixed split for link prediction tasks (e.g. given with the dataset).
In these cases, it may be useful to allow the user to pass a NumPy array containing the list of the edges to use in the training/validation/test set.
I installed deepsnap on a remote machine using pip, with the below dependencies first:
PyTorch --> https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
NetworkX --> https://github.com/networkx/networkx
But when I import:
from deepsnap.graph import Graph
I get:
Illegal instruction (core dumped)
Thank you!
Hi,
I installed DeepSnap using:
$ git clone https://github.com/snap-stanford/deepsnap
$ cd deepsnap
$ pip install .
When I try to run :
root = './tmp/cora'
name = 'Cora'
# The Cora dataset
pyg_dataset= Planetoid(root, name)
# PyG dataset to a list of deepsnap graphs
graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
# Get the first deepsnap graph (CORA only has one graph)
graph = graphs[0]
print(graph)
It throws the error: AttributeError: module 'deepsnap' has no attribute '_netlib'
The full error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4896/229941490.py in <module>
6
7 # Convert to a list of deepsnap graphs
----> 8 graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
9
10 # Convert list of deepsnap graphs to deepsnap dataset with specified task=graph
~\anaconda3\envs\dev\lib\site-packages\deepsnap\dataset.py in pyg_to_graphs(dataset, verbose, fixed_split, tensor_backend, netlib)
1274 return graphs_split
1275 else:
-> 1276 return [
1277 Graph.pyg_to_graph(
1278 data, verbose=verbose,
~\anaconda3\envs\dev\lib\site-packages\deepsnap\dataset.py in <listcomp>(.0)
1275 else:
1276 return [
-> 1277 Graph.pyg_to_graph(
1278 data, verbose=verbose,
1279 tensor_backend=tensor_backend,
~\anaconda3\envs\dev\lib\site-packages\deepsnap\graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
1994 G = deepsnap._netlib.DiGraph()
1995 else:
-> 1996 G = deepsnap._netlib.Graph()
1997 G.add_nodes_from(range(data.num_nodes))
1998 G.add_edges_from(data.edge_index.T.tolist())
AttributeError: module 'deepsnap' has no attribute '_netlib'
Is there any workaround or a solution to solve this error?
Note: The code that I am trying to run can be found in the 62nd executable cell of this notebook
Library Versions:
PyTorch: 1.9.0
PyTorch Geometric: 1.7.2
NetworkX: 2.6.1
Thank you :)
Hello,
I am trying to use the ogbl-biokg (docs | github) with the DeepSNAP package. The graph has 5.088.434 edges and 93.773 nodes. I created a custom dataset (link to the code), but I have massive performance issues.
The problem is that it takes more than 30 min for the graph to process and generate the HeteroGraph
object:
hetero = HeteroGraph(G)
And that the memory consumption is too much, even for a node with 256GB when I start the training, so it always crashes. I am using it in the link prediction with the heterogeneous GraphSAGE model (tutorial colab from DeepSNAP).
I think the problem might be using networkx in the backend. I tried loading the graph with the StellarGraph package via numpy arrays, with are much more efficient. All of the graph loads within a minute, even on a CPU.
Is there any suggestion you have as to how to better load the data into DeepSNAP? Or could you possibly integrate the ogbl-biokg graph as a dataset into your library, considering the ogb package is also part of snap-stanford ? This would be very helpful!
Hi
I want to split a heterogeneous graph into train, dev, and test set for link_pred, but I need to have the positive and negative instances of the edges/links from a specific edge_type. Predicting that kind of edge is the problem description, other edges are just informative. Is there a way to do that with split?
Thanks,
Soha
It is very interesting that the work of deepsnap is to repeatedly convert other datasets into deepsnap datasets, but it does not provide the reading and saving interface of deepsnap itself. I hope you can continue to improve it.
Good work. Some links don't work. Missing files?
Also, it would be good to add documentation about different classes and link them to PyG help. Thanks.
Hello everyone:
I'm trying to learn the examples. All the example can run smoothly except subgraph_matching.
When I run exmaples\subgraph_matching\train.py
but i am getting an error: TypeError: Transform function returns a value of unknown type (<class 'networkx.classes.graph.Graph'>)
the full error is:
Traceback (most recent call last):
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\train_single_proc.py", line 241, in
main()
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\train_single_proc.py", line 225, in main
data_source = data.DataSource(args.dataset)
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\data.py", line 57, in init
self.train, self.test, _ = load_dataset(dataset_name)
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\data.py", line 46, in load_dataset
dataset = dataset.apply_transform(lambda g: g.G.subgraph(max(nx.connected_components(g.G), key=len)))
File "D:\coding\Demo_learn\deepsnap\deepsnap\dataset.py", line 1176, in apply_transform
for graph in self.graphs
File "D:\coding\Demo_learn\deepsnap\deepsnap\dataset.py", line 1176, in
for graph in self.graphs
File "D:\coding\Demo_learn\deepsnap\deepsnap\graph.py", line 1008, in apply_transform
"Transform function returns a value of unknown type "
TypeError: Transform function returns a value of unknown type (<class 'networkx.classes.graph.Graph'>)
I have try, but I did find why this error happened.
I have uesed the example dataset 'enzymes' ‘cox2’,the error above will happen.
When i use the example dataset 'imdb-binary', the error with same with #44, TypeError: zip argument #2 must support iteration
I think maybe networkx and pyg graph have to convert.
I hope to get answer soon.
Hi,I was tring to transfer my deepsnap HeteroGraph object into a pytorch_geometric HeteroData Object, I wonder if there is any easy path to do it?
Hi!
The explanation of forward_op
is a bit confusing and I think that it has some typos.
It is not clear what this means: Given a dictionary input x, it will return a dictionary with the same keys and the values applied by the corresponding values of the module_dict with specified parameters.
"The keys in x are same with the keys in the module_dict." I think that this may mean the keys in x and the module_dict are the same. Or, the keys in x are equal to the keys in the module_dict.
Hello!
Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:
aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)
so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:
I thought it would be simpler to use justaggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)
where self.lin_update
is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels)
and we don't need the linear layers self.lin_neigh
and self.lin_self
anymore?
This represents something like
where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.
In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?
Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!
I am using the cora link prediction as shown in the example colab: https://colab.research.google.com/drive/1ycdlJuse7l2De7wi51lFd_nCuaWgVABc?usp=sharing
Instead of using the cora dataset, I am using a subset of the pokec dataset with 1 million nodes and 10 million relationships. My nodes have two properties, so all in all it should work. My code is basically identical as the example, I only change the input graph that is created from a PyG graph:
args = {
"device" : 'cuda' if torch.cuda.is_available() else 'cpu',
"hidden_dim" : 128,
"epochs" : 50,
}
#pyg_dataset = Planetoid('./tmp/cora', 'Cora')
graph = Graph.pyg_to_graph(pyg_graph)
dataset = GraphDataset(
graph,
task='link_pred',
edge_train_mode="disjoint"
)
datasets = {}
datasets['train'], datasets['val'], datasets['test']= dataset.split(
transductive=True, split_ratio=[0.85, 0.05, 0.1])
input_dim = datasets['train'].num_node_features
num_classes = datasets['train'].num_edge_labels
model = LinkPredModel(input_dim, args["hidden_dim"]).to(args["device"])
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
dataloaders = {split: DataLoader(
ds, collate_fn=Batch.collate([]),
batch_size=1, shuffle=(split=='train'))
for split, ds in datasets.items()}
best_model = train(model, dataloaders, optimizer, args)
log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
best_train_roc = test(best_model, dataloaders['train'], args)
best_val_roc = test(best_model, dataloaders['val'], args)
best_test_roc = test(best_model, dataloaders['test'], args)
print(log.format(best_train_roc, best_val_roc, best_test_roc))
However I get the following error:
RuntimeError Traceback (most recent call last)
in
27 batch_size=1, shuffle=(split=='train'))
28 for split, ds in datasets.items()}
---> 29 best_model = train(model, dataloaders, optimizer, args)
30 log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
31 best_train_roc = test(best_model, dataloaders['train'], args)in train(model, dataloaders, optimizer, args)
9 model.train()
10 optimizer.zero_grad()
---> 11 pred = model(batch)
12 loss = model.loss(pred, batch.edge_label.type(pred.dtype))
13~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []in forward(self, batch)
19
20 nodes_first = torch.index_select(x, 0, edge_label_index[0,:].long())
---> 21 nodes_second = torch.index_select(x, 0, edge_label_index[1,:].long())
22 pred = torch.sum(nodes_first * nodes_second, dim=-1)
23 return predRuntimeError: CUDA out of memory. Tried to allocate 1.75 GiB (GPU 0; 8.00 GiB total capacity; 5.14 GiB already allocated; 281.56 MiB free; 5.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
When I had a similar issue in pytorch geometric, I just added the non_blocking parameter
graph.to(device, non_blocking=True)
but here it doesn't seem to help at all?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.