snap-stanford / deepsnap Goto Github PK

View Code? Open in Web Editor NEW

543.0 60.0 57.0 1.79 MB

Python library assists deep learning on graphs

Home Page: https://snap.stanford.edu/deepsnap/

License: MIT License

Makefile 0.13% Python 99.87%

pytorch deep-learning graph-neural-networks

deepsnap's Introduction

DeepSNAP

Documentation | Examples | Colab Notebooks

DeepSNAP is a Python library to assist efficient deep learning on graphs. DeepSNAP features in its support for flexible graph manipulation, standard pipeline, heterogeneous graphs and simple API.

DeepSNAP bridges powerful graph libraries such as NetworkX and deep learning framework PyTorch Geometric. With an intuitive and easy-than-ever API, DeepSNAP addresses the above pain points:

DeepSNAP currently supports a NetworkX-based backend (also SnapX-based backend for homogeneous undirected graph), allowing users to seamlessly call hundreds of graph algorithms available to manipulate / transform the graphs, even at every training iteration.
DeepSNAP provides a standard pipeline for dataset split, negative sampling and defining node/edge/graph-level objectives, which are transparent to users.
DeepSNAP provides efficient support for flexible and general heterogeneous GNNs, that supports both node and edge heterogeneity, and allows users to control how messages are parameterized and passed.
DeepSNAP has an easy-to-use API that works seamlessly with existing GNN models / datasets implemented in PyTorch Geometric. There is close to zero learning curve if the user is familiar with PyTorch Geometric.

Installation

To install the DeepSNAP, ensure PyTorch Geometric and NetworkX are installed. Then:

$ pip install deepsnap

Or build from source:

$ git clone https://github.com/snap-stanford/deepsnap
$ cd deepsnap
$ pip install .

Example

Examples using DeepSNAP are provided within the code repository.

$ git clone https://github.com/snap-stanford/deepsnap

Node classification:

$ cd deepsnap/examples/node_classification # node classification
$ python node_classification_planetoid.py

Link prediction:

$ cd deepsnap/examples/link_prediction # link prediction
$ python link_prediction_cora.py

Graph classification:

$ cd deepsnap/examples/graph_classification # graph classification
$ python graph_classification_TU.py

Documentation

For comprehensive overview, introduction, tutorial and example, please refer to Full Documentation

deepsnap's People

Contributors

Stargazers

Watchers

deepsnap's Issues

Dimension (Mis)matching - What's the correct usage of DeepSnap Heterogeneous Graph Convolution?

I'm sorry I didn't find a good example code or documentation online that I can follow.

In example code below, how come the GNN didn't require a dimension 200/350/800 input?

from deepsnap.hetero_gnn import HeteroConv, HeteroSAGEConv, forward_op, loss_op
from deepsnap.hetero_graph import HeteroGraph
import networkx as nx
conv1 = {}
conv1[("n1","e0","n0")] = HeteroSAGEConv(800,600,200)
conv1[("n1","e1","n1")] = HeteroSAGEConv(350,600,350)
conv1 = HeteroConv(conv1)
G = nx.DiGraph()
G.add_node("n0", node_type="n0", node_feature=\
                    torch.zeros((1)).float())
G.add_node("n1", node_type="n1", node_feature=\
                    torch.zeros((1)).float())
G.add_edge("n0", "n1", edge_type="e1")
G = HeteroGraph(G)
G = conv1(G, G.edge_index)

Unable to use torch geometric with deepsnap [Unable to cast Python instance to C++ type]

Example code:

import networkx as nx
G = nx.DiGraph()
G.add_node(0, node_type='n1', node_label=1, node_feature=torch.Tensor([0.1, 0.2, 0.3, 0.7]))
G.add_node(1, node_type='n1', node_label=1, node_feature=torch.Tensor([0.2, 0.3, 0.4, 0.4]))
G.add_node(2, node_type='n2', node_label=0, node_feature=torch.Tensor([0.3, 0.4, 0.5]))
G.add_edge(0, 1, edge_type='e1', edge_feature=torch.Tensor([0.1,0.1]))
G.add_edge(0, 2, edge_type='e1', edge_feature=torch.Tensor([0.1,0.1]))
G.add_edge(1, 2, edge_type='e2', edge_feature=torch.Tensor([0.1,0.1, 0.1]))
H = HeteroGraph(G)
H = Batch.from_data_list([H]).to("cuda")

import torch_geometric.nn as pyg_nn
nf, batch = H.node_feature, H.batch
H = pyg_nn.global_add_pool(nf, batch)

AttributeError: module 'deepsnap' has no attribute '_netlib'

Hi,

I installed DeepSnap using:

$ git clone https://github.com/snap-stanford/deepsnap
$ cd deepsnap
$ pip install .

When I try to run :

root = './tmp/cora'
name = 'Cora'

# The Cora dataset
pyg_dataset= Planetoid(root, name)

# PyG dataset to a list of deepsnap graphs
graphs = GraphDataset.pyg_to_graphs(pyg_dataset)

# Get the first deepsnap graph (CORA only has one graph)
graph = graphs[0]
print(graph)

It throws the error: AttributeError: module 'deepsnap' has no attribute '_netlib'

The full error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4896/229941490.py in <module>
      6 
      7 # Convert to a list of deepsnap graphs
----> 8 graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
      9 
     10 # Convert list of deepsnap graphs to deepsnap dataset with specified task=graph

~\anaconda3\envs\dev\lib\site-packages\deepsnap\dataset.py in pyg_to_graphs(dataset, verbose, fixed_split, tensor_backend, netlib)
   1274             return graphs_split
   1275         else:
-> 1276             return [
   1277                 Graph.pyg_to_graph(
   1278                     data, verbose=verbose,

~\anaconda3\envs\dev\lib\site-packages\deepsnap\dataset.py in <listcomp>(.0)
   1275         else:
   1276             return [
-> 1277                 Graph.pyg_to_graph(
   1278                     data, verbose=verbose,
   1279                     tensor_backend=tensor_backend,

~\anaconda3\envs\dev\lib\site-packages\deepsnap\graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
   1994                 G = deepsnap._netlib.DiGraph()
   1995             else:
-> 1996                 G = deepsnap._netlib.Graph()
   1997             G.add_nodes_from(range(data.num_nodes))
   1998             G.add_edges_from(data.edge_index.T.tolist())

AttributeError: module 'deepsnap' has no attribute '_netlib'

Is there any workaround or a solution to solve this error?

Note: The code that I am trying to run can be found in the 62nd executable cell of this notebook

Library Versions:
PyTorch: 1.9.0
PyTorch Geometric: 1.7.2
NetworkX: 2.6.1

Thank you :)

heterogeneous graphs for node classification with inductive split

Hi,

I am trying to reproduce the examples for heterogeneous node classification but with multiples graphs, i.e. with an inductive split.
Is there any resource to implement a model with type of dataset ? Or can you provide any guidance to implement such a model.

Thanks in advance.
Armand

Default 'remove_self_loops' set to True in heterogeneous link prediction example problematic for use case with multiple node types?

Hello,
I am working on setting up link prediction on a heterogeneous graph with different node types and used the examples in deepsnap/examples/link_prediction_hetero as a starting point. In these examples, the HeteroSAGEConv layer is used, with its default of setting 'remove_self_loops' to true.

Could this be problematic when I have two different node types, as the edge index for the links between these could have links between, for example, node 0 of one node type and node 0 of the other node type? In fact, this is what happened for me.

As far as I understood the negative sampling algorithm in the hetero_graph/negative_sampling() method expects the indices for both node types to start from zero, rather than have unique indices, so the solution can't be to create unique node indices across the different node types?

If I am correct, it may be worth adding a comment on the need to remove self-loops in careful pre-processing instead of using the default functionality in the HeteroSAGEConv layer?
Another option could be to add a check if the source and destination node type are the same in the HeteroSAGEConv implementation, before running pyg_utils.remove_self_loops(edge_index)? I guess it doesn't really make sense to remove self loops in the case where the source and destination node type is different anyway?

If I have misunderstood the situation, I would very much appreciate to be corrected!

Thank you!

Some questions of Examples/subgraph_matching

Hello everyone:

I'm trying to learn the examples. All the example can run smoothly except subgraph_matching.
When I run exmaples\subgraph_matching\train.py

but i am getting an error: TypeError: Transform function returns a value of unknown type (<class 'networkx.classes.graph.Graph'>)
the full error is:
Traceback (most recent call last):
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\train_single_proc.py", line 241, in
main()
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\train_single_proc.py", line 225, in main
data_source = data.DataSource(args.dataset)
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\data.py", line 57, in init
self.train, self.test, _ = load_dataset(dataset_name)
File "D:\coding\Demo_learn\deepsnap\examples\subgraph_matching\data.py", line 46, in load_dataset
dataset = dataset.apply_transform(lambda g: g.G.subgraph(max(nx.connected_components(g.G), key=len)))
File "D:\coding\Demo_learn\deepsnap\deepsnap\dataset.py", line 1176, in apply_transform
for graph in self.graphs
File "D:\coding\Demo_learn\deepsnap\deepsnap\dataset.py", line 1176, in
for graph in self.graphs
File "D:\coding\Demo_learn\deepsnap\deepsnap\graph.py", line 1008, in apply_transform
"Transform function returns a value of unknown type "
TypeError: Transform function returns a value of unknown type (<class 'networkx.classes.graph.Graph'>)

I have try, but I did find why this error happened.
I have uesed the example dataset 'enzymes' ‘cox2’，the error above will happen.
When i use the example dataset 'imdb-binary', the error with same with #44, TypeError: zip argument #2 must support iteration

I think maybe networkx and pyg graph have to convert.
I hope to get answer soon.

Hi，I was tring to transfer my deepsnap HeteroGraph object into a pytorch_geometric HeteroData Object, I wonder if there is any easy path to do it?

RuntimeError in dataset.split

Hi,

I encountered a runtime error while splitting a dataset for link prediction tasks (see code below). This seems to be a bug for python 3.8 only, the code runs without error on python 3.7.4.

Thanks!

Environment:

Python 3.8.5
Deepsnap 0.1.1
networkx 2.5

Code run:

import networkx as nx
from deepsnap.graph import Graph
from deepsnap.dataset import GraphDataset

G = nx.complete_graph(100)
H1 = Graph(G)
H2 = H1.clone()
dataset = GraphDataset(graphs=[H1, H2], task='link_pred') # modified to link_prediction task.

train, val, test = dataset.split(transductive=True, split_ratio=[0.8, 0.1, 0.1])
print(train, val, test)

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-70292612295a> in <module>
      8 dataset = GraphDataset(graphs=[H1, H2], task='link_pred')
      9
---> 10 train, val, test = dataset.split(transductive=True, split_ratio=[0.8, 0.1, 0.1])
     11 print(train, val, test)

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/dataset.py in split(self, transductive, split_ratio, split_types)
    591         if transductive and self.task != 'graph':
    592             dataset_return = \
--> 593                 self._split_transductive(split_ratio, split_types)
    594         else:
    595             dataset_return = \

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/dataset.py in _split_transductive(self, split_ratio, split_types)
    450                 for graph_temp in dataset_current.graphs:
    451                     if type(graph_temp) == Graph:
--> 452                         graph_temp._create_neg_sampling(
    453                             self.edge_negative_sampling_ratio)
    454                     elif type(graph_temp) == HeteroGraph:

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/graph.py in _create_neg_sampling(self, negative_sampling_ratio, resample)
    911         if (len(edge_index_all) > 0):
    912             negative_edges = \
--> 913                 self.negative_sampling(edge_index_all, self.num_nodes, num_neg_edges)
    914         else:
    915             return torch.LongTensor([])

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/deepsnap/graph.py in negative_sampling(edge_index, num_nodes, num_neg_samples)
   1083             rest = rest[mask.nonzero().view(-1)]
   1084
-> 1085         row = perm / num_nodes
   1086         col = perm % num_nodes
   1087         neg_edge_index = torch.stack([row, col], dim=0).long()

RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

How to split a heterogeneous graph for link_pred targeting specific edge_type?

Hi
I want to split a heterogeneous graph into train, dev, and test set for link_pred, but I need to have the positive and negative instances of the edges/links from a specific edge_type. Predicting that kind of edge is the problem description, other edges are just informative. Is there a way to do that with split?

Thanks,
Soha

what is deepsnap.dataset.EnsembleGenerator?

I couldn't find the detailed tutorial of EnsembleGenerator in your document.
Could you give an example of how EnsembleGenerator works?
What is the meaning of each parameter?

Explanation of forward_op in deepsnap.hetero_gnn

Hi!
The explanation of forward_op is a bit confusing and I think that it has some typos.

It is not clear what this means: Given a dictionary input x, it will return a dictionary with the same keys and the values applied by the corresponding values of the module_dict with specified parameters.

"The keys in x are same with the keys in the module_dict." I think that this may mean the keys in x and the module_dict are the same. Or, the keys in x are equal to the keys in the module_dict.

Inductive Link Prediction graph copying

Hi, in the Inductive Link Prediction Split section of this colab notebook, I see we're making copies of the input graph to help set up the train / val / test splits.

This technique is echoed in this example as well but it seems to be more explicitly implemented to support learning on multiple graphs.

I'd like to use DeepSNAP's functionalities to manage train/val/test splits for edge prediction on a large graph (100MM+ nodes) and it seems strange/undesirable to have to create copies in this way. Is this truly the case or am I missing something? Does DeepSNAP allow inductive learning on single graphs?

import networkx as nx
from copy import deepcopy
from deepsnap.graph import Graph
from deepsnap.dataset import GraphDataset



run_transductive = False
n_copies_for_inductive = 10


myG = nx.complete_graph(50)
snap_graph = Graph(myG)


if not run_transductive:
    graphs = [copy.deepcopy(snap_graph) for _ in range(n_copies_for_inductive)]
else:
    graphs = [snap_graph]
    
    
dataset = GraphDataset(
    graphs,
    task='link_pred',
    edge_negative_sampling_ratio=1,
    edge_train_mode='disjoint'
)    

train, val, test = dataset.split(transductive=run_transductive, split_ratio=[0.85, 0.05, 0.1])

Converting PyG 2.4.0 dataset to a list of deepsnap graphs no longer works!

Hello,

After upgrading from PyG 2.3.x to PyG 2.4.0, the keys property of torch_geometric.data.data.BaseData was refactored into a method, leading to the following error when calling GraphDataset.pyg_to_graphs

File [.../lib/python3.10/site-packages/deepsnap/dataset.py:1277](https://file+.vscode-resource.vscode-cdn.net/.../lib/python3.10/site-packages/deepsnap/dataset.py:1277), in <listcomp>(.0)
   1274     return graphs_split
   1275 else:
   1276     return [
...
   1975     data.edge_attr if "edge_attr" in data.keys else None
   1976 )
   1977 kwargs["node_label"], kwargs["edge_label"] = None, None

TypeError: argument of type 'method' is not iterable

You can use this sample to replicate the issue:

from deepsnap.dataset import GraphDataset
from torch_geometric.datasets import Planetoid

root = './tmp/cora'
name = 'Cora'
pyg_dataset= Planetoid(root, name)
graphs = GraphDataset.pyg_to_graphs(pyg_dataset)

Could someone please have a look?

Thank you in advance!

Sebastian

None type error

In the constructor of the HeteroSAGE class, even if self.in_channels_self has been initialized to in_channels_neigh in case of None input, self.lin_self = nn.Linear(in_channels_self, out_channels) does not use the object parameter.

Therefore, the code returns the following error: TypeError new(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2

class HeteroSAGEConv(pyg_nn.MessagePassing):
    r"""The heterogeneous compitable GraphSAGE operator is derived from the `"Inductive Representation
    Learning on Large Graphs" <https://arxiv.org/abs/1706.02216>`_, `"Modeling polypharmacy side
    effects with graph convolutional networks" <https://arxiv.org/abs/1802.00543>`_ and `"Modeling
    Relational Data with Graph Convolutional Networks" <https://arxiv.org/abs/1703.06103>`_ papers.
    Args:
        in_channels_neigh (int): The input dimension of the end node type.
        out_channels (int): The dimension of the output.
        in_channels_self (int): The input dimension of the start node type.
            Default is `None` where the `in_channels_self` is equal to `in_channels_neigh`.
    """
    def __init__(self, in_channels_neigh, out_channels, in_channels_self=None):
        super(HeteroSAGEConv, self).__init__(aggr='add')
        self.in_channels_neigh = in_channels_neigh
        if in_channels_self is None:
            self.in_channels_self = in_channels_neigh
        else:
            self.in_channels_self = in_channels_self
        self.out_channels = out_channels
        self.lin_neigh = nn.Linear(in_channels_neigh, out_channels)
        self.lin_self = nn.Linear(in_channels_self, out_channels)
        self.lin_update = nn.Linear(out_channels * 2, out_channels)

temporal link prediction

Does deep snap offer any methods for temporal link prediction?

How to pre-split negative edges in Link Prediction tasks?

It may happen that the absence of an edge in a graph is due to the absence of knowledge.

It may be useful to add the possibility to pass True Negative edges to the graph constructor.

Links not working

Good work. Some links don't work. Missing files?
Also, it would be good to add documentation about different classes and link them to PyG help. Thanks.

Using the ogbl-biokg with DeepSNAP

Hi there!

I was just trying out to use the ogbl-biokg graph with DeepSNAP, more precisely using it as input for the link_prediction.py for heterogeneous graphs. Since deepSNAP requires a networkx or pytorch geometric object, I tried to convert the ogbl biokg graph into a pytorch geometric object and then to transform it to a HeteroGraph, as you point out in the tutorial here.

Yet, when I did that it threw an error since the graph would not have an 'edge_index':

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     47         try:
---> 48             return self[key]
     49         except KeyError:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getitem__(self, key)
     67     def __getitem__(self, key: str) -> Any:
---> 68         return self._mapping[key]
     69 

KeyError: 'edge_index'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_42299/3068006995.py in <module>
----> 1 graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/deepsnap/graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
   1991             if netlib is not None:
   1992                 deepsnap._netlib = netlib
-> 1993             if data.is_directed():
   1994                 G = deepsnap._netlib.DiGraph()
   1995             else:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_directed(self)
    184     def is_directed(self) -> bool:
    185         r"""Returns :obj:`True` if graph edges are directed."""
--> 186         return not self.is_undirected()
    187 
    188     def clone(self):

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_undirected(self)
    180     def is_undirected(self) -> bool:
    181         r"""Returns :obj:`True` if graph edges are undirected."""
--> 182         return all([store.is_undirected() for store in self.edge_stores])
    183 
    184     def is_directed(self) -> bool:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in <listcomp>(.0)
    180     def is_undirected(self) -> bool:
    181         r"""Returns :obj:`True` if graph edges are undirected."""
--> 182         return all([store.is_undirected() for store in self.edge_stores])
    183 
    184     def is_directed(self) -> bool:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in is_undirected(self)
    395             return value.is_symmetric()
    396 
--> 397         edge_index = self.edge_index
    398         edge_attr = self.edge_attr if 'edge_attr' in self else None
    399         return is_undirected(edge_index, edge_attr, num_nodes=self.size(0))

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     48             return self[key]
     49         except KeyError:
---> 50             raise AttributeError(
     51                 f"'{self.__class__.__name__}' object has no attribute '{key}'")
     52 

AttributeError: 'GlobalStorage' object has no attribute 'edge_index'

How can I convert the ogbl-biokg graph into an object that can be used with deepSNAP?

I would very much appreciate any help!

How to pre-split the edges in Link Prediction tasks?

A person may be interested in using a fixed split for link prediction tasks (e.g. given with the dataset).

In these cases, it may be useful to allow the user to pass a NumPy array containing the list of the edges to use in the training/validation/test set.

How to customize own data

Hello, i ran a lot of examples in this repo, but did not mention how my data should be modeled into a graph.
For example, how is a MultiDiGraph such as WN18.gpickle established step by step, could you provide a sample code?

Switch to GitHub Actions

I am taking CS224W, just saw this project : )

Just a suggestion switching from Travis to GitHub Actions when this project is still young (later will take more effects), because

Due to recent update, Travis added many limitations and also it is slow.
GitHub is free, faster, and support "apps" through Marketplace making it much more powerful.

Here is speed comparison for one of my repo between Travis and GitHub Actions:

1. Travis

2. GitHub Actions

AttributeError: 'HeteroGraph' object has no attribute 'node_to_tensor_mapping'

hello, when i call:
hete = HeteroGraph(G_orig)
i get an answer
AttributeError: 'HeteroGraph' object has no attribute 'node_to_tensor_mapping'

the graph is very simple - two types of vertices

HeteroSAGEConv - Issue with edge_index in message_passing when selecting indices from data

I have a graph with two different types of nodes (user and doc). The edge_index contains edges between users and also users and docs. In message_passing, node_feature_neigh is getting assigned to data (data = kwargs.get(arg[:-2], inspect.Parameter.empty)) when arg is 'node_feature_neigh_j'. However, later, index_select is used to select rows from data using node_id in edge_index which can be either a user or a doc. So the issue is that data at this point contains only the features for doc nodes but edge_index contains node_ids from users as well and therefore, I get index out of range exception.

  File "...torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "...deepsnap/deepsnap/hetero_gnn.py", line 135, in forward
    self.convs[message_key](
  File "...torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "...deepsnap/deepsnap/hetero_gnn.py", line 44, in forward
    return self.propagate(
  File "...torch_geometric/nn/conv/message_passing.py", line 250, in propagate
    kwargs = self.__collect__(edge_index, size, mp_type, kwargs)
  File "...torch_geometric/nn/conv/message_passing.py", line 147, in __collect__
    out[arg] = data.index_select(self.node_dim,
IndexError: index out of range in self

In the provided example for heterogenous graph, cora and citseer are combined in one graph where two graphs are isolated and each have their own node and edge types and I guess that's why this code works.

I would really appreciate any help with this. Happy to elaborate more if necessary.

Converting deepsnap HeteroGraph object into a pytorch_geometric HeteroData Object

Hi，I was tring to transfer my deepsnap HeteroGraph object into a pytorch_geometric HeteroData Object, I wonder if there is any easy path to do it?

Which paper does SkipLastGNN come from?

Citing

Could you add a bibentry so we can cite the framework?

The edge_index in HeteoGraph with nx.Graph is not correct when running in message_passing

the order with source node and target node will exchange.

`
if self.isdirect():
if isinstance(edge_index, dict):
for key in edge_index:
edge_index[key] = torch.cat(
[edge_index[key], torch.flip(edge_index[key], [1])], dim=0)

if isinstance(edge_index, dict):
for key in edge_index:
permute_tensor = edge_index[key].permute(1, 0)
source_node_index =
_convert_to_tensor_index(permute_tensor[0])
target_node_index =
_convert_to_tensor_index(permute_tensor[1])
edge_index[key] =
torch.stack([source_node_index, target_node_index])`

'HeteroGraph' object has no attribute 'custom'

While running the Heterograph link prediction code, I am getting this issue.

What does gengraph do?

What does gengraph.py do in examples/syn/? Is this a graph generator? What paper/method is this based on?

Architecture of the aggregation in HeteroSAGEConv?

Hello!
Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:

aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:

$W_{y} \begin{bmatrix} W_{u}x_{u}+b_{u} \\W_{\nu}x_{\nu}+b_{\nu} \end{bmatrix} + b_{y}$

I thought it would be simpler to use just

aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

where self.lin_update is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels) and we don't need the linear layers self.lin_neigh and self.lin_self anymore?

This represents something like

$W_{y}\prime CONCAT(x_{u}, x_\nu) +b_{y}\prime,$

where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.

In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?

Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!

Allow the class inheritance

A person may be interested in extending the Graph classes with new features, however, the strict check over the types forces the final user to modify a lot of classes.

An alternative may be to use the function issubclass.

if type(graph) == Graph:
    graphs_split = graph.split(self.task, split_ratio)
elif type(graph) == HeteroGraph:
    graphs_split = graph.split(
                    task=self.task,
                    split_types=split_types,
                    split_ratio=split_ratio,
                    edge_split_mode=self.edge_split_mode)
else:
    raise TypeError('element in self.graphs of unexpected type')

Running node_classification_acm.py throws an error

Built from source using

$ git clone https://github.com/snap-stanford/deepsnap $ cd deepsnap $ pip install .

but running

python3 node_classification_acm.py

throws the following error

Traceback (most recent call last): File "node_classification_acm.py", line 278, in <module> loss = train(model, optimizer, hetero_graph, train_idx) File "node_classification_acm.py", line 182, in train preds = model(hetero_graph.node_feature, hetero_graph.edge_index) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "node_classification_acm.py", line 163, in forward x = forward_op(x, self.bns1) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/deepsnap/hetero_gnn.py", line 245, in forward_op res[key] = module_dict(x[key], **kwargs) File "/home/youngwook/anaconda3/envs/torch_graphs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) TypeError: forward() takes 1 positional argument but 2 were given

Not sure if these info are relevant but

`deepsnap.version
'0.2.0'
torch_geometric.version
'1.7.2'

Performance issues with ogbl-biokg graph in DeepSNAP

Hello,

I am trying to use the ogbl-biokg (docs | github) with the DeepSNAP package. The graph has 5.088.434 edges and 93.773 nodes. I created a custom dataset (link to the code), but I have massive performance issues.

The problem is that it takes more than 30 min for the graph to process and generate the HeteroGraph object:

hetero = HeteroGraph(G)

And that the memory consumption is too much, even for a node with 256GB when I start the training, so it always crashes. I am using it in the link prediction with the heterogeneous GraphSAGE model (tutorial colab from DeepSNAP).
I think the problem might be using networkx in the backend. I tried loading the graph with the StellarGraph package via numpy arrays, with are much more efficient. All of the graph loads within a minute, even on a CPU.

Is there any suggestion you have as to how to better load the data into DeepSNAP? Or could you possibly integrate the ogbl-biokg graph as a dataset into your library, considering the ogb package is also part of snap-stanford ? This would be very helpful!

CUDA out of memory

I am using the cora link prediction as shown in the example colab: https://colab.research.google.com/drive/1ycdlJuse7l2De7wi51lFd_nCuaWgVABc?usp=sharing

Instead of using the cora dataset, I am using a subset of the pokec dataset with 1 million nodes and 10 million relationships. My nodes have two properties, so all in all it should work. My code is basically identical as the example, I only change the input graph that is created from a PyG graph:

args = {
    "device" : 'cuda' if torch.cuda.is_available() else 'cpu',
    "hidden_dim" : 128,
    "epochs" : 50,
}

#pyg_dataset = Planetoid('./tmp/cora', 'Cora')
graph = Graph.pyg_to_graph(pyg_graph)

dataset = GraphDataset(
        graph,
        task='link_pred',
        edge_train_mode="disjoint"
    )
datasets = {}
datasets['train'], datasets['val'], datasets['test']= dataset.split(
            transductive=True, split_ratio=[0.85, 0.05, 0.1])
input_dim = datasets['train'].num_node_features
num_classes = datasets['train'].num_edge_labels

model = LinkPredModel(input_dim, args["hidden_dim"]).to(args["device"])

optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

dataloaders = {split: DataLoader(
            ds, collate_fn=Batch.collate([]),
            batch_size=1, shuffle=(split=='train'))
            for split, ds in datasets.items()}
best_model = train(model, dataloaders, optimizer, args)
log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
best_train_roc = test(best_model, dataloaders['train'], args)
best_val_roc = test(best_model, dataloaders['val'], args)
best_test_roc = test(best_model, dataloaders['test'], args)
print(log.format(best_train_roc, best_val_roc, best_test_roc))

However I get the following error:

RuntimeError Traceback (most recent call last)
in
27 batch_size=1, shuffle=(split=='train'))
28 for split, ds in datasets.items()}
---> 29 best_model = train(model, dataloaders, optimizer, args)
30 log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
31 best_train_roc = test(best_model, dataloaders['train'], args)

in train(model, dataloaders, optimizer, args)
9 model.train()
10 optimizer.zero_grad()
---> 11 pred = model(batch)
12 loss = model.loss(pred, batch.edge_label.type(pred.dtype))
13

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

in forward(self, batch)
19
20 nodes_first = torch.index_select(x, 0, edge_label_index[0,:].long())
---> 21 nodes_second = torch.index_select(x, 0, edge_label_index[1,:].long())
22 pred = torch.sum(nodes_first * nodes_second, dim=-1)
23 return pred

RuntimeError: CUDA out of memory. Tried to allocate 1.75 GiB (GPU 0; 8.00 GiB total capacity; 5.14 GiB already allocated; 281.56 MiB free; 5.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

When I had a similar issue in pytorch geometric, I just added the non_blocking parameter

graph.to(device, non_blocking=True)
but here it doesn't seem to help at all?

cannot import name 'container_abcs' from 'torch._six'

hetero_gnn.py imports container_abcs from torch._six, which has been removed in later versions of pytorch. Should be updated to use collections.abc directly, since container_abcs was just a wrapper for this module.

How to generate WN18.gpickle or How to format own KGs

I created a KG and intended to split them into train/val/test without negative triples. I saved my KG as a multidigraph (networkx) and set resample_negatives=False. However, the splitting graphs still contain many non-existing edges. But the data "WN18.gpickle" you provided did not have this issue. Could you please tell me how you generate the WN18.gpickle? Thank you very much for your help.

Error in num_nodes

Hi,

I'm getting an error in the function num_nodes because sometimes self.G is a list.

deepsnap/deepsnap/graph.py

Line 220 in 6197dce

def num_nodes(self) -> int:

Would it possible to update such function and just return the length of the attribute G.nodes? Something like this

  @property
  def num_nodes(self) -> int:
      r"""
      Return number of nodes in the graph.

      Returns:
          int: Number of nodes in the graph.
      """
      G = None
      if isinstance(self.G, list):
          G = self.G[0]
      else:
          G = self.G

      return len(G.nodes)

Thanks in advance

How to convert PyG heterograph to DeepSnap heterograph?

I made a PyG heterograph as below.

cipdata
Out[145]: 
HeteroData(
  card={ x=[13752, 302] },
  user={ x=[10996, 304] },
  (user, solved, card)={
    edge_index=[2, 28992],
    edge_attr=[28992]
  },
  (card, rev_solved, user)={
    edge_index=[2, 28992],
    edge_attr=[28992]
  }
)

Then I tried to convert this cipdata PyG heterograph into DeepSnap's heterograph

from deepsnap.hetero_graph import HeteroGraph
graph = HeteroGraph(cipdata)

Then I get this error

graph = HeteroGraph(cipdata)
Traceback (most recent call last):

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 48, in __getattr__
    return self[key]

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 68, in __getitem__
    return self._mapping[key]

KeyError: 'number_of_nodes'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-147-5bdfc5ffc776>", line 1, in <module>
    graph = HeteroGraph(cipdata)

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/hetero_graph.py", line 94, in __init__
    self._update_tensors(init=True)

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/graph.py", line 515, in _update_tensors
    self._update_attributes()

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/deepsnap/graph.py", line 531, in _update_attributes
    if self.G.number_of_nodes() == 0:

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/hetero_data.py", line 118, in __getattr__
    return getattr(self._global_store, key)

  File "/home/hojun/anaconda3/envs/geometric/lib/python3.6/site-packages/torch_geometric/data/storage.py", line 51, in __getattr__
    f"'{self.__class__.__name__}' object has no attribute '{key}'")

AttributeError: 'BaseStorage' object has no attribute 'number_of_nodes'

How do I easily convert pyg heterograph to deepsnap one?

I wish some detailed example code can be provided.

Negative sampling in HeteroGraph

Hi there!

I would like to do negative sampling in training with a HeteroGraph for link prediction, and I am unsure how to retrieve the negative edges and their scores. I've been following the link_prediction.py script mostly. What I have done so far is the following:

# Create a dataset with negative sampling enabled
dataset = GraphDataset(
            [hetero],
            task='link_pred',
            edge_train_mode=edge_train_mode,
            edge_negative_sampling_ratio=1,
        )


...
# Training
pred = model(batch)

The output of pred and batch.edge_index looks like this:

batch.edge_index
{('n4', '0', 'n1'): tensor([[1, 3, 4, 7,...4, 7, 8]]), ('n1', '1', 'n0'): tensor([[12, 13, 17,...  8,  9]]), ('n4', '2', 'n4'): tensor([[10, 14, 18,... 23, 25]]), ('n3', '3', 'n3'): tensor([[ 0,  6, 10,... 15, 17]]), ('n0', '4', 'n5'): tensor([[13, 15, 17,...  8,  9]])}

pred
{('n4', '0', 'n1'): tensor([-1.4144e-06,...ackward1>), ('n1', '1', 'n0'): tensor([-1.6777e-06,...ackward1>), ('n4', '2', 'n4'): tensor([6.2179, 6.21...ackward1>), ('n3', '3', 'n3'): tensor([ 10.5874,  1...ackward1>), ('n0', '4', 'n5'): tensor([-1.5251, -1....ackward1>)}

How do I know which of these edges are the positive and negative ones? Or are these only positive ones and I have to generate the negative ones?

I'd be thankful for any help !!

Best,

Sophia

Difference between edge_index and edge_label_index

Hello. I know what edge_index is but I am still struggling with edge_label_index.

Could you please make it clear with some examples and detailed explanation?

Thank you in advance.

Deepsnap doesnt have neighbor sampler?

To fit the GPU memory, DGL and PyG use neighboer sampler. I searched and found no information about neighbor samplers. So there is none? If none, would you make in near future?

proposal

It is very interesting that the work of deepsnap is to repeatedly convert other datasets into deepsnap datasets, but it does not provide the reading and saving interface of deepsnap itself. I hope you can continue to improve it.

CORA Node Classification example fails on old cpus

When the CORA Node Classification example is run on old cpus (eg. madmax), the line that loads the dataset pyg_dataset = Planetoid('./cora', name, transform=T.TargetIndegree()), throws a Illegal instruction (core dumped) error

Outdated PyPI version causing container_abcs ImportError

When importing Graph from deepsnap.graph, I get the following error:

ImportError                               Traceback (most recent call last)
/tmp/ipykernel_119/3679505279.py in <module>
      1 import torch.optim as optim
      2 from torch_geometric.data import DataLoader
----> 3 from deepsnap.graph import Graph
      4 from deepsnap.batch import Batch
      5 from deepsnap.dataset import GraphDataset

/opt/conda/lib/python3.9/site-packages/deepsnap/__init__.py in <module>
      7 import deepsnap.batch
      8 import deepsnap.hetero_graph
----> 9 import deepsnap.hetero_gnn
     10 
     11 import networkx as _netlib

/opt/conda/lib/python3.9/site-packages/deepsnap/hetero_gnn.py in <module>
      5 
      6 from torch import Tensor
----> 7 from torch._six import container_abcs
      8 from torch_geometric.nn.inits import reset
      9 from torch_sparse import matmul

ImportError: cannot import name 'container_abcs' from 'torch._six' (/opt/conda/lib/python3.9/site-packages/torch/_six.py)

After some looking, I saw that this was fixed in this commit on June 19th, but the latest update to PyPI was on April 7, so it still persists when pulling using pip. Could a new version be pushed to PyPI?

Thanks!

Heterogeneous link prediction with inductive split

Hello,

I am trying to reproduce the examples for heterogeneous link prediction but using multiples graphs, i.e. with inductive split.
Is there any resources or guidance to implement a model with this type of dataset?

Thank you!
Laura

Illegal instruction (core dumped)

I installed deepsnap on a remote machine using pip, with the below dependencies first:

PyTorch --> https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
NetworkX --> https://github.com/networkx/networkx

But when I import:

from deepsnap.graph import Graph

I get:
Illegal instruction (core dumped)

Thank you!

Slow dataloading due to NetworkX graph being collated

I am converting a custom PyG dataset in the following manner

pyg_dataset = GraphCountDataset(
    dataset_path, name="subgraph_counting", task_label_idx=self.task_label_index[task_key]
)

mode_idxs = pyg_dataset.separate_data(seed=None, fold_idx=None)

deepsnap_graphs = GraphDataset.pyg_to_graphs(pyg_dataset)
dataset = GraphDataset(deepsnap_graphs, task="graph")

and then using a DataLoader with the collate_fn provided by deepSNAP. I found out the overall pipeline to be incredibly slower after the conversion, and found out that it was due to the NetworkX graph being collated along with the rest of the data.
I have currently resolved my issue by wrapping the collate_fn in this function where I am removing unwanted keys, such as G or any other data that I won't be using in the pipeline.

def from_data_list_ignore_keys(
    data_list: List[Graph],
    keys_to_ignore: List[str] = None,
    follow_batch: List = None,
    transform: Callable = None,
    **kwargs
):
    if keys_to_ignore is not None:
        for key in keys_to_ignore:
            for data in data_list:
                data[key] = None

    return Batch.from_data_list(data_list=data_list, follow_batch=follow_batch, transform=transform, **kwargs)

Am I doing something wrong or would it be reasonable to integrate the functionality to choose the keys to collate directly into the library?

How to convert pyTorch-Geometric custom datasets into DeepSnap datasets?

Hello,

I am trying to use a custom dataset for link prediction,
What i tried was

pyg_dataset = My_Own_Dataset()
graphs1 = GraphDataset.pyg_to_graphs(pyg_dataset) #error here

but i am getting an error: TypeError: zip argument #2 must support iteration

The full error is

/usr/local/lib/python3.7/dist-packages/deepsnap/dataset.py in pyg_to_graphs(dataset, verbose, fixed_split, tensor_backend, netlib)
   1280                     netlib=netlib
   1281                 )
-> 1282                 for data in dataset
   1283             ]
   1284 

/usr/local/lib/python3.7/dist-packages/deepsnap/dataset.py in <listcomp>(.0)
   1280                     netlib=netlib
   1281                 )
-> 1282                 for data in dataset
   1283             ]
   1284 

/usr/local/lib/python3.7/dist-packages/deepsnap/graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
   2025             if Graph._is_node_attribute(key):
   2026                 if not tensor_backend:
-> 2027                     Graph.add_node_attr(G, key, value)
   2028                 else:
   2029                     attributes[key] = value

/usr/local/lib/python3.7/dist-packages/deepsnap/graph.py in add_node_attr(G, attr_name, node_attr)
   1909         # TODO: Better method here?
   1910         node_list = list(G.nodes)
-> 1911         attr_dict = dict(zip(node_list, node_attr))
   1912         deepsnap._netlib.set_node_attributes(G, attr_dict, name=attr_name)
   1913 

TypeError: zip argument #2 must support iteration

I have also tried to do this using networkx and converting it to a pyg graph and converting from there, in that case I get a different error .

This error doesn't happen when I am using a graph in Planetoid as in the Link prediction with DeepSnap example colab notebook.

What could be causing this problem? Is there a guide on how I can use custom data on DeepSnap?

Thank you for your help!

snap-stanford / deepsnap Goto Github PK

deepsnap's Introduction

DeepSNAP

Installation

Example

Documentation

deepsnap's People

Contributors

Stargazers

Watchers

Forkers

deepsnap's Issues

Recommend Projects

Recommend Topics

Recommend Org