Hi there! I was just trying out to use the ogbl-biokg graph with Dee

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I just ran this and I'm getting the same error (I added the line <code class="notransl

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Using the ogbl-biokg with DeepSNAP about deepsnap HOT 4 OPEN

snap-stanford commented on August 15, 2024

Using the ogbl-biokg with DeepSNAP

from deepsnap.

Comments (4)

anniekmyatt commented on August 15, 2024

Hello! I'm not a maintainer of the deepsnap package but I would be happy to try and help. Would you mind posting the code that you used, that resulted in the error message, so it is easier to spot what went wrong?

I wonder whether what happened is that you created a heterogeneous graph object with v2 of pytorch-geometric, which might not yet be supported by deepsnap? If you have a pytorch-geometric HeteroData graph object, the edges of the different types are already stored in a dictionary, so then it wouldn't have an edge_index entry (but rather an edge_index_dict) and that migth be what the error message is complaining about?

A question for the developers of deepsnap: are you planning to update deepsnap to be compatible with HeteroData objects from pytorch-geometric >=v2? Can you share something about the roadmap for deepsnap in general?

from deepsnap.

sophiakrix commented on August 15, 2024

Hi @anniekmyatt !
Thanks for chipping in on this. There are only a few lines of code I used for this:

from ogb.linkproppred import PygLinkPropPredDataset

ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])

from deepsnap.

anniekmyatt commented on August 15, 2024

I just ran this and I'm getting the same error (I added the line from deepsnap.graph import Graph though).

This error occurs because the ogbl_biokg_dataset has an edge_index_dict rather than a single edge_index attribute. OGB uses this edge_index_dict dictionary to specify the edges for the different edge types. If you are keen to use deepsnap, rather than pytorch-geometric directly, it seems like you need to manually create the hetero graph object like here. However, the ogbl_biokg_dataset consists only of triplets, it doesn't have node features so you'll have to create some appropriate (or placeholder) features for the node_feature input. To create the deepsnap heterograph object your code would look something like this:

dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = dataset[0] 
hetero_graph = HeteroGraph(
     node_feature=<insert your node features here,>,
     edge_index=graph.edge_index_dict, # Note that this is a dictionary with edge index for each edge type
     directed=True)

About the node features: this should be a dictionary with keys of each node type (e.g. disease, drug...) and as values a torch tensor of dimension (number_of_nodes, number_of_features_per_node).

I am curious which deepsnap functionality specifically you would like to use? For an RDF graph like this (without node features), wouldn't a package like DGL-KE be more helpful as it has lots of embedding functionality that doesn't rely on message passing of node features?

from deepsnap.

sophiakrix commented on August 15, 2024

Hi @anniekmyatt !

Thanks for your reply. I tried to create a deepsnap HeteroGraph object from scratch here for the ogblbiokg graph. I followed the tutorial from deepsnap for heterogeneous graphs to create the object.

One important step here is to relabel the nodes from the ogblbiokg since it starts with label 0 for every node type, but networkx requires consecutive node labels.

import torch
import tqdm
import numpy as np
from collections import defaultdict
import networkx as nx
from ogb.linkproppred import PygLinkPropPredDataset


ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")

# =====================
# Relabel nodes
# =====================

## convert to array for speed
edge_split_array = dict()
for dataset in ['train', 'valid', 'test']:
    edge_split_array[dataset] = dict()
    for key in edge_split[dataset]:
        if type(edge_split[dataset][key]) != list:
            edge_split_array[dataset][key] = edge_split[dataset][key].numpy()
        else: 
            edge_split_array[dataset][key] = np.array(edge_split[dataset][key])

# new node label
current_node_label = 0
# track nodes that have been seen
seen = set()

new_label_mapping = defaultdict(dict)
new_label_mapping_inv = defaultdict(dict)

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):

        tmp_head_node = (edge_split_array[dataset]['head'][i], edge_split_array[dataset]['head_type'][i])
        tmp_tail_node = (edge_split_array[dataset]['tail'][i], edge_split_array[dataset]['tail_type'][i])

        if tmp_head_node not in seen:

            seen.add(tmp_head_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['head'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['head_type'][i]
            new_label_mapping_inv[tmp_head_node] = current_node_label
            current_node_label += 1

        if tmp_tail_node not in seen:

            seen.add(tmp_tail_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['tail'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['tail_type'][i]
            new_label_mapping_inv[tmp_tail_node] = current_node_label
            current_node_label += 1


# =====================
# Create HeteroGraph
# =====================
G = nx.DiGraph()

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):
        
        # head node
        head_node_id = edge_split_array[dataset]['head'][i].item()
        head_node_type = edge_split_array[dataset]['head_type'][i]
        new_head_node_id = new_label_mapping_inv[(head_node_id, head_node_type)]

        # tail node
        tail_node_id = edge_split_array[dataset]['tail'][i].item()
        tail_node_type = edge_split_array[dataset]['tail_type'][i]
        new_tail_node_id = new_label_mapping_inv[(tail_node_id, tail_node_type)]

        # edge type
        edge_type_id = edge_split_array[dataset]['relation'][i].item()
        edge_type_label = edge_index_to_type_mapping[edge_split_array[dataset]['relation'][i].item()]

        G.add_node(new_head_node_id, node_type=head_node_type, node_label=head_node_type)
        G.add_node(new_tail_node_id, node_type=tail_node_type, node_label=tail_node_type)
        G.add_edge(new_head_node_id, new_tail_node_id, edge_type=str(edge_type_id))

When I run this code, it creates a networkx graph, as shown in the tutorial. I can also convert it into a HeteroGraph object from deepsnap with this:

# Transform to a heterograph object that is recognised by deepSNAP
hetero = HeteroGraph(G)

But the object does not have the attribute edges :

>>> hetero
HeteroGraph(G=[], edge_feature=[], edge_index=[], edge_label_index=[], edge_to_graph_mapping=[], edge_to_tensor_mapping=[3540567], edge_type=[], node_feature=[], node_label_index=[], node_to_graph_mapping=[], node_to_tensor_mapping=[93773], node_type=[])

>>> hetero.edges()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'HeteroGraph' object has no attribute 'edges'

I am wondering why this is, since I followed the tutorial of the authors. Do you have any idea? Would be great if any of the authors could comment on this @farzaank @JiaxuanYou @RexYing @jmilldotdev ?

P.S. The reason why I would like to use deepsnap is exactly that it can use node features, which I would add for another graph later on.

from deepsnap.

Using the ogbl-biokg with DeepSNAP about deepsnap HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent