Git Product home page Git Product logo

Comments (4)

anniekmyatt avatar anniekmyatt commented on August 15, 2024

Hello! I'm not a maintainer of the deepsnap package but I would be happy to try and help. Would you mind posting the code that you used, that resulted in the error message, so it is easier to spot what went wrong?

I wonder whether what happened is that you created a heterogeneous graph object with v2 of pytorch-geometric, which might not yet be supported by deepsnap? If you have a pytorch-geometric HeteroData graph object, the edges of the different types are already stored in a dictionary, so then it wouldn't have an edge_index entry (but rather an edge_index_dict) and that migth be what the error message is complaining about?

A question for the developers of deepsnap: are you planning to update deepsnap to be compatible with HeteroData objects from pytorch-geometric >=v2? Can you share something about the roadmap for deepsnap in general?

from deepsnap.

sophiakrix avatar sophiakrix commented on August 15, 2024

Hi @anniekmyatt !
Thanks for chipping in on this. There are only a few lines of code I used for this:

from ogb.linkproppred import PygLinkPropPredDataset

ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])

from deepsnap.

anniekmyatt avatar anniekmyatt commented on August 15, 2024

I just ran this and I'm getting the same error (I added the line from deepsnap.graph import Graph though).

This error occurs because the ogbl_biokg_dataset has an edge_index_dict rather than a single edge_index attribute. OGB uses this edge_index_dict dictionary to specify the edges for the different edge types. If you are keen to use deepsnap, rather than pytorch-geometric directly, it seems like you need to manually create the hetero graph object like here. However, the ogbl_biokg_dataset consists only of triplets, it doesn't have node features so you'll have to create some appropriate (or placeholder) features for the node_feature input. To create the deepsnap heterograph object your code would look something like this:

dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = dataset[0] 
hetero_graph = HeteroGraph(
     node_feature=<insert your node features here,>,
     edge_index=graph.edge_index_dict, # Note that this is a dictionary with edge index for each edge type
     directed=True)

About the node features: this should be a dictionary with keys of each node type (e.g. disease, drug...) and as values a torch tensor of dimension (number_of_nodes, number_of_features_per_node).

I am curious which deepsnap functionality specifically you would like to use? For an RDF graph like this (without node features), wouldn't a package like DGL-KE be more helpful as it has lots of embedding functionality that doesn't rely on message passing of node features?

from deepsnap.

sophiakrix avatar sophiakrix commented on August 15, 2024

Hi @anniekmyatt !

Thanks for your reply. I tried to create a deepsnap HeteroGraph object from scratch here for the ogblbiokg graph. I followed the tutorial from deepsnap for heterogeneous graphs to create the object.

One important step here is to relabel the nodes from the ogblbiokg since it starts with label 0 for every node type, but networkx requires consecutive node labels.

import torch
import tqdm
import numpy as np
from collections import defaultdict
import networkx as nx
from ogb.linkproppred import PygLinkPropPredDataset


ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")

# =====================
# Relabel nodes
# =====================

## convert to array for speed
edge_split_array = dict()
for dataset in ['train', 'valid', 'test']:
    edge_split_array[dataset] = dict()
    for key in edge_split[dataset]:
        if type(edge_split[dataset][key]) != list:
            edge_split_array[dataset][key] = edge_split[dataset][key].numpy()
        else: 
            edge_split_array[dataset][key] = np.array(edge_split[dataset][key])

# new node label
current_node_label = 0
# track nodes that have been seen
seen = set()

new_label_mapping = defaultdict(dict)
new_label_mapping_inv = defaultdict(dict)

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):

        tmp_head_node = (edge_split_array[dataset]['head'][i], edge_split_array[dataset]['head_type'][i])
        tmp_tail_node = (edge_split_array[dataset]['tail'][i], edge_split_array[dataset]['tail_type'][i])

        if tmp_head_node not in seen:

            seen.add(tmp_head_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['head'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['head_type'][i]
            new_label_mapping_inv[tmp_head_node] = current_node_label
            current_node_label += 1

        if tmp_tail_node not in seen:

            seen.add(tmp_tail_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['tail'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['tail_type'][i]
            new_label_mapping_inv[tmp_tail_node] = current_node_label
            current_node_label += 1


# =====================
# Create HeteroGraph
# =====================
G = nx.DiGraph()

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):
        
        # head node
        head_node_id = edge_split_array[dataset]['head'][i].item()
        head_node_type = edge_split_array[dataset]['head_type'][i]
        new_head_node_id = new_label_mapping_inv[(head_node_id, head_node_type)]

        # tail node
        tail_node_id = edge_split_array[dataset]['tail'][i].item()
        tail_node_type = edge_split_array[dataset]['tail_type'][i]
        new_tail_node_id = new_label_mapping_inv[(tail_node_id, tail_node_type)]

        # edge type
        edge_type_id = edge_split_array[dataset]['relation'][i].item()
        edge_type_label = edge_index_to_type_mapping[edge_split_array[dataset]['relation'][i].item()]

        G.add_node(new_head_node_id, node_type=head_node_type, node_label=head_node_type)
        G.add_node(new_tail_node_id, node_type=tail_node_type, node_label=tail_node_type)
        G.add_edge(new_head_node_id, new_tail_node_id, edge_type=str(edge_type_id))
        

When I run this code, it creates a networkx graph, as shown in the tutorial. I can also convert it into a HeteroGraph object from deepsnap with this:

# Transform to a heterograph object that is recognised by deepSNAP
hetero = HeteroGraph(G)

But the object does not have the attribute edges :

>>> hetero
HeteroGraph(G=[], edge_feature=[], edge_index=[], edge_label_index=[], edge_to_graph_mapping=[], edge_to_tensor_mapping=[3540567], edge_type=[], node_feature=[], node_label_index=[], node_to_graph_mapping=[], node_to_tensor_mapping=[93773], node_type=[])

>>> hetero.edges()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'HeteroGraph' object has no attribute 'edges'

I am wondering why this is, since I followed the tutorial of the authors. Do you have any idea? Would be great if any of the authors could comment on this @farzaank @JiaxuanYou @RexYing @jmilldotdev ?

P.S. The reason why I would like to use deepsnap is exactly that it can use node features, which I would add for another graph later on.

from deepsnap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.