stellargraph / stellargraph Goto Github PK

StellarGraph - Machine Learning on Graphs

Home Page: https://stellargraph.readthedocs.io/

License: Apache License 2.0

Python 98.78% Shell 1.04% Dockerfile 0.19%

graphs machine-learning machine-learning-algorithms graph-convolutional-networks networkx geometric-deep-learning saliency-map interpretability heterogeneous-networks graph-neural-networks

stellargraph's Introduction

StellarGraph Machine Learning Library

StellarGraph is a Python library for machine learning on graphs and networks.

Introduction
Getting Started
Getting Help
Example: GCN
Algorithms
Installation
Citing
References

Introduction

The StellarGraph library offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data. It can solve many machine learning tasks:

Representation learning for nodes and edges, to be used for visualisation and various downstream machine learning tasks;
Classification and attribute inference of nodes or edges;
Classification of whole graphs;
Link prediction;
Interpretation of node classification [8].

Graph-structured data represent entities as nodes (or vertices) and relationships between them as edges (or links), and can include data associated with either as attributes. For example, a graph can contain people as nodes and friendships between them as links, with data like a person's age and the date a friendship was established. StellarGraph supports analysis of many kinds of graphs:

homogeneous (with nodes and links of one type),
heterogeneous (with more than one type of nodes and/or links)
knowledge graphs (extreme heterogeneous graphs with thousands of types of edges)
graphs with or without data associated with nodes
graphs with edge weights

StellarGraph is built on TensorFlow 2 and its Keras high-level API, as well as Pandas and NumPy. It is thus user-friendly, modular and extensible. It interoperates smoothly with code that builds on these, such as the standard Keras layers and scikit-learn, so it is easy to augment the core graph machine learning algorithms provided by StellarGraph. It is thus also easy to install with pip or Anaconda.

Getting Started

The numerous detailed and narrated examples are a good way to get started with StellarGraph. There is likely to be one that is similar to your data or your problem (if not, let us know).

You can start working with the examples immediately in Google Colab or Binder by clicking the and badges within each Jupyter notebook.

Alternatively, you can run download a local copy of the demos and run them using jupyter. The demos can be downloaded by cloning the master branch of this repository, or by using the curl command below:

curl -L https://github.com/stellargraph/stellargraph/archive/master.zip | tar -xz --strip=1 stellargraph-master/demos

The dependencies required to run most of our demo notebooks locally can be installed using one of the following:

Using pip: pip install stellargraph[demos]
Using conda: conda install -c stellargraph stellargraph

(See Installation section for more details and more options.)

Getting Help

If you get stuck or have a problem, there are many ways to make progress and get help or support:

Read the documentation
Consult the examples
Contact us:
- Ask questions and discuss problems on the StellarGraph Discussions forum
- File an issue
- Send us an email at [email protected]

Example: GCN

One of the earliest deep machine learning algorithms for graphs is a Graph Convolution Network (GCN) [6]. The following example uses it for node classification: predicting the class from which a node comes. It shows how easy it is to apply using StellarGraph, and shows how StellarGraph integrates smoothly with Pandas and TensorFlow and libraries built on them.

Data preparation

Data for StellarGraph can be prepared using common libraries like Pandas and scikit-learn.

import pandas as pd
from sklearn import model_selection

def load_my_data():
    # your own code to load data into Pandas DataFrames, e.g. from CSV files or a database
    ...

nodes, edges, targets = load_my_data()

# Use scikit-learn to compute training and test sets
train_targets, test_targets = model_selection.train_test_split(targets, train_size=0.5)

Graph machine learning model

This is the only part that is specific to StellarGraph. The machine learning model consists of some graph convolution layers followed by a layer to compute the actual predictions as a TensorFlow tensor. StellarGraph makes it easy to construct all of these layers via the GCN model class. It also makes it easy to get input data in the right format via the StellarGraph graph data type and a data generator.

import stellargraph as sg
import tensorflow as tf

# convert the raw data into StellarGraph's graph format for faster operations
graph = sg.StellarGraph(nodes, edges)

generator = sg.mapper.FullBatchNodeGenerator(graph, method="gcn")

# two layers of GCN, each with hidden dimension 16
gcn = sg.layer.GCN(layer_sizes=[16, 16], generator=generator)
x_inp, x_out = gcn.in_out_tensors() # create the input and output TensorFlow tensors

# use TensorFlow Keras to add a layer to compute the (one-hot) predictions
predictions = tf.keras.layers.Dense(units=len(ground_truth_targets.columns), activation="softmax")(x_out)

# use the input and output tensors to create a TensorFlow Keras model
model = tf.keras.Model(inputs=x_inp, outputs=predictions)

Training and evaluation

The model is a conventional TensorFlow Keras model, and so tasks such as training and evaluation can use the functions offered by Keras. StellarGraph's data generators make it simple to construct the required Keras Sequences for input data.

# prepare the model for training with the Adam optimiser and an appropriate loss function
model.compile("adam", loss="categorical_crossentropy", metrics=["accuracy"])

# train the model on the train set
model.fit(generator.flow(train_targets.index, train_targets), epochs=5)

# check model generalisation on the test set
(loss, accuracy) = model.evaluate(generator.flow(test_targets.index, test_targets))
print(f"Test set: loss = {loss}, accuracy = {accuracy}")

This algorithm is spelled out in more detail in its extended narrated notebook. We provide many more algorithms, each with a detailed example.

Algorithms

The StellarGraph library currently includes the following algorithms for graph machine learning:

Algorithm	Description
GraphSAGE [1]	Supports supervised as well as unsupervised representation learning, node classification/regression, and link prediction for homogeneous networks. The current implementation supports multiple aggregation methods, including mean, maxpool, meanpool, and attentional aggregators.
HinSAGE	Extension of GraphSAGE algorithm to heterogeneous networks. Supports representation learning, node classification/regression, and link prediction/regression for heterogeneous graphs. The current implementation supports mean aggregation of neighbour nodes, taking into account their types and the types of links between them.
attri2vec [4]	Supports node representation learning, node classification, and out-of-sample node link prediction for homogeneous graphs with node attributes.
Graph ATtention Network (GAT) [5]	The GAT algorithm supports representation learning and node classification for homogeneous graphs. There are versions of the graph attention layer that support both sparse and dense adjacency matrices.
Graph Convolutional Network (GCN) [6]	The GCN algorithm supports representation learning and node classification for homogeneous graphs. There are versions of the graph convolutional layer that support both sparse and dense adjacency matrices.
Cluster Graph Convolutional Network (Cluster-GCN) [10]	An extension of the GCN algorithm supporting representation learning and node classification for homogeneous graphs. Cluster-GCN scales to larger graphs and can be used to train deeper GCN models using Stochastic Gradient Descent.
Simplified Graph Convolutional network (SGC) [7]	The SGC network algorithm supports representation learning and node classification for homogeneous graphs. It is an extension of the GCN algorithm that smooths the graph to bring in more distant neighbours of nodes without using multiple layers.
(Approximate) Personalized Propagation of Neural Predictions (PPNP/APPNP) [9]	The (A)PPNP algorithm supports fast and scalable representation learning and node classification for attributed homogeneous graphs. In a semi-supervised setting, first a multilayer neural network is trained using the node attributes as input. The predictions from the latter network are then diffused across the graph using a method based on Personalized PageRank.
Node2Vec [2]	The Node2Vec and Deepwalk algorithms perform unsupervised representation learning for homogeneous networks, taking into account network structure while ignoring node attributes. The node2vec algorithm is implemented by combining StellarGraph's random walk generator with the word2vec algorithm from Gensim. Learned node representations can be used in downstream machine learning models implemented using Scikit-learn, Keras, TensorFlow or any other Python machine learning library.
Metapath2Vec [3]	The metapath2vec algorithm performs unsupervised, metapath-guided representation learning for heterogeneous networks, taking into account network structure while ignoring node attributes. The implementation combines StellarGraph's metapath-guided random walk generator and Gensim word2vec algorithm. As with node2vec, the learned node representations (node embeddings) can be used in downstream machine learning models to solve tasks such as node classification, link prediction, etc, for heterogeneous networks.
Relational Graph Convolutional Network [11]	The RGCN algorithm performs semi-supervised learning for node representation and node classification on knowledge graphs. RGCN extends GCN to directed graphs with multiple edge types and works with both sparse and dense adjacency matrices.
ComplEx[12]	The ComplEx algorithm computes embeddings for nodes (entities) and edge types (relations) in knowledge graphs, and can use these for link prediction
GraphWave [13]	GraphWave calculates unsupervised structural embeddings via wavelet diffusion through the graph.
Supervised Graph Classification	A model for supervised graph classification based on GCN [6] layers and mean pooling readout.
Watch Your Step [14]	The Watch Your Step algorithm computes node embeddings by using adjacency powers to simulate expected random walks.
Deep Graph Infomax [15]	Deep Graph Infomax trains unsupervised GNNs to maximize the shared information between node level and graph level features.
Continuous-Time Dynamic Network Embeddings (CTDNE) [16]	Supports time-respecting random walks which can be used in a similar way as in Node2Vec for unsupervised representation learning.
DistMult [17]	The DistMult algorithm computes embeddings for nodes (entities) and edge types (relations) in knowledge graphs, and can use these for link prediction
DGCNN [18]	The Deep Graph Convolutional Neural Network (DGCNN) algorithm for supervised graph classification.
TGCN [19]	The GCN_LSTM model in StellarGraph follows the Temporal Graph Convolutional Network architecture proposed in the TGCN paper with a few enhancements in the layers architecture.

Installation

StellarGraph is a Python 3 library and we recommend using Python version 3.6. The required Python version can be downloaded and installed from python.org. Alternatively, use the Anaconda Python environment, available from anaconda.com.

The StellarGraph library can be installed from PyPI, from Anaconda Cloud, or directly from GitHub, as described below.

Install StellarGraph using PyPI:

To install StellarGraph library from PyPI using pip, execute the following command:

pip install stellargraph

Some of the examples require installing additional dependencies as well as stellargraph. To install these dependencies as well as StellarGraph using pip execute the following command:

pip install stellargraph[demos]

The community detection demos require python-igraph which is only available on some platforms. To install this in addition to the other demo requirements:

pip install stellargraph[demos,igraph]

Install StellarGraph in Anaconda Python:

The StellarGraph library is available an Anaconda Cloud and can be installed in Anaconda Python using the command line conda tool, execute the following command:

conda install -c stellargraph stellargraph

Install StellarGraph from GitHub source:

First, clone the StellarGraph repository using git:

git clone https://github.com/stellargraph/stellargraph.git

Then, cd to the StellarGraph folder, and install the library by executing the following commands:

cd stellargraph
pip install .

Some of the examples in the demos directory require installing additional dependencies as well as stellargraph. To install these dependencies as well as StellarGraph using pip execute the following command:

pip install .[demos]

Citing

StellarGraph is designed, developed and supported by CSIRO's Data61. If you use any part of this library in your research, please cite it using the following BibTex entry

@misc{StellarGraph,
  author = {CSIRO's Data61},
  title = {StellarGraph Machine Learning Library},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/stellargraph/stellargraph}},
}

References

Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec. Neural Information Processing Systems (NIPS), 2017, (link webpage)
Node2Vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, (link)
Metapath2Vec: Scalable Representation Learning for Heterogeneous Networks. Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 135–144, 2017, (link)
Attributed Network Embedding via Subspace Discovery. D. Zhang, Y. Jie, X. Zhu and C. Zhang, Data Mining and Knowledge Discovery, 2019, (link)
Graph Attention Networks. P. Veličković et al. International Conference on Learning Representations (ICLR), 2018, (link)
Graph Convolutional Networks (GCN): Semi-Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf, Max Welling. International Conference on Learning Representations (ICLR), 2017, (link)
Simplifying Graph Convolutional Networks. F. Wu, T. Zhang, A. H. de Souza, C. Fifty, T. Yu, and K. Q. Weinberger. International Conference on Machine Learning (ICML), 2019, (link)
Adversarial Examples on Graph Data: Deep Insights into Attack and Defense. H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu. IJCAI 2019, (link)
Predict then propagate: Graph neural networks meet personalized PageRank. J. Klicpera, A. Bojchevski, A., and S. Günnemann, ICLR, 2019, arXiv:1810.05997.(link)
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. W. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, and C. Hsiej, KDD, 2019, arXiv:1905.07953.(link)
Modeling relational data with graph convolutional networks. M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, European Semantic Web Conference, 2018, arXiv:1609.02907 (link).
Complex Embeddings for Simple Link Prediction. T. Trouillon, J. Welbl, S. Riedel, É. Gaussier and G. Bouchard, ICML, 2016. (link)
Learning Structural Node Embeddings via Diffusion Wavelets. C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec, SIGKDD, 2018, arXiv:1710.10321 (link)
Watch Your Step: Learning Node Embeddings via Graph Attention. S. Abu-El-Haija, B. Perozzi, R. Al-Rfou and A. Alemi, NIPS, 2018, arXiv:1710.09599 (link)
Deep Graph Infomax. P. Veličković, W. Fedus, W. L. Hamilton, P. Lio, Y. Bengio, R. D. Hjelm. International Conference on Learning Representations (ICLR), 2019, arXiv:1809.10341, (link).
Continuous-Time Dynamic Network Embeddings. Giang Hoang Nguyen, John Boaz Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungchul Kim. Proceedings of the 3rd International Workshop on Learning Representations for Big Networks (WWW BigNet) 2018. (link)
Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng, ICLR, 2015. arXiv:1412.6575 (link)
An End-to-End Deep Learning Architecture for Graph Classification. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen, AAAI, 2018. (link)
T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. IEEE Transactions on Intelligent Transportation Systems, 2019. (link)

stellargraph's People

Contributors

Stargazers

Watchers

Forkers

huonw adalisan guanlongtianzi pspeter gear ariewahyu gridl orchestor subpath panteliselinas mafm08 ffaisal93 diyang chubbymaggie wangjiahong shuangyumo anonymnous-gituser cbentes syseeker minsu-daniel-kim wkryst doyley91 liginanton ashuein alextorquin ufwt esugis hhh920406 mumaxu ifv eric-protzer savy1712 swathi1810 wangzhen263 pallasathena92 guptam onisimchukv spencerx xhoong thecooltechguy ml-lab henrydambanemuya sktzwhj nrosjat theden zblumen haiquanchen silva-m jgericardo databill86 vincentleebang yuehanlyu souravbose1991 hookk ab-go lyh710 hl-henry ddomingue flavio58it damioncheng kiarie-ndegwa dpys ncdingari tonylv mannyjop cdawei xlamb1412 aoe-khkhan junhaowang kieranricardo qiulinzhang kannazuki2017 sahanduiuc shalevy1 matt-gilliland sucrerouge jzl0166 holoword zongzonglin flowersj sher-ali84 mgalusza xyuan bbw7561135 stjordanis ovishake zhh0998 muchway2019 guyguygang hd-chuong bksgupta kjun9 opensourcelearningrepos amaiya yunhao7966 aravind-n-s linkin2333 wwwwei yichingchan1013 samalienware

stellargraph's Issues

Test link prediction demo on HIN

Description

The link prediction demo works on homogeneous datasets. We want to test whether it also works on heterogeneous datasets under the assumption that the latter will be treated as homogeneous.

User Story

As a: Research Enigneer
I want: to make sure that the link prediction demo work for both homogeneous and heterogeneous networks
so that: I can tackle more general analytics problems.

Done Checklist (Development)

Determine heterogeneous dataset for testing
Link prediction demo works with input the selected heterogeneous network.
Pull request

Present Architecture(s) and Bottlenecks for GraphSAGE

Description

Present initial analysis of scalable architecture for GraphSAGE, its bottlenecks, and possible ways to fix them.

Checklist

Add to architecture documentation in Google Docs

Done

https://drive.google.com/open?id=1jdo6ZvNZscTaMj6jiQLGyjhuF1n-c8uTKY4TtT0uxbE

Graph splitting based on edge type to predict

Description

Data splitter for link prediction should be able to split the graph based on the type of the edge to predict. Also, it should be able to split based on an edge property. For example, we should be able to split based on timestamps if edge have it as a property.

User Story

As a: Research Engineer
I want: to prepare my data
so that: I can perform link prediction on HINs based on edge types and properties

Done Checklist (Research)

Implement data splitting based on edge type and/or edge property, e.g., timestamp
Pull request
Unit tests

Extend HinSAGE demo code for unsupervised learning

Description

Implement the wrappers / additional layers for unsupervised learning around the HinSAGE demo code. Create a working example using the risk net dataset.

User Story

As a: data scientist
I want: to run unsupervised learning using GraphSAGE
so that: I can transform my large dataset into node embeddings

Done Checklist

Peer Code Review Performed
Code well commented
Documentation in repo

Investigate Metapath2Vec paper for link prediction on heterogeneous graphs

Description

I want to understand the Metapath2Vec algorithm for representation learning in heterogeneous graphs.

User Story

As a: Research Engineer
I want: to understand the MetaPath2Vec algorithm for representation learning on heterogeneous graphs
so that: I can use it for node attribute inference and link prediction.

Done Checklist (Development)

Document differences between Metapath2Vec and Node2Vec algorithms
Determine scalability issues that are unique to Metapath2Vec
Search for reference implementation and, if found, run some experiments on test graphs to better understand its performance.

PoC for Unsupervised GraphSAGE

Description

Currently the GraphSage unsupervised method is not in the library. This task is to add a simple unsupervised GraphSage module in the stellar-ml library.

User Story

As a: Data Scientist
I want: everything that GraphSAGE offers
so that: I have freedom for my unsupervised method experiments

Done Checklist (Development)

Branch and Pull Request build on CI
Code well commented

Graphsage demo in StellarML Library

Description

Currently StellarML Library has some base classes. We have working Graphsage code from Kevin. We should implement a demo using the classes from StellarML, moving code to this style as required.

Done Checklist (Development)

Done Checklist (Research)

Done Checklist (Bug)

Bug fixed
Branch and Pull Request build on CI
Branch and Pull Request pass unit tests on CI
Branch and Pull Request pass integration tests on CI
Version number reflects new status
Peer Code Review Performed
Code well commented

Sketch a principled way to generalise GraphSAGE to heterogeneous graphs

Description

As is, GraphSAGE works for homogeneous graphs. We need to sketch a principled way to generalise it to HINs, similar to DECAGON, ideally by generalising the existing GraphSAGE code.

Checklist

Documentation in ML Team Google Docs

Done

Move link prediction demo from stellar-ml-sandbox to stellar-ml repo

Description

We need to move the demo for link prediction from the stellar-ml-sandbox repo to here.

User Story

As a: Research Engineer
I want: to have all my code relating to graph ML in one place
so that: I can more effectively develop the graph-ml library and share code with my team

Done Checklist (Development)

Moved code from stellar-ml-sandbox to stellar-ml repo
Pull request

Setup Travis for stellar-ml

Description

Setup continuous integration for automated tests in the library. Create a buildkite.yml file

Done Checklist

Triggered commits for BuildKite
Writing to build-bots

Update GraphSAGE HIN document with ML Task architectures (continuing #6)

Description

Continuing from #6

Done Checklist

Description added to the document of the architectures for
- unsupervised node feature learning
- semi-supervised node attribute inference on HINs with GraphSAGE
- supervised link prediction on HINs

Collaborate with Platform Team on caching architecture

Description

The platform team is building an experimental stack. We need to ensure that this meets the needs of the ML and Data teams.

User Story

As a: IA ML dev
I want: ensure that I'm building in sync with the platform team
so that: there is no wasted effort

Done Checklist (Development)

Documentation on Google Docs

Write simple data feed to Graphsage for StellarML library

Description

Currently the Graphsage code from Kevin runs well, but requires a redis database and is slow for simple single computer testing.

As part of the StellarML library we want to pass data into tensorflow fast. Having a simple in-memory graphsage sampler would be a good start.

Done Checklist (Development)

Done Checklist (Research)

Done Checklist (Bug)

Bug fixed
Branch and Pull Request build on CI
Branch and Pull Request pass unit tests on CI
Branch and Pull Request pass integration tests on CI
Version number reflects new status
Peer Code Review Performed
Code well commented

Explore Aboleth for library design choices

Description

Can we borrow some design choices, e.g.: base classes & inheritance, layer compositions, pipelining
Link: https://github.com/data61/aboleth

Checklist

List of Aboleth base clases + description, perhaps indicating which base classes can be borrowed into our library
pseudo code for node2vec+logistic workflow with the graph ML library (as we imagine it)
pseudo code for GraphSAGE

Prepare YOW Data Experiment

Description

We need to get started with an interesting demonstration of machine learning on graphs for the YowData! conference in mid-May.
First, we need to decide on a dataset and problem.

Checklist

Write-up of dataset and problem options on google docs
Initial implementation of solving the selected problem on the dataset

Add equations for GraphSAGE aggregator and nonlinear embedding generator in case of HINs

Description

Add equations (similar to those in Algorithm 1 or 2 in GraphSAGE paper) for embedding step (aggregator + dense layer) applied to HINs

Done Checklist

Equations added to Heterogeneous GraphSAGE document created in Issue #6

Investigate Apache Tinkerpop and write GraphSAGE input preparation in gremlin-python

Description

Investigate Gremlin's viability to efficiently prepare inputs for GraphSAGE from a graph database as well as from local memory.

User Story

As a: data scientist
I want: to use gremlin to prepare inputs for my graph ML tasks.
so that: I can efficiently prepare batch inputs from various graph data sources.

Done Checklist (Development)

Code well commented
Documentation
Peer Code Review Performed
Mini-meetup talk

Run HIN Graphsage for Movielens 1M dataset with node attributes

Description

Kevin has written code for HIN GraphSage, I'd like to use this to make predictions on the Movielens 1M dataset with the same train/test split as other examples and using intrinsic user/movie features.

Done Checklist

Obtain performance numbers for node2vec features
Obtain performance numbers for intrinsic features
Documentation on Google Docs
Code Review

Build inductive NAI with GCN

Steps:
Given a full graph G:

Randomly select a test set of nodes {V}_test, remove them from G, resulting in G_train = G - {V}_test
Evaluate \hat{A}=\hat{A_train}, X_train from G_train
Train GCN on G_train (feeding \hat{A_train}, X_train), save the trained model
Evaluate \hat{A}, X for the full graph G, ensuring that the order of nodes in the intersection of G and G_train is preserved. I.e., update \hat{A_train}, X_train with test nodes to obtain the full graph's \hat{A}, X.
Do a forward pass of the updated \hat{A}, X through the trained GCN model, predicting attributes for test nodes.
Evaluate predictions by comparing them with true test node attributes

Repeat steps 1-5 to obtain average prediction metrics.

Scalable Node Attribute Inference for Graphs

Description

Build a scalable implementation of node attribute inference (NAI) for graphs, that works for at least 10M node graphs.

Value

Besides satisfying stakeholders' requirements for scalable attribute inference tasks on large graph datasets (thus expanding the NAI capability of Release 1), this should allow us to find an optimal scalable architecture for other ML tasks on graphs, such as link prediction and classification, recommendations, etc.

Extend HinSAGE for Link Prediction

Description

Use the documentation for HinSAGE link prediction to create a working link prediction example using the Paradise Papers dataset from the Data team.

User Story

As a: data scientist
I want: to use GraphSAGE layers for link prediction
so that: I can run scalable link prediction

Done Checklist (Development)

Working example with Alzheimer data
Code well commented
Documentation in repo
Peer Code Review Performed

Adapt the initial library skeleton to include Kevin's initial GraphSAGE workflow

Description

Adapt the initial library skeleton to include Kevin's initial GraphSAGE workflow

Done Checklist (Development)

YowData: Investigate NetFlix prediction using GraphSage

Description

Done Checklist (Research)

Team talk
Results for movie rating predictions (RMSE)

Improve speed of 'local' sampling method for link prediction

Description

Sampling negative edges for link prediction using the nodes' local neighbourhood structure currently uses BFS that runs very slow if target nodes more than 5 edges away need to be sample. This issue is about replacing BFS with DFS to speed up the sampling algorithm.

User Story

As a: Research Engineer
I want: to run link experiments as fast as possible
so that: I maximise my efficiency.

Done Checklist (Development)

Updated source code to replace BFS for target nodes with DFS.
Pull request

[Dynamic Node2Vec] Investigate temporal updates of skipgram model

Description

Hooman is performing through experiments in how his dynamic random walk methods perform as a part of an end-to-end dynamic node2vec algorithm. There are difficulties with how the skip-gram model interacts with random walk updates. To get a good publication we need a description of the skip-gram model and some explanation of how different training update schemes will affect the model.

Done Checklist (Research)

Skip-gram model understanding
Documentation of skip-gram model
Documentation of skip-gram model update techniques

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

Slides on Google Docs
Give YOWData! presentation

Investigate risknet with small-graph version GraphSage

Description

Since risknet has similar data size to Cora dataset, check out if GraphSage small graph version works with risknet,

Done Checklist (Research)

Code Review
Documentation in repo

Review Kevin's pull request for graphsage demo

Description

Review the pull request #12

Done Checklist (Development)

Peer Code Review Performed
Code well commented
Documentation in repo

Start moving code from link-prediction/utils to stellar ML library

Description

Some of the code in link-prediction/utils is mature enough to be integrated into the stellar ML library.

User Story

As a: Research Engineer
I want: to transfer mature code from demo code into the stellar graph ML library
so that: it can be re-used by other IA member and properly unit tested with CI.

Done Checklist (Development)

Produced code for required functionality
Branch and Pull Request build on CI
Branch and Pull Request pass unit tests on CI
Peer Code Review Performed
Code well commented

Investigate Turi for graph processing in the ML library and platform

Description

Apple has open-sourced Turi which is a powerful graph processing framework. We should evaluate this tech with the following critera:

functionality
easy of use
easy of scalability
performance

This task would be to ingest a > 1M edge dataset and perform a set of graph tasks e.g. BFS/DFS, graph traversal, grabbing neighbours, random sampling.

User Story

As a: data scientist
I want: the graph processing part of the library to be fast and have lots of functionality
so that: I can move on to my tensorflow part to build my model

Done Checklist (Development)

Small experiment setup to run the evaluation, e.g. python script file
Documentation on Google Docs
Team demo

Git workflow demo

Description

Use example git repositories to understand the git and github workflow.

User Story

As a: Research engineer
I want: to understand git and github workflows
so that: I can work with the rest of the team to develop the ML library.

Done Checklist (Development)

Create test repos
Document the workflow for forking a repo, developing new code, and putting the code back to original repo via a pull request

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

Slides on Google Docs
Give YOWData! presentation

Graph Machine Learning library that is easy to use and contribute to

Description

Create a machine learning library in Python that is simple to use and simple to contribute too. The library should focus on the deep learning on graph algorithms, and not attempt to duplicate existing algorithms e.g. community detection, random forests etc.

Value

This library will allow Data Scientists and Researchers to create models over network datasets with minimal overhead. The goal is to allow a fast experiment cycle time, with minimal assumed knowledge. For Researchers, it should be a place to add new algorithms, get their algorithms seen, and supply functions for building new deep learning models on graphs.

Unit tests for link prediction demo

Description

We need unit tests for the link prediction utility classes.

User Story

As a: Research Engineer
I want: to make sure that changes to the link prediction code are not breaking existing functionality
so that: I can be certain that my code works correctly as it is expanded and improved.

Done Checklist (Development)

Create test directory for link prediction demo
Add test for link prediction code
Pull request

Investigate feature alignment for link prediction

Description

Investigate whether link features obtained from G_train and G_test are aligned, and whether/how this affects performance of the link prediction classifier.

We need to remove the confluence docs, it would be good to get the link prediction code from there.

Done Checklist (Research)

Experimental code/visualisations in 'alignment' branch in stellar-ml-sandbox/link-prediction
Documentation on Google Docs

YowData: Prepare spammers example

Description

I want to present an example a the YowData conference. The spammers dataset is an interesting case for applying graph ML. I want to prepare the spammers dataset and run node attribute inference on it.

Note:
Anna has done some investigation into using GraphSage and node2vec, so I will find out what has been done so far.

Done Checklist (Research)

Gave presentation at YowData

Tune node2vec parameters for link prediction demo

Description

Currently, the link prediction demo uses fixed parameter values, e.g., p=q=1 and several other parameters, for node2vec. We need to allow for these parameters to be tuned for improved link prediction performance.

User Story

As a: Research Engineer
I want: to tune the hyper-parameters of the node2vec algorithm
so that: I can achieve the highest performance in link prediction

Done Checklist (Development)

Code to tune node2vec hyper-parameters
Pull request

Organize external engadgements with Jia and Jesse

Description

To engage successfully with the research groups led by Jia and Jesse, we need to map out the research interests of both groups and match them with research questions of relevance to us.

User Story

As a: researcher collaborating with the Stellar project
I want: to research graph technologies that are of interest to Stellar
so that: we can get publications for our research and support from Stellar.

Done Checklist (Research)

Documentation of ongoing engadgements on Google Docs
Schedule of meetings with Jesse and Jia
Outline of scope of research.
AC review

Improve link prediction demo to handle non-integer node IDs

Description

The current implementation of the link prediction demo assumes that node IDs are integers. This is a restrictive assumption because for some datasets the node IDs are not integers. This causes the link prediction demo to fail with an Exception. We need to generalise the code so that it handles non-integer node IDs.

User Story

As a: Research Engineer
I want: to perform link prediction on a variety of network datasets stored in valid EPGM format
so that: I can be certain of the link prediction algorithms generalisation

Done Checklist (Development)

Update implementation to handle non-integer node IDs
Add unit tests
Pull request

Organise reference datasets

Description

Organise the reference datasets with a readme.

User Story

As an: IA team member
I want: to have easy access to well defined data sets
So that: I can test my code and minimise dataset confusion

Done Checklist (Bug)

Documented dataset procedure
Document

Create stellar-ml library structure, and populate with base classes

Description

Define the library's structure, base classes, methods, some helper functions, etc.
Create unit tests for all the library's base classes and helper functions

User Story

As a: developer of the library
I want: to see a clear structure of base classes to inherit from, their methods, and examples of composing workflows from them.

Done Checklist (Development)

Assumptions of the user story met
Produced code for required functionality
Branch and Pull Request pass unit tests on CI
Peer Code Review Performed
Code well commented
Documentation in repo

Build demo link prediction code from existing

Description

Use existing code to build up a link prediction demo for homogeneous graphs

Checklist

Make stellar-ml-sandbox
Create link-prediction scripts
How-to Readme

YowData: Investigate NetFlix prediction using N2V

Description

For the YowData conference, I'd like to present a recommender example. The Netflix prize dataset is well known, and a large amount of effort has been spent on getting results on this dataset. Good performance on this dataset would be impressive.

Recommender systems are often not thought about in terms of graphs. Therefore, posing this in a graph framework and solving it would be interesting. We can start by using node2vec to extract node embeddings and trying to predict the scores from this.

Done Checklist (Research)

Notes or slides on recommedations for movielens with node2vec
Code for recommendations for movielens with node2vec

Clean up Movielens using HIN Graphsage and move to demos

Description

The movielens recommender demo developed for YOWData could be useful for other problems (Anna would like to try it out to see if it will work for the medicare dataset.

Currently the code is rough and ready, so I'd like to tidy it up, add documentation and have a quick-to-run test case (say on movielens 100k).

Done Checklist (Research)

Code Review
Documentation in repo
Code well commented

Create baseline skeleton library

Description

Create initial dummy library using the documentation and pseudo code already accumulated

Done Checklist

Code
Pull Request
Unit Tests

Investigate message passing for node2vec

Description

Node2vec can be implemented in a message-passing framework. However, this is strictly only true for prediction. Can we also place training in a message passing framework?

User Story

As a: developer of the graphml library
I want: to train and predict using node2vec in a message-passing framework
so that: i can train node2vec in a one-step scalable fashion.

Done Checklist (Research)

Code Review
Documentation on Google Docs

Update EPGM class to use networkx v2.*

Description

Currently, our graph processing module requires an earlier version of networkx, e.g., 1.. Newer versions of networkx, namely 2., have changed how nodes and edges are returned to the user. We need to updated our code to work with the newer version of networkx because it is becoming more common and often causes problems.

User Story

As a: Research Engineer
I want: my network analytics library to work with the latest version of python modules
so that: I can make use of the latest developments and improvements in these modules

Done Checklist (Development)

Produced code for required functionality
Unit tests updated and new ones added as necessary
Pull request

Understand HIN GraphSage algorithm

Description

Kevin and Yuriy have implemented a HIN GraphSAGE algorithm. I'd like to understand the implementation.

There are are other heterogeneous GCN-like algorithms in the literature, read and understand them. How do they compare? Which algorithms could we implement for the ML library? Can we obtain code and test them on different problems? What input sampling strategies are required for each algorithm? How do training and prediction differ?

Done Checklist (Research)

Add different algorithms to documentation on Google Docs
Add sampling strategies to documentation on Google Docs

Improve data splitting code for link prediction

Description

The node splitter developed for the link prediction demo of issue #8 needs to be improved such that negative samples are more challenging, i.e., should not be randomly selected out of all pairs of disconnected nodes but rather of disconnected nodes that are nearby in the graph.

User Story

As a: Research Engineer
I want: to use my data to correctly evaluate my link prediction algorithm
so that: I am confident about its performance on unseen data.

Done Checklist (Development)

Edge splitter class with improved sampling algorithm
Integration of new edge splitter class with baseline link prediction demo
Pull Request

Prepare for GraphSAGE/HinSAGE usage during Hackathon

Description

Prepare for the Spotify hackathon to allow everyone to use GraphSAGE/HinSAGE with ease on the day. Investigate the dataset and prepare notes on any requirements such as AWS setup, input batch preparation code, etc.

User Story

As a: Hackathoner
I want: to run Stellar's graph ML algorithms during the Hackathon
so that: we can win the Spotify competition

Done Checklist (Development)

Documentation on Google Docs
Documentation in repo

stellargraph / stellargraph Goto Github PK

stellargraph's Introduction

StellarGraph Machine Learning Library

Table of Contents

Introduction

Getting Started

Getting Help

Example: GCN

Data preparation

Graph machine learning model

Training and evaluation

Algorithms

Installation

Install StellarGraph using PyPI:

Install StellarGraph in Anaconda Python:

Install StellarGraph from GitHub source:

Citing

References

stellargraph's People

Contributors

Stargazers

Watchers

Forkers

stellargraph's Issues

Description

User Story

Done Checklist (Development)

Description

Checklist

Done

Description

User Story

Done Checklist (Research)

Description

User Story

Done Checklist

Description

User Story

Done Checklist (Development)

Description

User Story

Done Checklist (Development)

Description

Done Checklist (Development)

Done Checklist (Research)

Done Checklist (Bug)

Description

Checklist

Done

Description

User Story

Done Checklist (Development)

Description

Done Checklist

Description

Done Checklist

Description

User Story

Done Checklist (Development)

Description

Done Checklist (Development)

Done Checklist (Research)

Done Checklist (Bug)

Description

Checklist

Description

Checklist

Description

Done Checklist

Description

User Story

Done Checklist (Development)

Description

Done Checklist

Description

Value

Description

User Story

Done Checklist (Development)

Description