Git Product home page Git Product logo

stellargraph / stellargraph Goto Github PK

View Code? Open in Web Editor NEW
2.9K 63.0 423.0 94.74 MB

StellarGraph - Machine Learning on Graphs

Home Page: https://stellargraph.readthedocs.io/

License: Apache License 2.0

Python 98.78% Shell 1.04% Dockerfile 0.19%
graphs machine-learning machine-learning-algorithms graph-convolutional-networks networkx geometric-deep-learning saliency-map interpretability heterogeneous-networks graph-neural-networks

stellargraph's Introduction

StellarGraph Machine Learning library logo

pypi downloads

StellarGraph Machine Learning Library

StellarGraph is a Python library for machine learning on graphs and networks.

Table of Contents

Introduction

The StellarGraph library offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data. It can solve many machine learning tasks:

Graph-structured data represent entities as nodes (or vertices) and relationships between them as edges (or links), and can include data associated with either as attributes. For example, a graph can contain people as nodes and friendships between them as links, with data like a person's age and the date a friendship was established. StellarGraph supports analysis of many kinds of graphs:

  • homogeneous (with nodes and links of one type),
  • heterogeneous (with more than one type of nodes and/or links)
  • knowledge graphs (extreme heterogeneous graphs with thousands of types of edges)
  • graphs with or without data associated with nodes
  • graphs with edge weights

StellarGraph is built on TensorFlow 2 and its Keras high-level API, as well as Pandas and NumPy. It is thus user-friendly, modular and extensible. It interoperates smoothly with code that builds on these, such as the standard Keras layers and scikit-learn, so it is easy to augment the core graph machine learning algorithms provided by StellarGraph. It is thus also easy to install with pip or Anaconda.

Getting Started

The numerous detailed and narrated examples are a good way to get started with StellarGraph. There is likely to be one that is similar to your data or your problem (if not, let us know).

You can start working with the examples immediately in Google Colab or Binder by clicking the and badges within each Jupyter notebook.

Alternatively, you can run download a local copy of the demos and run them using jupyter. The demos can be downloaded by cloning the master branch of this repository, or by using the curl command below:

curl -L https://github.com/stellargraph/stellargraph/archive/master.zip | tar -xz --strip=1 stellargraph-master/demos

The dependencies required to run most of our demo notebooks locally can be installed using one of the following:

  • Using pip: pip install stellargraph[demos]
  • Using conda: conda install -c stellargraph stellargraph

(See Installation section for more details and more options.)

Getting Help

If you get stuck or have a problem, there are many ways to make progress and get help or support:

Example: GCN

One of the earliest deep machine learning algorithms for graphs is a Graph Convolution Network (GCN) [6]. The following example uses it for node classification: predicting the class from which a node comes. It shows how easy it is to apply using StellarGraph, and shows how StellarGraph integrates smoothly with Pandas and TensorFlow and libraries built on them.

Data preparation

Data for StellarGraph can be prepared using common libraries like Pandas and scikit-learn.

import pandas as pd
from sklearn import model_selection

def load_my_data():
    # your own code to load data into Pandas DataFrames, e.g. from CSV files or a database
    ...

nodes, edges, targets = load_my_data()

# Use scikit-learn to compute training and test sets
train_targets, test_targets = model_selection.train_test_split(targets, train_size=0.5)

Graph machine learning model

This is the only part that is specific to StellarGraph. The machine learning model consists of some graph convolution layers followed by a layer to compute the actual predictions as a TensorFlow tensor. StellarGraph makes it easy to construct all of these layers via the GCN model class. It also makes it easy to get input data in the right format via the StellarGraph graph data type and a data generator.

import stellargraph as sg
import tensorflow as tf

# convert the raw data into StellarGraph's graph format for faster operations
graph = sg.StellarGraph(nodes, edges)

generator = sg.mapper.FullBatchNodeGenerator(graph, method="gcn")

# two layers of GCN, each with hidden dimension 16
gcn = sg.layer.GCN(layer_sizes=[16, 16], generator=generator)
x_inp, x_out = gcn.in_out_tensors() # create the input and output TensorFlow tensors

# use TensorFlow Keras to add a layer to compute the (one-hot) predictions
predictions = tf.keras.layers.Dense(units=len(ground_truth_targets.columns), activation="softmax")(x_out)

# use the input and output tensors to create a TensorFlow Keras model
model = tf.keras.Model(inputs=x_inp, outputs=predictions)

Training and evaluation

The model is a conventional TensorFlow Keras model, and so tasks such as training and evaluation can use the functions offered by Keras. StellarGraph's data generators make it simple to construct the required Keras Sequences for input data.

# prepare the model for training with the Adam optimiser and an appropriate loss function
model.compile("adam", loss="categorical_crossentropy", metrics=["accuracy"])

# train the model on the train set
model.fit(generator.flow(train_targets.index, train_targets), epochs=5)

# check model generalisation on the test set
(loss, accuracy) = model.evaluate(generator.flow(test_targets.index, test_targets))
print(f"Test set: loss = {loss}, accuracy = {accuracy}")

This algorithm is spelled out in more detail in its extended narrated notebook. We provide many more algorithms, each with a detailed example.

Algorithms

The StellarGraph library currently includes the following algorithms for graph machine learning:

Algorithm Description
GraphSAGE [1] Supports supervised as well as unsupervised representation learning, node classification/regression, and link prediction for homogeneous networks. The current implementation supports multiple aggregation methods, including mean, maxpool, meanpool, and attentional aggregators.
HinSAGE Extension of GraphSAGE algorithm to heterogeneous networks. Supports representation learning, node classification/regression, and link prediction/regression for heterogeneous graphs. The current implementation supports mean aggregation of neighbour nodes, taking into account their types and the types of links between them.
attri2vec [4] Supports node representation learning, node classification, and out-of-sample node link prediction for homogeneous graphs with node attributes.
Graph ATtention Network (GAT) [5] The GAT algorithm supports representation learning and node classification for homogeneous graphs. There are versions of the graph attention layer that support both sparse and dense adjacency matrices.
Graph Convolutional Network (GCN) [6] The GCN algorithm supports representation learning and node classification for homogeneous graphs. There are versions of the graph convolutional layer that support both sparse and dense adjacency matrices.
Cluster Graph Convolutional Network (Cluster-GCN) [10] An extension of the GCN algorithm supporting representation learning and node classification for homogeneous graphs. Cluster-GCN scales to larger graphs and can be used to train deeper GCN models using Stochastic Gradient Descent.
Simplified Graph Convolutional network (SGC) [7] The SGC network algorithm supports representation learning and node classification for homogeneous graphs. It is an extension of the GCN algorithm that smooths the graph to bring in more distant neighbours of nodes without using multiple layers.
(Approximate) Personalized Propagation of Neural Predictions (PPNP/APPNP) [9] The (A)PPNP algorithm supports fast and scalable representation learning and node classification for attributed homogeneous graphs. In a semi-supervised setting, first a multilayer neural network is trained using the node attributes as input. The predictions from the latter network are then diffused across the graph using a method based on Personalized PageRank.
Node2Vec [2] The Node2Vec and Deepwalk algorithms perform unsupervised representation learning for homogeneous networks, taking into account network structure while ignoring node attributes. The node2vec algorithm is implemented by combining StellarGraph's random walk generator with the word2vec algorithm from Gensim. Learned node representations can be used in downstream machine learning models implemented using Scikit-learn, Keras, TensorFlow or any other Python machine learning library.
Metapath2Vec [3] The metapath2vec algorithm performs unsupervised, metapath-guided representation learning for heterogeneous networks, taking into account network structure while ignoring node attributes. The implementation combines StellarGraph's metapath-guided random walk generator and Gensim word2vec algorithm. As with node2vec, the learned node representations (node embeddings) can be used in downstream machine learning models to solve tasks such as node classification, link prediction, etc, for heterogeneous networks.
Relational Graph Convolutional Network [11] The RGCN algorithm performs semi-supervised learning for node representation and node classification on knowledge graphs. RGCN extends GCN to directed graphs with multiple edge types and works with both sparse and dense adjacency matrices.
ComplEx[12] The ComplEx algorithm computes embeddings for nodes (entities) and edge types (relations) in knowledge graphs, and can use these for link prediction
GraphWave [13] GraphWave calculates unsupervised structural embeddings via wavelet diffusion through the graph.
Supervised Graph Classification A model for supervised graph classification based on GCN [6] layers and mean pooling readout.
Watch Your Step [14] The Watch Your Step algorithm computes node embeddings by using adjacency powers to simulate expected random walks.
Deep Graph Infomax [15] Deep Graph Infomax trains unsupervised GNNs to maximize the shared information between node level and graph level features.
Continuous-Time Dynamic Network Embeddings (CTDNE) [16] Supports time-respecting random walks which can be used in a similar way as in Node2Vec for unsupervised representation learning.
DistMult [17] The DistMult algorithm computes embeddings for nodes (entities) and edge types (relations) in knowledge graphs, and can use these for link prediction
DGCNN [18] The Deep Graph Convolutional Neural Network (DGCNN) algorithm for supervised graph classification.
TGCN [19] The GCN_LSTM model in StellarGraph follows the Temporal Graph Convolutional Network architecture proposed in the TGCN paper with a few enhancements in the layers architecture.

Installation

StellarGraph is a Python 3 library and we recommend using Python version 3.6. The required Python version can be downloaded and installed from python.org. Alternatively, use the Anaconda Python environment, available from anaconda.com.

The StellarGraph library can be installed from PyPI, from Anaconda Cloud, or directly from GitHub, as described below.

Install StellarGraph using PyPI:

To install StellarGraph library from PyPI using pip, execute the following command:

pip install stellargraph

Some of the examples require installing additional dependencies as well as stellargraph. To install these dependencies as well as StellarGraph using pip execute the following command:

pip install stellargraph[demos]

The community detection demos require python-igraph which is only available on some platforms. To install this in addition to the other demo requirements:

pip install stellargraph[demos,igraph]

Install StellarGraph in Anaconda Python:

The StellarGraph library is available an Anaconda Cloud and can be installed in Anaconda Python using the command line conda tool, execute the following command:

conda install -c stellargraph stellargraph

Install StellarGraph from GitHub source:

First, clone the StellarGraph repository using git:

git clone https://github.com/stellargraph/stellargraph.git

Then, cd to the StellarGraph folder, and install the library by executing the following commands:

cd stellargraph
pip install .

Some of the examples in the demos directory require installing additional dependencies as well as stellargraph. To install these dependencies as well as StellarGraph using pip execute the following command:

pip install .[demos]

Citing

StellarGraph is designed, developed and supported by CSIRO's Data61. If you use any part of this library in your research, please cite it using the following BibTex entry

@misc{StellarGraph,
  author = {CSIRO's Data61},
  title = {StellarGraph Machine Learning Library},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/stellargraph/stellargraph}},
}

References

  1. Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec. Neural Information Processing Systems (NIPS), 2017, (link webpage)

  2. Node2Vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, (link)

  3. Metapath2Vec: Scalable Representation Learning for Heterogeneous Networks. Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 135–144, 2017, (link)

  4. Attributed Network Embedding via Subspace Discovery. D. Zhang, Y. Jie, X. Zhu and C. Zhang, Data Mining and Knowledge Discovery, 2019, (link)

  5. Graph Attention Networks. P. Veličković et al. International Conference on Learning Representations (ICLR), 2018, (link)

  6. Graph Convolutional Networks (GCN): Semi-Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf, Max Welling. International Conference on Learning Representations (ICLR), 2017, (link)

  7. Simplifying Graph Convolutional Networks. F. Wu, T. Zhang, A. H. de Souza, C. Fifty, T. Yu, and K. Q. Weinberger. International Conference on Machine Learning (ICML), 2019, (link)

  8. Adversarial Examples on Graph Data: Deep Insights into Attack and Defense. H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu. IJCAI 2019, (link)

  9. Predict then propagate: Graph neural networks meet personalized PageRank. J. Klicpera, A. Bojchevski, A., and S. Günnemann, ICLR, 2019, arXiv:1810.05997.(link)

  10. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. W. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, and C. Hsiej, KDD, 2019, arXiv:1905.07953.(link)

  11. Modeling relational data with graph convolutional networks. M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, European Semantic Web Conference, 2018, arXiv:1609.02907 (link).

  12. Complex Embeddings for Simple Link Prediction. T. Trouillon, J. Welbl, S. Riedel, É. Gaussier and G. Bouchard, ICML, 2016. (link)

  13. Learning Structural Node Embeddings via Diffusion Wavelets. C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec, SIGKDD, 2018, arXiv:1710.10321 (link)

  14. Watch Your Step: Learning Node Embeddings via Graph Attention. S. Abu-El-Haija, B. Perozzi, R. Al-Rfou and A. Alemi, NIPS, 2018, arXiv:1710.09599 (link)

  15. Deep Graph Infomax. P. Veličković, W. Fedus, W. L. Hamilton, P. Lio, Y. Bengio, R. D. Hjelm. International Conference on Learning Representations (ICLR), 2019, arXiv:1809.10341, (link).

  16. Continuous-Time Dynamic Network Embeddings. Giang Hoang Nguyen, John Boaz Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungchul Kim. Proceedings of the 3rd International Workshop on Learning Representations for Big Networks (WWW BigNet) 2018. (link)

  17. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng, ICLR, 2015. arXiv:1412.6575 (link)

  18. An End-to-End Deep Learning Architecture for Graph Classification. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen, AAAI, 2018. (link)

  19. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. IEEE Transactions on Intelligent Transportation Systems, 2019. (link)

stellargraph's People

Contributors

adocherty avatar akinparkan avatar andife avatar annitrolla avatar anonymnous-gituser avatar cdawei avatar daokunzhang avatar geoffj-d61 avatar habiba-h avatar hd-chuong avatar huonw avatar jessclmoore avatar kalinin-sanja avatar kieranricardo avatar kjun9 avatar larsner avatar m0baxter avatar nalinbhardwaj avatar panteliselinas avatar pspeter avatar sbrugman avatar sgrigory avatar sktzwhj avatar sleepy-owl avatar thanh-nguyenmueller avatar thatlittleboy avatar theden avatar timpitman avatar wangzhen263 avatar youph avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stellargraph's Issues

Test link prediction demo on HIN

Description

The link prediction demo works on homogeneous datasets. We want to test whether it also works on heterogeneous datasets under the assumption that the latter will be treated as homogeneous.

User Story

As a: Research Enigneer
I want: to make sure that the link prediction demo work for both homogeneous and heterogeneous networks
so that: I can tackle more general analytics problems.

Done Checklist (Development)

  • Determine heterogeneous dataset for testing
  • Link prediction demo works with input the selected heterogeneous network.
  • Pull request

Graph splitting based on edge type to predict

Description

Data splitter for link prediction should be able to split the graph based on the type of the edge to predict. Also, it should be able to split based on an edge property. For example, we should be able to split based on timestamps if edge have it as a property.

User Story

As a: Research Engineer
I want: to prepare my data
so that: I can perform link prediction on HINs based on edge types and properties

Done Checklist (Research)

  • Implement data splitting based on edge type and/or edge property, e.g., timestamp
  • Pull request
  • Unit tests

Extend HinSAGE demo code for unsupervised learning

Description

Implement the wrappers / additional layers for unsupervised learning around the HinSAGE demo code. Create a working example using the risk net dataset.

User Story

As a: data scientist
I want: to run unsupervised learning using GraphSAGE
so that: I can transform my large dataset into node embeddings

Done Checklist

  • Peer Code Review Performed
  • Code well commented
  • Documentation in repo

Investigate Metapath2Vec paper for link prediction on heterogeneous graphs

Description

I want to understand the Metapath2Vec algorithm for representation learning in heterogeneous graphs.

User Story

As a: Research Engineer
I want: to understand the MetaPath2Vec algorithm for representation learning on heterogeneous graphs
so that: I can use it for node attribute inference and link prediction.

Done Checklist (Development)

  • Document differences between Metapath2Vec and Node2Vec algorithms
  • Determine scalability issues that are unique to Metapath2Vec
  • Search for reference implementation and, if found, run some experiments on test graphs to better understand its performance.

PoC for Unsupervised GraphSAGE

Description

Currently the GraphSage unsupervised method is not in the library. This task is to add a simple unsupervised GraphSage module in the stellar-ml library.

User Story

As a: Data Scientist
I want: everything that GraphSAGE offers
so that: I have freedom for my unsupervised method experiments

Done Checklist (Development)

  • Branch and Pull Request build on CI
  • Code well commented

Graphsage demo in StellarML Library

Description

Currently StellarML Library has some base classes. We have working Graphsage code from Kevin. We should implement a demo using the classes from StellarML, moving code to this style as required.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented
  • Documentation on Google Docs
  • Documentation in repo
  • Team demo
  • Mini-meetup talk
  • Stakeholder sign-off

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs
  • Documentation in repo
  • Team talk
  • Mini-meetup talk

Done Checklist (Bug)

  • Bug fixed
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented

Move link prediction demo from stellar-ml-sandbox to stellar-ml repo

Description

We need to move the demo for link prediction from the stellar-ml-sandbox repo to here.

User Story

As a: Research Engineer
I want: to have all my code relating to graph ML in one place
so that: I can more effectively develop the graph-ml library and share code with my team

Done Checklist (Development)

  • Moved code from stellar-ml-sandbox to stellar-ml repo
  • Pull request

Setup Travis for stellar-ml

Description

Setup continuous integration for automated tests in the library. Create a buildkite.yml file

Done Checklist

  • Triggered commits for BuildKite
  • Writing to build-bots

Collaborate with Platform Team on caching architecture

Description

The platform team is building an experimental stack. We need to ensure that this meets the needs of the ML and Data teams.

User Story

As a: IA ML dev
I want: ensure that I'm building in sync with the platform team
so that: there is no wasted effort

Done Checklist (Development)

  • Documentation on Google Docs

Write simple data feed to Graphsage for StellarML library

Description

Currently the Graphsage code from Kevin runs well, but requires a redis database and is slow for simple single computer testing.

As part of the StellarML library we want to pass data into tensorflow fast. Having a simple in-memory graphsage sampler would be a good start.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented
  • Documentation on Google Docs
  • Documentation in repo
  • Team demo
  • Mini-meetup talk
  • Stakeholder sign-off

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs
  • Documentation in repo
  • Team talk
  • Mini-meetup talk

Done Checklist (Bug)

  • Bug fixed
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented

Explore Aboleth for library design choices

Description

Can we borrow some design choices, e.g.: base classes & inheritance, layer compositions, pipelining
Link: https://github.com/data61/aboleth

Checklist

  • List of Aboleth base clases + description, perhaps indicating which base classes can be borrowed into our library
  • pseudo code for node2vec+logistic workflow with the graph ML library (as we imagine it)
  • pseudo code for GraphSAGE

Prepare YOW Data Experiment

Description

We need to get started with an interesting demonstration of machine learning on graphs for the YowData! conference in mid-May.
First, we need to decide on a dataset and problem.

Checklist

  • Write-up of dataset and problem options on google docs
  • Initial implementation of solving the selected problem on the dataset

Investigate Apache Tinkerpop and write GraphSAGE input preparation in gremlin-python

Description

Investigate Gremlin's viability to efficiently prepare inputs for GraphSAGE from a graph database as well as from local memory.

User Story

As a: data scientist
I want: to use gremlin to prepare inputs for my graph ML tasks.
so that: I can efficiently prepare batch inputs from various graph data sources.

Done Checklist (Development)

  • Code well commented
  • Documentation
  • Peer Code Review Performed
  • Mini-meetup talk

Run HIN Graphsage for Movielens 1M dataset with node attributes

Description

Kevin has written code for HIN GraphSage, I'd like to use this to make predictions on the Movielens 1M dataset with the same train/test split as other examples and using intrinsic user/movie features.

Done Checklist

  • Obtain performance numbers for node2vec features
  • Obtain performance numbers for intrinsic features
  • Documentation on Google Docs
  • Code Review

Build inductive NAI with GCN

Steps:
Given a full graph G:

  1. Randomly select a test set of nodes {V}_test, remove them from G, resulting in G_train = G - {V}_test
  2. Evaluate \hat{A}=\hat{A_train}, X_train from G_train
  3. Train GCN on G_train (feeding \hat{A_train}, X_train), save the trained model
  4. Evaluate \hat{A}, X for the full graph G, ensuring that the order of nodes in the intersection of G and G_train is preserved. I.e., update \hat{A_train}, X_train with test nodes to obtain the full graph's \hat{A}, X.
  5. Do a forward pass of the updated \hat{A}, X through the trained GCN model, predicting attributes for test nodes.
  6. Evaluate predictions by comparing them with true test node attributes

Repeat steps 1-5 to obtain average prediction metrics.

Scalable Node Attribute Inference for Graphs

Description

Build a scalable implementation of node attribute inference (NAI) for graphs, that works for at least 10M node graphs.

Value

Besides satisfying stakeholders' requirements for scalable attribute inference tasks on large graph datasets (thus expanding the NAI capability of Release 1), this should allow us to find an optimal scalable architecture for other ML tasks on graphs, such as link prediction and classification, recommendations, etc.

Extend HinSAGE for Link Prediction

Description

Use the documentation for HinSAGE link prediction to create a working link prediction example using the Paradise Papers dataset from the Data team.

User Story

As a: data scientist
I want: to use GraphSAGE layers for link prediction
so that: I can run scalable link prediction

Done Checklist (Development)

  • Working example with Alzheimer data
  • Code well commented
  • Documentation in repo
  • Peer Code Review Performed

Improve speed of 'local' sampling method for link prediction

Description

Sampling negative edges for link prediction using the nodes' local neighbourhood structure currently uses BFS that runs very slow if target nodes more than 5 edges away need to be sample. This issue is about replacing BFS with DFS to speed up the sampling algorithm.

User Story

As a: Research Engineer
I want: to run link experiments as fast as possible
so that: I maximise my efficiency.

Done Checklist (Development)

  • Updated source code to replace BFS for target nodes with DFS.
  • Pull request

[Dynamic Node2Vec] Investigate temporal updates of skipgram model

Description

Hooman is performing through experiments in how his dynamic random walk methods perform as a part of an end-to-end dynamic node2vec algorithm. There are difficulties with how the skip-gram model interacts with random walk updates. To get a good publication we need a description of the skip-gram model and some explanation of how different training update schemes will affect the model.

Done Checklist (Research)

  • Skip-gram model understanding
  • Documentation of skip-gram model
  • Documentation of skip-gram model update techniques

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

  • Slides on Google Docs
  • Give YOWData! presentation

Start moving code from link-prediction/utils to stellar ML library

Description

Some of the code in link-prediction/utils is mature enough to be integrated into the stellar ML library.

User Story

As a: Research Engineer
I want: to transfer mature code from demo code into the stellar graph ML library
so that: it can be re-used by other IA member and properly unit tested with CI.

Done Checklist (Development)

  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Peer Code Review Performed
  • Code well commented

Investigate Turi for graph processing in the ML library and platform

Description

Apple has open-sourced Turi which is a powerful graph processing framework. We should evaluate this tech with the following critera:

  • functionality
  • easy of use
  • easy of scalability
  • performance

This task would be to ingest a > 1M edge dataset and perform a set of graph tasks e.g. BFS/DFS, graph traversal, grabbing neighbours, random sampling.

User Story

As a: data scientist
I want: the graph processing part of the library to be fast and have lots of functionality
so that: I can move on to my tensorflow part to build my model

Done Checklist (Development)

  • Small experiment setup to run the evaluation, e.g. python script file
  • Documentation on Google Docs
  • Team demo

Git workflow demo

Description

Use example git repositories to understand the git and github workflow.

User Story

As a: Research engineer
I want: to understand git and github workflows
so that: I can work with the rest of the team to develop the ML library.

Done Checklist (Development)

  • Create test repos
  • Document the workflow for forking a repo, developing new code, and putting the code back to original repo via a pull request

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

  • Slides on Google Docs
  • Give YOWData! presentation

Graph Machine Learning library that is easy to use and contribute to

Description

Create a machine learning library in Python that is simple to use and simple to contribute too. The library should focus on the deep learning on graph algorithms, and not attempt to duplicate existing algorithms e.g. community detection, random forests etc.

Value

This library will allow Data Scientists and Researchers to create models over network datasets with minimal overhead. The goal is to allow a fast experiment cycle time, with minimal assumed knowledge. For Researchers, it should be a place to add new algorithms, get their algorithms seen, and supply functions for building new deep learning models on graphs.

Unit tests for link prediction demo

Description

We need unit tests for the link prediction utility classes.

User Story

As a: Research Engineer
I want: to make sure that changes to the link prediction code are not breaking existing functionality
so that: I can be certain that my code works correctly as it is expanded and improved.

Done Checklist (Development)

  • Create test directory for link prediction demo
  • Add test for link prediction code
  • Pull request

Investigate feature alignment for link prediction

Description

Investigate whether link features obtained from G_train and G_test are aligned, and whether/how this affects performance of the link prediction classifier.

We need to remove the confluence docs, it would be good to get the link prediction code from there.

Done Checklist (Research)

  • Experimental code/visualisations in 'alignment' branch in stellar-ml-sandbox/link-prediction
  • Documentation on Google Docs

YowData: Prepare spammers example

Description

I want to present an example a the YowData conference. The spammers dataset is an interesting case for applying graph ML. I want to prepare the spammers dataset and run node attribute inference on it.

Note:
Anna has done some investigation into using GraphSage and node2vec, so I will find out what has been done so far.

Done Checklist (Research)

  • Gave presentation at YowData

Tune node2vec parameters for link prediction demo

Description

Currently, the link prediction demo uses fixed parameter values, e.g., p=q=1 and several other parameters, for node2vec. We need to allow for these parameters to be tuned for improved link prediction performance.

User Story

As a: Research Engineer
I want: to tune the hyper-parameters of the node2vec algorithm
so that: I can achieve the highest performance in link prediction

Done Checklist (Development)

  • Code to tune node2vec hyper-parameters
  • Pull request

Organize external engadgements with Jia and Jesse

Description

To engage successfully with the research groups led by Jia and Jesse, we need to map out the research interests of both groups and match them with research questions of relevance to us.

User Story

As a: researcher collaborating with the Stellar project
I want: to research graph technologies that are of interest to Stellar
so that: we can get publications for our research and support from Stellar.

Done Checklist (Research)

  • Documentation of ongoing engadgements on Google Docs
  • Schedule of meetings with Jesse and Jia
  • Outline of scope of research.
  • AC review

Improve link prediction demo to handle non-integer node IDs

Description

The current implementation of the link prediction demo assumes that node IDs are integers. This is a restrictive assumption because for some datasets the node IDs are not integers. This causes the link prediction demo to fail with an Exception. We need to generalise the code so that it handles non-integer node IDs.

User Story

As a: Research Engineer
I want: to perform link prediction on a variety of network datasets stored in valid EPGM format
so that: I can be certain of the link prediction algorithms generalisation

Done Checklist (Development)

  • Update implementation to handle non-integer node IDs
  • Add unit tests
  • Pull request

Organise reference datasets

Description

Organise the reference datasets with a readme.

User Story

As an: IA team member
I want: to have easy access to well defined data sets
So that: I can test my code and minimise dataset confusion

Done Checklist (Bug)

  • Documented dataset procedure
  • Document

Create stellar-ml library structure, and populate with base classes

Description

  • Define the library's structure, base classes, methods, some helper functions, etc.
  • Create unit tests for all the library's base classes and helper functions

User Story

As a: developer of the library
I want: to see a clear structure of base classes to inherit from, their methods, and examples of composing workflows from them.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request pass unit tests on CI
  • Peer Code Review Performed
  • Code well commented
  • Documentation in repo

YowData: Investigate NetFlix prediction using N2V

Description

For the YowData conference, I'd like to present a recommender example. The Netflix prize dataset is well known, and a large amount of effort has been spent on getting results on this dataset. Good performance on this dataset would be impressive.

Recommender systems are often not thought about in terms of graphs. Therefore, posing this in a graph framework and solving it would be interesting. We can start by using node2vec to extract node embeddings and trying to predict the scores from this.

Done Checklist (Research)

  • Notes or slides on recommedations for movielens with node2vec
  • Code for recommendations for movielens with node2vec

Clean up Movielens using HIN Graphsage and move to demos

Description

The movielens recommender demo developed for YOWData could be useful for other problems (Anna would like to try it out to see if it will work for the medicare dataset.

Currently the code is rough and ready, so I'd like to tidy it up, add documentation and have a quick-to-run test case (say on movielens 100k).

Done Checklist (Research)

  • Code Review
  • Documentation in repo
  • Code well commented

Create baseline skeleton library

Description

Create initial dummy library using the documentation and pseudo code already accumulated

Done Checklist

  • Code
  • Pull Request
  • Unit Tests

Investigate message passing for node2vec

Description

Node2vec can be implemented in a message-passing framework. However, this is strictly only true for prediction. Can we also place training in a message passing framework?

User Story

As a: developer of the graphml library
I want: to train and predict using node2vec in a message-passing framework
so that: i can train node2vec in a one-step scalable fashion.

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs

Update EPGM class to use networkx v2.*

Description

Currently, our graph processing module requires an earlier version of networkx, e.g., 1.. Newer versions of networkx, namely 2., have changed how nodes and edges are returned to the user. We need to updated our code to work with the newer version of networkx because it is becoming more common and often causes problems.

User Story

As a: Research Engineer
I want: my network analytics library to work with the latest version of python modules
so that: I can make use of the latest developments and improvements in these modules

Done Checklist (Development)

  • Produced code for required functionality
  • Unit tests updated and new ones added as necessary
  • Pull request

Understand HIN GraphSage algorithm

Description

Kevin and Yuriy have implemented a HIN GraphSAGE algorithm. I'd like to understand the implementation.

There are are other heterogeneous GCN-like algorithms in the literature, read and understand them. How do they compare? Which algorithms could we implement for the ML library? Can we obtain code and test them on different problems? What input sampling strategies are required for each algorithm? How do training and prediction differ?

Done Checklist (Research)

  • Add different algorithms to documentation on Google Docs
  • Add sampling strategies to documentation on Google Docs

Improve data splitting code for link prediction

Description

The node splitter developed for the link prediction demo of issue #8 needs to be improved such that negative samples are more challenging, i.e., should not be randomly selected out of all pairs of disconnected nodes but rather of disconnected nodes that are nearby in the graph.

User Story

As a: Research Engineer
I want: to use my data to correctly evaluate my link prediction algorithm
so that: I am confident about its performance on unseen data.

Done Checklist (Development)

  • Edge splitter class with improved sampling algorithm
  • Integration of new edge splitter class with baseline link prediction demo
  • Pull Request

Prepare for GraphSAGE/HinSAGE usage during Hackathon

Description

Prepare for the Spotify hackathon to allow everyone to use GraphSAGE/HinSAGE with ease on the day. Investigate the dataset and prepare notes on any requirements such as AWS setup, input batch preparation code, etc.

User Story

As a: Hackathoner
I want: to run Stellar's graph ML algorithms during the Hackathon
so that: we can win the Spotify competition

Done Checklist (Development)

  • Documentation on Google Docs
  • Documentation in repo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.