Git Product home page Git Product logo

age's Introduction

This program (AGE) implements an active learning for graph embedding framework, as proposed in the following paper.
If you use it for scientific experiments, please cite this paper:
@article{DBLP:journals/corr/CaiZC17,
  author    = {HongYun Cai and
               Vincent Wenchen Zheng and
               Kevin Chen{-}Chuan Chang},
  title     = {Active Learning for Graph Embedding},
  journal   = {CoRR},
  volume    = {abs/1705.05085},
  year      = {2017},
  url       = {https://arxiv.org/abs/1705.05085},
  timestamp = {Mon, 15 May 2017 06:49:04 GMT}
}

The code has been tested under Ubuntu 16.04 LTS with Intel Xeon(R) CPU E5-1620 @3.50GHz*8 and 16G memory.


============== *** Installation *** ============== 
python setup.py install

============== *** Requirements *** ============== 
tensorflow (>0.12)
networkx
Graph convolutional network (Kipf and Welling, ICLR 2017): https://github.com/tkipf/gcn

============== *** Data *** ==============
In order to use your own data, you have to provide

an N by N adjacency matrix (N is the number of nodes),
an N by D feature matrix (D is the number of features per node), and
an N by E binary label matrix (E is the number of classes).
Have a look at the load_data() function in utils.py for an example.

In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: http://linqs.cs.umd.edu/projects/projects/lbc/. In our version (see data folder) we use dataset splits provided by https://github.com/kimiyoung/planetoid (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi-Supervised Learning with Graph Embeddings, ICML 2016) to load the whole dataset, and use the same test data as theirs.

The validation node instances are randomly sampled from the non-test nodes set. We randomly generate 10 validation sets for each dataset and the node indexes are stored in "source/datasetname/val_idxa.txt" (where a is the validation set id, range within [0,10]).

The initially labeled nodes are randomly sampled from the non-test and non-train nodes set. Given the C (the number of classes in this dataset) and a predefined L, AGE will randomly sample L nodes from each class as the initially labeled nodes (so there are C*L initial labeled nodes in total). 

============== *** Run the Program *** ==============
1. First generate the graph centrality score for each node as follows.
Command: 
python get_graph_centrality.py datasetname 
e.g., python get_graph_centrality.py citeseer
Parameteres:
datasetname: denote the dataset to process
Output:
The centality scores for each node (same order as in graph) are stored in "res/datasetname/graphcentrality/normcen"
Note:
We adopt PageRank Centrality in this work. You can try other centrality measurements by modifing function "centralissimo()" in file "get_graph_centrality.py".
2. Run the AGE algorithm to actively select nodes to label during the graph embedding process and record the MacroF1 and MicroF1 for node classification
Command:
python train_entropy_density_graphcentral_ts.py validation_id nb_initial_labelled_nodes_per_class class_nb datasetname
e.g., python train_entropy_density_graphcentral_ts.py 0 4 6 citeseer
Parameters:
validation_id: the validation set id, refering to the id listed in "source/datasetname/val_idxa.txt"
nb_initial_labelled_nodes_per_class: number of the initial labelled nodes per class, we use four in this work
class_nb: number of class for each dataset
datasetname: the name of the dataset to process

age's People

Watchers

James Cloos avatar  avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.