Git Product home page Git Product logo

line's Introduction

LINE: Large-scale information network embedding

##Introduction

This is the LINE toolkit developed for embedding very large-scale information networks. It is suitable to a variety of networks including directed, undirected, binary or weighted edges. The LINE model is quite efficient, which is able to embed a network with millions of vertices and billions of edges on a single machine within a few hours.

Contact: Jian Tang, Microsoft Research Asia, [email protected], [email protected]
Project page: https://sites.google.com/site/pkujiantang/line

##Usage

We provide both the Windows and LINUX versions. To complile the souce codes, some external packages are required, which are used to generate random numbers for the edge-sampling algorithm in the LINE model. For Windows version, the BOOST package is used and can be downloaded at http://www.boost.org/; for LINUX, the GSL package is used and can be downloaded at http://www.gnu.org/software/gsl/

Network Input

The input of a network consists of the edges in the network. Each line of the input file represents a DIRECTED edge in the network, which is specified as the format "source_node target_node weight" (can be either separated by blank or tab). For each undirected edge, users must use TWO DIRECTED edges to represent it. Here is an input example of a word co-occurrence network:

good the 3
the good 3
good bad 1
bad good 1
bad of 4
of bad 4

Run

./line -train network_file -output embedding_file -binary 1 -size 200 -order 2 -negative 5 -samples 100 -rho 0.025 -threads 20
  • -train, the input file of a network;
  • -output, the output file of the embedding;
  • -binary, whether saving the output file in binary mode; the default is 0 (off);
  • -size, the dimension of the embedding; the default is 100;
  • -order, the order of the proximity used; 1 for first order, 2 for second order; the default is 2;
  • -negative, the number of negative samples used in negative sampling; the deault is 5;
  • -samples, the total number of training samples (*Million);
  • -rho, the starting value of the learning rate; the default is 0.025;
  • -threads, the total number of threads used; the default is 1.

Files in the folder

  • line.cpp, the souce code of the LINE;
  • reconstruct.cpp, the code used for reconstructing the sparse networks into dense ones, which is described in Section 4.3;
  • normalize.cpp, the code for normalizing the embeddings (l2 normalization);
  • concatenate.cpp, the code for concatenating the embeddings with 1st-order and 2nd-order;
  • train_youtube.bat/train_youtube.sh, the demo to train the embeddings on the reconstructed Youtube network. The orginal Youtube network is available at http://socialnetworks.mpi-sws.mpg.de/data/youtube-links.txt.gz

##Citation

@inproceedings{tang2015line,
  title={LINE: Large-scale Information Network Embedding.},
  author={Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu},
  booktitle={WWW},
  year={2015},
  organization={ACM}
}

line's People

Contributors

tangjianpku avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.