Git Product home page Git Product logo

largevis's Introduction

#LargeVis This is the official implementation of the LargeVis model by the original authors, which is used to visualize large-scale and high-dimensional data (Tang, Liu, Zhang and Mei). It now supports visualizing both high-dimensional feature vectors and networks. The package also contains a very efficient algorithm for constructing K-nearest neighbor graph (K-NNG).

Contact person: Jian Tang, [email protected]. This work is done when the author is in Microsoft Research Asia.

##Install Both C++ source codes and Python wrapper are provided on Linux, OS X and Windows. To install the package, external packages are required, including GSL (GNU Scientific Library) on Linux and OS X or BOOST on Windows for generating random numbers.

####Linux Compile the source files via:

g++ LargeVis.cpp main.cpp -o LargeVis -lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math

To install the Python wrapper, modify setup.py to make sure that the GSL path is correctly set and then run sudo python setup.py install.

####OS X Install gsl using Homebrew:

brew install gsl

Modify line 347 of annoylib.h to change lseek64 to lseek. Then compile the source files (in the Linux folder) via:

g++ LargeVis.cpp main.cpp -o LargeVis -lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math -L/usr/local/lib -I/usr/local/include

To install the Python wrapper, run sudo python setup.py install.

####Windows To compile the source files, use Microsoft Visual Studio, where you need to set the BOOST path.

To install the Python wrapper, modify setup.py to make sure that the BOOST path is correctly set and then run python setup.py install.

##Usage LargeVis is suitable for visualizing both high-dimensional feature vectors and networks. For high-dimensional feature vectors, the format of input file should be as follows: the first line specifies the number of feature vectors and the dimensionality (500 vectors with 10 dimensions in the following example), and each of the next 500 lines describes one feature vector with 10 float numbers.

500 10
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
...
...
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

For networks, each line of the input file is a DIRECTED edge. For each undirected edge, users must use TWO DIRECTED edges to represent it. For example,

0 1 2.5
1 0 2.5
2 5 4.5
5 2 4.5
3 10 3.0
...
...
495 498 1.5

For C++ executable file,

./LargeVis -input -output

or for Python,

python LargeVis_run.py -input -output
  • -input: Input file of feature vectors or networks (see the Example folders for input format).
  • -output: Output file of low-dimensional representations.

Besides the two parameters, other optional parameters include:

  • -fea: specify whether the input file is high-dimensional feature vectors (1) or networks (0). Default is 1.
  • -threads: Number of threads. Default is 8.
  • -outdim: The lower dimensionality LargesVis learns for visualization (usually 2 or 3). Default is 2.
  • -samples: Number of edge samples for graph layout (in millions). Default is set to data size / 100 (million).
  • -prop: Number of times for neighbor propagations in the state of K-NNG construction, usually less than 3. Default is 3.
  • -alpha: Initial learning rate. Default is 1.0.
  • -trees: Number of random-projection trees used for constructing K-NNG. 50 is sufficient for most cases unless you are dealing with very large datasets (e.g. data size over 5 million), and less trees are suitable for smaller datasets. Default is set according to the data size.
  • -neg: Number of negative samples used for negative sampling. Default is 5.
  • -neigh: Number of neighbors (K) in K-NNG, which is usually set as three times of perplexity. Default is 150.
  • -gamma: The weights assigned to negative edges. Default is 7.
  • -perp: The perplexity used for deciding edge weights in K-NNG. Default is 50.

##Examples We provide some examples including MNIST(high-dimensional feature vectors) and CondMat(networks) in the Examples/ folder.

For example, to visualize the MNIST dataset,

python LargeVis_run.py -input mnist_vec784D.txt -output mnist_vec2D.txt -threads 16
python plot.py -input mnist_vec2D.txt -label mnist_label.txt -output mnist_vec2D_plot

plot of mnist

Please cite the following paper if you use LargeVis to visualize your data. ##Citation

@inproceedings{tang2016visualizing,
  title={Visualizing Large-scale and High-dimensional Data},
  author={Tang, Jian and Liu, Jingzhou and Zhang, Ming and Mei, Qiaozhu},
  booktitle={Proceedings of the 25th International Conference on World Wide Web},
  pages={287--297},
  year={2016},
  organization={International World Wide Web Conferences Steering Committee}
}

##Acknowledgement Some methods of this package are from a previous work of the LargeVis authors, LINE (Large-scale Information Network Embedding).

largevis's People

Contributors

dmartinpro avatar kylemcdonald avatar lferry007 avatar lmcinnes avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.