Git Product home page Git Product logo

amusaafir / graph-scaling Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 861 KB

A sampling based method for scaling graph (network) data sets.

License: GNU General Public License v3.0

CMake 3.80% C++ 90.18% C 1.94% Cuda 3.58% Shell 0.50%
graphs graph-scaling graph-datasets graph-sampling sampling benchmark totally-induced-edge-sampling forest-fire-sampling random-edge-sampling random-node-sampling

graph-scaling's Introduction

Graph Scaling Tool

A Sampling-based Method for Scaling Graph Datasets

This tool allows users to scale a graph up or down using sampling as a basis for both operations.

Paper: https://dl.acm.org/doi/10.1145/3358960.3379144

Scaling Guidelines and Results

  • Refer to the scaling up guidelines to target certain property changes whenever scaling up a graph.

  • The scaling-results document shows the results of several scaling operations conducted on different graph datasets.

Usage

Compilation

Dependencies (install them by running install_dependencies.sh):

  1. googletest - Optional for compiling the test project.
  2. snap-stanford - Required for the auto-tuner.
mkdir build
cd build
cmake ../
make -j$NPROC

Single thread and machine

Running graph_scaling_tool without any parameter should prompt the user several input requests accordingly, including whether a scaling up or scaling down operation should be performed.

To perform one of the two operations directly, the user must specify at least:

  • The input graph path: -i (e.g., -i /home/user/graph.csv). Note that this can be any format, as long as the input graph is an edge list (where each vertex is represented numerically)
  • The output (folder) path: -o (e.g., -o /home/user/)

To directly perform a scaling down operation, the -s parameter is required to specify the preferred sample size (based on the number of vertices). This parameter should be between 0 and 1. By default, TIES is used as sampling algorithm. This can be changed by adding the -a option with randomedge, randomnode, ties or forestfire -v {sourcevertex}. If forest fire is used, replace {sourcevertex} with the source vertex.

Example scaling down: ./graph_scaling_tool -i /home/user/graph.txt -o /home/user/graph_output -s 0.4 -a ties

To directly perform a scaling up operation, the following parameters are required:

  • Scaling factor -u (based on the number of vertices)
  • Sample size -s for each sampled graph. This value should be between 0 and 1.
  • Sampling algorithm -a either: ties, randomedge, randomnode or forestfire -v {sourcevertex}.
  • Topology -t either: star, chain, ring or fullyconnected
  • Bridging type -b, either: random or high (degrees)
  • Number of interconnections between each sample: -n
  • Force directed bridges -d. Setting this to true would add directed bridges.

Example scaling up: ./graph_scaling_tool -i /home/user/graph.txt -o /home/user/graph_output -u 3.0 -s 0.5 -a ties -d false -n 10 -b random -t star

Parallel & distributed

Switch to the 'distributed' branch; compile the project using:

make release

Run on a cluster with:

prun -np <number of compute nodes> -v -1 -reserve <reservation id> -sge-script $PRUN_ETC/prun-openmpi sample <input file> <output file> <sampling fraction> <CPU mem limit (bytes, per node)> <GPU mem limit (bytes, per node)>

Example with 2 nodes, "wiki" input file, 0.5 sampling fraction, 32 GB CPU memory and 12 GB GPU memory limits:

prun -np 2 -v -1 -reserve <reservation id> -sge-script $PRUN_ETC/prun-openmpi sample OUTPUT_PATH 0.5 34359738368 12884901888

Note that the input file must be an edge list, ordered by source vertex, and then by destination vertex. The number of vertices and edges in the graph must be denoted at the beginning of the file, on a separate rule, like: "! nVertices nEdges" (without quotes) The input graph should not have 'missing' vertices (vertices without incoming/outgoing edges and are therefore not present in the input file).

graph-scaling's People

Contributors

amusaafir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

yuanhaoliu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.