Git Product home page Git Product logo

scalemine's Introduction

ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

##Overview:

ScaleMine is a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces a novel two-phase approach. The first phase is approximate; it quickly identifies subgraphs that are frequent with high probability, while collecting various statistics. The second phase computes the exact solution by employing the results of the approximation to achieve good load balance; prune the search space; generate efficient execution plans; and guide intra-task parallelism.

For more details, visit http://cloud.kaust.edu.sa/Pages/scalemine.aspx

If you use ScaleMine in your research, please cite our paper:

@inproceedings{hamid2016scalemine,
 title={ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph},
 author={Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat and Fuad Jamour},
 booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
 pages={11},
 year={2016},
 organization={ACM}
}

##Contents:

README ...................  This file
LICENSE.txt ..............  License file (Open Source)
makefile .................  build ScaleMine binary files
Datasets/ ................  Example graphs
SRC_*/ ...................  Directory containing ScaleMine source files

Dependencies

There are a few dependencies which must be satisfied in order to compile and run ScaleMine.

  • build-essential and g++ (>= 4.4.7) [Required]

    • Needed for compiling ScaleMine.
  • openssh-server [Required]

    • Required to initialize MPI and establish connections among compute nodes.
  • MPICH2 [Required]

    • AdHash uses MPI for inter-node communication. Open MPI is not tested with ScaleMine.

##Installation:

  • install MPI and Boost libraries on the target machine
  • Uncompress ScaleMine using any compression tool
  • Build ScaleMine using the "compile.sh" script file

##Running: ###Single Machine Mode: Run ScaleMine using the following command:

mpirun -n N pfsm -file GRAPH_FILE -freq F -threads T

N: the number of MPI computation nodes, make sure that there is at lease one for th emaster and one for a worker. Best practice is to have one computation node at each separate machine, then for each machine set a number of parallel threads. GRAPH_FILE: the input graph file name, the supported graph format is .lg F: the user-give support threshold T: the number of threads per compute node

Example:

Use the following command:

mpirun -n 2 pfsm -file ./Datasets/patent_citations.lg -freq 28000 -threads 4

to mine teh patent_citation graph for subgraphs having support larger than or equal to 28000 using 2 compute nodes; 1 master and 1 worker, the worker has 4 threads.

###Distributed Mode For running on a supercomputer using SLURM job scheduler:

#!/bin/bash
#SBATCH --account=user-account
#SBATCH --job-name=pfsm
#SBATCH --output=/output/mining.out
#SBATCH --error=/output/mining.err
#SBATCH --time=01:00:00
#SBATCH --threads-per-core=1
#SBATCH --nodes=256
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
#SBATCH --cpus-per-task=32
export OMP_NUM_THREADS=32
export MKL_NUM_THREADS=32
export MPICH_MAX_THREAD_SAFETY=multiple
srun --ntasks=256 pfsm -file /Datasets/patent_citations.lg -freq 28000 -threads 32

##Output:

ScaleMine outputs the list of frequents subgraphs on the standard output. Also, the elapsed time is returned at the end.

##License: ScaleMine is a free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

ScaleMine is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with ScaleMine. If not, see http://www.gnu.org/licenses/.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.