Git Product home page Git Product logo

graphit-dsl / graphit Goto Github PK

View Code? Open in Web Editor NEW
366.0 21.0 44.0 8.69 MB

GraphIt - A High-Performance Domain Specific Language for Graph Analytics

Home Page: http://graphit-lang.org/

License: Other

CMake 0.15% C++ 86.19% Python 7.53% Emacs Lisp 4.78% C 0.59% Makefile 0.77%
compiler high-performance-computing graph-computing parallel-computing domain-specific-language graph-analytics machine-learning code-generation m

graphit's Introduction

GraphIt Domain Specific Language and Compiler Build Status

GraphIt is a high-performance Graph DSL. Our website has more detailed tutorials and documentations for the language.

Dependencies

To build GraphIt you need to install CMake 3.5.0 or greater. This dependency alone will allow you to build GraphIt and generate high-performance C++ implementations. Currently, we support both Python 2.7 and Python 3 for the end-to-end tests.

To compile the generated C++ implementations with support for parallelism, you need CILK and OPENMP. One easy way to set up both CILK and OPENMP is to use intel parallel compiler (icpc). The compiler is free for students. There are also open source CILK (g++ >= 5.3.0 with support for Cilk Plus), and OPENMP implementations.

(Optional) To use NUMA optimizations on multi-socket machines, libnuma needs to be installed (on Ubuntu, sudo apt-get install libnuma-dev). We do note, a good number of optimized implementations do not require enabling NUMA optimizations. You can give GraphIt a try even if you do not have libnuma installed.

(Optional) To use the python bindings for GraphIt, you need to install the following packages -

  • python3 (version >= 3.5)
  • scipy (can be installed using pip3)
  • pybind11 (can be installed using pip3)

If you are a mac user who recently upgraded to macOS Mojave, and are having issues with unable to find header files "string.h" or "wchar.h" when using cmake, c++ compiler, or the python scripts that uses the c++ compilers, maybe this post will help. As always, let us know if you have any issues with building and using GraphIt.

Build Graphit

To perform an out-of-tree build of Graphit do:

After you have cloned the directory:

    cd graphit
    mkdir build
    cd build
    cmake ..
    make

Currently, we do require the build directory to be in the root project directory for some unit tests to work. To run the C++ test suite do (all tests should pass):

    cd build/bin
    ./graphit_test

To run the Python end-to-end test suite:

Start from the root directory and change to the build directory

(All tests would pass, but some would generate error messages from the g++ compiler. This is expected.) The project tests should support both Python 2.x and Python 3.x.

    cd build
    python python_tests/test.py
    python python_tests/test_with_schedules.py

(Optional) To test the python bindings, the following extra commands can be run from the GraphIt root directory. You do NOT need this for compiling and running stand alone regular GraphIt programs.

    cd build
    export PYTHONPATH=.
    python3 python_tests/pybind_test.py

When running test_with_schedules.py, commands used for compiling GraphIt files, compiling the generated C++ file, and running the compiled binary file are printed. You can reproduce each test and examine the generated C++ files by typing the printed commands in the shell (make sure you are in the build/bin directory). You can also selectively enable a specific test using the TestSuite commands. We provide examples of enabling a subset of Python tests in the comments of the main function in test_with_schedules.py.

Note when running test.py, some error message may be printed during the run that are expected. We have expected to fail tests that print certain error messages. Please check the final output. test_with_schedules.py might take a few minutes to run.

Compile GraphIt Programs

GraphIt compiler currently generates a C++ output file from the .gt input GraphIt programs. To compile an input GraphIt file with schedules in the same file (assuming the build directory is in the root project directory).

    cd build/bin
    python graphitc.py -f ../../test/input_with_schedules/pagerank_benchmark.gt -o test.cpp

To compile an input algorithm file and another separate schedule file (some of the test files have hardcoded paths to test inputs, be sure to modify that or change the directory you run the compiled files)

The example below compiles the algorithm file (../../test/input/pagerank.gt), with a separate schedule file (../../test/input_with_schedules/pagerank_pull_parallel.gt)

   cd build/bin
   python graphitc.py -a ../../test/input/pagerank_with_filename_arg.gt -f ../../test/input_with_schedules/pagerank_pull_parallel.gt -o test.cpp

Compile and Run Generated C++ Programs

To compile a serial version, you can use reguar g++ with support of c++14 standard to compile the generated C++ file (assuming it is named test.cpp).

    # assuming you are still in the bin directory under build/bin. If not, just do cd build/bin from the root of the directory
    g++ -std=c++14 -I ../../src/runtime_lib/ -O3 test.cpp  -o test
    ./test ../../test/graphs/4.el

To compile a parallel version of the c++ program, you will need both CILK and OPENMP. OPENMP is required for programs using NUMA optimized schedule (configApplyNUMA enabled) and static parallel optimizations (static-vertex-parallel option in configApplyParallelization). All other programs can be compiled with CILK. For analyzing large graphs (e.g., twitter, friendster, webgraph) on NUMA machines, numacl -i all improves the parallel performance. For smaller graphs, such as LiveJournal and Road graphs, not using numactl can be faster.

    # assuming you are still in the bin directory under build/bin. If not, just do cd build/bin from the root of the directory

    # compile and run with CILK
      # icpc
      icpc -std=c++14 -I ../../src/runtime_lib/ -DCILK -O3 test.cpp -o test

      # g++ (gcc) with cilk support
      g++ -std=c++14 -I ../../src/runtime_lib/ -DCILK -fcilkplus -lcilkrts -O3 test.cpp -o test

      # to run the compiled binary on a small test graph, 4.el
      numactl -i all ./test ../../test/graphs/4.el

    # compile and run with OPENMP
      # icpc
      icpc -std=c++14 -I ../../src/runtime_lib/ -DOPENMP -qopenmp -O3 test.cpp -o test

      # g++ (gcc) with openmp support
      g++ -std=c++14 -I ../../src/runtime_lib/ -DOPENMP -fopenmp -O3 test.cpp -o test

      # to run the compiled binary on a small test graph, 4.el
      numactl -i all ./test ../../test/graphs/4.el

    # compile and run with NUMA optimizations (only works with OPENMP and needs libnuma).
      # Sometimes -lnuma will have to come after the test.cpp file
      # icpc
      icpc -std=c++14 -I ../../src/runtime_lib/ -DOPENMP -DNUMA -qopenmp  -O3 test.cpp -lnuma -o test

      # g++ (gcc)
      g++ -std=c++14 -I ../../src/runtime_lib/ -DOPENMP -DNUMA -fopenmp -O3 test.cpp -lnuma -o test

      # to run with NUMA enabled on a small test graph, 4.el
      OMP_PLACES=sockets ./test ../../test/graphs/4.el

You should see some running times printed. The pagerank example files require a command-line argument for the input graph file. If you see a segfault, then it probably means you did not specify an input graph.

Evaluate GraphIt's Performance

The algorithms we used for benchmarking, such as PageRank, PageRankDelta, BFS, Connected Components, Single Source Shortest Paths and Collaborative Filtering are in the apps directory. These files include ONLY the algorithm and NO schedules. You need to use the appropriate schedules for the specific algorithm and input graph to get the best performance.

Detailed instructions for replicating the OOPSLA 2018 GraphIt paper performance is here. In the OOPSLA paper (Table 8), we described the schedules used for each algorithm on each graph on a dual socket system with Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores and 48 hyper-threads. The system has 128GB of DDR3-1600 memory and 30 MB last level cache on each socket, and runs with Transparent Huge Pages (THP) enabled. The best schedule for a different machine can be different. You might need to try a few different set of schedules for the best performance.

Detailed instructions for replicating the results in our CGO 2020 Optimizing Ordered Graph Algorithms with GraphIt paper is here. We collected the results from the same hardware platform as our OOPSLA 18 paper.

If you just want to replicate the performance of our bucket fusion optimization for SSSP with delta stepping proposed in the CGO 2020 paper, you can also just use the GAP benchmark suite. We have since integrated our optimization into the GAP benchmark suite. You can find a bucket fusion optimized SSSP in the GAP repo here.

A Note on the Performance of SSSP We used weight 1 for all the weighted graphs in the original OOPSLA 2018 paper for convenience. We later realized that this decision impacted the performance of SSSP, especially on the road networks. Since then, we have used random weights for the social networks and original weights for the road networks in the CGO 2020 paper. Please refer to our CGO 2020 paper for the performance of SSSP with BellmanFord and DeltaStepping. We have also improved the performance of SSSP on road networks significantly with our new bucket fusion optimization. Please see this closed issue for more details.

In the schedules shown in Table 8 of the OOPSLA paper, the keyword ’Program’ and the continuation symbol ’->’ are omitted. ’ca’ is the abbreviation for ’configApply’. Note that configApplyNumSSG uses an integer parameter (X) which is dependent on the graph size and the cache size of a system. For example, the complete schedule used for CC on Twitter graph is the following (X is tuned to the cache size)

schedule:
    program->configApplyDirection("s1", "SparsePush-DensePull")->configApplyParallelization("s1", "dynamic-vertex-parallel")->configApplyDenseVertexSet("s1","bitvector", "src-vertexset", "DensePull");
    program->configApplyNumSSG("s1", "fixed-vertex-count",  X, "DensePull");

The test/input and test/input_with_schedules directories contain many examples of the algorithm and schedule files. Use them as references when writing your own schedule.

We provide more detailed instructions on evaluating the code generation and performance capability of GraphIt in graphit/graphit_eval/GraphIt_Evaluation_Guide.md. In the guide, we provide instructions for using a series of scripts that make it easier for people to evaluate GraphIt.

Input Graph Formats

GraphIt reuses GAPBS input formats. Specifically, we have tested with edge list file (.el), weighted edge list file (.wel), binary edge list (.sg), and weighted binary edge list (.wsg) formats. Users can use the converters in GAPBS (GAPBS/src/converter.cc) to convert other graph formats into the supported formats, or convert weighted and unweighted edge list files into their respective binary formats. We have provided sample input graph files in the graphit/test/graphs/ directory. The python tests use the sample input files.

Autotuning GraphIt Schedules

Please refer to README.md in graphit/autotune for more details. The autotuner is still somewhat experimental. Please read the instructions carefully before trying it out.

Publications

  • GraphIt-A High-Performance DSL for Graph Analytics OOPSLA 2018
  • Optimizing ordered graph algorithms with GraphIt CGO 2020

graphit's People

Contributors

ajaybrahmakshatriya avatar eafurst avatar ldhulipala avatar rbaghdadi avatar respitesage avatar siegfriedgreg avatar tugsbayasgalan avatar ykenny1 avatar yunmingzhang17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphit's Issues

is there a way to create edges as part of the algorithm?

I am trying to implement this algorithm: https://arxiv.org/pdf/1209.1688.pdf . In section 2.2 is suggests "Finally, to ensure that each row-sum is exactly one, we add a self-loop to each node".

Essentially I need to add an edge. I can't find a function to do this in examples/source/documentation.

My current idea is to either 1) include |V| self edges from python wrapper script, modify them in graphit 2) make a custom cpp function that adds a self edge somehow (would like guidance on this), 3) run the getOutDegrees() in one graphit script, then use those values from python to construct self edges, then feed them to another graphit script. 4) calculate getOutDegrees() in python, then put the self edges into the first script.

Would like some guidance on this.. tysm for very cool project.

issue with using scheduling language in a global scope

Not sure if it is allowed, but when I do #s1# some statement in a global scope, Scanner thinks there is a comma before and after s1, thus the parser errors out. But this seems to be working fine if the scheduling symbol is inside a function

Benchmark comparison against GPUs, FPGAs

First of all Thank you for this impressive software !!
But I won't use it until i've found a benhcmark comparison against https://developer.nvidia.com/nvgraph on a modern gpu.
A comparison against AMD gpu would be Nice too https://www.google.com/search?q=AMD+graph+algorithms+opencl&client=firefox-b&oq=AMD+graph+algorithms+opencl&gs_l=mobile-heirloom-serp.3..30i10.17126.20484.0.20921.13.12.1.0.0.1.480.2284.4j5j0j2j1.12.0....0...1c.1j4.34.mobile-heirloom-serp..7.6.593.WaazgBmCg_E
I had read that FPGAs are the fastest at graph analytics until graph processors become a reality (aren't they already ? https://www.google.com/search?q=graph+processor&client=firefox-b&oq=graph+processor&gs_l=mobile-heirloom-serp.3...5156.5156.0.5686.1.1.0.0.0.0.0.0..0.0....0...1c.1j4.34.mobile-heirloom-serp..1.0.0.QKVdUYR0fFo but I understanding that it is difficult to find such accelerators for testing)

And if GPUs or HSA are faster than cpus for graph analytics, you could think about the possibility of porting the graphit to gpu backends, i've heard it became « relatively » easy to make since gpu llvm backends are mature and SPIR-V is a thing.

Betweenness Centrality

From your paper six of the seven algorithms are included in the apps directory but Betweenness Centrality looks to be missing. Is this available for sharing? Thanks!

Support for 2D vectors

I'm trying to implement all pairs shortest path using GraphIt and I'm having trouble.

I'd like to have a 2D vector distance that for any two vertices u and v distance[u][v] holds a an integer. I'd like this vector to be accessible from within an apply and filter function.

GraphIt doesn't seem to have support for this feature. Can this be supported?
Thank you.

Crash on running bfs_verifier

Hello @AjayBrahmakshatriya

I am trying to run bfs_verifier on included sample graph 4_sym.el. This gives a crash error.

$ cd build/bin
$ ./bfs_verifier -f ../../test/graphs/4_sym.el 
# running BFS verifier 
# Segmentation fault (core dumped)

I was trying it on web-Stanford graph, both in edge list (.el) and matrix market (.mtx) formats. graphit_test passes all tests.

test_with_schedules.py test failed

Hello,
I am trying to install Graphit but I got several errors during 'test_with_schedules.py' tests.
From starting the error at "graphitc.py", line 76, "CalledProcessError", a tons of errors is printed out.
I guess it would be invoked due to specifying paths.
Could you help me to solve the issue?

Thank you,

Global Const Vertex

Need to make sure that a global vertex can be specified for vertex queries.

More algorithms

Very nice project as well as your paper.

I noticed the proposed algorithms are with number variables (int, float, double, vector<..>).
I was wondering if other variables like string could be added.

For example, to implement label propagation algorithm,
we may want:
const label: vector{Vertex}(string)

Is that possible and how to initialized the label variable?
Any comments would be appreciated. Thanks.

getVertexSetSize()

I noticed within tc.gt, if you call
src_nghs.getVertexSetSize()
this will always be 0, even though the true size is edges.getOutDegree(src), I noticed similar behavior on a file I'm working on right now. Is the size of src_nghs actually 0, or is it an issue w. getVertexSetSize()?

Issue with initializing a constant size vector.

Ideally,
const vertexArray: vector{Vertex}(int) = 0; should initialize a vector with size of number of vertices. But it initializes to a vector of size 0. We had to add this to make it work:
const vertices : vertexset{Vertex} = edges.getVertices();
const vertexArray: vector{Vertex}(int) = 0;

The ICPC link seems to be broken

Hi!
I am a student who is learning to use graphit. I noticed that the link to ICPC, the compiler that supports Cilk and OpenMP, had been changed to a download link to OneAPI. And the download link of ICPC seems to be hard to find. Could you send me a copy if it is convenient for you? Also, does Graphit support OneAPI? and is it possible to install Cilk and OpenMP separately?

Incorrect results on varying threadcount for pagerank algorithm

Hi,
I generated a test.cpp using the command: python graphitc.py -a ../../test/input/pagerank_with_filename_arg.gt -f ../../test/input_with_schedules/pagerank_pull_parallel.gt -o test.cpp
I was able to compile and run the same using OpenMP flags of varying thread count from 1 to 24. The error value(part of pagerank algorithm) it prints is the same for varying numbers of threads for a given input graph.

Similarly I generated another test.cpp using the command: python graphitc.py -a ../../test/input/pagerank_with_filename_arg.gt -f ../../test/input_with_schedules/pagerank_push_parallel.gt -o test.cpp
The schedule file is push instead of pull here. I was able to compile and run using the OpenMP flags of varying thread count 1 to 24. The error value it prints was the same for the given input graph with number of threads = 1. But with an increase in the thread count the error value is changing for push-based scheduling which should not be the case, right?

Kindly help in resolving the issue. Thank you

New operator for vector[size]

Currently the new operator for vectors works fine with Vertex type but not for a constant size vector.

For example the following works,

var vec: vector{Vertex}(int) = new vector{Vertex}(int)();

But this doesn't work -

var vec: vector[10](int) = new vector[10](int)();

and returns the error -

Error: expected '{' but got '[', at 1

SSSP implementation in Ligra and Galois

Hi Yunming,
I've been testing the sssp implementation of Ligra and Galois on a 10-thread machine and a 24-thread machine. It shows that Bellman-ford algorithm in Ligra takes more than 200 seconds on both machines for solving a single SSSP query on USA road network.
The results of Delta-stepping on Galois is roughly 2 to 3 seconds.
As you mentioned in the Paper, "We run Galois with the Bellman-Ford algorithm so that the algorithms are the same across systems." would like to share how is the implementation you used for testing GraphIt with Ligra and Galois? I suppose it would be similar to Ligra that takes about hundreds of seconds?

image

C++ Test Suite Infinite Runtime on M1 Mac

Hi there,

When I was trying to run the C++ test suite, infinite runtime occurs at the SetCover_test in RumtimeLibTest. The last while loop in this test is causing this issue, which is around line 538. Since I am able to pass the rest of tests, I guess something unique in this test was the reason. Perhaps this test use some technique which is not supported by the M1 chip?

Thank you for your feedback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.