alibaba / libgrape-lite Goto Github PK

View Code? Open in Web Editor NEW

365.0 25.0 89.0 9.76 MB

🍇 A C++ library for parallel graph processing (GRAPE) 🍇

Home Page: https://alibaba.github.io/libgrape-lite/

License: Apache License 2.0

CMake 2.55% C++ 93.29% Shell 2.07% Java 2.09%

graph cxx library parallel mpi gnn graph-theory graph-algorithms header-only grape

libgrape-lite's Introduction

libgrape-lite

A C++ library for parallel graph processing

libgrape-lite is a C++ library from Alibaba for parallel graph processing. It differs from prior systems in its ability to parallelize sequential graph algorithms as a whole by following the PIE programming model from GRAPE. Sequential algorithms can be easily "plugged into" libgrape-lite with only minor changes and get parallelized to handle large graphs efficiently. In addition to the ease of programming, libgrape-lite is designed to be highly efficient and flexible, to cope the scale, variety and complexity from real-life graph applications.

Building libgrape-lite

Dependencies

libgrape-lite is developed and tested on CentOS 7. It should also work on other unix-like distributions. Building libgrape-lite requires the following softwares installed as dependencies.

CMake (>=2.8)
A modern C++ compiler compliant with C++-11 standard. (g++ >= 4.8.1 or clang++ >= 3.3)
MPICH (>= 2.1.4) or OpenMPI (>= 3.0.0)
glog (>= 0.3.4)

Here are the dependencies for optional features:

jemalloc (>= 5.0.0) for better memory allocation;
Doxygen (>= 1.8) for generating documentation;
Linux HUGE_PAGES support, for better performance.
CUDA (>= 10.0) for GPU-based graph processing.
NCCL (>= 2.7) for multi-GPU communication.

Extra dependencies are required by examples:

gflags (>= 2.2.0);
Apache Kafka (>= 2.3.0);
librdkafka(>= 0.11.3);

Building libgrape-lite and examples

Once the required dependencies have been installed, go to the root directory of libgrape-lite and do a out-of-source build using CMake.

mkdir build && cd build
cmake ..
make -j

The building targets include a shared/static library, and two sets of examples: analytical_apps and a gnn_sampler.

Alternatively, you can build a particular target with command:

make libgrape-lite # or
make analytical_apps # or
make gnn_sampler

Building libgrape-lite with GPU support

libgrape-lite supports deploying graph algorithms to GPUs. When CUDA is detected on the machine and NCCL >= 2.7, GPU support will be enabled automatically.

Running libgrape-lite applications

Graph format

The input of libgrape-lite is formatted following the LDBC Graph Analytics benchmark, with two files for each graph, a .v file for vertices with 1 or 2 columns, which are a vertex_id and optionally followed by the data assigned to the vertex; and a .e file for edges with 2 or 3 columns, representing source, destination and optionally the data on the edge, correspondingly. See sample files p2p-31.v and p2p-31.e under the dataset directory.

Example applications

libgrape-lite provides six algorithms from the LDBC benchmark as examples. The deterministic algorithms are, single-source shortest path(SSSP), connected component(WCC), PageRank, local clustering coefficient(LCC), community detection of label propagation(CDLP), and breadth first search(BFS).

To run a specific analytical application, users may use command like this:

# run single-source shortest path with 4 workers in local.
mpirun -n 4 ./run_app --vfile ../dataset/p2p-31.v --efile ../dataset/p2p-31.e --application sssp --sssp_source 6 --out_prefix ./output_sssp --directed

# or run connected component with 4 workers on a cluster.
# HOSTFILE provides a list of hosts where MPI processes are launched. 
mpirun -n 4 -hostfile HOSTFILE ./run_app --application=wcc --vfile ../dataset/p2p-31.v --efile ../dataset/p2p-31.e --out_prefix ./output_wcc

# or run breadth-first search with 8 workers in a multi-GPU server.
mpirun -n 8 ./run_cuda_app --application=bfs --lb=cm --bfs_source 6 --vfile ../dataset/p2p-31.v --efile ../dataset/p2p-31.e --out_prefix ./output_wcc

# see more flags info.
./run_app --help

LDBC benchmarking

The analytical applications support the LDBC Analytical Benchmark suite with the provided ldbc_driver. Please refer to ldbc_driver for more details. The benchmark results for libgrape-lite and other state-of-the-art systems could be found here.

GNN sampler

In addition to offline graph analytics, libgrape-lite could also be utilized to handle more complex graph tasks. A sampler for GNN training/inference on dynamic graphs (taking graph changes and queries, and producing results via Kafka) is included as an example. Please refer to examples/gnn_sampler for more details.

GPU-based graph analytics

libgrape-lite also supports graph analytics on multi-GPU servers. Unlike CPUs, GPUs have more-but-weaker cores, making load balancing the key to high-performance sparse graph processing on GPUs. libgrape-lite provides multiple load balancing strategies on GPUs (wm, cm, cta, and strict). libgrape-lite adopts NCCL to handle communication between multiple GPUs. With GPU acceleration, libgrape-lite can obtain similar performance for a 4-node CPU cluster with a single GPU. The detailed benchmark results of libgrape-lite on GPUs could also be found here.

Documentation

Documentation is generated using Doxygen. Users can build doxygen documentation in the build directory using:

cd build
make doc
# open docs/index.html

The latest version of online documentation can be found at https://alibaba.github.io/libgrape-lite

License

libgrape-lite is distributed under Apache License 2.0. Please note that third-party libraries may not have the same license as libgrape-lite.

Acknowledgements

flat_hash_map, an efficient hashmap implementation;
granula, a tool for gathering performance information for LDBC Benchmark;
xoroshiro, a pseudo-random number generator;
threadpool, a concise C++11 Thread Pool implementation.

Publications

Wenfei Fan, Jingbo Xu, Wenyuan Yu, Jingren Zhou, Xiaojian Luo, Ping Lu, Qiang Yin, Yang Cao, and Ruiqi Xu. Parallelizing Sequential Graph Computations. ACM Transactions on Database Systems (TODS) 43(4): 18:1-18:39.
Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang. GRAPE: Parallelizing Sequential Graph Computations. The 43rd International Conference on Very Large Data Bases (VLDB), demo, 2017 (the Best Demo Award).
Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. Parallelizing Sequential Graph Computations, ACM SIG Conference on Management of Data (SIGMOD), 2017 (the Best Paper Award).

Please cite the following paper in your publications if GRAPE or this repo helps your research.

@inproceedings{10.1145/3035918.3035942,
    author = {Fan, Wenfei and Xu, Jingbo and Wu, Yinghui and Yu, Wenyuan and Jiang, Jiaxin and Zheng, Zeyu and Zhang, Bohan and Cao, Yang and Tian, Chao},
    title = {Parallelizing Sequential Graph Computations},
    year = {2017},
    isbn = {9781450341974},
    publisher = {Association for Computing Machinery},
    url = {https://doi.org/10.1145/3035918.3035942},
    doi = {10.1145/3035918.3035942},
    booktitle = {Proceedings of the 2017 ACM International Conference on Management of Data},
    pages = {495–510},
    numpages = {16},
    location = {Chicago, Illinois, USA},
    series = {SIGMOD '17}
}

Getting involved

Read contribution guide.
Join in the Slack channel
Please report bugs by submitting a GitHub issue.
Submit contributions using pull requests.

Thank you in advance for your contributions!

libgrape-lite's People

Contributors

Stargazers

Watchers

libgrape-lite's Issues

Update README

with emphasis on GPU acceleration.
with a badge link to GraphScope

[BUG] cmake failed to generate Makefile without librdkafka

Describe the bug
Building examples should be optional when dependencies are not met.

👍 Without gflags, the build can still proceed without producing examples as targets.
👎 But when librdkafka is not installed, cmake fails without producing a Makefile.

To Reproduce
try cmake .. without librdkafka installed.

Expected behavior

an option should be added to CMakeLists for users to configure if examples targets shall be generated
turn off the option if any of examples' dependencies were not met, and still produce the Makefile without examples

THe performance with GPU backend

Do you have questions or need support? Please describe.
I run the libgrapelite on A100 with datasets graph500-26 using BFS, the results are as follow:

load graph: 1080.76 sec
load application: 0.341124 sec
run algorithm: 0.23278 sec
print output: 57.175 sec
It's not faster than on the CPU ，what's wrong with it ? Is it OK ? According to the graph500-benchmark list 'https://graph500.org/?page_id=12' , the GPU performance with BFS -graph500-26 can be as high as 319.061 GTEPS , while in libgrape-lite , this results can be computed as 22616/0.23=4.66GTEPS, is it too small ?

Additional context
Add any other context about the question here.

Add CI of benchmark.

Use self-hosted runner, used to report performance number after each PR get merged.

[FEATURE] Add SendMsgThroughEdges overload to send gid of vertices

Is your feature request related to a problem? Please describe.
Currently, only SyncStateOnOuterVertex can sync gid of vertices. I would like to add this feature in SendMsgThroughEdges since several applications need this feature.

Describe the solution you'd like
Add SendMsgThroughEdges overload to send gid of vertices.

[BUG] Segmentation fault when VDATA_T == std::string and the length of input vertex label is larger than 22

Describe the bug
Segmentation fault when VDATA_T == std::string and the length of input vertex label is larger than 22.

To Reproduce
To reproduce the bug, you may

set VDATA_T to std::string at run.h:173:
using GraphType = ImmutableEdgecutFragment<OID_T, VID_T, std::string, double>;
and change the first line of ./dataset/p2p-31.v from 1 7 to 1 7777777777777777777777

Expected behavior
The graph should be loaded successfully.

Additional context

This problem is caused at ev_fragment_loader.h:181:
basic_fragment_loader_.ConstructFragment(fragment);

Can the distributed framework support encryption？

Which framework does Grape use for distributed computing?
Does it support encryption, if not, hwo can we implement encryption or use other rpc framework like grpc?

[BUG] Edge doesn't have member set_src()/set_dst() when EDATA_T is EmptyType

Describe the bug
Edge doesn't have member set_src()/set_dst() in the following template

template <typename VID_T>
class Edge<VID_T, EmptyType>

The benchmark data set link is not valid!

I want to reproduce the result in Performance.md, but the datasets in website link https://graphalytics.org/datasets offered in Performance.md can't be download . I want to get some datasets like graph500-22 or others . Where can I get them?

Where is the implementation of Ingress dynamic graph calculation?

[FEATURE] Providing getter methods for `host_id` in comm_spec

Adding methods in comm_spec to return the host_id for this process(worker), or return the worker_list on a single host.

C++线程池推荐

HI，考虑尝试使用这款图框架中的线程池替换现有版本么

https://github.com/ChunelFeng/CGraph/tree/main/src/UtilsCtrl/ThreadPool

实现机制层面，决定了这款ThreadPool在包含现有线程池功能的基础上，拥有更好的调度性能。

[BUG] Failed to instantiate fragments with VDATA_T!=EmptyType.

Describe the bug

When I instantiate a ImmutableEdgecutFragment<int64_t, uint32_t, double, double>, the compiler reports a compilation error:

No matching function for call to 'grape::ImmutableEdgecutFragment<long int, unsigned, double, double, (grape::LoadStrategy)2>::Serialize(const string&)'
fragment->Serialize(serialization_prefix);

Expected behavior

Fix the compilation error.

[BUG]Compilation error

Describe the bug
I run “make” then got error !

To Reproduce
To help us reproducing this bug, please provide information below:

Your Operation System version (uname -a): Linux eff814a706ff 5.4.0-109-generic #123~18.04.1-Ubuntu SMP Fri Apr 8 09:48:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
The version of libgrape-lite you use: latest
Versions of crucial packages, such as mpi: mpirun (Open MPI) 4.1.2
Full stack of the error.
Minimized code to reproduce the error.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

[BUG] Infinite iterations in pagerank algo when max_round <= 0

Describe the bug
when set pr_mr with 0 or negtive integer in pagerank algo, it will results to infinite iterations .

To Reproduce
Any OS with:
mpirun -n 4 ./run_app --vfile ../dataset/p2p-31.v --efile ../dataset/p2p-31.e --application pagerank -pr_mr 0 --out_prefix ./output_pr --directed

[BUG] Wrong number of vertices when the vertex does not appear in the edge file

Describe the bug

When the VDATA_T is grape::EmptyType, the vertex will not be added to GlobalVertexMap.

To Reproduce
example.e

0 1
3 1

example.v

auto inner_vertices = frag.InnerVertices();

LOG(INFO) << "|V|: " << inner_vertices.size();
// THE OUTPUT IS
// I1225 19:43:47.534154 288427520 pagerank.h:59] |V|: 3

Expected behavior
The output should be 4 instead of 3.

Additional context
Note: Fix the bug may increase the memory usage since all the vertices have to be added for every worker. Before the fix, we only need to add vertices that cross the fragment.

Share vertex map with the workers on the same host using shared memory

Is your feature request related to a problem? Please describe.
When multiple worker processes are started on the same host (e.g., for auto workers), global vertex maps of fragments will be created multiple times

Describe the solution you'd like
Global vertex maps can be placed in shared memory, and getting share across workers on the same host to save the unnecessary memory overhead

Describe alternatives you've considered
To support multiple concurrent workers for each process?

[BUG] The Initialization of worker_num in AllToAll function is wrong.

Describe the bug
inline void AllToAll(std::vector& objects, MPI_Comm comm) {
int worker_id, worker_num;
MPI_Comm_rank(comm, &worker_id);
MPI_Comm_size(comm, &worker_id);
..
}

Should be changed to

inline void AllToAll(std::vector& objects, MPI_Comm comm) {
int worker_id, worker_num;
MPI_Comm_rank(comm, &worker_id);
MPI_Comm_size(comm, &worker_num);
..
}

Incorrect results of GPU-based BFS algorithm on `graph500-26` dataset

Do you have questions or need support? Please describe.

I use the GPU program run_cuda_app of libgrape-lite to run the data of graph500--26, the command is ：
” ./run_cuda_app --application=bfs --lb=cm --bfs_source 62455266 --vfile ../dataset/graph500zst/graph500-26/graph500-26.v --efile ../dataset/graph500zst/graph500-26/graph500-26.e --out_prefix ./output_bfs-26“

and the results （bfs-output）are as follows. The results are very strange. Is this normal?

Additional context
Add any other context about the question here.

MPI相关问题

"to_others_.emplace_back(std::move(item.second));"
你好，我想请教一下，为什么ParallelMessageManager中的消息发送线程中需要将非阻塞发送的消息缓冲区保存至to_others_容器中呢？貌似后续对to_others_容器也没有做什么操作，只是在通信完成后将其清空了。我测试了下，如果不加这句的话，大数据量消息收发情况下会错，小数据量时则不会报错，所以想请教下这步操作的作用是什么？

compilation error !

Do you have questions or need support? Please describe.
I downloaded the source code from github, after running the make -j command, the program has the following error, I want to know what's going on?

Additional context
Add any other context about the question here.

OutArchive获取字符串消息的问题

各位有遇到过当MESSAGE_T为std::string时，使用OutArchive的>>获取字符串内容时报错的问题吗，应该是下面的代码：

inline OutArchive& operator>>(OutArchive& out_archive, std::string& str) {
size_t size;
out_archive >> size;
str.resize(size);
// 这一句会报错
memcpy(&str[0], out_archive.GetBytes(size), size);
return out_archive;
}

按照下面的方式使用：
int id;
std::string msg;
out_arc >> id >> msg;

这个问题解决了，在添加字符串至InArchive时需要用const

[BUG]vertex file and the edge file which vid is a string, the final output result of the PR algorithm is wrong

When I provide the vertex file and the edge file which vid is a string, the final output result of the PR algorithm is only one vertex, and its PR value is 1, and the vid is not the original id.

Refactor `AppendOnlyEdgecutFragment` with mutable csr

Since mutable csr is ready, we can use mutable csr to implement AppendOnlyEdgecutFragment.

Adding AddressSanitizer to CI

to prevent issues like #8 from happening in the future

[FEATURE] Add EFragmentLoader to load graph without vertex file

To load a graph with edge file only, we will make the following changes:

Move GraphSpec in ev_fragment_loader.h to basic_fragment_loader.h, fix corresponding include.
Add EFragmentLoader which can load graph with only edge files, use EFragmentLoader in LoadGraph when vertex file is empty.
Update run_app.h. Allow user to use an empty FLAGS_vfile when FLAGS_segmented_partition is true.

[BUG] Analytical apps won't compile when VDATA_T is not grape::EmptyType

Describe the bug
The analytical apps will not compile when the VDATA_T is not grape::EmptyType.

To Reproduce
To reproduce the bug, you may change the VDATA_T from grape::EmptyType to int at run.cc:44:

  if (name.find("sssp") != std::string::npos) {
    grape::Run<int64_t, uint32_t, int, double>();
  } else {
    grape::Run<int64_t, uint32_t, int, grape::EmptyType>();
  }

Expected behavior
It should be compile.

Additional context

This problem caused by the lack of grape::EmptyType specialization for LoadGraph function. To fix the bug, just add two specialized LoadGraph functions:

template <typename FRAG_T,
          typename PARTITIONER_T = SegmentedPartitioner<typename FRAG_T::oid_t>,
          typename IOADAPTOR_T = LocalIOAdaptor,
          typename LINE_PARSER_T =
              TSVLineParser<typename FRAG_T::oid_t, typename FRAG_T::vdata_t,
                            typename FRAG_T::edata_t>>
static typename std::enable_if<
    std::is_same<typename FRAG_T::vdata_t, grape::EmptyType>::value,
    std::shared_ptr<FRAG_T>>::type
LoadGraph(const std::string& efile, const std::string& vfile,
          const CommSpec& comm_spec,
          const LoadGraphSpec& spec = DefaultLoadGraphSpec()) {
  if (vfile.empty()) {
    std::unique_ptr<
        EFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T, LINE_PARSER_T>>
        loader(new EFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T,
                                   LINE_PARSER_T>(comm_spec));
    return loader->LoadFragment(efile, vfile, spec);
  } else {
    std::unique_ptr<
        EVFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T, LINE_PARSER_T>>
        loader(new EVFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T,
                                    LINE_PARSER_T>(comm_spec));
    return loader->LoadFragment(efile, vfile, spec);
  }
}

template <typename FRAG_T,
          typename PARTITIONER_T = SegmentedPartitioner<typename FRAG_T::oid_t>,
          typename IOADAPTOR_T = LocalIOAdaptor,
          typename LINE_PARSER_T =
              TSVLineParser<typename FRAG_T::oid_t, typename FRAG_T::vdata_t,
                            typename FRAG_T::edata_t>>
static typename std::enable_if<
    !std::is_same<typename FRAG_T::vdata_t, grape::EmptyType>::value,
    std::shared_ptr<FRAG_T>>::type
LoadGraph(const std::string& efile, const std::string& vfile,
          const CommSpec& comm_spec,
          const LoadGraphSpec& spec = DefaultLoadGraphSpec()) {
  std::unique_ptr<
      EVFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T, LINE_PARSER_T>>
      loader(new EVFragmentLoader<FRAG_T, PARTITIONER_T, IOADAPTOR_T,
                                  LINE_PARSER_T>(comm_spec));
  return loader->LoadFragment(efile, vfile, spec);
}

Replace cppkafka with librdkafka in gnn_sampler

Is your feature request related to a problem? Please describe.

Currently the cppkafka is a submodule of libgrape-lite, the dependency is too heavy for such an example application.

Describe the solution you'd like

Replace the cpp-kafka functionalities with librdkafka.

Describe alternatives you've considered

Using other lightweight C++ kafka libraries, e.g., libkafka-asio, but looks not every promising.

Add Dockerfile or docker image

It is difficult to compile the codes successfully without Dockerfile or docker image, such as incompatible version of gflags, glog and nccl, so, can you provide it?

[BUG] Buffer overflow

Describe the bug
In comm_spec.h, the hostname is represented with 32 chars array. The buffer overflow will occur when the hostname more than 32 chars.

void initLocalInfo() {
    char hn[32];
    gethostname(hn, 32);
    char* recv_buf = reinterpret_cast<char*>(malloc(32 * worker_num_));
    MPI_Allgather(hn, 32, MPI_CHAR, recv_buf, 32, MPI_CHAR, comm_);

    std::vector<std::string> worker_host_names(worker_num_);
    for (int i = 0; i < worker_num_; ++i) {
      worker_host_names[i].assign(&recv_buf[i * 32], 32);
    }
    free(recv_buf);
    ...
}

[FEATURE] Making changes to adapt GraphScope

To adapt GraphScope, we have to make the following changes:

Introduce VertexDataContext to hold the result after computation.
Introduce VertexVector to represent a series of discontinuous vertices.
The termination can be made to stop the running app when an error occurred.
CommSpec should provide a method to get a MPI communicator on the same machine.
For the purpose of ease of use, the project should be installed under the system's include directory. Then, the other projects can find the include directory of libgrape-lite by a simple find_package statement.

alibaba / libgrape-lite Goto Github PK

libgrape-lite's Introduction

libgrape-lite

Building libgrape-lite

Dependencies

Building libgrape-lite and examples

Building libgrape-lite with GPU support

Running libgrape-lite applications

Graph format

Example applications

LDBC benchmarking

GNN sampler

GPU-based graph analytics

Documentation

License

Acknowledgements

Publications

Getting involved

libgrape-lite's People

Contributors

Stargazers

Watchers

Forkers

libgrape-lite's Issues

Recommend Projects

Recommend Topics

Recommend Org