Git Product home page Git Product logo

regraph's Introduction

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines

NOTE: Here is a modified version for xilinx_u280_gen3x16_xdma_1_202211_1 platform.

DOI

What's new?

TBD

Overview

image

The Figure above shows the overview of ReGraph workflow.

  • Step 1: To obtain a custumized accelerator design for a graph application, developers only need to write user-defined functions (UDFs) of three stages of a GAS model with the provided programminginterface.
  • Step 2: ReGraph then takes the UDFs, accelerator templates and platform specific optimizations to generate a set of synthesizable codes for accelerators with all possible pipeline combinations.
  • Step 3: The synthesizable codes are compiled to bitstreams using the Xilinx Vitis toolchain. After that, users assign the graph for acceleration.
  • Step 4: ReGraph reorders vertices based on their in-degrees and partitions the graph. Then, the task scheduler with the built-in graph-aware task scheduling method selects the accelerator with the most suitable numbers of Big and Little pipelines and generates the scheduling plan.
  • Step 5: ReGraph deploys the selected accelerator and runs on the target FPGA.

Programming Interface

With ReGraph, users can implement different graph accelerators by only writing three high-level functions: accScatter(), accGather() and accApply(). By default, we provide you three build-in graph algorithms, PageRank (PR), Breadth-First Search (BFS) and Closeness Centrality (CC) as examples. The desired application can be compiled by passing argument APP=[the algorithm] to make command.

Accelerator generation

The number of little pipelines and big pipelines are configurable. You can change them in ./global_para.mk by modifying LITTLE_KERNEL_NUM and BIG_KERNEL_NUM. Please note, due to the limited memory ports, for U280, the total number of pipelines, i.e., LITTLE_KERNEL_NUM + BIG_KERNEL_NUM should not exceed 14, for U50, the total number of pipelines should not exceed 13.

You can also specify which SLR you want to put kernels in, and which banks you want to let each kernel access, in the file ./autogen/autogen.py. There are three configurable variables: apply_kernel_hbm_id, all_kernels_slr_id and all_kernels_hbm_id. Please note, due to limited URAMs, for U50, the LITTLE_KERNEL_DST_BUFFER_SIZE and BIG_KERNEL_DST_BUFFER_SIZE should be reduced by half, i.e., 32768 and 262144, respectively.

After configurations, run make autogen to generate the synthesizable accelerators and the connectivity files. Below is a detailed example to config 11 little pipelines and 3 big pipelines.

Step 1 (MUST DO): modify the ./global_para.mk: specify the number of each kind of pipelines

#Little kernel setup 
LITTLE_KERNEL_NUM=11
LITTLE_KERNEL_DST_BUFFER_SIZE=65536
#################################################################################################################
#Big kernel setup 
BIG_KERNEL_NUM=3
BIG_KERNEL_DST_BUFFER_SIZE=524288 
#################################################################################################################

Step 2 (OPTIONAL): modify the ./autogen/autogen.py: config slr id and hbm id. Kindly note HBM bank 30 of U280 and HBM bank 27 of U50 are reserved for outdegree, please avoid using these two banks. For U280, we recommend you use HBM bank 0 to 29, for U50, we recommend you use HBM bank 0 to 26. To have better timing and avoid routing congestion, please assign the kernels evenly among SLRs.

# configurable hbm wrapper bank id (for vertex properties)
#                         little pipeline vetex properties     |   big pipeline vetex properties
wrapper_kernel_hbm_id  = [1,3,5,7,9,11,13,15,17,19,21,             23,25,27]

# configurable little and big wrapper bank id (for edges)
#                                little pipeline vetex edges   |   big pipeline vetex properties
little_and_big_kernels_hbm_id = [0,2,4,6,8,10,12,14,16,18,20,      22,24,26]

# configurable little and big kernels slr id
#                                each little pipeline's SLR    |   each big pipeline's SLR
little_and_big_kernels_slr_id = [0,1,2,0,1,2,0,1,2,0,1,            2,1,2]

Devices

The desired device can be specified by passing argument DEVICES=[the device] to make command. If not specified, the default platform is U280. The below table is for quick reference.

Argument Devices
DEVICES=xilinx_u280_gen3x16_xdma_1_202211_1 Alveo U280

Datasets

The table below shows the details of used graph datasets, including synthetic graphs and real-world large-scale graphs. Because the dataset is too large, we only provide the dataset generator. Run the ./dataset/rmat.m file to generate the rmat-19-32.txt.

Graphs # of vertex # of edge Average degree Type Categories
rmat-19-32 (R19) 524.3K 16.8M 32 Directed Synthetic
rmat-21-32 (R21) 2.1M 67.1M 32 Directed Synthetic
rmat-24-16 (R24) 16.8M 268.4M 16 Directed Synthetic
graph500-scale23 (G23) 4.6M 258.5M 56 Directed Synthetic
web-google (GG) 916.4K 5.1M 6 Directed Web
amazon-2008 (AM) 735.3K 5.2M 7 Directed Social
web-hudong (HD) 2.0M 14.9M 7 Directed Web
web-baidu-baike (BB) 2.1M 17.8M 8 Directed Web
wiki-topcats (TC) 1.8M 28.5M 16 Directed Web
pokec-relationships (PK) 1.6M 30.6M 19 Directed Social
soc-flickr-und (FU) 1.7M 31.2M 9 Undirected Social
wikipedia-20070206 (WP) 3.6M 45.0M 13 Directed Web
liveJournal (LJ) 4.8M 68.9M 14 Undirected Social
ca-hollywood-2009 (HW) 1.1M 112.6M 53 Undirected Collabo.
dbpedia-link (DB) 18.3M 172.2M 9 Directed Social
orkut (OR) 3.1M 234.4M 38 Undirected Social

Run the code

Prerequisites

  • The gcc-9.4
  • Tools:
    • Vitis 2022.2
  • Evaluated platforms from Xilinx:
    • Alveo U280 Data Center Accelerator Card

Here is the example of implementing the accelerator for PageRank on Alveo U280 platform with Vitis 2020.2.

$ git clone https://path/to/github/repo.git
$ cd ./ReGraph
# config num of little pipelines (LITTLE_KERNEL_NUM) and num of little pipelines (BIG_KERNEL_NUM)
$ vim global_para.mk
# assign slrs and hbm banks for each kernel 
$ vim ./autogen/autogen.py
# config target (TARGET), device (DEVICES), and algorithm (APP)
$ vim Makefile
# generate accelerator and connectivity files according to the specified configurations
$ make autogen
# make the host execution program and the FPGA bitstream. It takes time :)
$ make APP=pr all 
# For execution on real hardware. The path of graph dataset needs to be provided by the user. 
$ ./host_graph_fpga_pr xclbin_hw_pr/*.xclbin ./dataset/rmat-19-32.txt 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.