Git Product home page Git Product logo

lockstepdualisation's Introduction

LockstepDualisation Artefact Description

This repository contains the code for the paper "Lockstep-Parallel Dualization of Surface Triangulations" by Jonas Dornonville de la Cour, Carl-Johannes Johnsen, and James Emil Avery. Submitted to the 2023 International Conference for High Performance Computing, Networking, Storage, and Analysis.

Instructions

Software Prerequisites

  • Linux or MacOS X (Tested on Ubuntu 18.04, Ubuntu 22.04, Arch Linux 6, MacOS X 13.3.1). CPU version works on Windows WSL on Ubuntu 22.04, GPU version does not due to Windows not supporting NVIDIA Unified Memory.
  • CMake 3.18 or higher (Tested on Cmake 3.23 and 3.26)
  • C++ compiler with C++17 support (Tested and verified with gcc 7.5, 11.3, and 12.2. Does not work with clang)
  • Nvidia CUDA Toolkit 11.8 or higher
  • Nvidia GPU with compute capability 5.0 or higher
  • Git
  • Fortran compiler (Tested with gfortran 7.5, 11.3, and 12.2)

Build

Quickstart

Download the code using the recursive-flag, as the code benchmarks against the dualization implementation from http://github.com/jamesavery/fullerenes/ and includes it as a sub-module:

git clone --recursive [email protected]/jonasdelacour/LockstepDualisation.git
cd LockstepDualisation

To automatically build, run automatic validation, and run benchmarks, simply type

make all

Each of the benchmarks produces a CSV file containing the results, and generates the benchmark plots. The benchmark and validation output will be placed in a directory named output/<hostname>.

To only build, or run benchmarks or validation separately, run

make build

or

make validation

or

make benchmarks

The validation checks the results from all the parallel implementations against the reference sequential dualization implementation in the Fullerene software package. For every $n$ in $[20,24,26,\ldots,200]$, the check is performed against a random sample of 10,000 dual $C_n$ fullerene isomer graphs (or the full isomer space if smaller than 10,000). We verify that the results are identical.

The benchmarks can also be performed interactively with the Jupyter notebook, reproduce.ipynb.

Manual build

In case the automatic build fails for some reason, the individual steps to build and run the software is as follows:

  1. Fetch the Fullerene software package as a submodule (for reference comparisons)
git submodule update --init
  1. After this, the benchmarks can be built using CMake and make:
mkdir build
cd build/
cmake ..
make -j

Manual Run

After building, return to the repository root directory before running the benchmarks. The executables are located in the build/benchmarks and build/validation directories. The executables are:

build/benchmarks/baseline
build/benchmarks/omp_multicore
build/benchmarks/single_gpu
build/benchmarks/multi_gpu

The GPU benchmarks will only be built if the CUDA toolkit is available.

All executables take the same command line parameters. For example:

./build/benchmarks/multi_gpu <Ntriangles> <Ngraphs> <Nruns> <Nwarmup> <variant:0|1>
  1. Ntriangles: one of [20, 24, 26, 28, ... , 200] (to match the fullerene test-data). Default: 200
  2. Ngraphs : batch size, i.e. the number of graphs to dualise in parallel.
  3. Nruns: number of repeated runs. Default: 10. To reproduce results from the paper, set to 100 (but takes longer).
  4. 'Nwarmup`: number of warmup runs. Default: 1
  5. variant: Kernel variant.
  • For GPU, kernel 0 uses one thread per triangle (Ntriangles threads), and kernel 1 uses one thread per vertex.
  • For CPU, kernel 0 is the shared-memory parallel version, and kernel 1 is the task-parallel version.

For example,

./build/benchmarks/single_gpu 100 1000000 100 1 1

runs the single-GPU benchmark for a million C100 fullerene isomers, repeated 100 times for statistics, with a single warmup-run, using GPU kernel 1.

lockstepdualisation's People

Contributors

jonasdelacour avatar jamesavery avatar carljohnsen avatar

Watchers

 avatar  avatar

lockstepdualisation's Issues

sycl_benchmark crashes for N>128 on LUMI-G

Running for N>128 passes validation, but crashes in benchmark.

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 200 200 
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 200
Success!

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 200
Dualising 1000000 triangulation graphs, each with 200 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
        NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
        NOT USING: AMD EPYC 7A53 64-Core Processor                 has 4 compute-units.
Platform: AMD HIP BACKEND
        USING    : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377767,0,0], local id: [167,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377768,0,0], local id: [168,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377769,0,0], local id: [169,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377770,0,0], local id: [170,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377771,0,0], local id: [171,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377772,0,0], local id: [172,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377773,0,0], local id: [173,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377774,0,0], local id: [174,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377776,0,0], local id: [176,0,0] Assertion `false` failed.
/users/averyjam/dualize/LockstepDualisation/src/sycl/dual.cc:31: K DeviceDualGraph<6, unsigned short>::dedge_ix(const K, const K) const [MaxDegree = 6, K = unsigned short]: global id: [1377778,0,0], local id: [178,0,0] Assertion `false` failed.
:0:rocdevice.cpp            :2652: 1910724722915 us: 1686 : [tid:0x14a7b1aef700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
Aborted

Running for N<=128 works for both. Why?

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./validation/sycl/sycl_validation gpu 128 128
Validating SYCL implementation for gpu device: gfx90a:sramecc+:xnack-.
N = 128
Success!

averyjam@nid005021:~/dualize/LockstepDualisation/build> ./benchmarks/sycl/sycl_benchmark gpu 128
Dualising 1000000 triangulation graphs, each with 128 triangles, repeated 10 times and with 1 warmup runs.
Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
        NOT USING: Intel(R) FPGA Emulation Device has 4 compute-units.
Platform: Intel(R) OpenCL
        NOT USING: AMD EPYC 7A53 64-Core Processor                 has 4 compute-units.
Platform: AMD HIP BACKEND
        USING    : gfx90a:sramecc+:xnack- has 110 compute-units.
Using 1 gpu-devices
Mean Time per Graph: 26.4305 +/- 7.02391 ns

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.