Git Product home page Git Product logo

gups's Introduction

TL;DR

The single-core kernel of GUPS (aka RandomAccess) from the HPC Challenge benchmark suite.

How to build and run

Simply invoke make to build the gups executable, which runs the multi-threaded version (implemented with OpenMP).

To run GUPS on a 1GB array (== 2^27 cells of unsigned 64-bit integers):

$ OMP_NUM_THREADS=1 ./gups --log2_length 27

To run GUPS with four threads on a 2GB array:

$ OMP_NUM_THREADS=3 ./gups --log2_length 28

Motivation

According to its website, the HPC Challenge (HPCC) suite measures a range memory access patterns and consists of seven benchmarks: HPL, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and Communication bandwidth and latency. Strangely, the suite links all these benchmarks into a single, monolithic executable that runs them one by one. This repo re-implements one of the seven benchmarks -- RandomAccess -- so it can be run alone. Additionally, this repo cleans the original code and remove MPI dependencies. The original code can be cloned from the official github repo: https://github.com/icl-utk-edu/hpcc . My code mostly borrows from the file RandomAccess/core_single_cpu.c in this repo. I therefore included the original copyright notice.

Summary of changes

The changes with respect to the original code are:

  1. Specify the array size through a command-line argument.
  2. Specify the number of iterations through a command-line argument. A single iteration goes over the array 4x (like in the original HPCC implementation) without running the initialization phase (array[i] = i). The default number of iterations is set to one, compatible to the HPCC implementation.
  3. Scrape off the MPI related functions.
  4. Migrate the code to C++ and build it with g++.
  5. Introduce a new-command line flag, "--verify", that enables the final verification step. Verification is disabled by default because the comments in core_single_cpu.c say it is optional.

Validation against the original HPCC benchmark

make compare validates the performance of this new implementation against the reference HPCC implementation. My tests on my Intel i7-6600U CPU (Skylake) produced the following performance numbers (giga updates per second):

ours reference
repeat1 0.0295 0.0297
repeat2 0.0305 0.0321
repeat3 0.0304 0.0299
repeat4 0.0311 0.0301
repeat5 0.0308 0.0302
----------- ----------- -----------
average 0.0305 0.0304
std_dev 0.0006 0.0010

In conclusion: the new implementation is similar to the original HPCC implementation because the difference between the average numbers is well below the standard deviation. (This statistical procedure is similar to the Z-test or the t-test).

Scalability tests

GUPS seems like an embarrassingly parallel workload because it can issue many threads that randomly accesses the big array. However, it turned out that the parellel implementation provided in the HPCC suite does not scale well with the number of threads; The table below shows that GUPS throughput is ~50% higher with 4 threads. Additionally, Intel VTune reported that the application achieves 33% of the available memory bandwidth.

My first guess was that the problem is false sharing of the random number array, which is accessed in the main loop like this:

#pragma omp parallel for
    for (j=0; j<128; j++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= ran[j];
    }
  }
}

To fix the problem, I padded the ran array such that each cell lies in a separate 64-byte cache line. The performance results on Intel Xeon E5-2660 v4 CPU (Broadwell) were similar, as shown below. This trend was true for other machines and array sizes. It seems like the problem lies somewhere else...

threads original no false sharing
1 0.0283 0.0287
2 0.0393 0.0382
3 0.0406 0.0402
4 0.0419 0.0425

gups's People

Contributors

idanyani avatar

Stargazers

Yiqi Chen avatar Zhe Zhou avatar

Watchers

Dan Tsafrir avatar

Forkers

anadav oribenzur

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.