Git Product home page Git Product logo

rsbench's People

Contributors

heshpdx avatar jtramm avatar stephan-rohr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsbench's Issues

SYCL "simulation only" runtime statistics misleading

For the SYCL version, we are currently reporting runtime statistics for both the kernel initialization / JIT compiling as well as the actual execution. This may result in some issues on certain systems, e.g.:

Total Time Statistics (SYCL+OpenCL Init / JIT Compilation + Simulation Kernel)
Runtime:                XXXXXXX seconds
Lookups:               XXXXXXXXXX
Lookups/s:            XXXXXXXXXX
Simulation Kernel Only Statistics
Runtime:               0.00001 seconds
Lookups/s:             1,000,000,000,000,000
Verification checksum: (Valid)

Timing these things as we are now included some assumptions as to the asynchronous behavior of SYCL that do not appear to be true in all cases with all compilers on all machines. Instead, we should just time only the total runtime.

Possible buffer overrun of poles buffer

This line

Pole * contiguous = (Pole *) malloc( input.n_nuclides * input.avg_n_poles * sizeof(Pole));

... assumes that the total equals to avg*nuclides.
But that is only true if we never hit this line:

R[i] = 1;

If we do, then the total is larger, and RSBench reaches a buffer overrun bug.
This is a marginal case and highly unlikely with large avg_n_poles, but still a problem with lower values.

This can be fixed locally, by summing up n_poles inside generate_poles instead of multiplying by the average.
Not sure if there are other side effects to this, though

CUDA error and Libomptarget error: openmp-offload method

Simulation crashes at runtime for openmp-offload mode:

Compiler: LLVM 14.0.0 (nightly build: February 2nd 2022) + cudatoolkit/21.9_11.4
Machine: Perlmutter (NVIDIA A100 GPU + AMD Milan CPU)

Reproducer

  1. cd openmp-offload
  2. export CC=clang
  3. make
  4. ./rsbench -m event

No changes were made to the Makefile

Beginning baseline event based simulation on device... CUDA error: an illegal memory access was encountered Libomptarget error: Copying data from device failed. Libomptarget error: Call to targetDataEnd failed, abort target. Libomptarget error: Failed to process data after launching the kernel. Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings. simulation.c:24:2: Libomptarget fatal error 1: failure of target construct while offloading is mandatory Aborted

a typo in readme?

"RSBench represents the multipole method of perfoming continuous energy macroscopic neutron cross section lookups."

perfoming->performing

Implement XL and XXL sizes

The README documents XL and XXL sizes, but only small and large appear to be implemented. These would indeed be helpful for benchmarking large nodes. In the meantime, can you suggest command line options that can scale the problem in ways that makes sense, e.g., to an order of magnitude larger than the current large size? Thank you.

Result of computation is never checked -> optimising compilers skew results

Again, similar to XSbench issue:
in main.c:
calculate_macro_xs( macro_xs, mat, E, input, data, sigTfactors, &abrarov, &alls );
The results in macro_xs are never checked. Simply adding asm volatile (""::"m"(macro_xs[0]),...) brings performance back in line for aggressive optimising compilers (adding LTO to GCC optimisation flags).
Three core difference is significant again:
$ while true; do res1=$(./rsbench -s small | awk '/Lookups.s:/ {print $2}'); res2=$(./rsbench.force_use -s small | awk '/Lookups.s:/ {print $2}'); echo $res1 $res2; done
808,197 401,135
792,076 367,372
765,152 366,358
Showing a performance difference of 2x.

So please make sure that the benchmark results get used, either by employing similar asm volatile barriers, or adding a running sum over the results (and printing / asm-volatile-consuming that).

Unchecked mallocs

Similar to my recent issue in XSbench, RSbench has also unchecked mallocs that can cause segfaults.

$ grep -n malloc c
init.c:56: Pole *
R = (Pole *) malloc( input.n_nuclides * sizeof( Pole *));
init.c:57: Pole * contiguous = (Pole *) malloc( input.n_nuclides * input.avg_n_poles * sizeof(Pole));
init.c:89: Window *
R = (Window *) malloc( input.n_nuclides * sizeof( Window *));
init.c:90: Window * contiguous = (Window *) malloc( input.n_nuclides * input.avg_n_windows * sizeof(Window));
init.c:128: double *
R = (double *) malloc( input.n_nuclides * sizeof( double * ));
init.c:129: double * contiguous = (double *) malloc( input.n_nuclides * input.numL * sizeof(double));
main.c:120: (complex double *) malloc( input.numL * sizeof(complex double) );
material.c:17: int * num_nucs = (int
)malloc(12_sizeof(int));
material.c:44: int *_ mats = (int *) malloc( 12 * sizeof(int *) );
material.c:46: mats[i] = (int *) malloc(num_nucs[i] * sizeof(int) );
material.c:112: double *
concs = (double **)malloc( 12 * sizeof( double *) );
material.c:115: concs[i] = (double *)malloc( num_nucs[i] * sizeof(double) );
papi.c:252: int * events = malloc(num_papi_events * sizeof(int));
papi.c:257: long_long * values = malloc( num_papi_events * sizeof(long_long));

Not quite as bad as in XSbench, because the default allocation is smaller (~250MB), but it would still be good to have checked mallocs.

Request for Adding New Programming Models - Kokkos and RAJA

Our research group is currently doing research on parallel programming models and are interested in contributing to the RSBench project by adding new models such as Kokkos and RAJA.

As part of the plan, we propose restructuring the project to follow a similar structure as BabelStream, which would allow for better organization and maintainability.

We would like to know if there are any current plans within the RSBench project regarding the addition of these models We are eager to contribute to the project by implementing these products.

Any feedback, suggestions, or guidance on this proposal would be highly appreciated. We are looking forward to collaborating with the RSBench community to further improve this valuable benchmarking tool.

Thank you for considering our request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.