anl-cesar / rsbench Goto Github PK
View Code? Open in Web Editor NEWA mini-app to represent the multipole resonance representation lookup cross section algorithm.
License: MIT License
A mini-app to represent the multipole resonance representation lookup cross section algorithm.
License: MIT License
For the SYCL version, we are currently reporting runtime statistics for both the kernel initialization / JIT compiling as well as the actual execution. This may result in some issues on certain systems, e.g.:
Total Time Statistics (SYCL+OpenCL Init / JIT Compilation + Simulation Kernel)
Runtime: XXXXXXX seconds
Lookups: XXXXXXXXXX
Lookups/s: XXXXXXXXXX
Simulation Kernel Only Statistics
Runtime: 0.00001 seconds
Lookups/s: 1,000,000,000,000,000
Verification checksum: (Valid)
Timing these things as we are now included some assumptions as to the asynchronous behavior of SYCL that do not appear to be true in all cases with all compilers on all machines. Instead, we should just time only the total runtime.
This line
Line 57 in cf42d27
... assumes that the total equals to avg*nuclides.
But that is only true if we never hit this line:
Line 16 in cf42d27
If we do, then the total is larger, and RSBench reaches a buffer overrun bug.
This is a marginal case and highly unlikely with large avg_n_poles, but still a problem with lower values.
This can be fixed locally, by summing up n_poles
inside generate_poles
instead of multiplying by the average.
Not sure if there are other side effects to this, though
Simulation crashes at runtime for openmp-offload mode:
Compiler: LLVM 14.0.0 (nightly build: February 2nd 2022) + cudatoolkit/21.9_11.4
Machine: Perlmutter (NVIDIA A100 GPU + AMD Milan CPU)
Reproducer
No changes were made to the Makefile
Beginning baseline event based simulation on device... CUDA error: an illegal memory access was encountered Libomptarget error: Copying data from device failed. Libomptarget error: Call to targetDataEnd failed, abort target. Libomptarget error: Failed to process data after launching the kernel. Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings. simulation.c:24:2: Libomptarget fatal error 1: failure of target construct while offloading is mandatory Aborted
When compiling the openmp-offload version of the benchmark and execute the application terminates normally, but the verification checksum differs.
"RSBench represents the multipole method of perfoming continuous energy macroscopic neutron cross section lookups."
perfoming->performing
The README documents XL
and XXL
sizes, but only small
and large
appear to be implemented. These would indeed be helpful for benchmarking large nodes. In the meantime, can you suggest command line options that can scale the problem in ways that makes sense, e.g., to an order of magnitude larger than the current large
size? Thank you.
Again, similar to XSbench issue:
in main.c:
calculate_macro_xs( macro_xs, mat, E, input, data, sigTfactors, &abrarov, &alls );
The results in macro_xs are never checked. Simply adding asm volatile (""::"m"(macro_xs[0]),...) brings performance back in line for aggressive optimising compilers (adding LTO to GCC optimisation flags).
Three core difference is significant again:
$ while true; do res1=$(./rsbench -s small | awk '/Lookups.s:/ {print
808,197 401,135
792,076 367,372
765,152 366,358
Showing a performance difference of 2x.
So please make sure that the benchmark results get used, either by employing similar asm volatile barriers, or adding a running sum over the results (and printing / asm-volatile-consuming that).
Similar to my recent issue in XSbench, RSbench has also unchecked mallocs that can cause segfaults.
$ grep -n malloc c
init.c:56: Pole * R = (Pole *) malloc( input.n_nuclides * sizeof( Pole *));
init.c:57: Pole * contiguous = (Pole *) malloc( input.n_nuclides * input.avg_n_poles * sizeof(Pole));
init.c:89: Window * R = (Window *) malloc( input.n_nuclides * sizeof( Window *));
init.c:90: Window * contiguous = (Window *) malloc( input.n_nuclides * input.avg_n_windows * sizeof(Window));
init.c:128: double * R = (double *) malloc( input.n_nuclides * sizeof( double * ));
init.c:129: double * contiguous = (double *) malloc( input.n_nuclides * input.numL * sizeof(double));
main.c:120: (complex double *) malloc( input.numL * sizeof(complex double) );
material.c:17: int * num_nucs = (int)malloc(12_sizeof(int));
material.c:44: int *_ mats = (int *) malloc( 12 * sizeof(int *) );
material.c:46: mats[i] = (int *) malloc(num_nucs[i] * sizeof(int) );
material.c:112: double * concs = (double **)malloc( 12 * sizeof( double *) );
material.c:115: concs[i] = (double *)malloc( num_nucs[i] * sizeof(double) );
papi.c:252: int * events = malloc(num_papi_events * sizeof(int));
papi.c:257: long_long * values = malloc( num_papi_events * sizeof(long_long));
Not quite as bad as in XSbench, because the default allocation is smaller (~250MB), but it would still be good to have checked mallocs.
Our research group is currently doing research on parallel programming models and are interested in contributing to the RSBench project by adding new models such as Kokkos and RAJA.
As part of the plan, we propose restructuring the project to follow a similar structure as BabelStream, which would allow for better organization and maintainability.
We would like to know if there are any current plans within the RSBench project regarding the addition of these models We are eager to contribute to the project by implementing these products.
Any feedback, suggestions, or guidance on this proposal would be highly appreciated. We are looking forward to collaborating with the RSBench community to further improve this valuable benchmarking tool.
Thank you for considering our request.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.