Git Product home page Git Product logo

Comments (7)

vitduck avatar vitduck commented on May 24, 2024 1

Hi Duncan,

Thanks for your reply.

I do agree that a second target without graphical component is better than removing OpenGL altogether.
Looking at the code, it seems that the rendering is strongly coupled with simulation part.
So I am not sure it is worth the effort on your end to isolate it.

For now, I will set up a linux box to test the code.

from cuda-to-sycl-nbody.

DuncanMcBain avatar DuncanMcBain commented on May 24, 2024

Hi @vitduck,

as a temporary solution it might be possible to use the solution described here instead of messing around with the X virtual framebuffer stuff, though I haven't tried personally. It should be possible to compile Mesa and LLVMPipe without requiring that they are installed to the system.

We don't have any quick fixes for removing the graphical dependency but it's something we're considering doing in some fashion. It might be possible to simply remove the OpenGL code from the main file, though I think if we pick this task up I'd like to make a second target which builds from a separate main file that has no graphical component.

Duncan.

from cuda-to-sycl-nbody.

DuncanMcBain avatar DuncanMcBain commented on May 24, 2024

Hi @vitduck,

We have a PR open that should fix this issue (#30).

I hope this helps!

from cuda-to-sycl-nbody.

vitduck avatar vitduck commented on May 24, 2024

Hi @DuncanMcBain
Thanks very much for the notice.

I am testing the latest commit as follow:

$ module purge 
$ module load cuda/10.1 
$ sh scripts/build_cuda.sh no_render 
-- The CXX compiler identification is GNU 4.8.5
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /apps/cuda/10.1/bin/nvcc
-- Check for working CUDA compiler: /apps/cuda/10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /apps/cuda/10.1 (found version "10.1") 
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    GLEW_LIBRARY


-- Build files have been written to: /scratch/optpar01/work/2024/cuda-to-sycl-nbody/build_cuda
Scanning dependencies of target nbody_cuda
[ 25%] Building CXX object src/CMakeFiles/nbody_cuda.dir/nbody.cpp.o
[ 50%] Building CXX object src/CMakeFiles/nbody_cuda.dir/sim_param.cpp.o
[ 75%] Building CUDA object src/CMakeFiles/nbody_cuda.dir/simulator.cu.o
[100%] Linking CXX executable ../../nbody_cuda
[100%] Built target nbody_cuda
Scanning dependencies of target release
[100%] Built target release

So OpenGL libs are no longer required!

However, I encounter the following error when running the compiled binary:

$ ./scripts/run_nbody.sh -b cuda 100 10  
GPUassert: initialization error /scratch/optpar01/work/2024/cuda-to-sycl-nbody/src/simulator.cuh 94

Looking the the relevant line of simulator.cuh, it is just a standard cudaMalloc

 92     ¦ ParticleData_d(size_t n) {
 93     ¦   ¦// Allocate device memory for particle coords & velocity...
 94     ¦   ¦gpuErrchk(cudaMalloc((void **)&x, sizeof(coords_t) * n));
 95     ¦   ¦gpuErrchk(cudaMalloc((void **)&y, sizeof(coords_t) * n));
 96     ¦   ¦gpuErrchk(cudaMalloc((void **)&z, sizeof(coords_t) * n));
 97     ¦ };

I tried smaller system size as well, but the error persists (We have 40 GB memory)
Do you have some insight on this issue ?

from cuda-to-sycl-nbody.

DuncanMcBain avatar DuncanMcBain commented on May 24, 2024

Hi @vitduck,

We won't really be able to help with the pure CUDA version of the code (we didn't write it), but if you're able to try the SYCL version we'd be happy to help with that!

from cuda-to-sycl-nbody.

vitduck avatar vitduck commented on May 24, 2024

Duncan,
Sorry for the an oversight on my part. The aforementioned CUDA error is due to MIG partition.
Both CUDA and SYCL-migrated codes can now be built and run without rendering.

Could you kindly confirm if the following output is expected ?
(If I understand correctly, the kernel time will be measured in ms)

  • Backend enumeration
$ sycl-ls 
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.6.0.22_223734]
[opencl:cpu:1] Intel(R) OpenCL, AMD EPYC 7543 32-Core Processor                 3.0 [2023.16.6.0.22_223734]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.8 [CUDA 11.6]
  • CUDA performance
$ ./nbody_cuda 50 10 0.999998 0.005 1.0e-7 2 10000
... 
At step 10000 kernel time is 15.4361 and mean is 15.435 and stddev is: 0.0853953
  • SYCL/CUDA performance
$ SYCL_DEVICE_FILTER=cuda ./nbody_dpcpp 50 10 0.999998 0.005 1.0e-7 2 10000
...
At step 10000 kernel time is 8.60655 and mean is 8.60897 and stddev is: 0.0694211

I would have expected some level of parity between native CUDA and SYCL with a slight edge for the former.
Here, the result unexpectedly shows that SYCL/CUDA is two times faster.
I am not sure how to interpret this outcome.

from cuda-to-sycl-nbody.

DuncanMcBain avatar DuncanMcBain commented on May 24, 2024

Hi @vitduck,
so we have a section in the README (the last section) which covers performance and we effectively managed to get the results to be about the same between CUDA and SYCL on a 3060 GPU back when we were working on this. Obviously the software stack has changed since then so it's hard to say exactly what might be similar or different since then.

I'll check with a colleague, we might be able to send you some of our updated numbers, but also you could check with the NVIDIA NSight Compute profiling tool to see if there are any obvious things going on.

from cuda-to-sycl-nbody.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.