Git Product home page Git Product logo

Comments (6)

pkestene avatar pkestene commented on June 27, 2024

Looks like cudaGetDeviceProperties_v2 it actually missing in cuda toolkit

nm /usr/local/cuda-11.8/lib64/libcudart_static.a |grep cudaGetDeviceProperties
000000000003ec90 T cudaGetDeviceProperties

while it is ok in nvhpc

nm /data/pkestene/local/hpcsdk-23.3/Linux_x86_64/23.3/cuda/12.0/lib64/libcudart_static.a| grep cudaGetDeviceProperties
0000000000064010 T cudaGetDeviceProperties
000000000003e860 T cudaGetDeviceProperties_v2

it is also OK, when moving to cuda toolkit >=12.

from owl.

ingowald avatar ingowald commented on June 27, 2024

so just double-checking on this: this looks like you built with one cuda 11.8 install and then tried to run with another 11.8 install, and that was causing the issue? ie, both by themselves would have worked, just mixing them didn't?

(agreed that having to different distributions with the same version numbers if "funny", though :-) just trying to make sure that it's not related to owl.)

from owl.

pkestene avatar pkestene commented on June 27, 2024

Just to be clear:

  • building with cuda 11.8 (from toolkit) fails, the error is at link
  • building with cuda (from nvhpc, here 12.0) is fine, and I run using the same env
  • building with cuda 12.0 (from toolkit) is fine, runs fine

The problem was that libcudart_static.a shipped with cuda 11.8 doesn't provide cudaGetDeviceProperties_v2.

At least, it's working now. My next step is playing with owlExaBrick.
Thanks for making this available.

from owl.

ingowald avatar ingowald commented on June 27, 2024

Huh; that is "slightly" concerning. Basically what you're saying is that CUDA 11.8 is broken :-/. Huh. Now we have three options: a) try and fix the code even for cuda 11.8; b) go into cmakefile, detect cuda version, and at least throw an error; or c) ignore, and hope that people will use the newer cuda 12, anyway...
Anyway - thanks for reporting this - OS/toolchain related stuff is always nasty, in particular when version dependent...

from owl.

pkestene avatar pkestene commented on June 27, 2024

I tried another machine under RedHat8, and cuda 11.8 (both toolkit and nvhpc), no problem there.

When I look for symbols cudaGetDeviceProperties/cudaGetDeviceProperties_v2 on that machine, I get the same results as on Ubuntu, that is:

> nm /ccc/products/cuda-11.8/system/toolkit/lib64/libcudart_static.a | grep cudaGetDeviceProper
000000000003ec90 T cudaGetDeviceProperties
> nm /ccc/products/cuda-12.0/system/toolkit/lib64/libcudart_static.a | grep cudaGetDeviceProper
0000000000064010 T cudaGetDeviceProperties
000000000003e860 T cudaGetDeviceProperties_v2

cuda-11.8 only contain one of them, cuda-12.0 contains them both.

But on RedHat, owl samples apps link fine with cuda-11.8, even though cudaGetDeviceProperties_v2 is not present.

from owl.

pkestene avatar pkestene commented on June 27, 2024

I finally found the problem on my ubuntu machine; eventhough both cuda toolkit 11.8 and 12.0 where installed, in complete separated directories, when installing newer toolkit, by default the ubuntu package creates a sym link /usr/local/cuda -> /etc/alternatives/cuda -> /etc/alternatives/cuda-12.0
so /usr/local/cuda always points to the latest cuda toolkit installed.

so what happened is that, I was compiling with nvcc 11.8, the cuda headers where actually taken from 12.0. So there was a mismatch between the header version and the runtime library version.

So the question is why is /usr/local/cuda/include included in CUDA_INCLUDES ?

Finaly, I think the problem is there:
https://github.com/owl-project/owl/blob/master/owl/CMakeLists.txt#L165

the path /usr/local/cuda/include is unconditionnaly included.

But this path is really not needed if using alias library like CUDA::cudart_static.

A possible fix is to replace:

target_include_directories(owl
  PUBLIC
    /usr/local/cuda/include/
    ${PROJECT_SOURCE_DIR}
    ${CMAKE_CURRENT_LIST_DIR}/include
)

by

target_include_directories(owl
  PUBLIC
    ${PROJECT_SOURCE_DIR}
    ${CMAKE_CURRENT_LIST_DIR}/include
)

So that the path /usr/local/cuda/include/ is not added by default.

I think that definitely close the issue. I can provide a small if needed.

from owl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.