<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

I get the same error: <div class="snippet-clipboard-content notranslate position-r

thanks for the report. I guess we need to call in the big guns :-) <a class="user-

Gauss kernel initialization: unknown error about popsift HOT 11 OPEN

fabiencastan commented on September 17, 2024

Gauss kernel initialization: unknown error

from popsift.

Comments (11)

stale commented on September 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from popsift.

simogasp commented on September 17, 2024

I sometimes get a similar error on an MBP when there are other programs running using GPU (typically Chrome). Closing the program(s) makes the system to switch to the integrated graphic card and then I am able use popsift. Probably a memory size limit?

from popsift.

taketwo commented on September 17, 2024

I get the same error:

$ ./popsift-demo -i image.png --print-dev-info
PopSift version: 1.0.0
image.png
Choosing device 0: GeForce RTX 2070
Device information:
    Name: GeForce RTX 2070
    Compute Capability:    7.5
    Total device mem:      8366915584 B 8170816 kB 7979 MB
    Per-block shared mem:  49152
    Warp size:             32
    Max threads per block: 1024
    Max threads per SM(X): 1024
    Max block sizes:       {1024,1024,64}
    Max grid sizes:        {2147483647,65535,65535}
    Number of SM(x)s:      36
    Concurrent kernels:    yes
    Mapping host memory:   yes
    Unified addressing:    yes

/code/my_projects/popsift/src/popsift/gauss_filter.cu:245
    cudaMemcpyToSymbol failed for Gauss kernel initialization: invalid device symbol

According to nvidia-smi, only 600 MiB out of 8000 MiB on the GPU are occupied by other processes.

from popsift.

simogasp commented on September 17, 2024

thanks for the report.
I guess we need to call in the big guns :-) @griwodz

from popsift.

taketwo commented on September 17, 2024

Just a few more data points. Out of curiosity, I commented out that memory copy. This lead to a similar error in uploading SIFT constants:

/code/my_projects/popsift/src/popsift/sift_constants.cu:52
    Failed to upload h_consts to device: invalid device symbol

Commenting this one out, I hit the next one at:

/code/my_projects/popsift/src/popsift/common/debug_macros.cu:24
    called from /code/my_projects/popsift/src/popsift/s_pyramid_build.cu:125
    cudaGetLastError failed: invalid device function

from popsift.

griwodz commented on September 17, 2024

Huh. You are getting different error messages on the K4000 and the RTX 2070. That's weird.

Could you try to move the __device__ __constant__ GaussInfo d_gauss; in gauss_filter.cu out of the namespace popsift? Since the binding is symbolic, it is possible that something has changed and the namespace is now a problem.

Another possibility, but I wouldn't know why that should happen if your system has only 1 CUDA card, is that cudaMemcpyToSymbol cannot figure out which card you are trying to use. The constant memory should exist on all CUDA cards anyway. That could be tested by adding a call cudaSetDevice(0); at the top of the init_filter function (that would be just for testing, not a solution in the long term).

from popsift.

griwodz commented on September 17, 2024

The amount on constant memory on a CUDA card is quite limited, but all documentation insists that it is because the constant cache size is limited.

Do you have any hints on how I can get recreate the error (on Linux)?

from popsift.

taketwo commented on September 17, 2024

Hi, thanks for your answer. I've tried both moving d_gauss out of the popsift namespace and setting device explicitly, all to no avail.

I did not mention before, I am running PopSift in a Docker container. This morning I tried to build and run it on the host system directly, and there were no issues.

Do you have any hints on how I can get recreate the error (on Linux)?

Unfortunately, the Docker image I use is proprietary, so I can not share it. Instead, I've tried to create a minimal image based on nvidia/cuda with the same Ubuntu/CUDA version. To my surprise, when I compile and run PopSift there, it also has no issues. So, apparently, there is something very special about my proprietary image. I'm investigating further, but if you have any ideas or hints what can be tried, please let me know.

from popsift.

griwodz commented on September 17, 2024

Is it possible that your main Docker container uses a different CUDA SDK than the host machine, but your test container uses the same SDK as the host?

Since late CUDA 10, NVidia tries to do something about the compatibility hassle (as they are writing here: https://docs.nvidia.com/deploy/cuda-compatibility/index.html), but I have not looked at those compatibility libraries at all.

from popsift.

taketwo commented on September 17, 2024

My "main" container is based on the nvidia/cudagl:10.0-devel-ubuntu18.04 image; I used the same one for my "test" container. So both of them have CUDA 10.0. On my host system I used to have 10.2, however this morning I downgraded it to 10.0 just in case. nvidia-smi still reports:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+

but as far as I understand, this is because the driver is tied to a certain CUDA version regardless of which CUDA SDK is actually installed. Also, according to the info on the page you posted, this driver is compatible with all 10.x versions.

I'm currently trying to "bisect" the layers of the "main" container to find the one that introduces the problem.

from popsift.

taketwo commented on September 17, 2024

I found the cause and it (seemingly) has nothing to do with CUDA and/or Docker. In my "main" container lld linker is installed and setup to be used by default. That's all. Switching back to the standard gold linker eliminates the issue.

In case you want to reproduce this and check what's going on, simply install lld package and add the following to the root CMakeLists.txt of PopSift:

set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=lld")

from popsift.

Gauss kernel initialization: unknown error about popsift HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent