Git Product home page Git Product logo

Comments (11)

stale avatar stale commented on September 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from popsift.

simogasp avatar simogasp commented on September 17, 2024

I sometimes get a similar error on an MBP when there are other programs running using GPU (typically Chrome). Closing the program(s) makes the system to switch to the integrated graphic card and then I am able use popsift. Probably a memory size limit?

from popsift.

taketwo avatar taketwo commented on September 17, 2024

I get the same error:

$ ./popsift-demo -i image.png --print-dev-info
PopSift version: 1.0.0
image.png
Choosing device 0: GeForce RTX 2070
Device information:
    Name: GeForce RTX 2070
    Compute Capability:    7.5
    Total device mem:      8366915584 B 8170816 kB 7979 MB
    Per-block shared mem:  49152
    Warp size:             32
    Max threads per block: 1024
    Max threads per SM(X): 1024
    Max block sizes:       {1024,1024,64}
    Max grid sizes:        {2147483647,65535,65535}
    Number of SM(x)s:      36
    Concurrent kernels:    yes
    Mapping host memory:   yes
    Unified addressing:    yes

/code/my_projects/popsift/src/popsift/gauss_filter.cu:245
    cudaMemcpyToSymbol failed for Gauss kernel initialization: invalid device symbol

According to nvidia-smi, only 600 MiB out of 8000 MiB on the GPU are occupied by other processes.

from popsift.

simogasp avatar simogasp commented on September 17, 2024

thanks for the report.
I guess we need to call in the big guns :-) @griwodz

from popsift.

taketwo avatar taketwo commented on September 17, 2024

Just a few more data points. Out of curiosity, I commented out that memory copy. This lead to a similar error in uploading SIFT constants:

/code/my_projects/popsift/src/popsift/sift_constants.cu:52
    Failed to upload h_consts to device: invalid device symbol

Commenting this one out, I hit the next one at:

/code/my_projects/popsift/src/popsift/common/debug_macros.cu:24
    called from /code/my_projects/popsift/src/popsift/s_pyramid_build.cu:125
    cudaGetLastError failed: invalid device function

from popsift.

griwodz avatar griwodz commented on September 17, 2024

Huh. You are getting different error messages on the K4000 and the RTX 2070. That's weird.

Could you try to move the __device__ __constant__ GaussInfo d_gauss; in gauss_filter.cu out of the namespace popsift? Since the binding is symbolic, it is possible that something has changed and the namespace is now a problem.

Another possibility, but I wouldn't know why that should happen if your system has only 1 CUDA card, is that cudaMemcpyToSymbol cannot figure out which card you are trying to use. The constant memory should exist on all CUDA cards anyway. That could be tested by adding a call cudaSetDevice(0); at the top of the init_filter function (that would be just for testing, not a solution in the long term).

from popsift.

griwodz avatar griwodz commented on September 17, 2024

The amount on constant memory on a CUDA card is quite limited, but all documentation insists that it is because the constant cache size is limited.

Do you have any hints on how I can get recreate the error (on Linux)?

from popsift.

taketwo avatar taketwo commented on September 17, 2024

Hi, thanks for your answer. I've tried both moving d_gauss out of the popsift namespace and setting device explicitly, all to no avail.

I did not mention before, I am running PopSift in a Docker container. This morning I tried to build and run it on the host system directly, and there were no issues.

Do you have any hints on how I can get recreate the error (on Linux)?

Unfortunately, the Docker image I use is proprietary, so I can not share it. Instead, I've tried to create a minimal image based on nvidia/cuda with the same Ubuntu/CUDA version. To my surprise, when I compile and run PopSift there, it also has no issues. So, apparently, there is something very special about my proprietary image. I'm investigating further, but if you have any ideas or hints what can be tried, please let me know.

from popsift.

griwodz avatar griwodz commented on September 17, 2024

Is it possible that your main Docker container uses a different CUDA SDK than the host machine, but your test container uses the same SDK as the host?

Since late CUDA 10, NVidia tries to do something about the compatibility hassle (as they are writing here: https://docs.nvidia.com/deploy/cuda-compatibility/index.html), but I have not looked at those compatibility libraries at all.

from popsift.

taketwo avatar taketwo commented on September 17, 2024

My "main" container is based on the nvidia/cudagl:10.0-devel-ubuntu18.04 image; I used the same one for my "test" container. So both of them have CUDA 10.0. On my host system I used to have 10.2, however this morning I downgraded it to 10.0 just in case. nvidia-smi still reports:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+

but as far as I understand, this is because the driver is tied to a certain CUDA version regardless of which CUDA SDK is actually installed. Also, according to the info on the page you posted, this driver is compatible with all 10.x versions.

I'm currently trying to "bisect" the layers of the "main" container to find the one that introduces the problem.

from popsift.

taketwo avatar taketwo commented on September 17, 2024

I found the cause and it (seemingly) has nothing to do with CUDA and/or Docker. In my "main" container lld linker is installed and setup to be used by default. That's all. Switching back to the standard gold linker eliminates the issue.

In case you want to reproduce this and check what's going on, simply install lld package and add the following to the root CMakeLists.txt of PopSift:

set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=lld")

from popsift.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.