Git Product home page Git Product logo

Comments (9)

Rodousse avatar Rodousse commented on September 17, 2024 1

Can relate with the RTX 2080, Nvidia driver 440.33.01 and Cuda 10.2.
Repro case :

  • I built the popsift-demo
  • Ran it with the following arguments (Image from here ):
    ./popsift-demo -i apollonian_gasket.ascii.pgm
  • It crashes with these errors :
/home/rdousse/Téléchargements/casablanca.ascii.pgm
Warning: no descriptors extracted
/home/rdousse/dev/eesep_2/thirdParty/build/popsift/src/popsift/features.cu:89 Runtime warning:
    Failed to register feature memory in CUDA.
    Features count: 1121
    Memory size requested: 80712
    an illegal memory access was encountered
/home/rdousse/dev/eesep_2/thirdParty/build/popsift/src/popsift/features.cu:97 Runtime warning:
    Failed to register descriptor memory in CUDA.
    Descriptors count: 0
    Memory size requested: 0
    an illegal memory access was encountered
/home/rdousse/dev/eesep_2/thirdParty/build/popsift/src/popsift/sift_pyramid.cu:307
    Failed to copy device-to-host: an illegal memory access was encountered
    src ptr=7f32b7800000
    dst ptr=7f329c021000

The code was also tested with a Geforce 1080 with the same nvidia driver and cuda version, everything went well.
Hope it can help.

from popsift.

lzx551402 avatar lzx551402 commented on September 17, 2024 1

@Rodousse I now have made it work with RTX cards. Here is my solution.

  1. As pointed out by the authors @griwodz, the problem lies in function "BitonicSort" when estimating the orientation. I couldn't identify what really went wrong with it but I attempted to re-implement this by a simple bubble sorting, and I now can pass the orientation step.
  2. Similarly, the program will crash at the descriptor step. This issue can be fixed by setting "--desc-mode grid".

Overall, I suspect that the problem lies in using CUDA warp-level primitives, e.g., __shfl_sync. It seems to be some out-of-bound issue when calling those functions with RTX cards.

from popsift.

griwodz avatar griwodz commented on September 17, 2024

Sorry for not replying for an eternity!
Unfortunately, I don't have any RTX card and I don't know what has changed.

My guess is that one of the intrinsic functions (__ballot, __popc) doesn't return because a thread has died after a memory violation. And that's probably in BitonicSort, but that is pure guesswork.

Do you have any time to search for the problem?

from popsift.

lzx551402 avatar lzx551402 commented on September 17, 2024

Thank you for your reply! I will have time to diagnose this issue in the next few weeks, and will keep you informed if it is resolved.

from popsift.

Rodousse avatar Rodousse commented on September 17, 2024

@lzx551402 Thanks for the turnaround !
I'll just wait for the fix to be released I think.

from popsift.

griwodz avatar griwodz commented on September 17, 2024

@lzx551402 Thank you very much for finding the origin for the crashes!
I'll probably get a login for a machine with a new Tesla card in the next few days, and I hope that the problem is the same on that one.

from popsift.

griwodz avatar griwodz commented on September 17, 2024

@lzx551402 Thanks for discovering the locations of the RTX 20 bugs. I managed to loan an eGPU with a 2080 and fixed the problem in BitonicSort (just one line using a conditional partial warp xor-shuffle).
If you have the time, you could test the branch dev/fixNewWarp (PR #66).

I'm not seeing the crash in the descriptor step in loop mode. Can you tell me which parameters you used when it crashed? Perhaps you have an example image?

from popsift.

griwodz avatar griwodz commented on September 17, 2024

Hmm, I did actually move a bad __syncthreads() out of a loop in s_desc_loop, so that may have fixed the other problem without actually looking for it.

from popsift.

lzx551402 avatar lzx551402 commented on September 17, 2024

@griwodz thank you so much for fixing the problem! I have just tested both modifications in src/popsift/s_desc_loop.cu and src/popsift/common/warp_bitonic_sort.h, and they have indeed resolved the two issues I have encountered.

from popsift.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.