Git Product home page Git Product logo

Comments (14)

Column01 avatar Column01 commented on July 4, 2024 1

Tried older ROCm versions, no dice (3.9 and 4.0 and 4.1 all have a longer error about /dev/kfd not existing when doing rocminfo

So the repository for ROCm lists gfx8 GPUs as compatible but full support is not guaranteed. I think this is more likely due to its being ROCm inside WSL2 and not WSL1. I'm going to dual boot Linux and run my workflows in there and hopefully, ROCm will work properly there...

from antares.

Column01 avatar Column01 commented on July 4, 2024

Here is more info if needed. The GPU is an RX580

colin-ubuntu@Colin-Desktop:~$ rocminfo
ROCk module is NOT loaded, possibly no GPU devices
colin-ubuntu@Colin-Desktop:~$ clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 AMD-APP (3305.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0

from antares.

Column01 avatar Column01 commented on July 4, 2024

I was able to get torch's ROCm version to install, but running the antares samples will use the CPU

from antares.

ghostplant avatar ghostplant commented on July 4, 2024

From #269, we have said Antares is used to launch ROCm device code using Windows native ROCm driver and help to port device code to Standard Win64 applications, not the one to restore the full-stack of ROCm and make pytorch to work in Linux mode (maybe this is possible in theory but letting it come true is definitely a costly task and I am not sure whether it is deserved to do using plenty of time).

But your logs https://gist.github.com/3c77d7003a0a212d3f30abea8ee2b9d8 and statement of "running the antares samples will use the CPU" is not expected though. I am not sure whether you have installed Windows AMDGPU driver correctly.

So can you paste the log by running bash -c 'cd .libAntares/cache/_/ && ../../evaluator.c-rocm_win64' ? It will display the error reasons for your case.

from antares.

Column01 avatar Column01 commented on July 4, 2024
+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
..\..\..\rocclr\hip_code_object.cpp:482: guarantee(false && "hipErrorNoBinaryForGpu: Unable to find code object for all
current devices!")

from antares.

Column01 avatar Column01 commented on July 4, 2024

The AMDGPU drivers are installed correctly on windows, the DLL mentioned in the docs is present

from antares.

Column01 avatar Column01 commented on July 4, 2024

If I'm reading the error correctly I might need to install an older ROCm version? But it could totally be related to the fact rocminfo displays that the ROCk module is not running, could it not?

from antares.

ghostplant avatar ghostplant commented on July 4, 2024

Your log is clear to show the root-cause: no existing --amdgpu-target exactly hit your GPU type.
Current --amdgpu-target list added in the compiling argument includes gfx803, gfx900, gfx906, gfx908, gfx1010.

My AMDGPU is Radeon7 which should match gfx906, if I remove argument --amdgpu-target=gfx906, I'll also get this error:

+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
..\..\..\hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

Once any --amdgpu-target is satisfied the GPU type, it would work normally:

+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out

[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.000401532}

This is also tested using AMD Navi 10 which should match gfx1010.

Besides, some GPUs with same types may still have special suffix like xnack- so it may influence the Windows ROCm runtime driver to match your compiled target types. I am not sure whether your AMDGPU indeed and precisely matches gfx803. If so, it might indicate that Windows ROCm runtime driver has dropped support for any gfx803 cards.

from antares.

Column01 avatar Column01 commented on July 4, 2024

I'm running an RX580, no idea what that one is in this naming scheme

from antares.

ghostplant avatar ghostplant commented on July 4, 2024

@Column01 Just found the news: https://www.videogames.ai/2021/01/07/RX580-ROCM-40.html Seems like new ROCm drivers >= 4.0 no longer support gfx803, and this is also the same in Linux ROCm. If you still want to use it for acceleration, you may need to consider "Linux + ROCm < 4.0" or "Windows + DirectX12 (over BACKEND=c-hlsl_win64)"

from antares.

Column01 avatar Column01 commented on July 4, 2024

ugh, This is extremely frustrating. AMD keeps making stupid decisions like this and pissing consumers off. I wholeheartedly regret buying my AMD card,

from antares.

ghostplant avatar ghostplant commented on July 4, 2024

@Column01 Windows AMDGPU with ROCm runtime was initially support by the end of 2020 while ROCm 4.0 is released after that? Maybe the an history version of Windows AMD driver is still supporting gfx803? But.. it is indeed annoying though.

Fortunately, DirectX12 runtime is always able to use RTX580 for acceleration, and I think the performance is not far from acceleration by ROCm runtime.

from antares.

Column01 avatar Column01 commented on July 4, 2024

I will try 4.0 tomorrow and see if works

from antares.

Column01 avatar Column01 commented on July 4, 2024

After lots of tearing my hair out, I gave up. This is not an antares issue, it's an "AMD being stupid" issue. GFX803 is not supported and just straight-up broke at some point. RIP AMD consumer cards for computing...

from antares.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.