Comments (14)
Tried older ROCm versions, no dice (3.9 and 4.0 and 4.1 all have a longer error about /dev/kfd
not existing when doing rocminfo
So the repository for ROCm lists gfx8 GPUs as compatible but full support is not guaranteed. I think this is more likely due to its being ROCm inside WSL2 and not WSL1. I'm going to dual boot Linux and run my workflows in there and hopefully, ROCm will work properly there...
from antares.
Here is more info if needed. The GPU is an RX580
colin-ubuntu@Colin-Desktop:~$ rocminfo
ROCk module is NOT loaded, possibly no GPU devices
colin-ubuntu@Colin-Desktop:~$ clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (3305.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
from antares.
I was able to get torch's ROCm version to install, but running the antares samples will use the CPU
from antares.
From #269, we have said Antares is used to launch ROCm device code using Windows native ROCm driver and help to port device code to Standard Win64 applications, not the one to restore the full-stack of ROCm and make pytorch to work in Linux mode (maybe this is possible in theory but letting it come true is definitely a costly task and I am not sure whether it is deserved to do using plenty of time).
But your logs https://gist.github.com/3c77d7003a0a212d3f30abea8ee2b9d8 and statement of "running the antares samples will use the CPU" is not expected though. I am not sure whether you have installed Windows AMDGPU driver correctly.
So can you paste the log by running bash -c 'cd .libAntares/cache/_/ && ../../evaluator.c-rocm_win64'
? It will display the error reasons for your case.
from antares.
+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
..\..\..\rocclr\hip_code_object.cpp:482: guarantee(false && "hipErrorNoBinaryForGpu: Unable to find code object for all
current devices!")
from antares.
The AMDGPU drivers are installed correctly on windows, the DLL mentioned in the docs is present
from antares.
If I'm reading the error correctly I might need to install an older ROCm version? But it could totally be related to the fact rocminfo
displays that the ROCk module is not running, could it not?
from antares.
Your log is clear to show the root-cause: no existing --amdgpu-target
exactly hit your GPU type.
Current --amdgpu-target
list added in the compiling argument includes gfx803, gfx900, gfx906, gfx908, gfx1010.
My AMDGPU is Radeon7 which should match gfx906, if I remove argument --amdgpu-target=gfx906
, I'll also get this error:
+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
..\..\..\hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Once any --amdgpu-target
is satisfied the GPU type, it would work normally:
+ /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.000401532}
This is also tested using AMD Navi 10 which should match gfx1010.
Besides, some GPUs with same types may still have special suffix like xnack-
so it may influence the Windows ROCm runtime driver to match your compiled target types. I am not sure whether your AMDGPU indeed and precisely matches gfx803
. If so, it might indicate that Windows ROCm runtime driver has dropped support for any gfx803 cards.
from antares.
I'm running an RX580, no idea what that one is in this naming scheme
from antares.
@Column01 Just found the news: https://www.videogames.ai/2021/01/07/RX580-ROCM-40.html Seems like new ROCm drivers >= 4.0 no longer support gfx803, and this is also the same in Linux ROCm. If you still want to use it for acceleration, you may need to consider "Linux + ROCm < 4.0" or "Windows + DirectX12 (over BACKEND=c-hlsl_win64)"
from antares.
ugh, This is extremely frustrating. AMD keeps making stupid decisions like this and pissing consumers off. I wholeheartedly regret buying my AMD card,
from antares.
@Column01 Windows AMDGPU with ROCm runtime was initially support by the end of 2020 while ROCm 4.0 is released after that? Maybe the an history version of Windows AMD driver is still supporting gfx803? But.. it is indeed annoying though.
Fortunately, DirectX12 runtime is always able to use RTX580 for acceleration, and I think the performance is not far from acceleration by ROCm runtime.
from antares.
I will try 4.0 tomorrow and see if works
from antares.
After lots of tearing my hair out, I gave up. This is not an antares issue, it's an "AMD being stupid" issue. GFX803 is not supported and just straight-up broke at some point. RIP AMD consumer cards for computing...
from antares.
Related Issues (20)
- Usage with Rocm windows for hip code compilation and documentation HOT 49
- gfx 1031 hip kernel crash HOT 1
- Running ROCm computations on Windows over AMD GPU HOT 14
- This repo is missing important files
- Change the cache directory HOT 4
- [BUG] Tune a bert-base-fp16 failed HOT 1
- [Help Request] How can Antares IR support stride size > 1 's Slice operation? HOT 3
- Can antares assign specified gpus for evaluation? HOT 1
- how can antares surport loop which index doesn't start with 0 HOT 5
- Benchmarks HOT 3
- is it possible c-ocl_*_win64 HOT 15
- Not an issue but a question due to lack of docs. HOT 1
- Fail to compile, when I use "AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares" HOT 9
- The residue of the last issue (#365)
- [Error] error: ‘CHECK_EQ’ was not declared in this scope; did you mean ‘CHECK_OK’? HOT 17
- Assertion error: SDK for `c-rocm_win64` is not configured correctly, HOT 3
- Is this project based on AI? What is the goal of this project? HOT 3
- will this project replace torch-directml? HOT 2
- Is ROCm no longer supported by 0.9.x? HOT 16
- Lack operator implementation for DirectX: torch.abs() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from antares.