Comments (14)
Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.
from antares.
Second this. I think OP has a 6800M and can run Rocm on Windows. That is awesome.
from antares.
Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.
I understand, thank you for your progress.
from antares.
Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.
I understand, thank you for your progress.
You can try follow 2 steps in advance:
Step-1: Generate Kernel Source
BACKEND=c-rocm_win64 COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' antares save kernel0.hip.cc
Step-2: Build Source into HSACO
hipcc kernel0.hip.cc --amdgpu-target=gfx1031 --genco -O2 -o kernel0.hip.hsaco
Step-3: Write win64 programs without WSL to utilize kernel0.hip.hsaco
(a little complex, will be updated later)
from antares.
Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.
I understand, thank you for your progress.
You can try follow 2 steps in advance: Step-1: Generate Kernel Source
BACKEND=c-rocm_win64 COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' antares save kernel0.hip.cc
Step-2: Build Source into HSACO
hipcc kernel0.hip.cc --amdgpu-target=gfx1031 --genco -O2 -o kernel0.hip.hsacoStep-3: Write win64 programs without WSL to utilize
kernel0.hip.hsaco
(a little complex, will be updated later)
thank you
from antares.
Thanks for your waiting. The feature is now supported. Please try commands below:
# Upgrade Antares version
$ pip3 install antares==0.3.15.1 --upgrade
# Your GFX1031 video card is not as the first rank of AMDGPUs, so you need this to select the correct GPU index.
$ export DEVICE_ID=1
# Save the code that you want to compile for Windows ROCm
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./amd_example.cpp
# Compile it into a clean folder: dest-outputs
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./amd_example.cpp dest-outputs/
# You'll get a simple project that can be built by MINGW64 or VC++, which is no longer related to WSL.
$ cd dest-outputs/ && make
After that, two files `kernels.bin` and `main.exe` (put together) can be executed on Windows without GCC and WSL dependencies.
from antares.
# Upgrade Antares version $ pip3 install antares==0.3.15.1 --upgrade # Your GFX1031 video card is not as the first rank of AMDGPUs, so you need this to select the correct GPU index. $ export DEVICE_ID=1 # Save the code that you want to compile for Windows ROCm $ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./amd_example.cpp # Compile it into a clean folder: dest-outputs $ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./amd_example.cpp dest-outputs/ # You'll get a simple project that can be built by MINGW64 or VC++, which is no longer related to WSL. $ cd dest-outputs/ && make After that, two files `kernels.bin` and `main.exe` (put together) can be executed on Windows without GCC and WSL dependencies.
Awesome!, i will try it out!
from antares.
i'm getting this error
comp@U_03:~/bm3dhip$ dir
dest-outputs kernel.hip.cpp source.cpp
comp@U_03:~/bm3dhip$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./kernel.hip.cpp
>> Backend = c-rocm_win64, Python PID = 16261, Task = lang.generic;
// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})
// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>
#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__
#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w
#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)
#define __AMDGFX__ gfx1031
#endif
extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
// [thread_extent] blockIdx.x = 1024
// [thread_extent] threadIdx.x = 1
// [thread_extent] blockIdx.y = 512
// [thread_extent] threadIdx.y = 1
output0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] = (input0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] + input1[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))]);
}
// ---------------------------------------------------------------------------
[EvalAgent] Compiling Evaluator: x86_64-w64-mingw32-g++ /home/comp/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/../../graph_evaluator/run_graph.cpp -o /home/comp/.cache/antares/evaluator.c-rocm_win64.tmp -D__BACKEND__=\"c-rocm_win64\" -D__BACKEND_rocm_win64__ -I/home/comp/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include -std=c++17 -Wno-string-compare -Wno-unused-result -Wno-unused-value -O2 -static -lpthread
[EvalAgent] Evaluating Modules .. (with backend = c-rocm_win64)
'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported. Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out
[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.00486145}
[Antares] Average time cost / run = 0.00486145 sec, 0.107846 gflops. (Checked: None)
comp@U_03:~/bm3dhip$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./kernel.hip.cpp deest-output
s/
>> Backend = c-rocm_win64, Python PID = 19458, Task = lang.generic;
// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})
// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})
// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>
#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__
#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w
#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)
#define __AMDGFX__ gfx1031
#endif
extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
// [thread_extent] blockIdx.x = 1024
// [thread_extent] threadIdx.x = 1
// [thread_extent] blockIdx.y = 512
// [thread_extent] threadIdx.y = 1
output0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] = (input0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] + input1[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))]);
}
// ---------------------------------------------------------------------------
[EvalAgent] Evaluating Modules .. (with backend = c-rocm_win64)
'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported. Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out
[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.00469551}
[Antares] Average time cost / run = 0.00469551 sec, 0.111657 gflops. (Checked: None)
'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported. Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Traceback (most recent call last):
File "./antares/antares_compiler.py", line 712, in <module>
main_compute()
File "./antares/antares_compiler.py", line 618, in main_compute
hex_code = eval_client.eval(kernel_path=kernel_path, dev_id=0, backend_root=backend_root, compile=1)['HEX']
KeyError: 'HEX'
from antares.
i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel kernel.hip.cpp
and a vapoursynth wrapper source.cpp
which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.
from antares.
i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel
kernel.hip.cpp
and a vapoursynth wrappersource.cpp
which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.
Please try this version: pip3 install antares==0.3.15.2
. I think the problem is solved.
from antares.
i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel
kernel.hip.cpp
and a vapoursynth wrappersource.cpp
which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.Please try this version:
pip3 install antares==0.3.15.2
. I think the problem is solved.
it did worked as intended, i got the main.exe and the kernel.bin, i haven't runned it yet, i'm searching to get a dll out instead of an exe, how could i do that?, do i save both kernel.hip.cpp
and source.cpp
, then compile both into the clean folder? to later be compiled in mingw?, this is the source i'm trying to work with.
hip_source.zip
from antares.
in a normal rocm linux environment i would need to use this to compile it/opt/rocm/hip/bin/hipcc source.cpp kernel.hip.cpp -o libbm3dhip.so -shared -fPIC -std=c++17 -O3 -I/home/comp/vapoursynth -Wno-unused-result --offload-arch=gfx1031 $(/opt/rocm/hip/bin/hipconfig --cxx_config)
from antares.
You have to use ROCm driver-level API to dispatch kernel workloads for Windows ROCm. Definitely, your source code is not based on that style. antares compile <kernel.cpp> <outdir>
generates a minimum driver-level example that shows you on how to do this correctly. You have to follow that.
kernel.bin
is the object that you should precompile bm3d
function in wsl via hipcc --genco ..
, and for hip APIs like hipSetDevice/hipMalloc/..
in source.cpp
, you should change them according to https://github.com/microsoft/antares/blob/v0.3.x/backends/c-rocm_win64/include/backend.hpp#L24-L25.
from antares.
I see, i will see if the friend who is developing this can help me when he haves time.
from antares.
Related Issues (20)
- gfx 1031 hip kernel crash HOT 1
- This repo is missing important files
- Change the cache directory HOT 4
- [BUG] Tune a bert-base-fp16 failed HOT 1
- [Help Request] How can Antares IR support stride size > 1 's Slice operation? HOT 3
- Can antares assign specified gpus for evaluation? HOT 1
- how can antares surport loop which index doesn't start with 0 HOT 5
- Benchmarks HOT 3
- is it possible c-ocl_*_win64 HOT 15
- Not an issue but a question due to lack of docs. HOT 1
- Fail to compile, when I use "AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares" HOT 9
- The residue of the last issue (#365)
- [Error] error: ‘CHECK_EQ’ was not declared in this scope; did you mean ‘CHECK_OK’? HOT 17
- Assertion error: SDK for `c-rocm_win64` is not configured correctly, HOT 3
- Is this project based on AI? What is the goal of this project? HOT 3
- will this project replace torch-directml? HOT 2
- Is ROCm no longer supported by 0.9.x? HOT 16
- Lack operator implementation for DirectX: torch.abs() HOT 1
- Is there any document for performance benchmark result vs pytorch2.1 compile mode? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from antares.