Git Product home page Git Product logo

Comments (14)

ghostplant avatar ghostplant commented on July 24, 2024

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

from antares.

ffleader1 avatar ffleader1 commented on July 24, 2024

Second this. I think OP has a 6800M and can run Rocm on Windows. That is awesome.

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

I understand, thank you for your progress.

from antares.

ghostplant avatar ghostplant commented on July 24, 2024

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

I understand, thank you for your progress.

You can try follow 2 steps in advance:
Step-1: Generate Kernel Source

BACKEND=c-rocm_win64 COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' antares save kernel0.hip.cc

Step-2: Build Source into HSACO

hipcc kernel0.hip.cc --amdgpu-target=gfx1031 --genco -O2 -o kernel0.hip.hsaco

Step-3: Write win64 programs without WSL to utilize kernel0.hip.hsaco (a little complex, will be updated later)

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

I understand, thank you for your progress.

You can try follow 2 steps in advance: Step-1: Generate Kernel Source

BACKEND=c-rocm_win64 COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' antares save kernel0.hip.cc

Step-2: Build Source into HSACO

hipcc kernel0.hip.cc --amdgpu-target=gfx1031 --genco -O2 -o kernel0.hip.hsaco

Step-3: Write win64 programs without WSL to utilize kernel0.hip.hsaco (a little complex, will be updated later)

thank you

from antares.

ghostplant avatar ghostplant commented on July 24, 2024

Thanks for your waiting. The feature is now supported. Please try commands below:

# Upgrade Antares version
$ pip3 install antares==0.3.15.1 --upgrade

# Your GFX1031 video card is not as the first rank of AMDGPUs, so you need this to select the correct GPU index.
$ export DEVICE_ID=1

# Save the code that you want to compile for Windows ROCm
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./amd_example.cpp

# Compile it into a clean folder: dest-outputs
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./amd_example.cpp dest-outputs/

# You'll get a simple project that can be built by MINGW64 or VC++, which is no longer related to WSL.
$ cd dest-outputs/ && make

After that, two files `kernels.bin` and `main.exe` (put together) can be executed on Windows without GCC and WSL dependencies.

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024
# Upgrade Antares version
$ pip3 install antares==0.3.15.1 --upgrade

# Your GFX1031 video card is not as the first rank of AMDGPUs, so you need this to select the correct GPU index.
$ export DEVICE_ID=1

# Save the code that you want to compile for Windows ROCm
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./amd_example.cpp

# Compile it into a clean folder: dest-outputs
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./amd_example.cpp dest-outputs/

# You'll get a simple project that can be built by MINGW64 or VC++, which is no longer related to WSL.
$ cd dest-outputs/ && make

After that, two files `kernels.bin` and `main.exe` (put together) can be executed on Windows without GCC and WSL dependencies.

Awesome!, i will try it out!

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

i'm getting this error

comp@U_03:~/bm3dhip$ dir
dest-outputs  kernel.hip.cpp  source.cpp
comp@U_03:~/bm3dhip$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./kernel.hip.cpp
  >> Backend = c-rocm_win64, Python PID = 16261, Task = lang.generic;

// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})


// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]

#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>

#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__

#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w

#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)

#define __AMDGFX__ gfx1031

#endif


extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
  // [thread_extent] blockIdx.x = 1024
  // [thread_extent] threadIdx.x = 1
  // [thread_extent] blockIdx.y = 512
  // [thread_extent] threadIdx.y = 1
  output0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] = (input0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] + input1[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))]);
}

// ---------------------------------------------------------------------------

[EvalAgent] Compiling Evaluator: x86_64-w64-mingw32-g++ /home/comp/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/../../graph_evaluator/run_graph.cpp -o /home/comp/.cache/antares/evaluator.c-rocm_win64.tmp -D__BACKEND__=\"c-rocm_win64\" -D__BACKEND_rocm_win64__ -I/home/comp/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include -std=c++17 -Wno-string-compare -Wno-unused-result -Wno-unused-value -O2 -static -lpthread


[EvalAgent] Evaluating Modules .. (with backend = c-rocm_win64)
'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out

[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.00486145}

[Antares] Average time cost / run = 0.00486145 sec, 0.107846 gflops. (Checked: None)

comp@U_03:~/bm3dhip$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./kernel.hip.cpp deest-output
s/
  >> Backend = c-rocm_win64, Python PID = 19458, Task = lang.generic;

// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})

// GLOBALS: input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N, M] = input0[N, M] + input1[N, M]", input_dict={"input0": {"dtype": "float32", "shape": [1024, 512]}, "input1": {"dtype": "float32", "shape": [1024, 512]}})


// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[1024, 512], input1:float32[1024, 512] -> output0:float32[1024, 512]

#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>

#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__

#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w

#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)

#define __AMDGFX__ gfx1031

#endif


extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
  // [thread_extent] blockIdx.x = 1024
  // [thread_extent] threadIdx.x = 1
  // [thread_extent] blockIdx.y = 512
  // [thread_extent] threadIdx.y = 1
  output0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] = (input0[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))] + input1[(((((int)blockIdx.x) * 512) + ((int)blockIdx.y)))]);
}

// ---------------------------------------------------------------------------

[EvalAgent] Evaluating Modules .. (with backend = c-rocm_win64)
'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out

[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.00469551}

[Antares] Average time cost / run = 0.00469551 sec, 0.111657 gflops. (Checked: None)

'\\wsl.localhost\Ubuntu-20.04\home\comp\.cache\antares\cache\_'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1031 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Traceback (most recent call last):
  File "./antares/antares_compiler.py", line 712, in <module>
    main_compute()
  File "./antares/antares_compiler.py", line 618, in main_compute
    hex_code = eval_client.eval(kernel_path=kernel_path, dev_id=0, backend_root=backend_root, compile=1)['HEX']
KeyError: 'HEX'

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel kernel.hip.cpp and a vapoursynth wrapper source.cpp which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.

from antares.

ghostplant avatar ghostplant commented on July 24, 2024

i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel kernel.hip.cpp and a vapoursynth wrapper source.cpp which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.

Please try this version: pip3 install antares==0.3.15.2. I think the problem is solved.

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel kernel.hip.cpp and a vapoursynth wrapper source.cpp which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.

Please try this version: pip3 install antares==0.3.15.2. I think the problem is solved.

it did worked as intended, i got the main.exe and the kernel.bin, i haven't runned it yet, i'm searching to get a dll out instead of an exe, how could i do that?, do i save both kernel.hip.cpp and source.cpp, then compile both into the clean folder? to later be compiled in mingw?, this is the source i'm trying to work with.
hip_source.zip

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

in a normal rocm linux environment i would need to use this to compile it/opt/rocm/hip/bin/hipcc source.cpp kernel.hip.cpp -o libbm3dhip.so -shared -fPIC -std=c++17 -O3 -I/home/comp/vapoursynth -Wno-unused-result --offload-arch=gfx1031 $(/opt/rocm/hip/bin/hipconfig --cxx_config)

from antares.

ghostplant avatar ghostplant commented on July 24, 2024

You have to use ROCm driver-level API to dispatch kernel workloads for Windows ROCm. Definitely, your source code is not based on that style. antares compile <kernel.cpp> <outdir> generates a minimum driver-level example that shows you on how to do this correctly. You have to follow that.

kernel.bin is the object that you should precompile bm3d function in wsl via hipcc --genco .., and for hip APIs like hipSetDevice/hipMalloc/.. in source.cpp, you should change them according to https://github.com/microsoft/antares/blob/v0.3.x/backends/c-rocm_win64/include/backend.hpp#L24-L25.

from antares.

LuisB79 avatar LuisB79 commented on July 24, 2024

I see, i will see if the friend who is developing this can help me when he haves time.

from antares.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.