Git Product home page Git Product logo

rocm_sdk_builder's Introduction

ROCM SDK Builder

Purpose

ROCM SDK Builder will provide an easy and customizable build and install of AMD ROCm machine learning environment for your Linux computer with the support for user level GPUs. Current version is based on to ROCM release 6.1.2 but contains lot of patches and optimizations on top of it.

In addition Rocm sdk builder will also by default build and install additional tools and frameworks like python, pytorch, jupyter-notebook, onnxruntime, deepspeed that has been build specifically for the AMD gpu's as a target. SDK will be installed under /opt/rocm_sdk_ directory.

As a new feature rocm sdk builder 6.1.2 has now support for building also extra applications which are not build by default.

  • stable diffusion webui
  • llama.cpp
  • vllm

These can be build with commands like: ./babs.sh -b binfo/extra/llama.cpp after the base build has been done.

Pytorch gpu benchmarks

This project has been so far tested at least with the following AMD GPUs:

  • AMD RX 7900 XTX (gfx1100)
  • AMD RX 7800 XT (gfx1101)
  • AMD RX 7700S/Framework Laptop 16 (gfx1102)
  • AMD Radeon 780M Laptop iGPU (gfx1103)
  • AMD RX 6800 XT (gfx1030)
  • AMD RX 6800 (gfx1030)
  • AMD RX 6600 (gfx1032)
  • AMD RX 5700 (gfx1010)
  • AMD RX 5500 (gfx1012)
  • AMD Radeon 780M Laptop iGPU (gfx1035)
  • gfx1036

AMD RX 5500 and AMD RX 6600 supoort is at the moment only partial.

For AMD RX 6600, select the RX 6800 (gfx1030) as a build target and use extra environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0 For AMD RX 5500, select the RX 5700 (gfx1010) as a build target and use extra environment variable HSA_OVERRIDE_GFX_VERSION=10.1.0

In configuration it's possible to select also other GPU's for build target but with some of the older cards more work may be needed. All kind of feedback is more than welcome and can be discussed for example by opening a new issue to github.

Pytorch with AMD GPU

Installation Requirements

ROCM SDK Builder has been tested on Mageia 9, Fedora 39, Fedora 40, Ubuntu 22.04, Ubuntu 23.10, Ubuntu 24.04, Linux Mint 21, Arch, Manjaro and Void Linux distributions.

Build system itself has been written with bash shell language to limit external dependencies to minimal but the applications build and installed will have their own build time dependencies that can be installed by executing script:

# ./install_deps.sh

To reduce the run-time dependency variance between different distributions, the build system will itself build and install standalone python 3.9 which seems to be pretty trouble-free version with the currently used pytorch rocm-components.

You need to also to use git configure command to set git username and email address, otherwise the 'git am' command that the project uses for applying patches on top of the upstream code versions will fail. This can be done in a following way.

# git config --global user.name "John Doe"
# git config --global user.email [email protected]

ROCM SDK Builder will require about 130 GB of free space to build the newest rocm 6.1.2 version. This is mostly divided in a following way:

- src_projects directory, for source code, about 30 GB
- builddir directory for temporarily files, about 75 GB
- /opt/rocm_sdk_611, install directory for the sdk, about 20 GB

Once the build is ready, 'builddir' and 'src_projects' directories could be deleted to free more space. As the downloading the sources from scratch can take some, I recommend keeping at least the source directory.

Installation Directory and Environment Variables

ROCM SDK Builder will by default install the SDK to /opt/rocm_sdk_ directory. To set the paths and other environment variables required to execute the applications installed by the SDK can be loaded by executing a command:

# source /opt/rock_sdk_<version>/bin/env_rocm.sh

Note that this command needs to be executed only once for each bash terminal session evenghouth we set it up on exery example below.

How to Build and Install ROCm SDK

Following commands will download rocm sdk 6.1.2 project sources and then build and install the rocm_sdk version 6.1.2 to /opt/rocm_sdk_612 folder.

# git clone https://github.com/lamikr/rocm_sdk_builder.git
# cd rocm_sdk_builder
# git checkout releases/rocm_sdk_builder_612
# ./babs.sh -i
# ./babs.sh -b

SDK will pop-up the GPU selection fro the SDK build targets before the build will start and selections will be stored to build_cfg.user file. Configuration can also be done afterwards with ./babs.sh -c command. Note that build-configuration change does not automatically cause a rebuild of already builded projects. To force that you need to remove projects you want to rebuild from builddir folder. To force rebuilding everything you simply remove the 'builddir' directoty completely.

GPU Selection for ROCm SDK Build Target

Test the installed SDK

Setup the rocm_sdk

ROCm SDK builder environment needs to be first set to environment variables like path with following command:

# source /opt/rocm_sdk_612/bin/env_rocm.sh

Note that this command needs to be executed only once for each bash terminal session evenghouth we set it up on exery example below.

Verify your GPU with ROCM SDK

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# rocminfo

This command should list both your CPU and AMD GPU as an agent and give information related to their capabilities.

Test Pytorch install

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# ./run_pytorch_gpu_simple_test.sh

Test Jupyter-notebook usage with Pytorch.

Following command will test that jupyter-notebook opens properly and show information about installed pytorch version and your GPU. (Note that AMD gpus are also handled as a cuda GPU on pytorch language)

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# jupyter-notebook pytorch_amd_gpu_intro.ipynb

Test Pytorch MIGraphX integration

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# python test_torch_migraphx_resnet50.py

Test MIGraphX

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/migraphx
# ./test_migraphx_install.sh

Test ONNXRuntime

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/onnxruntime
# test_onnxruntime_providers.py*

This should printout: ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

Test HIPCC compiler

Following code shows how to transfer data to GPU and back by using hipcc.

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
# ./build.sh

Test OpenCL Integration

Following code printouts some information about OpenCL platform and devices found

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/opencl/check_opencl_caps
# make
# ./check_opencl_caps

Following code sends 200 numbers for GPU kernels which modifies and sends them back to userspace.

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
# make
# ./hello_world

Run Pytorch GPU Benchmark

This test is pretty extensive and takes about 50 minutes on RX 6800. Test results are collected to result-folder but the python code which is supposed to parse the results from CSV files and plot pictures needs to be fixed.

Results for different AMD and Nvidia GPUs are available in results folder.

# git clone https://github.com/lamikr/pytorch-gpu-benchmark/
# cd pytorch-gpu-benchmark
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# ./test.sh

Customizing the SDK Build

Here is shortish but more detailed information how the SDK will work and can be modified.

Selecting the GPUs for which to build the SDK

GPU's can be selected with ./babs -c option which opens checkbox and selects results to build_cfg.user file

./babs.sh -c

List of supported GPU's should be relatively easy to add, at the moment I have only added support for the one I have been able to test by myself. At some point, I had also the support working for older AMD G2400 but I have not had time to integrate those changes to newer rocm sdk. (Had it working for rocm sdk 3.0.0)

Adding New Projects to SDK for build and install

New projects can be added to builder by modifying files in binfo directory.

  • First you need to create the <build_order_number>_.binfo file where you specify details for the project like source code location, configure flags and build commands. By default the build system will use cmake and make commands for building the projects, but you can override those by supplying your BINFO array commands if the projects standard install command needs some customization. You can check details for those from the existing .binfo files but principle is following:
BINFO_APP_POST_INSTALL_CMD_ARRAY=(
    "if [ ! -e ${INSTALL_DIR_PREFIX_SDK_ROOT}/lib/cmake ]; then mkdir -p ${INSTALL_DIR_PREFIX_SDK_ROOT}/lib/cmake; fi"
    "if [ ! -e ${INSTALL_DIR_PREFIX_SDK_ROOT}/lib/libhsakmt.so ]; then ln -s ${INSTALL_DIR_PREFIX_SDK_ROOT}/lib/libhsakmt.so.1 ${INSTALL_DIR_PREFIX_SDK_ROOT}/lib/libhsakmt.so; fi"
}
  • Then you will need to add your <build_order_number>_.binfo file to binfo/binfo_list.sh file.

ROCM SDK Builder Major Components

  • babs.sh, build/build.sh and build/binfo_utils.sh provides the framework for the build system and can be used more or less without modifications also on other projects. You can get help for available babs (acronym babs ain't patch build system)) commands with the '-h' argument.
[lamikr@localhost rocm_sdk_builder (master)]$ ./babs.sh -h
babs (babs ain't patch build system)

usage:
-h or --help:           Show this help
-i or --init:           Download git repositories listed in binfo directory to 'src_projects' directory
                        and apply all patches from 'patches' directory.
-ap or --apply_patches: Scan 'patches/rocm-version' directory and apply each patch
                        on top of the repositories in 'src_projects' directory.
-co or --checkout:      Checkout version listed in binfo files for each git repository in src_projects directory.
                        Apply of patches of top of the checked out version needs to be performed separately with '-ap' command.
-f or --fetch:          Fetch latest source code for all repositories.
                        Checkout of fetched sources needs to be performed separately with '-co' command.
                        Possible subprojects needs to be fetched separately with '-fs' command. (after '-co' and '-ap')
-fs or --fetch_submod:  Fetch and checkout git submodules for all repositories which have them.
-b or --build:          Start or continue the building of rocm_sdk.
                        Build files are located under 'builddir' directory and install is done under '/opt/rocm_sdk_version' directory.
-v or --version:        Show babs build system version information
  • binfo folder contains the recipes for each projects which is wanted to be build. These recipes does not have support for listing the dependencies by purpose and insted the build order is managed in binfo/binfo_list.sh file.

  • patches directory contains the patches that are wanted to add on top of the each project that is downloaded from their upstream repository

  • src_projects is the directory under each sub-project source code is downloaded from internet.

  • builddir is the location where each project is build before install and work as a temporarily work environment. Build system can be cleaned to force the rebuild either by removing individual projects from builddir folder or by removing the whole projecs. More detailed specific tuning is also possible by deleting build-phase result files. (builddir/project/.result_preconfig/config/postconfig/build/install/postinstall)

Rebuilding Individual Projects

Rebuilding of individual projects can be triggered in two different ways if you have made for example some changes to project source code under the 'src_projects' directory: Note that builder will always build projects in an order listed in the binfo/binfo_list.sh file.

  • deleting the project specific directory from the builddir
  • removing the .result_* files under the build directory.

For example:

# rm -rf builddir/037_magma (would trigger to re-run all build phases)
# rm -f builddir/037_magma/.result_install (would trigger to re-run only the install phase)
# ./babs.sh -b

Additional build commands

./babs.sh has also following commands:

# ./babs.sh -co (checkouts the sources back to basic level for all projects)
# ./babs.sh -ap (apply patches to checked sources for all projects)
# ./babs.sh -f (fetch latest sources for all projects)
# ./babsh.sh -fs (fetch latest sources for all projects all submodules)

GPU benchmarks

  • Very simple benchmark is available on by executing command:
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# jupyter-notebook /opt/rocm_sdk_612/docs/examples/pytorch/pytorch_simple_cpu_vs_gpu_benchmark.ipynb

Pytorch simple CPU vs GPU benchmark

Copyright (C) 2024 by Mika Laitio [email protected] Some of the files in this project are licensed with the LGPL 2.1 and some other with the COFFEEWARE license.
See the COPYING file for details.

rocm_sdk_builder's People

Contributors

lamikr avatar daniandtheweb avatar jeroen-mostert avatar flip111 avatar hsmalley avatar jassoncordones avatar stefan-olt avatar mritunjaymusale avatar

Stargazers

JaraKramar avatar sdli1995 avatar Shitty Girl avatar  avatar  avatar Chen Yufei avatar  avatar  avatar  avatar  avatar ANLO.PO avatar Raif Olson avatar Benjamin Rockhold avatar  avatar Allen Guo avatar  avatar Sinan Bir avatar  avatar  avatar Donghyun Shin avatar Anten Skrabec avatar Sergey Kurdakov avatar Gordon Walsh avatar Dylan Chang avatar Luke Gallagher avatar Leonard avatar  avatar Alexander Aladov avatar  avatar Junaid Hasan avatar  avatar Erick Vilcica avatar Dw_03 avatar  avatar Nessotrin avatar Hollow avatar Wei-Hsiang Liao avatar Rich Young avatar Beatriz Navidad Vilches avatar  avatar yevshev avatar  avatar Serg Podtynnyi avatar Andreas Klöckner avatar Leon Nardella avatar Max Topham avatar  avatar J.P. Turcotte avatar Amit avatar Robin Voetter avatar Evan avatar  avatar Omer Faruk Oruc avatar Florian avatar  avatar  avatar Andrea R. avatar Halit Aksoy avatar David Girault avatar Martin Simon avatar cpiosecny avatar Kyun (L.F.A.) avatar  avatar  avatar あめ avatar epyon_avenger avatar Sai Prasad avatar Mimoza avatar AFRO avatar  avatar  avatar  avatar Manuel Selinger avatar  avatar  avatar Joe avatar Benjamin Green avatar  avatar Patryk Kielar avatar Luke Angove avatar Phillip Rhodes avatar Vincent Tan avatar  avatar Kyle avatar Sergey Yurkov avatar Kaiser Roy avatar  avatar Szabi avatar  avatar Taras Glek avatar Uro avatar ShiningLea avatar Lucas Manchine avatar  avatar Simone Scanzoni avatar Matteo Paolucci avatar Loic Devaux avatar Unis avatar  avatar Daniel Phillips avatar

Watchers

 avatar Nessotrin avatar Sinan Bir avatar  avatar

rocm_sdk_builder's Issues

integration of stable diffusion and some other interesting machine learning tools

At the moment the rocm sdk builder stack is providing the base for integrating and using the many nice ML projects but does not in itself include them.

Some of the projects like openai whisper are easy to take in use with pip install but I am thinking that there could perhaps be room for adding a second layer of install of apps directly. They would not be installed automatically but we could still have binfo files to build them after the core has been build..

At the moment I am thinking some language tools, audio tools and visual tools that could be easily usable.
(whisper, automatic1111, comfyui, sd.cpp-webui, shark for example ?)

Not sure whether this second layer would also include only the libraries itself or also the attempts to trying out to improve the default models that these projects provide.

Build fails with /usr/bin/ld: "cannot find -lmsgpackc-cxx: No such file or directory" for gfx1100

Tried to build it for an AMD RX 7900 XTX (gfx1100). The build fails on Ubuntu 24.04 LTS with the following error:

[ 10%] Linking CXX shared library ../lib/libmigraphx.so
cd /home/bart0/repos/rocm_sdk_builder/builddir/035_AMDMIGraphX/src && /usr/bin/cmake -E cmake_link_script CMakeFiles/migraphx.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -fPIC -O3 -DNDEBUG -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib -shared -Wl,-soname,libmigraphx.so.2009000 -o ../lib/libmigraphx.so.2009000.0.60101 CMakeFiles/migraphx.dir/adjust_allocation.cpp.o CMakeFiles/migraphx.dir/analyze_streams.cpp.o CMakeFiles/migraphx.dir/apply_alpha_beta.cpp.o CMakeFiles/migraphx.dir/argument.cpp.o CMakeFiles/migraphx.dir/autocast_fp8.cpp.o CMakeFiles/migraphx.dir/auto_contiguous.cpp.o CMakeFiles/migraphx.dir/common.cpp.o CMakeFiles/migraphx.dir/common_dims.cpp.o CMakeFiles/migraphx.dir/compile_src.cpp.o CMakeFiles/migraphx.dir/convert_to_json.cpp.o CMakeFiles/migraphx.dir/cpp_generator.cpp.o CMakeFiles/migraphx.dir/dead_code_elimination.cpp.o CMakeFiles/migraphx.dir/dom_info.cpp.o CMakeFiles/migraphx.dir/dynamic_loader.cpp.o CMakeFiles/migraphx.dir/eliminate_allocation.cpp.o CMakeFiles/migraphx.dir/eliminate_common_subexpression.cpp.o CMakeFiles/migraphx.dir/eliminate_concat.cpp.o CMakeFiles/migraphx.dir/eliminate_contiguous.cpp.o CMakeFiles/migraphx.dir/eliminate_convert.cpp.o CMakeFiles/migraphx.dir/eliminate_data_type.cpp.o CMakeFiles/migraphx.dir/eliminate_identity.cpp.o CMakeFiles/migraphx.dir/eliminate_pad.cpp.o CMakeFiles/migraphx.dir/env.cpp.o CMakeFiles/migraphx.dir/file_buffer.cpp.o CMakeFiles/migraphx.dir/fp_to_double.cpp.o CMakeFiles/migraphx.dir/fuse_concat.cpp.o CMakeFiles/migraphx.dir/fuse_pointwise.cpp.o CMakeFiles/migraphx.dir/fuse_reduce.cpp.o CMakeFiles/migraphx.dir/generate.cpp.o CMakeFiles/migraphx.dir/inline_module.cpp.o CMakeFiles/migraphx.dir/insert_pad.cpp.o CMakeFiles/migraphx.dir/instruction.cpp.o CMakeFiles/migraphx.dir/json.cpp.o CMakeFiles/migraphx.dir/layout_nhwc.cpp.o CMakeFiles/migraphx.dir/load_save.cpp.o CMakeFiles/migraphx.dir/make_op.cpp.o CMakeFiles/migraphx.dir/memory_coloring.cpp.o CMakeFiles/migraphx.dir/module.cpp.o CMakeFiles/migraphx.dir/msgpack.cpp.o CMakeFiles/migraphx.dir/normalize_attributes.cpp.o CMakeFiles/migraphx.dir/normalize_ops.cpp.o CMakeFiles/migraphx.dir/op_enums.cpp.o CMakeFiles/migraphx.dir/operation.cpp.o CMakeFiles/migraphx.dir/optimize_module.cpp.o CMakeFiles/migraphx.dir/pad_calc.cpp.o CMakeFiles/migraphx.dir/pass.cpp.o CMakeFiles/migraphx.dir/pass_manager.cpp.o CMakeFiles/migraphx.dir/permutation.cpp.o CMakeFiles/migraphx.dir/preallocate_param.cpp.o CMakeFiles/migraphx.dir/process.cpp.o CMakeFiles/migraphx.dir/program.cpp.o CMakeFiles/migraphx.dir/propagate_constant.cpp.o CMakeFiles/migraphx.dir/promote_literals.cpp.o CMakeFiles/migraphx.dir/quantization.cpp.o CMakeFiles/migraphx.dir/quantize_fp16.cpp.o CMakeFiles/migraphx.dir/quantize_8bits.cpp.o CMakeFiles/migraphx.dir/reduce_dims.cpp.o CMakeFiles/migraphx.dir/register_op.cpp.o CMakeFiles/migraphx.dir/register_target.cpp.o CMakeFiles/migraphx.dir/replace_allocate.cpp.o CMakeFiles/migraphx.dir/rewrite_reduce.cpp.o CMakeFiles/migraphx.dir/simplify_qdq.cpp.o CMakeFiles/migraphx.dir/sqlite.cpp.o CMakeFiles/migraphx.dir/rewrite_gelu.cpp.o CMakeFiles/migraphx.dir/rewrite_pooling.cpp.o CMakeFiles/migraphx.dir/rewrite_quantization.cpp.o CMakeFiles/migraphx.dir/rewrite_rnn.cpp.o CMakeFiles/migraphx.dir/schedule.cpp.o CMakeFiles/migraphx.dir/serialize.cpp.o CMakeFiles/migraphx.dir/shape.cpp.o CMakeFiles/migraphx.dir/simplify_algebra.cpp.o CMakeFiles/migraphx.dir/simplify_dyn_ops.cpp.o CMakeFiles/migraphx.dir/simplify_reshapes.cpp.o CMakeFiles/migraphx.dir/split_single_dyn_dim.cpp.o CMakeFiles/migraphx.dir/target.cpp.o CMakeFiles/migraphx.dir/tmp_dir.cpp.o CMakeFiles/migraphx.dir/value.cpp.o CMakeFiles/migraphx.dir/verify_args.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_abs_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_acosh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_acos_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_allocate_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_argmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_argmin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_asinh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_asin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_as_shape_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_atanh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_atan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_broadcast_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_capture_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_ceil_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_clip_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_concat_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_contiguous_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convert_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convolution_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convolution_backwards_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_cosh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_cos_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dequantizelinear_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dimensions_of_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_div_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dot_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_elu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_equal_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_erf_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_exp_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_fill_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_flatten_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_floor_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_fmod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gather_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gathernd_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_get_tuple_elem_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_greater_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gru_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_identity_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_if_op_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_im2col_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_isinf_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_isnan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_layout_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_leaky_relu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_less_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_load_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_log_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_and_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_or_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_xor_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logsoftmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_loop_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_lrn_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_lstm_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_mod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_multibroadcast_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_multinomial_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nearbyint_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_neg_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nonmaxsuppression_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nonzero_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_outline_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pad_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pointwise_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pooling_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pow_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_prefix_scan_sum_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_prelu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quant_convolution_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quant_dot_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quantizelinear_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_random_uniform_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_random_seed_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_recip_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_mean_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_prod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_sum_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_relu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reshape_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reshape_lazy_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reverse_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_last_cell_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_last_hs_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_var_sl_last_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_roialign_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rsqrt_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_run_on_target_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scalar_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_none_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_none_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/
migraphx_op_select_module_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sigmoid_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sign_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sinh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_slice_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_softmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sqdiff_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sqrt_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_squeeze_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_step_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sub_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_tanh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_tan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_topk_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_transpose_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unary_not_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_undefined_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unique_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unknown_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unsqueeze_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_where_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_variable_seq_lens_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_builtin_hpp.cpp.o  -Wl,-rpath,/home/bart0/repos/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib: -lstdc++fs -ldl /usr/lib/x86_64-linux-gnu/libsqlite3.so -lmsgpackc-cxx
/usr/bin/ld: cannot find -lmsgpackc-cxx: No such file or directory
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/CMakeFiles/migraphx.dir/build.make:3426: lib/libmigraphx.so.2009000.0.60101] Error 1
make[2]: Leaving directory '/home/bart0/repos/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make[1]: *** [CMakeFiles/Makefile2:5912: src/CMakeFiles/migraphx.dir/all] Error 2
make[1]: Leaving directory '/home/bart0/repos/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make: *** [Makefile:166: all] Error 2
build failed: AMDMIGraphX

build failed

I know the readme says it's untested, but I still wanted to report.

Guess msgpackc-cxx should be installed correctly:

bart0:~/repos/rocm_sdk_builder$ apt-cache policy libmsgpack-cxx-dev
libmsgpack-cxx-dev:
  Installed: 6.1.0-1build1
  Candidate: 6.1.0-1build1
  Version table:
 *** 6.1.0-1build1 500
        500 http://de.archive.ubuntu.com/ubuntu noble/universe amd64 Packages
        100 /var/lib/dpkg/status

BR

bart0

Fedora 40: Compile failure after GCC14 patch

Initial compile attempt failed with this issue here: ROCm/rocm_smi_lib#170

After applying the following patch, this fixed the initial issue above:

--- a/include/rocm_smi/rocm_smi_utils.h 2024-05-25 00:02:19.127412816 -0400
+++ b/include/rocm_smi/rocm_smi_utils.h 2024-05-25 00:03:25.359416227 -0400
@@ -149,7 +149,7 @@
   __forceinline ~ScopeGuard() {
     if (!dismiss_) release_();
   }
-  __forceinline ScopeGuard& operator=(const ScopeGuard& rhs) {
+  __forceinline ScopeGuard& operator=(ScopeGuard& rhs) {
     dismiss_ = rhs.dismiss_;
     release_ = rhs.release_;
     rhs.dismiss_ = true;

Now the rocm_sdk_builder compilation fails here:

/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:851:32: error: passing argument 2 of 'PMIx_Data_buffer_unload' from incompatible pointer type [-Wincompatible-pointer-types]
  851 |     PMIx_Data_buffer_unload(b, &(d), &(s))
      |                                ^~~~
      |                                |
      |                                void **
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c:451:5: note: in expansion of macro 'PMIX_DATA_BUFFER_UNLOAD'
  451 |     PMIX_DATA_BUFFER_UNLOAD(msg, buffer, size);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:352:49: note: expected 'char **' but argument is of type 'void **'
  352 |                                          char **bytes, size_t *sz);
      |                                          ~~~~~~~^~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c: In function 'mca_memheap_modex_recv_all':
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:851:32: error: passing argument 2 of 'PMIx_Data_buffer_unload' from incompatible pointer type [-Wincompatible-pointer-types]
  851 |     PMIx_Data_buffer_unload(b, &(d), &(s))
      |                                ^~~~
      |                                |
      |                                void **
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c:586:5: note: in expansion of macro 'PMIX_DATA_BUFFER_UNLOAD'
  586 |     PMIX_DATA_BUFFER_UNLOAD(msg, send_buffer, size);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:352:49: note: expected 'char **' but argument is of type 'void **'
  352 |                                          char **bytes, size_t *sz);
      |                                          ~~~~~~~^~~~~
make[2]: *** [Makefile:1527: base/memheap_base_mkey.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/015_03_openmpi/oshmem/mca/memheap'
make[1]: *** [Makefile:1920: all-recursive] Error 1
make[1]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/015_03_openmpi/oshmem'
make: *** [Makefile:1534: all-recursive] Error 1
build failed: openmpi

build failed

My skill level prevents me fixing this, any idea?

GPU: Radeon RX 7800 XT
CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics × 12
KERNEL: Linux 6.8.10-300.fc40.x86_64
GCC Info: Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/14/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-14.1.1-20240522/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none,amdgcn-amdhsa --enable-offload-defaulted --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.1.1 20240522 (Red Hat 14.1.1-4) (GCC)

amd-fftw: problems with linking and performance

Follow-up from #74 with a more narrow focus.

On my system (Manjaro, latest rolling) building amd-fftw results in a broken libfftw3.so library: attempting to dlopen it causes a segfault. This can be fixed by changing the build so that the --enable-dynamic-dispatcher option is not passed, but while this produces a working library, it results in what appears to be a badly or completely unoptimized code path. For comparison, the results of a benchmark of this build on a Ryzen 5 7600:

> builddir/020_02_amd_fftw_double_precision/tests/bench -I -opatient 64 128 256 512 1024 2048 4096

(name "fftw3")
(version "fftw-3.3.10-sse2-avx-avx2-avx2_128-avx512")
(aocl-version "AOCL-FFTW 4.2.0 Build 20240622")
(cc "gcc -I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include -mavx2 -mno-avx256-split-unaligned-store -mno-avx256-split-unaligned-load -mno-prefer-avx128 -mfma")
(codelet-optim "")
(benchmark-precision "double")
Problem: 64, setup: 41.08 ms, time: 416.05 ns, ``mflops'': 4614.8727
Problem: 128, setup: 99.77 ms, time: 1.15 us, ``mflops'': 3889.3769
Problem: 256, setup: 217.85 ms, time: 2.60 us, ``mflops'': 3942.3856
Problem: 512, setup: 427.52 ms, time: 5.58 us, ``mflops'': 4125.3646
Problem: 1024, setup: 874.12 ms, time: 12.23 us, ``mflops'': 4185.2638
Problem: 2048, setup: 1.98 s, time: 26.82 us, ``mflops'': 4199.1903
Problem: 4096, setup: 4.71 s, time: 58.95 us, ``mflops'': 4168.7358

Now compare this to a benchmark of vanilla fftw-3.3.10 built from source with GCC 14.1.1 with configure --enable-sse2 --enable-avx --enable-avx2 --enable-avx512 --enable-openmp --enable-shared:

(name "fftw3")
(version "fftw-3.3.10-sse2-avx-avx2-avx2_128-avx512")
(cc "gcc -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns")
(codelet-optim "")
(benchmark-precision "double")
Problem: 64, setup: 39.41 ms, time: 36.59 ns, ``mflops'': 52475.262
Problem: 128, setup: 98.67 ms, time: 68.36 ns, ``mflops'': 65536
Problem: 256, setup: 209.93 ms, time: 144.46 ns, ``mflops'': 70883.405
Problem: 512, setup: 371.50 ms, time: 337.13 ns, ``mflops'': 68342.058
Problem: 1024, setup: 585.39 ms, time: 802.37 ns, ``mflops'': 63811.106
Problem: 2048, setup: 876.51 ms, time: 2.01 us, ``mflops'': 56134.985
Problem: 4096, setup: 1.34 s, time: 5.14 us, ``mflops'': 47839.224

These are both single CPU tests. It's obvious that the supposedly optimized AMD build is unfortunately nothing of the sort.

That raises two questions:

  1. Is this only due to omitting --enable-dynamic-dispatcher, or is it a more basic issue? As setting this gives me a library that won't load at all I can't test this.
  2. How much faster is the AMD optimized version of fftw really supposed to be, can someone verify? It's good to know if it's worth going through the hassle to get it to build, as opposed to just using the system fftw. If it doesn't consistently give big wins it's at least worth considering making it optional to avoid any issues.

framework laptop 16 hybrid gpu support

I received a Framework 16 laptop for testing and development with AMD's cpus and gpus.

  • 7840HS CPU
  • 780M iGPU (gfx1103) with 12 CU's (rocm-smi device id 0x7480)
  • 7700S GPU (gfx1102) with 32 CU's (rocm-smi device id 0x15bf
  • 32GB so-ram (need to check if I can update it to 64 or 96 gb later)

So far tested:

  • gfx1102 7700S works with basic tests ok. Have not had time to do any benchmarks yet with it.
  • gfx1103 will need more work and will start debugging it now

This is the first time I am able to test with hyprid gpu's and would like to find ways to test all 3 scenarios:

  • Either 7700S or 780M alone (should be doable by masking another gpu away from rocm)
  • Some tasks where it would make sense to share the task between both GPU's

Order of commands in the readme could be a bit better

At the moment it's

./install_deps.sh
git config --global user.name "John Doe"
git config --global user.email [email protected]
source /opt/rock_sdk_<version>/bin/env_rocm.sh
git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
git checkout releases/rocm_sdk_builder_611
./babs.sh -i
./babs.sh -b

when reading from top to bottom and following instructions as you go, then this order makes a bit more sense:

git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
git checkout releases/rocm_sdk_builder_611
./install_deps.sh
git config --global user.name "John Doe"
git config --global user.email [email protected]
./babs.sh -i
./babs.sh -b
source /opt/rock_sdk_<version>/bin/env_rocm.sh

perhaps the surrounding text can be re-ordered to match the logical commands

Lot of compiler warnings in hipcc hello_world example code

When hipcc example is build and exetuded, lot of warnings are generated

$ cd /opt/rocm_sdk_611/docs/examples/hipcc/hello_world/
$ source /opt/rocm_sdk_611/bin/env_rocm.sh  
$ make
/opt/rocm_sdk_611/bin/hipcc -g -fPIE   -c -o hello_world.o hello_world.cpp
hello_world.cpp:48:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   48 |         hipGetDeviceProperties(&devProp, 0);
      |         ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
/opt/rocm_sdk_611/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
   91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
      |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:62:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   62 |         hipMalloc((void**)&inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
   63 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:64:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   64 |         hipMalloc((void**)&outputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
   65 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:66:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   66 |         hipMemcpy(inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~
   67 |                 input,
      |                 ~~~~~~
   68 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   69 |                 hipMemcpyHostToDevice);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:77:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   77 |         hipMemcpy(output,
      |         ^~~~~~~~~ ~~~~~~~
   78 |                 outputBuffer,
      |                 ~~~~~~~~~~~~~
   79 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   80 |                 hipMemcpyDeviceToHost);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:81:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   81 |         hipFree(inputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~
hello_world.cpp:82:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   82 |         hipFree(outputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~~
7 warnings generated when compiling for gfx1035.
hello_world.cpp:48:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   48 |         hipGetDeviceProperties(&devProp, 0);
      |         ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
/opt/rocm_sdk_611/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
   91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
      |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:62:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   62 |         hipMalloc((void**)&inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
   63 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:64:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   64 |         hipMalloc((void**)&outputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
   65 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:66:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   66 |         hipMemcpy(inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~
   67 |                 input,
      |                 ~~~~~~
   68 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   69 |                 hipMemcpyHostToDevice);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:77:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   77 |         hipMemcpy(output,
      |         ^~~~~~~~~ ~~~~~~~
   78 |                 outputBuffer,
      |                 ~~~~~~~~~~~~~
   79 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   80 |                 hipMemcpyDeviceToHost);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:81:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   81 |         hipFree(inputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~
hello_world.cpp:82:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   82 |         hipFree(outputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~~
7 warnings generated when compiling for host.

It should only output the execution command and output from app itself:

/opt/rocm_sdk_611/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
 System minor: 3
 System major: 10
 Agent name: AMD Radeon Graphics
Input string: GdkknVnqkc
Output string: HelloWorld
Test ok!

build failed: MIOpen: Linking error with undefined reference in libamdhip64.so

I'm currently trying to build for RX6700 on Linux Mint (Ubuntu 22.04 LTS), unfortunately I'll get a linking error with MIOpenDriver

[100%] Linking CXX executable ../bin/MIOpenDriver
cd /home/stefan/source/rocm_sdk_builder/builddir/034_miopen/driver && /home/stefan/.local/lib/python3.10/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/MIOpenDriver.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -O3 -DNDEBUG -s -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib -pthread CMakeFiles/MIOpenDriver.dir/main.cpp.o CMakeFiles/MIOpenDriver.dir/InputFlags.cpp.o -o ../bin/MIOpenDriver  -Wl,-rpath,/home/stefan/source/rocm_sdk_builder/builddir/034_miopen/lib:/opt/rocm/lib: ../lib/libMIOpen.so.1.0.60101 --hip-link --offload-arch=gfx1031 /opt/rocm_sdk_611/lib64/libamd_comgr.so.2.7.60101 /opt/rocm_sdk_611/lib64/librocblas.so.4.1.60101 /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d /opt/rocm_sdk_611/lib/clang/17/lib/linux/libclang_rt.builtins-x86_64.a /opt/rocm_sdk_611/lib/libboost_filesystem.a /usr/lib/x86_64-linux-gnu/librt.a /opt/rocm/lib/libroctx64.so 
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_address_reserve@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_address_free@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_handle_create@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_unmap@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_set_access@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_map@ROCR_1'
/usr/bin/ld: /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-6d684796d: undefined reference to `hsa_amd_vmem_get_access@ROCR_1'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [driver/CMakeFiles/MIOpenDriver.dir/build.make:121: bin/MIOpenDriver] Fehler 1
make[2]: Verzeichnis „/home/stefan/source/rocm_sdk_builder/builddir/034_miopen“ wird verlassen
make[1]: *** [CMakeFiles/Makefile2:12331: driver/CMakeFiles/MIOpenDriver.dir/all] Fehler 2
make[1]: Verzeichnis „/home/stefan/source/rocm_sdk_builder/builddir/034_miopen“ wird verlassen
make: *** [Makefile:166: all] Fehler 2
build failed: MIOpen
  error in build cmd: make VERBOSE=1 -j12

build failed

Any idea what could cause that? Is there some library missing in the linking command? Or was the libamdhip64.so not correctly build?

Build failed in onnxruntime on ubuntu 22.04

build env : ubuntu 22.04 + cmake 3.29.3 (distro cmake version was not enough for build system)

Process :

git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
git checkout releases/rocm_sdk_builder_611
./babs.sh -i
# selected gfx906;gfx90a;gfx940;gfx1102
./babs.sh -co
./babs.sh -ap
./babs.sh -b

Error on building onnxruntime :

/home/ubuntu/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
Building onnxruntime
[0] onnxruntime, build command:
cd /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime
[1] onnxruntime, build command:
./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_611 "gfx906;gfx90a;gfx940;gfx1102"
using rocm_root_directory specified: /opt/rocm_sdk_611
Using specified amd rocm gpu: "gfx906;gfx90a;gfx940;gfx1102"
2024-06-01 21:21:14,937 tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available
2024-06-01 21:21:16,663 build [DEBUG] - Command line arguments:
  --build_dir /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux --config Release --enable_training --build_wheel --parallel --skip_tests --build_shared_lib --use_rocm --rocm_home /opt/rocm_sdk_611 --use_migraphx --migraphx_home /opt/rocm_sdk_611 --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm_sdk_611/bin/clang++ CMAKE_INSTALL_PREFIX=/opt/rocm_sdk_611 'CMAKE_HIP_ARCHITECTURES="gfx906;gfx90a;gfx940;gfx1102"'
Namespace(build_dir='/home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux', config=['Release'], update=False, build=False, clean=False, parallel=0, nvcc_threads=-1, test=False, skip_tests=True, compile_no_warning_as_error=False, enable_nvtx_profile=False, enable_memory_profile=False, enable_training=True, enable_training_apis=False, enable_training_ops=False, enable_nccl=False, mpi_home=None, nccl_home=None, use_mpi=False, enable_onnx_tests=False, path_to_protoc_exe=None, fuzz_testing=False, enable_symbolic_shape_infer_tests=False, gen_doc=None, gen_api_doc=False, use_cuda=False, cuda_version=None, cuda_home=None, cudnn_home=None, enable_cuda_line_info=False, enable_cuda_nhwc_ops=False, enable_pybind=False, build_wheel=True, wheel_name_suffix=None, numpy_version=None, skip_keras_test=False,build_csharp=False, build_nuget=False, msbuild_extra_options=None, build_java=False, build_nodejs=False, build_objc=False, build_shared_lib=True, build_apple_framework=False, cmake_extra_defines=[['CMAKE_HIP_COMPILER=/opt/rocm_sdk_611/bin/clang++', 'CMAKE_INSTALL_PREFIX=/opt/rocm_sdk_611', 'CMAKE_HIP_ARCHITECTURES="gfx906;gfx90a;gfx940;gfx1102"']], target=None, x86=False, arm=False, arm64=False, arm64ec=False, buildasx=False, msvc_toolset=None, windows_sdk_version=None, android=False, android_abi='arm64-v8a', android_api=27, android_sdk_path='', android_ndk_path='', android_cpp_shared=False, android_run_emulator=False, use_gdk=False, gdk_edition='.', gdk_platform='Scarlett', ios=False, apple_sysroot='', ios_toolchain_file='', xcode_code_signing_team_id='', xcode_code_signing_identity='', cmake_generator=None, osx_arch='x86_64', apple_deploy_target=None, enable_address_sanitizer=False, enable_qspectre=False, disable_memleak_checker=False, build_wasm=False, build_wasm_static_lib=False, emsdk_version='3.1.51', enable_wasm_simd=False, enable_wasm_threads=False, disable_wasm_exception_catching=False, enable_wasm_api_exception_catching=False, enable_wasm_exception_throwing_override=True, wasm_run_tests_in_browser=False, enable_wasm_profiling=False, enable_wasm_debug_info=False, wasm_malloc=None, emscripten_settings=None, use_extensions=False, extensions_overridden_path=None, cmake_path='cmake', ctest_path='ctest', skip_submodule_sync=False, use_mimalloc=False, use_dnnl=False, dnnl_gpu_runtime='', dnnl_opencl_root='', use_openvino=None, dnnl_aarch64_runtime='', dnnl_acl_root='', use_coreml=False, use_webnn=False, use_snpe=False, snpe_root=None, use_nnapi=False, nnapi_min_api=None, use_jsep=False, use_qnn=False, qnn_home=None, use_rknpu=False, use_preinstalled_eigen=False, eigen_path=None, enable_msinternal=False, llvm_path=None, use_vitisai=False, use_tvm=False, tvm_cuda_runtime=False, use_tvm_hash=False, use_tensorrt=False, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, tensorrt_home=None, test_all_timeout='10800', use_migraphx=True, migraphx_home='/opt/rocm_sdk_611', use_full_protobuf=False, llvm_config='', skip_onnx_tests=False, skip_winml_tests=False, skip_nodejs_tests=False, enable_msvc_static_runtime=False, enable_language_interop_ops=False, use_dml=False, dml_path='', use_winml=False, winml_root_namespace_override=None, dml_external_project=False, use_telemetry=False, enable_wcos=False,enable_lto=False, enable_transformers_tool_test=False, use_acl=None, acl_home=None, acl_libs=None, use_armnn=False, armnn_relu=False, armnn_bn=False, armnn_home=None, armnn_libs=None, build_micro_benchmarks=False, minimal_build=None, include_ops_by_config=None, enable_reduced_operator_type_support=False, disable_contrib_ops=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_exceptions=False, rocm_version=None, use_rocm=True, rocm_home='/opt/rocm_sdk_611', code_coverage=False, enable_lazy_tensor=False, ms_experimental=False, enable_external_custom_op_schemas=False, external_graph_transformer_path=None, enable_cuda_profiling=False, use_cann=False, cann_home=None, enable_rocm_profiling=False, use_xnnpack=False, use_azure=False, use_cache=False, use_triton_kernel=False, use_lock_free_queue=False, allow_running_as_root=False)
2024-06-01 21:21:16,670 build [DEBUG] - Defaulting to running update, build [and test for native builds].
migraphx_home = /opt/rocm_sdk_611
rocm_home = /opt/rocm_sdk_611
2024-06-01 21:21:16,670 build [INFO] - Build started

[...]

-- Found pybind11: /opt/rocm_sdk_611/include (found version "")
-- Configuring done (6.2s)
-- Generating done (1.3s)
-- Build files have been written to: /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release
2024-06-01 21:21:25,634 build [INFO] - Building targets for Release configuration
2024-06-01 21:21:25,636 build [INFO] - /usr/bin/cmake --build /home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release --config Release -- -j16
[  2%] Building HIP object _deps/composable_kernel-build/library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/device_gemm_xdl_streamk_f16_f16_f16_mk_kn_mn_instance.cpp.o

clang++: error: invalid target ID 'gfx906 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx1102'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')

gmake[1]: *** [CMakeFiles/Makefile2:17271: _deps/composable_kernel-build/library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[...]
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/home/ubuntu/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j16']' returned non-zero exit status 2.
build failed: onnxruntime
  error in build cmd: ./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_611 "gfx906;gfx90a;gfx940;gfx1102"

build failed

The generated clang options seem strange :

gfx906 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx1102

Any idea ?

torch_migraphx won't build with /home/<user>/.local/bin/cmake

I have a cmake installed installed using pip in my user dir (because the one of my Linux Mint 21 (Ubuntu 22.04) is too old for many projects), torch_migraphx fails to compile with it. Renaming it fixed the problem. For onnx I had to rename it back, otherwise that wouldn't work.
Doesn't the project come with it it's own cmake that should be used?

rocm sdk 6.1.2 release

I started porting to 6.1.2 version and it's now on wip/rocm_sdk_builder_612

Changelog so far:

  • all rocm base libraries updated to rocm-6.1.2
  • cmake 3.26.6 is now build on early phase after python
  • added separate binfo file for pytorch_python dependencies
  • dropped patches merged on upstreams
  • other packages updated
    • openmpi 5.0.3
    • pytorch v.2.3.1
    • pytorch v2.3.1
    • pytorch vision v0.18.1
    • onnxruntime v1.18.0
    • deepspeed 31a57fa392 from June 7, 2024
      Plan still to check at least the aotriton, ucc and ucx package updates

Trying to compile for RX5500: Unknown GPU architecture 'gfx1012'

Hello,
I'm trying to compile for a RX 5500 on Linux Mint (Ubuntu 22.04 LTS), so I only selected gfx1012, unfortunately at some point I get the error that the GPU type is unknown:

-- LIBOMPTARGET: Building DeviceRTL. Using clang: /opt/rocm_sdk_611/bin/clang, llvm-link: /opt/rocm_sdk_611/bin/llvm-link and opt: /opt/rocm_sdk_611/bin/opt
CMake Error at libomptarget/cmake/Modules/LibomptargetUtils.cmake:26 (message):
  LIBOMPTARGET: Unknown GPU architecture 'gfx1012'
Call Stack (most recent call first):
  libomptarget/DeviceRTL/CMakeLists.txt:468 (libomptarget_error_say)


-- Configuring incomplete, errors occurred!
See also "/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeOutput.log".
See also "/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeError.log".
configure failed: llvm-project_openmp

This is the CMakeError.log:

Checking whether the ASM compiler is GNU using "--version" did not match "(GNU assembler)|(GCC)|(Free Software Foundation)":
clang version 17.0.0 (/home/idn/source/rocm_sdk_builder/src_projects/llvm-project/clang 9daa9b63ac8fd6706fdb9f51be564ef0878d0eae)
Target: x86_64-mageia-linux
Thread model: posix
InstalledDir: /opt/rocm_sdk_611/bin
Performing C++ SOURCE FILE Test OPENMP_HAVE_ONEAPI_COMPILER failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_dc872/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_dc872.dir/build.make CMakeFiles/cmTC_dc872.dir/build
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird betreten
Building CXX object CMakeFiles/cmTC_dc872.dir/src.cxx.o
/opt/rocm_sdk_611/bin/clang++ -DOPENMP_HAVE_ONEAPI_COMPILER  -std=c++17 -MD -MT CMakeFiles/cmTC_dc872.dir/src.cxx.o -MF CMakeFiles/cmTC_dc872.dir/src.cxx.o.d -o CMakeFiles/cmTC_dc872.dir/src.cxx.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx
/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx:4:30: error: expected unqualified-id
    4 |                              not oneAPI
      |                              ^
1 error generated.
gmake[1]: *** [CMakeFiles/cmTC_dc872.dir/build.make:79: CMakeFiles/cmTC_dc872.dir/src.cxx.o] Fehler 1
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird verlassen
gmake: *** [Makefile:127: cmTC_dc872/fast] Fehler 2


Source file was:
#if (defined(__INTEL_CLANG_COMPILER) || defined(__INTEL_LLVM_COMPILER))
                             int main() { return 0; }
                             #else
                             not oneAPI
                             #endif
Performing C++ SOURCE FILE Test OPENMP_HAVE_WMAYBE_UNINITIALIZED_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_f8122/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_f8122.dir/build.make CMakeFiles/cmTC_f8122.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building CXX object CMakeFiles/cmTC_f8122.dir/src.cxx.o
/opt/rocm_sdk_611/bin/clang++ -DOPENMP_HAVE_WMAYBE_UNINITIALIZED_FLAG  -Wmaybe-uninitialized -std=c++17 -MD -MT CMakeFiles/cmTC_f8122.dir/src.cxx.o -MF CMakeFiles/cmTC_f8122.dir/src.cxx.o.d -o CMakeFiles/cmTC_f8122.dir/src.cxx.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx
warning: unknown warning option '-Wmaybe-uninitialized'; did you mean '-Wuninitialized'? [-Wunknown-warning-option]
1 warning generated.
Linking CXX executable cmTC_f8122
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_f8122.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_f8122.dir/src.cxx.o -o cmTC_f8122 
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'


Source file was:
int main() { return 0; }
Performing C++ SOURCE FILE Test LIBOMP_HAVE_WCLASS_MEMACCESS_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_512d9/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_512d9.dir/build.make CMakeFiles/cmTC_512d9.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building CXX object CMakeFiles/cmTC_512d9.dir/src.cxx.o
/opt/rocm_sdk_611/bin/clang++ -DLIBOMP_HAVE_WCLASS_MEMACCESS_FLAG  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic    -Wclass-memaccess -std=c++17 -MD -MT CMakeFiles/cmTC_512d9.dir/src.cxx.o -MF CMakeFiles/cmTC_512d9.dir/src.cxx.o.d -o CMakeFiles/cmTC_512d9.dir/src.cxx.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx
warning: unknown warning option '-Wclass-memaccess'; did you mean '-Wclass-varargs'? [-Wunknown-warning-option]
1 warning generated.
Linking CXX executable cmTC_512d9
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_512d9.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_512d9.dir/src.cxx.o -o cmTC_512d9 
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'


Source file was:
int main() { return 0; }
Performing C++ SOURCE FILE Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_99c33/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_99c33.dir/build.make CMakeFiles/cmTC_99c33.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building CXX object CMakeFiles/cmTC_99c33.dir/src.cxx.o
/opt/rocm_sdk_611/bin/clang++ -DLIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic    -Wstringop-overflow=0 -std=c++17 -MD -MT CMakeFiles/cmTC_99c33.dir/src.cxx.o -MF CMakeFiles/cmTC_99c33.dir/src.cxx.o.d -o CMakeFiles/cmTC_99c33.dir/src.cxx.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx
warning: unknown warning option '-Wstringop-overflow=0'; did you mean '-Wshift-overflow'? [-Wunknown-warning-option]
1 warning generated.
Linking CXX executable cmTC_99c33
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_99c33.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_99c33.dir/src.cxx.o -o cmTC_99c33 
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'


Source file was:
int main() { return 0; }
Performing C++ SOURCE FILE Test LIBOMP_HAVE_WSTRINGOP_TRUNCATION_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_f3f72/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_f3f72.dir/build.make CMakeFiles/cmTC_f3f72.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building CXX object CMakeFiles/cmTC_f3f72.dir/src.cxx.o
/opt/rocm_sdk_611/bin/clang++ -DLIBOMP_HAVE_WSTRINGOP_TRUNCATION_FLAG  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic    -Wstringop-truncation -std=c++17 -MD -MT CMakeFiles/cmTC_f3f72.dir/src.cxx.o -MF CMakeFiles/cmTC_f3f72.dir/src.cxx.o.d -o CMakeFiles/cmTC_f3f72.dir/src.cxx.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.cxx
warning: unknown warning option '-Wstringop-truncation'; did you mean '-Wstring-concatenation'? [-Wunknown-warning-option]
1 warning generated.
Linking CXX executable cmTC_f3f72
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_f3f72.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++  -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_f3f72.dir/src.cxx.o -o cmTC_f3f72 
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'


Source file was:
int main() { return 0; }
Performing C SOURCE FILE Test LIBOMP_HAVE_MMIC_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_652e9/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_652e9.dir/build.make CMakeFiles/cmTC_652e9.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_652e9.dir/src.c.o
/opt/rocm_sdk_611/bin/clang -DLIBOMP_HAVE_MMIC_FLAG  -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -mmic   -mmic -MD -MT CMakeFiles/cmTC_652e9.dir/src.c.o -MF CMakeFiles/cmTC_652e9.dir/src.c.o.d -o CMakeFiles/cmTC_652e9.dir/src.c.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.c
clang: error: unknown argument: '-mmic'
clang: error: unknown argument: '-mmic'
gmake[1]: *** [CMakeFiles/cmTC_652e9.dir/build.make:79: CMakeFiles/cmTC_652e9.dir/src.c.o] Error 1
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
gmake: *** [Makefile:127: cmTC_652e9/fast] Error 2


Source file was:
int main(void) { return 0; }
Determining if the _aligned_malloc exist failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_60bc7/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_60bc7.dir/build.make CMakeFiles/cmTC_60bc7.dir/build
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird betreten
Building C object CMakeFiles/cmTC_60bc7.dir/CheckSymbolExists.c.o
/opt/rocm_sdk_611/bin/clang   -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -MD -MT CMakeFiles/cmTC_60bc7.dir/CheckSymbolExists.c.o -MF CMakeFiles/cmTC_60bc7.dir/CheckSymbolExists.c.o.d -o CMakeFiles/cmTC_60bc7.dir/CheckSymbolExists.c.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/CheckSymbolExists.c
/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier '_aligned_malloc'
    8 |   return ((int*)(&_aligned_malloc))[argc];
      |                   ^
1 error generated.
gmake[1]: *** [CMakeFiles/cmTC_60bc7.dir/build.make:79: CMakeFiles/cmTC_60bc7.dir/CheckSymbolExists.c.o] Fehler 1
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird verlassen
gmake: *** [Makefile:127: cmTC_60bc7/fast] Fehler 2


File /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <malloc.h>

int main(int argc, char** argv)
{
  (void)argv;
#ifndef _aligned_malloc
  return ((int*)(&_aligned_malloc))[argc];
#else
  (void)argc;
  return 0;
#endif
}
Performing C SOURCE FILE Test LIBOMP_HAVE_UNDEFINED_VERSION_FLAG failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_93e2b/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_93e2b.dir/build.make CMakeFiles/cmTC_93e2b.dir/build
gmake[1]: Entering directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_93e2b.dir/src.c.o
/opt/rocm_sdk_611/bin/clang -DLIBOMP_HAVE_UNDEFINED_VERSION_FLAG  -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -Wl,--undefined-version -MD -MT CMakeFiles/cmTC_93e2b.dir/src.c.o -MF CMakeFiles/cmTC_93e2b.dir/src.c.o.d -o CMakeFiles/cmTC_93e2b.dir/src.c.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.c
clang: warning: -Wl,--undefined-version: 'linker' input unused [-Wunused-command-line-argument]
Linking C executable cmTC_93e2b
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_93e2b.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -Wl,--undefined-version -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_93e2b.dir/src.c.o -o cmTC_93e2b 
/usr/bin/ld: unrecognized option '--undefined-version'
/usr/bin/ld: use the --help option for usage information
clang: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[1]: *** [CMakeFiles/cmTC_93e2b.dir/build.make:100: cmTC_93e2b] Error 1
gmake[1]: Leaving directory '/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp'
gmake: *** [Makefile:127: cmTC_93e2b/fast] Error 2


Source file was:
int main(void) { return 0; }
Performing C SOURCE FILE Test LIBOMP_HAVE_PSAPI failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_78526/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_78526.dir/build.make CMakeFiles/cmTC_78526.dir/build
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird betreten
Building C object CMakeFiles/cmTC_78526.dir/src.c.o
/opt/rocm_sdk_611/bin/clang -DLIBOMP_HAVE_PSAPI  -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic  -MD -MT CMakeFiles/cmTC_78526.dir/src.c.o -MF CMakeFiles/cmTC_78526.dir/src.c.o.d -o CMakeFiles/cmTC_78526.dir/src.c.o -c /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.c
/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp/src.c:1:10: fatal error: 'windows.h' file not found
    1 | #include <windows.h>
      |          ^~~~~~~~~~~
1 error generated.
gmake[1]: *** [CMakeFiles/cmTC_78526.dir/build.make:79: CMakeFiles/cmTC_78526.dir/src.c.o] Fehler 1
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird verlassen
gmake: *** [Makefile:127: cmTC_78526/fast] Fehler 2


Source file was:
#include <windows.h>
  #include <psapi.h>
  int main(int artc, char** argv) {
    return EnumProcessModules(NULL, NULL, 0, NULL);
  }
Determining if the function __atomic_load_1 exists failed with the following output:
Change Dir: /home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_056d4/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_056d4.dir/build.make CMakeFiles/cmTC_056d4.dir/build
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird betreten
Building C object CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o
/opt/rocm_sdk_611/bin/clang   -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -DCHECK_FUNCTION_EXISTS=__atomic_load_1 -MD -MT CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o -MF CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o.d -o CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o -c /usr/share/cmake-3.22/Modules/CheckFunctionExists.c
Linking C executable cmTC_056d4
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_056d4.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -DCHECK_FUNCTION_EXISTS=__atomic_load_1 -L/usr/lib64 -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hip/lib -Wl,-rpath-link,/usr/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/lib -Wl,-rpath-link,/opt/rocm_sdk_611/lib64 -Wl,-rpath-link,/opt/rocm_sdk_611/hip/lib  CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o -o cmTC_056d4 
/usr/bin/ld: CMakeFiles/cmTC_056d4.dir/CheckFunctionExists.c.o: in function `main':
CheckFunctionExists.c:(.text+0x17): undefined reference to `__atomic_load_1'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[1]: *** [CMakeFiles/cmTC_056d4.dir/build.make:100: cmTC_056d4] Fehler 1
gmake[1]: Verzeichnis „/home/idn/source/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/CMakeFiles/CMakeTmp“ wird verlassen
gmake: *** [Makefile:127: cmTC_056d4/fast] Fehler 2

Is this a bug or am I doing something wrong?

/usr/bin/ld: cannot find -lmsgpackc-cxx: No such file or directory

OS: Ubuntu 24.04
GPU: Sapphire Radeon RX 7900 XTX Nitro+ Vapor-X Aktiv PCIe 4.0 x16

cd /home/flip111/programs/src/rocm_sdk_builder/builddir/035_AMDMIGraphX/src && /usr/bin/cmake -E cmake_link_script CMakeFiles/migraphx.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -fPIC -O3 -DNDEBUG -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib -shared -Wl,-soname,libmigraphx.so.2009000 -o ../lib/libmigraphx.so.2009000.0.60101 CMakeFiles/migraphx.dir/adjust_allocation.cpp.o CMakeFiles/migraphx.dir/analyze_streams.cpp.o CMakeFiles/migraphx.dir/apply_alpha_beta.cpp.o CMakeFiles/migraphx.dir/argument.cpp.o CMakeFiles/migraphx.dir/autocast_fp8.cpp.o CMakeFiles/migraphx.dir/auto_contiguous.cpp.o CMakeFiles/migraphx.dir/common.cpp.o CMakeFiles/migraphx.dir/common_dims.cpp.o CMakeFiles/migraphx.dir/compile_src.cpp.o CMakeFiles/migraphx.dir/convert_to_json.cpp.o CMakeFiles/migraphx.dir/cpp_generator.cpp.o CMakeFiles/migraphx.dir/dead_code_elimination.cpp.o CMakeFiles/migraphx.dir/dom_info.cpp.o CMakeFiles/migraphx.dir/dynamic_loader.cpp.o CMakeFiles/migraphx.dir/eliminate_allocation.cpp.o CMakeFiles/migraphx.dir/eliminate_common_subexpression.cpp.o CMakeFiles/migraphx.dir/eliminate_concat.cpp.o CMakeFiles/migraphx.dir/eliminate_contiguous.cpp.o CMakeFiles/migraphx.dir/eliminate_convert.cpp.o CMakeFiles/migraphx.dir/eliminate_data_type.cpp.o CMakeFiles/migraphx.dir/eliminate_identity.cpp.o CMakeFiles/migraphx.dir/eliminate_pad.cpp.o CMakeFiles/migraphx.dir/env.cpp.o CMakeFiles/migraphx.dir/file_buffer.cpp.o CMakeFiles/migraphx.dir/fp_to_double.cpp.o CMakeFiles/migraphx.dir/fuse_concat.cpp.o CMakeFiles/migraphx.dir/fuse_pointwise.cpp.o CMakeFiles/migraphx.dir/fuse_reduce.cpp.o CMakeFiles/migraphx.dir/generate.cpp.o CMakeFiles/migraphx.dir/inline_module.cpp.o CMakeFiles/migraphx.dir/insert_pad.cpp.o CMakeFiles/migraphx.dir/instruction.cpp.o CMakeFiles/migraphx.dir/json.cpp.o CMakeFiles/migraphx.dir/layout_nhwc.cpp.o CMakeFiles/migraphx.dir/load_save.cpp.o CMakeFiles/migraphx.dir/make_op.cpp.o CMakeFiles/migraphx.dir/memory_coloring.cpp.o CMakeFiles/migraphx.dir/module.cpp.o CMakeFiles/migraphx.dir/msgpack.cpp.o CMakeFiles/migraphx.dir/normalize_attributes.cpp.o CMakeFiles/migraphx.dir/normalize_ops.cpp.o CMakeFiles/migraphx.dir/op_enums.cpp.o CMakeFiles/migraphx.dir/operation.cpp.o CMakeFiles/migraphx.dir/optimize_module.cpp.o CMakeFiles/migraphx.dir/pad_calc.cpp.o CMakeFiles/migraphx.dir/pass.cpp.o CMakeFiles/migraphx.dir/pass_manager.cpp.o CMakeFiles/migraphx.dir/permutation.cpp.o CMakeFiles/migraphx.dir/preallocate_param.cpp.o CMakeFiles/migraphx.dir/process.cpp.o CMakeFiles/migraphx.dir/program.cpp.o CMakeFiles/migraphx.dir/propagate_constant.cpp.o CMakeFiles/migraphx.dir/promote_literals.cpp.o CMakeFiles/migraphx.dir/quantization.cpp.o CMakeFiles/migraphx.dir/quantize_fp16.cpp.o CMakeFiles/migraphx.dir/quantize_8bits.cpp.o CMakeFiles/migraphx.dir/reduce_dims.cpp.o CMakeFiles/migraphx.dir/register_op.cpp.o CMakeFiles/migraphx.dir/register_target.cpp.o CMakeFiles/migraphx.dir/replace_allocate.cpp.o CMakeFiles/migraphx.dir/rewrite_reduce.cpp.o CMakeFiles/migraphx.dir/simplify_qdq.cpp.o CMakeFiles/migraphx.dir/sqlite.cpp.o CMakeFiles/migraphx.dir/rewrite_gelu.cpp.o CMakeFiles/migraphx.dir/rewrite_pooling.cpp.o CMakeFiles/migraphx.dir/rewrite_quantization.cpp.o CMakeFiles/migraphx.dir/rewrite_rnn.cpp.o CMakeFiles/migraphx.dir/schedule.cpp.o CMakeFiles/migraphx.dir/serialize.cpp.o CMakeFiles/migraphx.dir/shape.cpp.o CMakeFiles/migraphx.dir/simplify_algebra.cpp.o CMakeFiles/migraphx.dir/simplify_dyn_ops.cpp.o CMakeFiles/migraphx.dir/simplify_reshapes.cpp.o CMakeFiles/migraphx.dir/split_single_dyn_dim.cpp.o CMakeFiles/migraphx.dir/target.cpp.o CMakeFiles/migraphx.dir/tmp_dir.cpp.o CMakeFiles/migraphx.dir/value.cpp.o CMakeFiles/migraphx.dir/verify_args.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_abs_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_acosh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_acos_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_allocate_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_argmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_argmin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_asinh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_asin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_as_shape_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_atanh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_atan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_broadcast_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_capture_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_ceil_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_clip_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_concat_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_contiguous_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convert_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convolution_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_convolution_backwards_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_cosh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_cos_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dequantizelinear_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dimensions_of_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_div_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_dot_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_elu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_equal_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_erf_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_exp_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_fill_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_flatten_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_floor_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_fmod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gather_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gathernd_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_get_tuple_elem_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_greater_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_gru_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_identity_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_if_op_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_im2col_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_isinf_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_isnan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_layout_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_leaky_relu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_less_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_load_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_log_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_and_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_or_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logical_xor_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_logsoftmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_loop_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_lrn_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_lstm_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_mod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_multibroadcast_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_multinomial_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nearbyint_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_neg_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nonmaxsuppression_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_nonzero_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_outline_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pad_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pointwise_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pooling_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_pow_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_prefix_scan_sum_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_prelu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quant_convolution_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quant_dot_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_quantizelinear_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_random_uniform_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_random_seed_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_recip_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_mean_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_prod_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reduce_sum_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_relu_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reshape_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reshape_lazy_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_reverse_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_last_cell_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_last_hs_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_var_sl_last_output_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_roialign_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rsqrt_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_run_on_target_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scalar_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_none_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatter_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_add_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_mul_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_none_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_max_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_scatternd_min_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_select_module_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sigmoid_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sign_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sinh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sin_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_slice_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_softmax_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sqdiff_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sqrt_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_squeeze_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_step_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_sub_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_tanh_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_tan_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_topk_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_transpose_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unary_not_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_undefined_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unique_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unknown_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_unsqueeze_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_where_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_op_rnn_variable_seq_lens_hpp.cpp.o CMakeFiles/migraphx.dir/ops/migraphx_builtin_hpp.cpp.o  -Wl,-rpath,/home/flip111/programs/src/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib: -lstdc++fs -ldl /usr/lib/x86_64-linux-gnu/libsqlite3.so -lmsgpackc-cxx
/usr/bin/ld: cannot find -lmsgpackc-cxx: No such file or directory
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/CMakeFiles/migraphx.dir/build.make:3426: lib/libmigraphx.so.2009000.0.60101] Error 1
make[2]: Leaving directory '/home/flip111/programs/src/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make[1]: *** [CMakeFiles/Makefile2:5912: src/CMakeFiles/migraphx.dir/all] Error 2
make[1]: Leaving directory '/home/flip111/programs/src/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make: *** [Makefile:166: all] Error 2
build failed: AMDMIGraphX

Update Python

The current Python implementation works great, however I think it may be the time for an update since a few projects such as Stable Diffusion Next are now dropping support for any Python version older than 3.10 (3.11 would be preferrable).

If it's not too hard it would be great to be able to choose between a Python version with a selector, just like the GPU selection menu, this way users would still be able to use Python 3.9 while also being able to install 3.10 or 3.11.

I'll try experimenting with it on my PC.

aotriton build issue

When trying to build aotriton the ram usage sky rockets and even starts using swap memory, is there a way to fix this? The system becomes unusable during that stage.
I used the rocm_sdk_builder_611_bg12_amdmigraphx branch with this instructions to avoid the previous amdmigraphx build failed error

Unable to build aotriton

I'm running a clean build and aotriton refuses to build. The error comes from files that aren't found in the system, could this be related to one of the last commits applied to its patches?

make[3]: *** [Makefile.compile:11: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** Waiting for unfinished jobs....
zstd: can't stat /home/daniandtheweb/WorkSpace/rocm_sdk_builder/builddir/038_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:35: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_612/bin/zstd -f /home/daniandtheweb/WorkSpace/rocm_sdk_builder/builddir/038_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco 
zstd: can't stat /home/daniandtheweb/WorkSpace/rocm_sdk_builder/builddir/038_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:47: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
...

csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory

While running ./babs.sh -b i received this error:

building 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension
creating build/temp.linux-x86_64-cpython-39
creating build/temp.linux-x86_64-cpython-39/csrc
creating build/temp.linux-x86_64-cpython-39/csrc/cpu
creating build/temp.linux-x86_64-cpython-39/csrc/cpu/comm
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -fPIC -I/home/eitch/src/compile_temp/rocm_sdk_builder/src_projects/DeepSpeed/csrc/cpu/includes -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/TH -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/THC -I/opt/rocm_sdk_611/include/python3.9 -c csrc/cpu/comm/ccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/cpu/comm/ccl.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O2 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm_op -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory
    8 | #include <oneapi/ccl.hpp>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
build failed: DeepSpeed
  error in build cmd: ./build_deepspeed_rocm.sh

build failed

I'm, running on Ubuntu:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 23.10
Release:  23.10
Codename: mantic

And I'm using a RX 7900 XTX

How does rocWMMA/hipBLASlt/etc work on gfx103x?

Hi, Gentoo enthusiast here,

Did it make any sense to add gfx1030 and gfx1035 to rocWMMA/hipBLASlt? As far as I understand their code, it contains hard dependencies in WMMA or MFMA instruction set, and RDNA2 (gfx103x) supports neither of them: https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1030.html . This means that at least tests will fail (if it even compiles).

Additionally, I noticed in other libraries like rocFFT code like #if defined(__gfx803__)|| defined(__gfx900__) || ... which restricts execution code paths to specific models, causing crashes if you blindly add gfx103x to AMDGPU_TARGETS. I mentioned about it earlier on gentoo/gentoo#33400 (comment) and the solution I proposed is to build for officially supported target and patch runtime so that it tries to load compatible kernels.

CPU architecture selection in blis project

In BLIS project the cpu architecture is now configured to ZEN2. (Configure option before build)
Should the BLIS build script check the CPU architecture really in use and change that option?

Collection of pytorch gpu benchmark results

Extensive GPU benchmarks with AMD gpus can now be run by following steps after building the rocm sdk. This version has now been synced with the upstream version which has fixed the pytorch 2.0 support in another way I had done earlier and it runs all the tests without running exceptions.

git clone https://github.com/lamikr/pytorch-gpu-benchmark
cd pytorch-gpu-benchmark
source /opt/rocm_sdk_611/bin/env_rocm.sh
./test.sh

It would be nice to collect results from different computers and create some comparison graphs.
On my AMD RX 6800 test execution was about 50 minutes and results were saved to
result-folder to following 8 files.

'AMD Radeon RX 6800_1_gpus__double_model_inference_benchmark.csv'  'AMD Radeon RX 6800_1_gpus__half_model_inference_benchmark.csv'
'AMD Radeon RX 6800_1_gpus__double_model_train_benchmark.csv'      'AMD Radeon RX 6800_1_gpus__half_model_train_benchmark.csv'
'AMD Radeon RX 6800_1_gpus__float_model_inference_benchmark.csv'    config.json
'AMD Radeon RX 6800_1_gpus__float_model_train_benchmark.csv'        system_info.txt

I have stored those from my benchmark run now to results/AMD_Radeon_RX_6800 folder of gpu benchmark.

So if you have done the tests, can you send them as a pull requests. At the moment the plot.ipynb code which should read the CSV files and generate pictures seems to be broken so that needs to be fixed...

clean mechanism for python project build files under it's src directory

Python projects pytorch, pytorch_audio, pytorch_vision, onnxruntime, DeepSpeed, torch_migraphx and bitsandbytes does not build cleanly under builddir but instead under src-directory.

When doing a clean build, files under "src_projects/pytorch/build" for example does not get cleaned up and can prevent project reconfiguration if patches or other dependencies have changed.

Need to have a way to clean them properly and I was first thinking that there may be need for new hook-call for python.

Ended up easier solution where each of these projects should have preconfig_rocm.sh script that can be called on project preconfig time. This way it's only called once and for each clean build. And if it is wanted to call again, user should remove it's build dir. (for example builddir/039_01_pytorch)

004_01_roct-thunk-interface_shared fails to build during DEB/RPM package build

Without dpkg or rpmbuild installed, the build errors out with:

[…]
[100%] Linking C shared library libhsakmt.so
[100%] Built target hsakmt
Run CPack packaging tool...
CPack: Create package using DEB
CPack: Install projects
CPack: - Run preinstall target for: hsakmt
CPack: - Install project: hsakmt []
CPack: -   Install component: devel
CPack: Create package
-- CPackDeb: Can not find dpkg in your path, default to i386.
CPack: - package: /scratch/local2/pmenzel/rocm_sdk_builder/builddir/004_01_roct-thunk-interface_shared/hsakmt-roct-dev_6.1.1.60101-local_.deb generated.
CPack: Create package using RPM
CPack: Install projects
CPack: - Run preinstall target for: hsakmt
CPack: - Install project: hsakmt []
CPack: -   Install component: devel
CPack: Create package
CMake Error at /usr/share/cmake-3.25/Modules/Internal/CPack/CPackRPM.cmake:822 (message):
  RPM package requires rpmbuild executable
Call Stack (most recent call first):
  /usr/share/cmake-3.25/Modules/Internal/CPack/CPackRPM.cmake:1968 (cpack_rpm_generate_package)


CPack Error: Error while execution CPackRPM.cmake
CPack Error: Problem compressing the directory
CPack Error: Error when generating package: hsakmt
make: *** [Makefile:71: package] Error 1
build failed: ROCT-Thunk-Interface_shared
  error in build cmd: make package

build failed

make package seems to be explicitly called:

BINFO_APP_BUILD_CMD_ARRAY=(
"make package"
)

I was able to build it manually just with make, but changing the binfo file from make package to make resulted in the same error. (No idea, if some build scripts need to be regenerated.)

Nice to have: checking git global variable

It would be good if ./babs.sh -i refused to run when user.name and user.email is not set globally. Also it would probably be good to abort the script as soon as one of the patches fail to apply.

gfx803 support

:c any hopes for the gfx803 to be supported and be built for?

hipBLASLt Build Error - TypeError: '<' not supported between instances of 'str' and 'bool'

Ubuntu 22.04 - Building for gfx1101 and gfx1102

Steps to reproduce

# git clone https://github.com/lamikr/rocm_sdk_builder.git
# cd rocm_sdk_builder
# git checkout releases/rocm_sdk_builder_611
# ./babs.sh -i
# ./install_deps.sh
# ./babs.sh -b
Traceback (most recent call last):
  File "/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt/library/../virtualenv/lib/python3.9/site-packages/Tensile/bin/TensileCreateLibrary", line 43, in <module>
    TensileCreateLibrary()
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/TensileCreateLibrary.py", line 1218, in TensileCreateLibrary
    kernelMinNaming, _ = getKernelWriters(solutions, kernels)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/TensileCreateLibrary.py", line 630, in getKernelWriters
    kernelSerialNaming   = Solution.getSerialNaming(kernels)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/SolutionStructs.py", line 4823, in getSerialNaming
    data[paramName] = sorted(data[paramName])
TypeError: '<' not supported between instances of 'str' and 'bool'
make[2]: *** [library/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/build.make:74: Tensile/library/TensileManifest.txt] Error 1
make[2]: Leaving directory '/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt'
make[1]: *** [CMakeFiles/Makefile2:249: library/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/all] Error 2
make[1]: Leaving directory '/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt'
make: *** [Makefile:166: all] Error 2
build failed: hipBLASLt
Build failed

add triton package

At the moment include the AMD's aotriton which uses amd's triton internally.
We should however build and install triton separately as it's used by many machine learning programs.

Upstream triton has now support for many AMD rocm featues so it make's sense to try to use that now. (even if it still miss some features from AMD's internal triton)

nice to have: Support for patches in git submodules

Current patch mechanism only support patches in main-projects which is one reason why aotriton project has been flattened to single repo. (aotriton)

Implementation idea based on to aotriton which has 3 git submodules located in directories

third_party/triton/
third_party/pypind11/
third_party/incbin/

Under the projects main patch dir, there could be similar sub-directory structure than in main project to easy for marking the directories where to do the patch apply. For example in the aotriton project the patches directory structure could be

patches/rocm-6.1.1/aotriton/001-aotriton-main-project_change1.patch
patches/rocm-6.1.1/aotriton/third_party/triton/change1.patch
patches/rocm-6.1.1/aotriton/third_party/triton/change2.patch
patches/rocm-6.1.1/aotriton/third_party/pypind11/change1.patch

Ubuntu 22.04 : build fails in build_deepspeed_rocm.sh (missing oneapi/ccl.hpp)

OS : ubuntu 22.04 (ubuntu 22.04 LTS cloud image : https://cloud-images.ubuntu.com/jammy/current/)
Build process :

git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
git checkout releases/rocm_sdk_builder_611
./install_deps.sh
./babs.sh -i
./babs.sh -b

Error :

/opt/rocm_sdk_611/lib/python3.9/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'deepspeed.autotuning.config_templates' is absent from the `packages` configuration.
/opt/rocm_sdk_611/lib/python3.9/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'deepspeed.inference.v2.kernels.core_ops.cuda_linear.include' is absent from the `packages` configuration.
/opt/rocm_sdk_611/lib/python3.9/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'deepspeed.inference.v2.kernels.cutlass_ops.shared_resources' is absent from the `packages` configuration.
/opt/rocm_sdk_611/lib/python3.9/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'deepspeed.inference.v2.kernels.includes' is absent from the `packages` configuration.
...
/opt/rocm_sdk_611/lib/python3.9/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'deepspeed.ops.csrc.xpu.packbits' is absent from the `packages` configuration.
...
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -fPIC -I/home/ubuntu/rocm_sdk_builder/src_projects/DeepSpeed/csrc/cpu/includes -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/TH -I/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/include/THC -I/opt/rocm_sdk_611/include/python3.9 -c csrc/cpu/comm/ccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/cpu/comm/ccl.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O2 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm_op -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory
    8 | #include <oneapi/ccl.hpp>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
build failed: DeepSpeed
  error in build cmd: ./build_deepspeed_rocm.sh
Build failed

rocm_sdk_builder version data

It would be good to have also the babs version and rocm_sdk_builder's git hash version stored.

Suggested content of file /opt/rocm_sdk_xxx/.info/rocm_sdk_builder:

SDK_BUILDER_VERSION: 6.1.1-2
BABS_VERSION: 2024_06_02
SRC_COMMIT: 63h7h226

onnxruntime error building for 6.1.2: '...allocated_capacity' is unused uninitialized

Attempting to build the wip/rocm_sdk_builder_612 branch with the Python patch from #70 applied (which is likely unrelated, but mentioned for completeness) on a fully up-to-date Manjaro unstable produces a peculiar error that I've found myself unable to troubleshoot. The header path (/usr/include/absl) seems to indicate it's using a global include file from the system's GCC, rather than clang or hipcc; I'm not sure whether that's intended or not.

And it's been a while since I've written C++ but the error itself is a mystery to me as well: it seems to complain about an uninitialized member being present when invoking a copy constructor, but the instances are being initialized using the default constructor.

This is gcc 14.1.1 20240522.

In file included from /usr/include/absl/container/inlined_vector.h:53,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/common/inlined_containers_fwd.h:25,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:13,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor.h:15,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:5:
In member function ‘void absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>::MemcpyFrom(const absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>::InlinedVector(const absl::lts_20240116::InlinedVector<T, N, A>&, const allocator_type&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’ at /usr/include/absl/container/inlined_vector.h:195:26,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>::InlinedVector(const absl::lts_20240116::InlinedVector<T, N, A>&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’ at /usr/include/absl/container/inlined_vector.h:177:59,
    inlined from ‘virtual void WriteScores_single_score_transform_none_Test::TestBody()’ at /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:61:29:
/usr/include/absl/container/internal/inlined_vector.h:532:5: error: ‘v1.absl::lts_20240116::InlinedVector<float, 11, std::allocator<float> >::storage_.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::data_.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::Data::allocated.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::Allocated::allocated_capacity’ is used uninitialized [-Werror=uninitialized]
  532 |     data_ = other_storage.data_;
      |     ^~~~~
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc: In member function ‘virtual void WriteScores_single_score_transform_none_Test::TestBody()’:
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:58:24: note: ‘v1’ declared here
   58 |   InlinedVector<float> v1;
      |                        ^~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/onnxruntime_test_all.dir/build.make:3296: CMakeFiles/onnxruntime_test_all.dir/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /usr/include/absl/container/inlined_vector.h:53,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/common/inlined_containers_fwd.h:25,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:13,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor.h:15,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:9:
In member function ‘void absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>::MemcpyFrom(const absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’,
    inlined from ‘void absl::lts_20240116::InlinedVector<T, N, A>::MoveAssignment(MemcpyPolicy, absl::lts_20240116::InlinedVector<T, N, A>&&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’ at /usr/include/absl/container/inlined_vector.h:856:24,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>& absl::lts_20240116::InlinedVector<T, N, A>::operator=(absl::lts_20240116::InlinedVector<T, N, A>&&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’ at /usr/include/absl/container/inlined_vector.h:548:21,
    inlined from ‘virtual void onnxruntime::test::ReductionOpTest_OptimizeShapeForFastReduce_KR_neg_Test::TestBody()’ at /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:4255:43:
/usr/include/absl/container/internal/inlined_vector.h:532:5: error: ‘expected_fast_axes.absl::lts_20240116::InlinedVector<long int, 6, std::allocator<long int> >::storage_.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::data_.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::Data::allocated.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::Allocated::allocated_capacity’ is used uninitialized [-Werror=uninitialized]
  532 |     data_ = other_storage.data_;
      |     ^~~~~
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc: In member function ‘virtual void onnxruntime::test::ReductionOpTest_OptimizeShapeForFastReduce_KR_neg_Test::TestBody()’:
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:4247:70: note: ‘expected_fast_axes’ declared here
 4247 |   TensorShapeVector expected_fast_shape, expected_fast_output_shape, expected_fast_axes;
      |                                                                      ^~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/onnxruntime_test_all.dir/build.make:3618: CMakeFiles/onnxruntime_test_all.dir/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3251: CMakeFiles/onnxruntime_test_all.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
Traceback (most recent call last):
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2950, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2842, in main
    build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 1731, in build_targets
    run_subprocess(cmd_args, env=env)
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 861, in run_subprocess
    return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/python/util/run.py", line 49, in run
    completed_process = subprocess.run(
                        ^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j12']' returned non-zero exit status 2.
build failed: onnxruntime
  error in build cmd: ./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_612 "gfx1030"
Build failed

Unable to build onnxruntime (40_01)

The system I'm using is Ubuntu 23.10 running in distrobox (podman container) and building for 5700XT.
I'm unable to build onnxruntime as it doesn't allow running as root to do it, however when running the build command without root it can't install the package so there's no way to get past this step.

Nice to have: attempt to patch again when sources were already downloaded

I thought i could get away with not setting --global on the git user/email setting. That's required however.

Now all the sources downloaded but the patches were not applied. When i rerun ./babs.sh -i it doesn't apply the patches. Would be nice if the script could check if the patches were not applied and then attempt to patch again.

rocm_sdk_builder internal release version

At the moment there is no way to check the SDK's release version from the build system as 4:th number in
/etc/.info/version is 9999

cat /opt/rocm_sdk_611/.info/version
6.1.1-9999

This should be now 6.1.1-1 and soon we could release 6.1.1-2 as there is now much more Linux distros supported.
This can be fixed in binfo/001_rocm_core.binfo by passing the BUILD_ID variable.

gfx1102 : import torchaudio : Caught signal 11 (Segmentation fault: address not mapped to object at address 0x41b40)

Built rocm_sdk_builder on Ubuntu 22.04 with Linux Kernel 6.10-rc2

Simple import torchaudio yielded

minipc@minipc:~/aipython$ python
Python 3.9.19 (tags/v3.9.19-dirty:882f62bd93, Jun 14 2024, 12:20:42)
[GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
[minipc:2334463:0:2334463] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x41b40)
Segmentation fault (core dumped)

Upgraded torchaudio with:

minipc@minipc:~/aipython$ pip install --upgrade torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
...
Successfully installed pytorch-triton-rocm-2.3.1 torch-2.3.1+rocm6.0 torchaudio-2.3.1+rocm6.0

Import afterward resulted in:

minipc@minipc:~/aipython$ python
Python 3.9.19 (tags/v3.9.19-dirty:882f62bd93, Jun 14 2024, 12:20:42)
[GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
Segmentation fault (core dumped)

Save Python wheel packages in a separate folder

It could be nice to copy all the built Python wheel packages into a separate folder, this way it would be easier to install them in a python virtual environment without the need to use the --system-site-packages and it could also improve the testing process by having different package versions ready to test saved in there without the need for rebuilding every time.

Nice to have: Contributors.md or txt file

Now that many has helped to test and provide patches, it would be nice to have a file listing people who has helped.
Addition of name or nickname would be voluntarily.

./run_pytorch_gpu_simple_test.sh fails after successful build (gfx1010)

I am using Ubuntu 22.04 with an AMD RX 5700 graphics card (gfx1010) with the driver being installed with amdgpu-install from the repo.radeon.com repository for version 6.1.3 (amdgpu-install --usecase=graphics).
In the babs.sh -i step i selected gfx1010 target and i used no HSA_OVERRIDE_GFX_VERSION. After a few tries and executing sudo apt install libstdc++-12-dev libgfortran-12-dev gfortran-12 the whole project compiled in about 16 hours (probably took so long due to 16 GB RAM). The babs.sh -b command says it has been successful. and rocminfo outputs the following:

ROCk module version 6.7.0 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32690056(0x1f2cf88) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32690056(0x1f2cf88) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32690056(0x1f2cf88) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5700                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
  Chip ID:                 29471(0x731f)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1750                               
  BDFID:                   1792                               
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 149                                
  SDMA engine uCode::      35                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

but the pytorch example exits almost immediately:

./run_pytorch_gpu_simple_test.sh
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1010:xnack-
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1010:xnack-
tensor([-0.8387], device='cuda:0')

The other examples mentioned in the README.md seem to work fine/ don't crash. i don't exactly know what output to expect though.
I have tried the releases/rocm_sdk_builder_611 and releases/rocm_sdk_builder_612 branches without any luck so far.
Unfortunately i have no idea if that might be caused by a driver problem or a configuration problem or something else.
The README.md states that RX 5700 has been tested but there is no mention of an modified build/install procedure or a specific branch to use. I would appreciate any information on what could be causing this (i think maybe aotriton, but i know very little about rocm)

cmake fails for AMDMIGraphX: ROCMTest not found

I'm building for gfx1031 on Linux Mint (Ubuntu 22.04 LTS) and it now fails with AMDMIGraphX during the cmake configure:

CMake Error at CMakeLists.txt:311 (include):
  include could not find requested file:

    ROCMTest

and at the end

CMake Error at CMakeLists.txt:319 (rocm_enable_test_package):
  Unknown CMake command "rocm_enable_test_package".

I tried copying the ROCMTest.cmake I found in rocm-cmake, but then I get this error:

CMake Error at cmake/ROCMTest.cmake:25 (rocm_define_property):
  Unknown CMake command "rocm_define_property".
Call Stack (most recent call first):
  CMakeLists.txt:311 (include)

Any idea how to solve that?

rocm-smi builds librocm_smi64.so.2.8 instead of librocm_smi64.so.7.0 breaking rocm-smi

It seems that the upstream public rocm_smi_lib github repo is missing the new
rsmi_pkg_ver-7.0.0 tag that the build system is using to find the soname version numbers used for build.

When those are missing, it fallbacks to 2.8.0 version.

I created upstream bug ROCm/rocm_smi_lib#178

but until it is fixed, rocm_sdk_builder will need to patch the rocm_smi_lib to handle the issue in some other way.

This should help on fixing the rocm-smi that fails to find the library.

f6e807687e breaks the build

Something gets broken on builds when f6e8076 is applied. Clean build failed on building the SuiteSparse and complaints from following error:

/opt/rocm_sdk_611/bin/clang++ -O3 -DNDEBUG -Wno-extra-semi-stmt -Wno-extra-semi-stmt -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib CMakeFiles/mongoose_exe.dir/Executable/mongoose.cpp.o -o suitesparse_mongoose -Wl,-rpath,/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/builddir/023_04_SuiteSparse/Mongoose:/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/builddir/023_04_SuiteSparse/SuiteSparse_config: libsuitesparse_mongoose.so.3 ../SuiteSparse_config/libsuitesparseconfig.so.7.7.0
/usr/bin/ld: ../SuiteSparse_config/libsuitesparseconfig.so.7.7.0: undefined reference to `omp_get_wtime@VERSION'

When that patch is reversed, build works ok. Problem is that the patch is huge and touches multiple files so it's not easy to track what breaks there.

LLVM default triple

This is a small thing, but:

BINFO_APP_CMAKE_CFG="${BINFO_APP_CMAKE_CFG} -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-mageia-linux"

Although explicitly referencing mageia is obviously fine for the owner of this repo :) and it seems to cause no difficulties on other distros since the relevant part is x86_64, wouldn't it make more sense to use x86_64-unknown-linux-gnu here? The explicit reference to mageia at the very least looks weird in build commands.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.