Comments (5)
We've pinned this down to the libfabric dependency we include with OpenMPI: when libfabric is removed as a dependency, the problem no longer occurs.
This requires rebuilding OpenMPI, so that's painful on a production system.
As a workaround, you can instruct OpenMPI to not use libfabric by passing the following options to mpirun
:
mpirun -mca pml ucx -mca btl '^uct,ofi' -mca mtl '^ofi'
Or equivalently, you can set the following environment variables:
export OMPI_MCA_btl='^uct,ofi'
export OMPI_MCA_pml='ucx'
export OMPI_MCA_mtl='^ofi'
from easybuild-easyconfigs.
SURF (@casparvl) also saw a very similar issue, they worked around it by setting $FI_PROVIDER
to verbs
...
The error there was Invalid argument
though, more similar to what was reported in openucx/ucx#9468
from easybuild-easyconfigs.
I've not seen this, but we added the following in December 2021 because of another issue we'd seen
# avoid libfabric warning "unknown link width 0x10"
# see https://github.com/ComputeCanada/software-stack-config/pull/19
setenv OMPI_MCA_mtl "^ofi"
setenv OMPI_MCA_btl "^openib,ofi"
from easybuild-easyconfigs.
Yep, we get these:
avoid libfabric warning "unknown link width 0x10"
As well regularly. We also solved it by setting those envrionment variables (I believe, would need to check what we set exactly)
from easybuild-easyconfigs.
More detail on what I hit here btw
from easybuild-easyconfigs.
Related Issues (20)
- Compilation of tests fails when building GROMACS 2023.3
- Quantum espresso 7.2 gives segmentation faults if compiled with OpenMP HOT 1
- NAMD-2.14: Charm++ dependency cannot be installed as configured for non-Intel toolchains HOT 6
- PyTorch-2.1.2-foss-2023a.eb test failure in jsc-zen3
- Perl-bundle-CPAN-5.36.1-GCCcore-12.3.0.eb bogus atime tests, failed on NFS HOT 1
- TensorFlow-2.11.0-foss-2022a-CUDA-11.7.0.eb HOT 1
- PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb still fails with too many errors after #20156 HOT 5
- Support for Miniforge3
- XZ uninstallable due to URL being down (security backdoor etc). HOT 1
- impi/2021.9.0-intel-compilers-2023.1.0 sanity check fails when using RPATH due to missing libfabric
- CuPy requires --cuda-compute-capabilities flag to install on some systems
- GROMACS 2024.1 test fails due to time outs on AMD-ZEN2 . HOT 8
- OpenSSL/1.1 fails the sanity test when using a proxy with authentication on Rocky 9
- Clang-13.0.1-GCCcore-11.2.0.eb build failure in Ubuntu 24.04
- TensorFlow-2.8.0-foss-2021b.eb issue with tf-estimator-nightly HOT 1
- HOOMD-blue-4.0.1-CUDA-11.7.0 installation fails on remote GPU (libldap.so.2 not found)
- cannot build NVHPC on ubuntu (and workaround) HOT 11
- `CatBoost-1.2-gfbf-2023a.eb` fails on RHEL9 because it uses `openssl/1.1.1s` via Conan
- Checksums have changed for all libxc versions HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from easybuild-easyconfigs.