Git Product home page Git Product logo

oneccl's Introduction

oneAPI Collective Communications Library (oneCCL)

Installation   |   Usage   |   Release Notes   |   Documentation   |   How to Contribute   |   License

oneAPI Collective Communications Library (oneCCL) provides an efficient implementation of communication patterns used in deep learning.

oneCCL is integrated into:

oneCCL is part of oneAPI.

Table of Contents

Prerequisites

  • Ubuntu* 18
  • GNU*: C, C++ 4.8.5 or higher.

Refer to System Requirements for more details.

SYCL support

Intel(R) oneAPI DPC++/C++ Compiler with Level Zero v1.0 support.

To install Level Zero, refer to the instructions in Intel(R) Graphics Compute Runtime repository or to the installation guide for oneAPI users.

BF16 support

  • AVX512F-based implementation requires GCC 4.9 or higher.
  • AVX512_BF16-based implementation requires GCC 10.0 or higher and GNU binutils 2.33 or higher.

Installation

General installation scenario:

cd oneccl
mkdir build
cd build
cmake ..
make -j install

If you need a clean build, create a new build directory and invoke cmake within it.

You can also do the following during installation:

Usage

Launching Example Application

Use the command:

$ source <install_dir>/env/setvars.sh
$ mpirun -n 2 <install_dir>/examples/benchmark/benchmark

Using external mpi

The ccl-bundled-mpi flag in vars.sh can take values "yes" or "no" to control if bundled Intel MPI should be used or not. Current default is "yes", which means that oneCCL temporarily overrides the mpi implementation in use.

In order to suppress the behavior and use user-supplied or system-default mpi use the following command instead of sourcing setvars.sh:

$ source <install_dir>/env/vars.sh --ccl-bundled-mpi=no

The mpi implementation will not be overridden. Please note that, in this case, user needs to assure the system finds all required mpi-related binaries.

Setting workers affinity

There are two ways to set worker threads (workers) affinity: automatically and explicitly.

Automatic setup

  1. Set the CCL_WORKER_COUNT environment variable with the desired number of workers per process.
  2. Set the CCL_WORKER_AFFINITY environment variable with the value auto.

Example:

export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=auto

With the variables above, oneCCL will create four workers per process and the pinning will depend from process launcher.

If an application has been launched using mpirun that is provided by oneCCL distribution package then workers will be automatically pinned to the last four cores available for the launched process. The exact IDs of CPU cores can be controlled by mpirun parameters.

Otherwise, workers will be automatically pinned to the last four cores available on the node.


Explicit setup

  1. Set the CCL_WORKER_COUNT environment variable with the desired number of workers per process.
  2. Set the CCL_WORKER_AFFINITY environment variable with the IDs of cores to pin local workers.

Example:

export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=3,4,5,6

With the variables above, oneCCL will create four workers per process and pin them to the cores with the IDs of 3, 4, 5, and 6 respectively.

Using oneCCL package from CMake

oneCCLConfig.cmake and oneCCLConfigVersion.cmake are included into oneCCL distribution.

With these files, you can integrate oneCCL into a user project with the find_package command. Successful invocation of find_package(oneCCL <options>) creates imported target oneCCL that can be passed to the target_link_libraries command.

For example:

project(Foo)
add_executable(foo foo.cpp)

# Search for oneCCL
find_package(oneCCL REQUIRED)

# Connect oneCCL to foo
target_link_libraries(foo oneCCL)

oneCCLConfig files generation

To generate oneCCLConfig files for oneCCL package, use the provided cmake/scripts/config_generation.cmake file:

cmake [-DOUTPUT_DIR=<output_dir>] -P cmake/script/config_generation.cmake

Additional Resources

Blog Posts

Workshop Materials

  • oneAPI, oneCCL and OFI: Path to Heterogeneous Architecure Programming with Scalable Collective Communications: recording and slides

Contribute

See CONTRIBUTING for more information.

License

Distributed under the Apache License 2.0 license. See LICENSE for more information.

Security Policy

See SECURITY for more information.

oneccl's People

Contributors

adk9 avatar aepanchi avatar dependabot[bot] avatar ksenyako avatar maria1petrova avatar mshiryaev avatar outoftardis avatar rebecca-fosdick avatar sazanovd avatar shirosankaku avatar srirajpaul avatar tarudoodi avatar ykiryano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oneccl's Issues

Question - do I also need set I_MPI_ASYNC_PROGRESS_* when already set CCL_WORKER_* ?

Previously, in Intel MLSL, MLSL_NUM_SERVERS and MLSL_SERVER_AFFINITY have the same functionality with I_MPI_AYSNC_PROGRESS_* (please correct me if my understanding is wrong).
Now, for One CCL, I did't find any relationship between CCL_WORKER_* and I_MPI_AYSNC_PROGRESS_* from the source code. Just want to know if I still need consider setting I_MPI_ASYNC_PROGRESS_* to enable async progress for Intel MPI 2019 while already setting CCL_WORKER_*, in case that there is any scaling issue. Thanks.

fferror

trials usage ... moment ....

AllgatherV crashes when the buffers overlap

Hey,

I'm using oneCCL to implement multinode communication in the marian machine translation toolkit. I am having a problem with a call to ccl::allgatherv. As far as I can tell according to the documentation there is no restriction on buffer overlapping, however if I don't use a temporary buffer onto which I copy the sendbuffer as shown here: https://github.com/XapaJIaMnu/marian-dev/blob/d33cea1d649186242c244f6a11d599be68f3499c/src/training/communicator_oneccl.h#L226

I get a crash like this:

2021:03:18-18:25:11:(64995) ERROR: |ERROR| host_event.cpp:33  ~host_event_impl not completed event is destroyed
backtrace() returned 11 addresses
./src/3rd_party/oneCCL/src/libccl.so(+0x1ea707) [0x7f0ecfcf6707]
./src/3rd_party/oneCCL/src/libccl.so(+0x1eacaf) [0x7f0ecfcf6caf]
./src/3rd_party/oneCCL/src/libccl.so(+0x1ead96) [0x7f0ecfcf6d96]
./marian(_ZNK6marian18OneCCLCommunicator15allGatherParamsEv+0xb34) [0x5597b1037c24]
./marian(_ZN6marian14SyncGraphGroup6updateESt6vectorISt10shared_ptrINS_4data5BatchEESaIS5_EEm+0x10fc) [0x5597b0fc0cdc]
./marian(_ZN6marian14SyncGraphGroup6updateESt10shared_ptrINS_4data5BatchEE+0x37a) [0x5597b0fc124a]
./marian(_ZN6marian5TrainINS_14SyncGraphGroupEE3runEv+0x8fc) [0x5597b0cc325c]
./marian(_Z11mainTraineriPPc+0xc9) [0x5597b0bf6729]
./marian(main+0x35) [0x5597b0bd4fa5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f0ecc6bab97]
./marian(_start+0x2a) [0x5597b0bf28ca]
2021:03:18-18:25:11:(64996) ERROR: |ERROR| host_event.cpp:33  ~host_event_impl not completed event is destroyed
backtrace() returned 11 addresses
./src/3rd_party/oneCCL/src/libccl.so(+0x1ea707) [0x7fe469b68707]
./src/3rd_party/oneCCL/src/libccl.so(+0x1eacaf) [0x7fe469b68caf]
./src/3rd_party/oneCCL/src/libccl.so(+0x1ead96) [0x7fe469b68d96]
./marian(_ZNK6marian18OneCCLCommunicator15allGatherParamsEv+0xb34) [0x55debf7dac24]
./marian(_ZN6marian14SyncGraphGroup6updateESt6vectorISt10shared_ptrINS_4data5BatchEESaIS5_EEm+0x10fc) [0x55debf763cdc]
./marian(_ZN6marian14SyncGraphGroup6updateESt10shared_ptrINS_4data5BatchEE+0x37a) [0x55debf76424a]
./marian(_ZN6marian5TrainINS_14SyncGraphGroupEE3runEv+0x8fc) [0x55debf46625c]
./marian(_Z11mainTraineriPPc+0xc9) [0x55debf399729]
./marian(main+0x35) [0x55debf377fa5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fe46652cb97]
./marian(_start+0x2a) [0x55debf3958ca]
2021:03:18-18:25:11:(65005) ERROR: |ERROR| worker.cpp:288  ccl_worker_func worker 0 caught internal exception: oneCCL: allgatherv_entry.hpp:start:76: EXCEPTION: ALLGATHERV entry failed. atl_status: FAILURE
backtrace() returned 4 addresses
./src/3rd_party/oneCCL/src/libccl.so(+0x1ea707) [0x7fe469b68707]
./src/3rd_party/oneCCL/src/libccl.so(+0x3cef7) [0x7fe4699baef7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fe4695626db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fe46662ca3f]
[2021-03-18 18:25:11] Error: Unhandled exception of type 'N3ccl2v19exceptionE': oneCCL: allgatherv_entry.hpp:start:76: EXCEPTION: ALLGATHERV entry failed. atl_status: FAILURE
[2021-03-18 18:25:11] Error: Aborted from void unhandledException() in /home/nbogoych/marian-dev-master/src/common/logging.cpp:113

[CALL STACK]
[0x55debf30537f]                                                       + 0x1cf37f
[0x7fe466f44ae6]                                                       + 0x92ae6
[0x7fe466f44b21]                                                       + 0x92b21
[0x7fe4699badd3]                                                       + 0x3cdd3
[0x7fe4695626db]                                                       + 0x76db
[0x7fe46662ca3f]    clone                                              + 0x3f

Otherwise, if I do use the workaround, I get correct behaviour, however every call to allgatherv is supplemented by the following stderr output:

2021:03:18-18:22:38:(64194) ERROR: |ERROR| host_event.cpp:33  ~host_event_impl not completed event is destroyed
backtrace() returned 11 addresses
./src/3rd_party/oneCCL/src/libccl.so(+0x1ea707) [0x7f6991977707]
./src/3rd_party/oneCCL/src/libccl.so(+0x1eacaf) [0x7f6991977caf]
./src/3rd_party/oneCCL/src/libccl.so(+0x1ead96) [0x7f6991977d96]
./marian(_ZNK6marian18OneCCLCommunicator15allGatherParamsEv+0xb34) [0x560cdcb0dc24]
./marian(_ZN6marian14SyncGraphGroup6updateESt6vectorISt10shared_ptrINS_4data5BatchEESaIS5_EEm+0x10fc) [0x560cdca96cdc]
./marian(_ZN6marian14SyncGraphGroup6updateESt10shared_ptrINS_4data5BatchEE+0x37a) [0x560cdca9724a]
./marian(_ZN6marian5TrainINS_14SyncGraphGroupEE3runEv+0x8fc) [0x560cdc79925c]
./marian(_Z11mainTraineriPPc+0xc9) [0x560cdc6cc729]
./marian(main+0x35) [0x560cdc6aafa5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f698e33bb97]
./marian(_start+0x2a) [0x560cdc6c88ca]

Any suggestions?

Cheers,

Nick

Allreduce cpu example fails with CCL_WORKER_COUNT > 1

I started playing with allreduce example from the main repository https://github.com/oneapi-src/oneCCL/blob/master/examples/cpu/cpu_allreduce_test.cpp .

I modified it slightly by increasing the buffer size 100 times:

diff --git a/examples/cpu/cpu_allreduce_test.cpp b/examples/cpu/cpu_allreduce_test.cpp
index 6e9ac4d..5dfe2d9 100644
--- a/examples/cpu/cpu_allreduce_test.cpp
+++ b/examples/cpu/cpu_allreduce_test.cpp
@@ -22,7 +22,7 @@
 using namespace std;

 int main() {
-    const size_t count = 4096;
+    const size_t count = 4096*100;

     size_t i = 0;

When I run it with the CCL_WORKER_COUNT environment variable with a value > 1 it fails with the following errors:

piotrc@machine:~/ws/oneCCL/build$ CCL_WORKER_COUNT=2 mpirun -np 2 examples/cpu/cpu_allreduce_test
[1705415958.879795729] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support
[1705415958.879801821] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support
machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen
machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 559315 RUNNING AT gbnwp-pod023-1
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 559316 RUNNING AT gbnwp-pod023-1
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

With CCL_WORKER_COUNT=1 it works perfect.

piotrc@machine:~/ws/oneCCL/build$ mpirun -np 2 examples/cpu/cpu_allreduce_test
PASSED

What am I doing wrong ? Why it fails ? Should I use specific flags when compiling or set some specific environment variable or pass a specific option to mpirun ? It is worth mention that with smaller buffer size (for example 4096 * 10) everything works fine even with CCL_WORKER_COUNT set with value > 1.

Attached CCL_LOG_LEVEL=info logs.txt
Attached CCL_LOG_LEVEL=debug logs_debug.txt

Will it support windows?

if not , is there any possible alternative that can be used on windows platform?At this moment I can't run distributed trainning on windows os

C++ support for unsigned char

C++ API doesn't support unsigned char or std::byte which are common types for object representation in C++ (link):

For an object of type T, object representation is the sequence of sizeof(T) objects of type unsigned char (or, equivalently, std::byte) beginning at the same address as the T object.

CCL_TYPE_TRAITS(ccl_dtype_char,   char,      sizeof(char))
CCL_TYPE_TRAITS(ccl_dtype_int,    int,       sizeof(int))
CCL_TYPE_TRAITS(ccl_dtype_int64,  int64_t,   sizeof(int64_t))
CCL_TYPE_TRAITS(ccl_dtype_uint64, uint64_t,  sizeof(uint64_t))
CCL_TYPE_TRAITS(ccl_dtype_float,  float,     sizeof(float))
CCL_TYPE_TRAITS(ccl_dtype_double, double,    sizeof(double))

Issue on page /introduction/sample.html

Getting error while compiling the given oneCCL code.

sample.cpp:52:9: error: use of class template 'host_accessor' requires template arguments
host_accessor send_buf_acc(send_buf, write_only);
^

Full Trace:
DUT683-CYP-Mell:/home/gta/ksatya/tests # clang++ -I${CCL_ROOT}/examples/include/ -I/opt/intel/oneapi/compiler/2021.4.0/liycl/ -I${CCL_ROOT}/include -L${CCL_ROOT}/lib/ -lsycl -lccl -o sample sample.cpp
sample.cpp:52:9: error: use of class template 'host_accessor' requires template arguments
host_accessor send_buf_acc(send_buf, sycl::write_only);
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/accessor.hpp:2134:7: note: template is declared here
class host_accessor
^
sample.cpp:53:9: error: use of class template 'host_accessor' requires template arguments
host_accessor recv_buf_acc(recv_buf, sycl::write_only);
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/accessor.hpp:2134:7: note: template is declared here
class host_accessor
^
sample.cpp:62:9: error: use of class template 'accessor' requires template arguments
accessor send_buf_acc(send_buf, h, write_only);
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/accessor.hpp:784:7: note: template is declared here
class accessor :
^
sample.cpp:75:9: error: use of class template 'accessor' requires template arguments
accessor recv_buf_acc(recv_buf, h, write_only);
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/accessor.hpp:784:7: note: template is declared here
class accessor :
^
sample.cpp:88:9: error: use of class template 'host_accessor' requires template arguments
host_accessor recv_buf_acc(recv_buf, read_only);
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/accessor.hpp:2134:7: note: template is declared here
class host_accessor
^
In file included from sample.cpp:1:
In file included from /opt/intel/oneapi/ccl/2021.4.0/examples/include/sycl_base.hpp:17:
In file included from /opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl.hpp:15:
In file included from /opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/backend.hpp:25:
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:226:12: error: no matching member function for ct_impl'
return submit_impl(CGF, CodeLoc);
^~~~~~~~~~~
sample.cpp:61:7: note: in instantiation of function template specialization 'sycl::queue::submit<(lambda at sample.cpp:61ted here
q.submit([&](auto &h) {
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:948:9: note: candidate function not viable: no kon from '(lambda at sample.cpp:61:14)' to 'std::function<void (handler &)>' for 1st argument
event submit_impl(std::function<void(handler &)> CGH,
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:951:9: note: candidate function not viable: requnts, but 2 were provided
event submit_impl(std::function<void(handler &)> CGH, queue secondQueue,
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:226:12: error: no matching member function for ct_impl'
return submit_impl(CGF, CodeLoc);
^~~~~~~~~~~
sample.cpp:74:7: note: in instantiation of function template specialization 'sycl::queue::submit<(lambda at sample.cpp:74ted here
q.submit([&](auto &h) {
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:948:9: note: candidate function not viable: no kon from '(lambda at sample.cpp:74:14)' to 'std::function<void (handler &)>' for 1st argument
event submit_impl(std::function<void(handler &)> CGH,
^
/opt/intel/oneapi/compiler/2021.4.0/linux/include/sycl/CL/sycl/queue.hpp:951:9: note: candidate function not viable: requnts, but 2 were provided
event submit_impl(std::function<void(handler &)> CGH, queue secondQueue,
^
7 errors generated.

Binaries location in 2021.12.0

Hi,
In version 2021.12.0, when compiling using prefix the libmip binaries are located in a different location than with 2021.11.2.
I am not sure if this is intended or if there is some concatenation issue with the paths.

  • 2021.11.2
$ find . -iname "*libmpi*"
...
./common/lib/libmpifort.so.12.0
./common/lib/libmpi.so.12
./common/lib/libmpifort.so.12.0.0
./common/lib/libmpifort.so
./common/lib/libmpi.so.12.0
./common/lib/libmpi.so.12.0.0
./common/lib/libmpifort.so.12
./common/lib/libmpicxx.so.12.0
./common/lib/libmpicxx.so
./common/lib/libmpicxx.so.12
./common/lib/libmpi.so
./common/lib/libmpicxx.so.12.0.0
...
  • 2021.12.0
...
$ find . -iname "*libmpi*"
./common/opt/mpi/lib/libmpifort.so.12.0
./common/opt/mpi/lib/libmpi.so.12
./common/opt/mpi/lib/libmpifort.so.12.0.0
./common/opt/mpi/lib/libmpifort.so
./common/opt/mpi/lib/libmpi.so.12.0
./common/opt/mpi/lib/libmpi.so.12.0.0
./common/opt/mpi/lib/libmpifort.so.12
./common/opt/mpi/lib/libmpicxx.so.12.0
./common/opt/mpi/lib/libmpicxx.so
./common/opt/mpi/lib/libmpicxx.so.12
./common/opt/mpi/lib/libmpi.so
./common/opt/mpi/lib/libmpicxx.so.12.0.0
...

torch Distributed Data Parallel with ccl backend failed for torch 2.1.0+cpu and oneccl-bind-pt 2.1.0+cpu while working on torch 2.0.1+cpu and oneccl-bind-pt 2.0.0+cpu

image

I use transformers Trainer to finetune LLM by Distributed Data Parallel with ccl backend, when I use torch 2.1.0+cpu and oneccl-bind-pt 2.1.0+cpu, it will fail like above image. But when I use torch 2.0.1+cpu and oneccl-bind-pt 2.0.0+cpu, it worked well.

The script I used is https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/examples/finetuning/instruction/finetune_clm.py, command is:

mpirun  --host 172.17.0.2,172.17.0.3 -n 2 -ppn 1 -genv OMP_NUM_THREADS=48 python3 finetune_clm.py     --model_name_or_path mosaicml/mpt-7b-chat     --train_file alpaca_data.json  --bf16 False     --output_dir ./mpt_peft_finetuned_model     --num_train_epochs 1     --max_steps 3     --per_device_train_batch_size 4     --per_device_eval_batch_size 4     --gradient_accumulation_steps 1     --evaluation_strategy "no"     --save_strategy "steps"   --save_steps 2000     --save_total_limit 1     --learning_rate 1e-4      --logging_steps 1     --peft lora     --group_by_length True     --dataset_concatenation     --do_train     --trust_remote_code True     --tokenizer_name "EleutherAI/gpt-neox-20b"     --use_fast_tokenizer True     --max_eval_samples 64     --no_cuda --ddp_backend ccl

Can you help investigate this issue?

make -j install failing with compiler options gcc, g++

we are using g++ compilers in XGBoost build. So, to make it unify across, I tried to pass the compiler options as follows:
Step1:
build1]$ cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++

Step2:
build1]$ make -j install

then it is failing with the following errors:

[ 54%] Built target ccl_atl_ofi
In member function ‘size_t ccl_sched_bin::erase(size_t, size_t&)’:
cc1plus: error: ‘void* __builtin_memset(void*, int, long unsigned int)’: specified size 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
In member function ‘size_t ccl_sched_queue::erase(ccl_sched_bin*, size_t)’:
cc1plus: error: ‘void* __builtin_memset(void*, int, long unsigned int)’: specified size 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
cc1plus: all warnings being treated as errors

below are the versions info:
[xgboost@vsr243 build]$ gcc --version
gcc (GCC) 7.3.0

[xgboost@vsr243 build]$ g++ --version
g++ (GCC) 7.3.0

OneCCL init should take arguments to take configuration parameters

Current to make OneCCL in "resizable":
we have 2 options to read the parameters, that is env or k8s
For the deployments where they don't use k8s, asking to use k8s mode for parameters reading is not a good idea.
Other option we have is env. This is not feasible way in Spark like deployments as we will run different tasks in same machine and they may belong to different application.

I think the good way is to allow ccl#init to take args. So, that applications can pass these parameters at process level.

XGBoost rabit#init also take arguments and tracker address passed in as arg parameter.
https://github.com/dmlc/rabit/blob/0d6a8532124c7ae0b6323b005156847e0d6dee0f/include/rabit/rabit.h#L94

Will it support windows?

if not , is there any possible alternative that can be used on windows platform?At this moment I can't run distributed trainning on windows os

Add support for including oneCCL in a CMake mono build

It would be nice to be able to include oneCCL inside another build directly.

add_subdirectory(oneCCL)

This entails that all CMake configuration arguments should be prefixed by something like ONECCL_ to avoid name conflicts.
Also all targets should be prefixed. E.g. oneccl::.

Another thing is to allow for all dependencies like googletest to be excludable from the build an provided externally. They may be defined somewhere else in the super build. E.g. ONECCL_GOOGLE_TEST_EXTERNAL=ON.

Make KVS Store service possible to start as independent service instead of keeping with one of the rank

In Spark like deployments, Driver is a single point of failure but not workers.
Keeping KVSStore with one of the worker makes one of the worker process as single point of failures.

If KVS can be started as stand alone process, the integration into spark like deployments will be easy. Driver can start this KVS Store and pass the KVSStore IP_Port to all workers.
Rabit has the similar architecture, tracker( like KVStore here) starts with Driver. All workers connects to tracker.

Errors when building with DPCPP backend

When building with DPCPP backend, the following errors arise:

cmake ../ -GNinja -DCMAKE_CXX_COMPILER=dpcpp -DCOMPUTE_BACKEND=dpcpp -DBUILD_FT=ON
ninja


[3/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/atl_base_comm.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/atl_base_comm.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/atl_base_comm.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/atl_base_comm.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/atl_base_comm.cpp.o -c ../../src/atl/atl_base_comm.cpp
In file included from ../../src/atl/atl_base_comm.cpp:17:
In file included from ../../src/atl/mpi/atl_mpi.hpp:20:
In file included from ../../src/atl/mpi/atl_mpi_global_data.hpp:23:
In file included from ../../src/comp/bf16/bf16_intrisics.hpp:23:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[4/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_comm.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_comm.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_comm.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_comm.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_comm.cpp.o -c ../../src/atl/ofi/atl_ofi_comm.cpp
In file included from ../../src/atl/ofi/atl_ofi_comm.cpp:16:
In file included from ../../src/atl/ofi/atl_ofi_comm.hpp:19:
In file included from ../../src/atl/ofi/atl_ofi.hpp:22:
In file included from ../../src/atl/ofi/atl_ofi_helper.hpp:38:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[5/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_global_data.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_global_data.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_global_data.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_global_data.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_global_data.cpp.o -c ../../src/atl/mpi/atl_mpi_global_data.cpp
In file included from ../../src/atl/mpi/atl_mpi_global_data.cpp:18:
In file included from ../../src/atl/mpi/atl_mpi.hpp:20:
In file included from ../../src/atl/mpi/atl_mpi_global_data.hpp:23:
In file included from ../../src/comp/bf16/bf16_intrisics.hpp:23:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[6/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi.cpp.o -c ../../src/atl/ofi/atl_ofi.cpp
In file included from ../../src/atl/ofi/atl_ofi.cpp:16:
In file included from ../../src/atl/ofi/atl_ofi.hpp:22:
In file included from ../../src/atl/ofi/atl_ofi_helper.hpp:38:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[7/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_comm.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_comm.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_comm.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_comm.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi_comm.cpp.o -c ../../src/atl/mpi/atl_mpi_comm.cpp
In file included from ../../src/atl/mpi/atl_mpi_comm.cpp:18:
In file included from ../../src/atl/mpi/atl_mpi_comm.hpp:23:
In file included from ../../src/atl/mpi/atl_mpi.hpp:20:
In file included from ../../src/atl/mpi/atl_mpi_global_data.hpp:23:
In file included from ../../src/comp/bf16/bf16_intrisics.hpp:23:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[8/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/mpi/atl_mpi.cpp.o -c ../../src/atl/mpi/atl_mpi.cpp
In file included from ../../src/atl/mpi/atl_mpi.cpp:19:
In file included from ../../src/atl/mpi/atl_mpi.hpp:20:
In file included from ../../src/atl/mpi/atl_mpi_global_data.hpp:23:
In file included from ../../src/comp/bf16/bf16_intrisics.hpp:23:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.
[9/252] Building CXX object src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_helper.cpp.o
FAILED: src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_helper.cpp.o 
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp -DCCL_AVX_COMPILER -DCCL_AVX_TARGET_ATTRIBUTES -DCCL_BF16_AVX512BF_COMPILER -DCCL_BF16_COMPILER -DCCL_BF16_GPU_TRUNCATE -DCCL_BF16_TARGET_ATTRIBUTES -DCCL_CXX_COMPILER="\"Clang 14.0.0\"" -DCCL_C_COMPILER="\"GNU 11.2.0\"" -DCCL_FP16_COMPILER -DCCL_FP16_TARGET_ATTRIBUTES -Iinclude -I../../include -I../../src -I../../src/atl -I../../deps/ofi/include -I../../deps/hwloc/include -I../../deps/mpi/include -Wall -Wextra -Wno-unused-parameter -Wno-implicit-fallthrough -Werror -D_GNU_SOURCE -fvisibility=internal -DCCL_ENABLE_SYCL_INTEROP_EVENT=1  -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector -DCCL_ENABLE_MPI -pthread -O3 -DNDEBUG  -O3 -fPIC -std=gnu++17 -MD -MT src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_helper.cpp.o -MF src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_helper.cpp.o.d -o src/CMakeFiles/ccl-objects.dir/atl/ofi/atl_ofi_helper.cpp.o -c ../../src/atl/ofi/atl_ofi_helper.cpp
In file included from ../../src/atl/ofi/atl_ofi_helper.cpp:16:
In file included from ../../src/atl/ofi/atl_ofi_helper.hpp:38:
In file included from ../../src/common/global/global.hpp:19:
In file included from ../../src/common/env/env.hpp:25:
In file included from ../../src/coll/coll.hpp:20:
In file included from ../../src/common/comm/comm.hpp:43:
In file included from ../../src/unordered_coll/unordered_coll.hpp:19:
../../src/sched/master_sched.hpp:61:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer& get_kernel_timer() {
    ~~~~~^
../../src/sched/master_sched.hpp:71:10: error: no type named 'kernel_timer' in namespace 'ccl'
    ccl::kernel_timer kernel_timer;
    ~~~~~^
2 errors generated.

Works ok for dpcpp_level_zero backend, fails with DPC++ backend for both closed and open source compilers in the above way.

[Improvement] Allow multiple CCL inits from same process but from different threads

Currently we can initialize multiple XGBoost Rabit instances from same process but from different thread. In Spark, its possible to have multiple tasks run on same executor. A executor is single JVM process and multiple tasks will in separate thread respectively.

Its not very critical requirement for us at this stage, but users can run that way, as Spark allows that.
So, it will be good to support multiple thread to initialize oneCCL.

CMake configuration writes directly to installation directory

Here cmake writes to the install directory which usually is owned by root. Usually building is not done by root. This causes a write failure.

if (BUILD_CONFIG)
    configure_file("cmake/templates/oneCCLConfig.cmake.in"
                   "${CCL_INSTALL_LIB}/cmake/oneCCL/oneCCLConfig.cmake"
                   COPYONLY)
    configure_file("cmake/templates/oneCCLConfigVersion.cmake.in"
                   "${CCL_INSTALL_LIB}/cmake/oneCCL/oneCCLConfigVersion.cmake"
                   @ONLY)
endif()

This does not manifest under default conditions because of this dubious change of CMAKE_INSTALL_PREFIX.

using namespace in header files

It's a bad practice to inject using namespace into header files like in sycl_base.hpp:

using namespace std;
using namespace cl::sycl;
using namespace cl::sycl::access;

I'll submit a PR fixing this issue and other polishing stuff (absence of used headers in a list of include files in some examples) as soon as I get some free time.

oneCCL doesn't compile with -Werror due to -Wsuggest-override in include/oneapi/ccl/exception.hpp

Tittle sums it up, here's a patch:

diff --git a/include/oneapi/ccl/exception.hpp b/include/oneapi/ccl/exception.hpp
index a5d03b4..6de6271 100644
--- a/include/oneapi/ccl/exception.hpp
+++ b/include/oneapi/ccl/exception.hpp
@@ -44,7 +44,7 @@ public:
         msg = std::string("oneCCL: ") + std::string(info);
     }
 
-    const char *what() const noexcept {
+    const char *what() const noexcept override {
         return msg.c_str();
     }
 };

OneCCL in "resizable" mode is throwing read/write error

Exported the following parameters:
export CCL_ATL_TRANSPORT=ofi
export CCL_WORLD_SIZE=2
export CCL_PM_TYPE=resizable
export CCL_KVS_IP_EXCHANGE=env
export CCL_KVS_IP_PORT=10.x.x.xxx_9877

and ran the example test at:
cd build/_install/examples/cpu/
./allreduce

Here we faced couples of issues:
if we provide a port( to CCL_KVS_IP_PORT) which is already in use, it is hanging.
if we provide free port, then it throws the following error:

[xgboost@vsr243 cpu]$ ./allreduce
CCL_init called.......
Resizable PMI initing......
main host ip: 10.x.x.143
Initing KVS sock connection from kvs.c IP: 127.0.0.1
Initing KVS sock connection from kvs.c Port: 4
KV_init Connection success
Initing KVS_init sock connection from kvs.c IP: 10.x.x.143
Initing KVS_init sock connection from kvs.c Port: 9877
KVS_init connect success..
sending connect request..
connect request sent..
**read/write error**

In my debug, this read write error is throwing from : https://github.com/intel/oneccl/blob/d2b9499ace634e230ed7b30ebd9e47a7555a8cee/src/atl/util/pm/pmi_resizable_rt/pmi_resizable/helper.c#L470

Another question:

  1. In code there is an hardcoded IP and port on which kvs_server_init is binding. Why is this hardcoded 127.0.0.1 and random port(starts from 1)?
    https://github.com/intel/oneccl/blob/master/src/atl/util/pm/pmi_resizable_rt/pmi_resizable/kvs/kvs.c#L592
 addr.sin_addr.s_addr = inet_addr("127.0.0.1");
 addr.sin_port = 1;

Compile error on the master branch

Env:
Ubuntu 20.04
GCC-10

First error:

torch-ccl/third_party/oneCCL/src/atl/util/pm/pmi_resizable_rt/pmi_resizable_simple.h:124:17: error: field ‘my_proccess_name’ has incomplete type ‘std::string’ {aka ‘std::__cxx11::basic_string’}
124 | std::string my_proccess_name;

After fix the issue by adding "#include < string >" in pmi_resizable_simple.h file, another error happened.

torch-ccl/third_party/oneCCL/src/comp/bf16/bf16_intrisics.hpp:74:82: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
74 | _mm256_storeu_si256((__m256i*)(dst), _mm512_cvtneps_pbh(_mm512_loadu_ps(src)));
| t
torch-ccl/third_party/oneCCL/src/comp/bf16/bf16_intrisics.hpp:74:60: error: cannot convert ‘__m256bh’ to ‘__m256i’
74 | _mm256_storeu_si256((__m256i*)(dst), _mm512_cvtneps_pbh(_mm512_loadu_ps(src)));
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
| |
| __m256bh

Is there any suggestion?

Issue about using shared memory provider

Hi, I am trying to use the shared memory provider of oneCCL. However, there are some problems when enabling it on the benchmark example.

When I run CCL_LOG_LEVEL=info CCL_ATL_TRANSPORT=ofi CCL_ATL_SHM=1 FI_PROVIDER=shm mpirun -n 2 _install/examples/benchmark/benchmark -i 36 -j off -l allreduce -d bfloat16 -y 1048576,8388608,4096000,160000, I got the error message:

Abort(2138767) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(189)........: 
MPID_Init(1561)..............: 
MPIDI_OFI_mpi_init_hook(1584): 
open_fabric(2663)............: 
find_provider(2819)..........: OFI fi_getinfo() failed (ofi_init.c:2819:find_provider:No data available)

Is there any extra configuration or library is needed to support the SHM provider?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.