Comments (6)
This is not a bug. You need to use verbs;ofi_rxm as shown from your fi_info output.
from libfabric.
Thanks Chien.
I had also tried with verbs;ofi_rxm, but although fi_info works, fi_pingpong fails (it looks for ofi_rxm at the end, instead of verbs;ofi_rxm):
$ FI_PROVIDER="verbs;ofi_rxm" FI_LOG_LEVEL=Debug fi_pingpong
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable perf_cntr=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hook=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem=
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_max_size=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_max_count=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_monitor=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cuda_cache_monitor_enabled=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_rocr_cache_monitor_enabled=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_ze_cache_monitor_enabled=
libfabric:4113849:1708031595::core:mr:ofi_default_cache_size():79 default cache size=526983472
libfabric:4113849:1708031595::core:core:fi_param_get_():382 read string var provider=verbs;ofi_rxm
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable universe_size=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable av_remove_cleanup=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable offload_coll_provider=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable provider_path=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable enable_passthru=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable buffer_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable tx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable rx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable msg_tx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable msg_rx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable cm_progress_interval=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable cq_eq_fairness=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable data_auto_progress=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable use_rndv_write=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable def_wait_obj=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable def_tcp_wait_obj=
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_rxm (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: verbs (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_perf (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_trace (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_debug (120.10)
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem=
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_hmem (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_dmabuf_peer_mem (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_noop (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: off_coll (120.10)
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable tx_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable rx_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable tx_iov_limit=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable rx_iov_limit=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable inline_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable min_rnr_timer=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable use_odp=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable prefer_xrc=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable xrcd_filename=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable cqread_bunch_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable gid_idx=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable device_name=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable use_dmabuf=
libfabric:4113849:1708031595::verbs:core:vrb_read_params():720 dmabuf support is enabled
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable iface=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable dgram_use_name_server=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable dgram_name_server_port=
libfabric:4113849:1708031595::verbs:fabric:verbs_devs_print():889 list of verbs devices found for FI_EP_MSG:
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::core:core:fi_getinfo_():1304 fi_getinfo: provider ofi_rxm returned -61 (No data available)
fi_getinfo(): util/pingpong.c:1489, ret=-61 (No data available)
Thank you.
from libfabric.
by default, fi_pingpong uses FI_EP_DGRAM. try fi_pingpong -e rdm
from libfabric.
With fi_pingpong -e rdm and also -e rdm -p verbs, the output is:
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable perf_cntr=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hook=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem=
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_max_size=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_max_count=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_monitor=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cuda_cache_monitor_enabled=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_rocr_cache_monitor_enabled=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_ze_cache_monitor_enabled=
libfabric:4117195:1708032872::core:mr:ofi_default_cache_size():79 default cache size=526983472
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable provider=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable universe_size=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable av_remove_cleanup=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable offload_coll_provider=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable provider_path=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable enable_passthru=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable buffer_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable tx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable rx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable msg_tx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable msg_rx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable cm_progress_interval=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable cq_eq_fairness=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable data_auto_progress=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable use_rndv_write=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable def_wait_obj=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable def_tcp_wait_obj=
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_rxm (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: verbs (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_perf (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_trace (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_debug (120.10)
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem=
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_hmem (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_dmabuf_peer_mem (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_noop (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: off_coll (120.10)
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable tx_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable rx_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable tx_iov_limit=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable rx_iov_limit=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable inline_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable min_rnr_timer=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable use_odp=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable prefer_xrc=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable xrcd_filename=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable cqread_bunch_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable gid_idx=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable device_name=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable use_dmabuf=
libfabric:4117195:1708032872::verbs:core:vrb_read_params():720 dmabuf support is enabled
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable iface=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable dgram_use_name_server=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable dgram_name_server_port=
libfabric:4117195:1708032872::verbs:fabric:verbs_devs_print():889 list of verbs devices found for FI_EP_MSG:
libfabric:4117195:1708032873::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032873::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032874::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::core:core:fi_getinfo_():1304 fi_getinfo: provider verbs returned -61 (No data available)
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider verbs, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider verbs, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1578 hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1578 hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4117195:1708032874::core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: IB-0xfe80000000000000
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping verbs
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping off_coll
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping off_coll
libfabric:4117195:1708032874::core:core:fi_getinfo_():1304 fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: UTIL-COLL
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: IB-0xfe80000000000000
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_check_hints():268 skipping device mlx5_0-xrc (want mlx5_0)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_check_hints():268 skipping device mlx5_0-xrc (want mlx5_0)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():301 rdma_resolve_addr: Invalid argument (22)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():303 src addr: fi_sockaddr_ib://[fe80::b83f:d203:2b:b478]:0xffff:0x13f:0x0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():305 dst addr: (null)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_match_infos():1825 handling of the socket address fails - -22
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_get_match_infos():1845 Handling of the addresses fails, the getting infos is unsuccessful
libfabric:4117195:1708032874:ofi_rxm:core:core:fi_getinfo_():1304 fi_getinfo: provider verbs returned -61 (No data available)
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
fi_domain(): util/pingpong.c:1415, ret=-61 (No data available)
from libfabric.
verbs supports msg endpoints (you would need -e msg
argument)
verbs;ofi_rxm supports rdm endpoints (you would need -e rdm
argument)
You can run fi_info -v -p verbs
to view the full set of supported capabilities and endpoint types
from libfabric.
From your fi_info and log, I'm guessing you do not have IPoIB set up. fi_pingpong requires either IPv4 or IPv6 address. After you have that configured, use verbs;ofi_rxm with -e rdm, that should work for you.
from libfabric.
Related Issues (20)
- `fi_errno` codes depend on implementation-defined macros HOT 3
- BUG, MAINT: segfaults through libfabric->ucx HOT 33
- Fix memory leaks detected by ASAN in Libfabric core code HOT 1
- libfabric + intel MPI over fi_mlx with multiple IB cards on 4OAM PVC HOT 1
- prov/shm: HMEM async copy path does double copy HOT 1
- BUG: unsafe CXI <-> gdrcopy cleanup interactions HOT 1
- hmem: compilation fails with incompatible pointer types on macOS with gcc 14 HOT 1
- prov/verbs: manual progress, call fi_cq_read with zero count does not drive progress HOT 11
- prov/cxi: OFI poll failed during MPI calls on LUMI/Adastra HOT 3
- Release tarballs are missing Windows files HOT 3
- rdm_tagged_bw is broken with OOB sync HOT 3
- prov/psm3: "munmap_chunk(): invalid pointer" on cleanup of fi_rdm_tagged_peek with OOB HOT 1
- prov/ucx: fi_rdm_tagged_peek cleanup race condition HOT 1
- Is fi_cntr_read expected to progress the EP ? HOT 1
- prov/ucx: fi_rdm_tagged_bw fi_av_insert error HOT 1
- EAGAIN endless loop HOT 1
- Support IPC for allocations created by cudaMallocAsync
- Release v1.22.0
- Release 1.21.1
- can't build 1.21.0 with PSM3 on Rocky 8 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libfabric.