Comments (6)
NCCL will enable P2P if needed, but will not fail if already enabled.
from nccl.
thanks!
from nccl.
I am observing that there is no P2P communication seen in nvprof when using BVLC caffe with NCCL for multi-gpu case. In the caffe version without NCCL, I could see the P2P between GPUs. Is there a reason why P2P is not being used by NCCL ?
from nccl.
P2P is used, but through CUDA kernels. So you will not see explicit P2P cudaMemcpy operations, but CUDA kernels doing computation as well as remote P2P writes.
from nccl.
Problem is cuda-memcheck will still complain about it already being enabled, which makes it hard to use when debugging nccl applications. cuda-memcheck complains even if no other problems with the application. It repeats this error message for every device communicator being initialized.
NCCL: Using devices
Rank 0 uses device 0 [0x01] GeForce GTX TITAN X
Rank 1 uses device 1 [0x02] GeForce GTX TITAN X
Rank 2 uses device 2 [0x03] GeForce GTX TITAN X
========= CUDA-MEMCHECK
========= Program hit cudaErrorPeerAccessAlreadyEnabled (error 50) due to "peer access is already enabled" on CUDA API call to cud
aDeviceEnablePeerAccess.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.8.0 (cudaDeviceEnablePeerAccess + 0x1a9) [0x38f29]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 [0x56c2]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 (ncclCommInitAll + 0x646) [0x7a66]
from nccl.
from nccl.
Related Issues (20)
- question about a new single-node communication mode
- what does non-blocking communicator forοΌ HOT 4
- deadlock when using multiple communicators for Point-To-Point Communication within the same GPU Group
- Network IP setup and physical wiring
- Enabling read for P2p transport HOT 1
- How to tell nccl that those network communication is disabled? HOT 2
- Is it possible to swap the calling order of `initTransportsRank` and `ncclTunerPluginLoad` HOT 1
- NCCL Logs Communicator Query HOT 1
- work request complete err: status 5 and vendor err 249 HOT 7
- Is there someway to measure gpu i/o usage or allreduce waiting time? HOT 1
- About sync in nvls algorithm
- NCCL Tree allreduce test cannot reach the theoretical bus bandwidth on 2 nodes with 4 nics HOT 7
- how does NCCL support peer-to-peer connections across NUMA nodes without the features of NICs and NVLinks? HOT 2
- How can I test IB bandwidth when NCCL is running?
- Single or double ring HOT 1
- Missing header file HOT 7
- Why does NVLSTree Allreduce perform worse than Ring Allreduce? HOT 1
- Encountering Random Segmentation Fault During NCCL-Tests HOT 14
- Ring broadcast
- inter-node nvls process when ib sharp not supported HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nccl.