🚀 Feature Multiple GPU support Motivation

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multiple GPU support about yolov5 HOT 10 CLOSED

ultralytics commented on July 3, 2024

Multiple GPU support

from yolov5.

Comments (10)

HaxThePlanet commented on July 3, 2024 1

Excellent, thanks for the fast response and hard work. This thing is amazing!

from yolov5.

github-actions commented on July 3, 2024

Hello @HaxThePlanet, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

from yolov5.

glenn-jocher commented on July 3, 2024

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:

python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage

test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

from yolov5.

AIFAN-Lab commented on July 3, 2024

when I type the command:
python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16
then, it will show below:
{'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='./data/coco.yaml', device='', epochs=300, evolve=False, img_size=[640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device1 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device2 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device3 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device4 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device5 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device6 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device7 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
Optimizer groups: 54 .bias, 60 conv.weight, 51 other

bug report as below:
/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:303: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. NB: There is a known issue in nn.parallel.replicate that prevents a single DDP instance to operate on multiple model replicas.
"Single-Process Multi-GPU is not the recommended mode for "
Traceback (most recent call last):
File "train.py", line 400, in
train(hyp)
File "train.py", line 152, in train
model = torch.nn.parallel.DistributedDataParallel(model)
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 287, in init
self._ddp_init_helper()
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 380, in _ddp_init_helper
expect_sparse_gradient)
RuntimeError: Model replicas must have an equal number of parameters.

from yolov5.

glenn-jocher commented on July 3, 2024

@AIFAN-Lab thanks for the bug report. I tested on two GPUs today and everything worked well. Can you try to reproduce this in our docker image to see if it's an environment issue?

Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

from yolov5.

AIFAN-Lab commented on July 3, 2024

Ok. I will test the Docker. And report later.

from yolov5.

HaxThePlanet commented on July 3, 2024

Is it still necessary to train the first 1000 or so iterations on a single GPU?

from yolov5.

glenn-jocher commented on July 3, 2024

@HaxThePlanet that's never been necessary.

from yolov5.

liangshi036 commented on July 3, 2024

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:
python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage
test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

would you pls support multi-gpus while using detect.py ?

from yolov5.

glenn-jocher commented on July 3, 2024

@liangshi036 we don't have the resources to implement suggestions, but you can do this yourself and submit a PR!

from yolov5.

Multiple GPU support about yolov5 HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent