Dear All, I'm running the asr_librispeech recipe in an ubuntu 16.04

GPU utilization is very low about espresso HOT 4 CLOSED

freewym commented on August 26, 2024

GPU utilization is very low

from espresso.

Comments (4)

freewym commented on August 26, 2024

Would it be faster if using 2 GPUs? Or does increasing batch size solve this issue? I am trying to figure out whether the problem is because of using more than 2 GPUs, or simply because the batch size you set is not large enough.

from espresso.

yfliao commented on August 26, 2024

Sorry for the late replay.

The max-tokens and max-sentences were set to 32/24 and 39000/26000, respectively for the transformer/conv_lstm recipes. So it is not the problem. However, increasing the "update_freq" to 5 mades things a little better.


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   51C    P0   263W / 300W |  13936MiB / 16160MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   54C    P0   267W / 300W |  13842MiB / 16160MiB |     84%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   59C    P0   277W / 300W |  13868MiB / 16160MiB |     81%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   53C    P0   274W / 300W |   8734MiB / 16160MiB |     88%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

from espresso.

freewym commented on August 26, 2024

Hmm, it seems that the bottle-neck is from when collecting gradients/stats across multiple GPUs in backprop. I don't have enough computing resources to try 4 GPUs, but I heard from someone else who tried 4 GTX 2080 GPUs in a SunGirdEngine environment that they didn't encounter such issue. So I am sorry I am unable to help out on this. But If a larger update_freq could help, maybe you can reduce the batch size accordingly so that the effective batch size for each update remain the same as when update_freq=1?

from espresso.

yfliao commented on August 26, 2024

Thanks for the reminder. I will reduce the batch size.

from espresso.

Recommend Projects

GPU utilization is very low about espresso HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent