As noted in <a class="issue-link js-issue-link" data-error-text="Failed to load title"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CUDA out of memory error during validation about voice-type-classifier HOT 3 CLOSED

marvinlvn commented on August 19, 2024

CUDA out of memory error during validation

from voice-type-classifier.

Comments (3)

TSSlade commented on August 19, 2024

@leebean337 - I also wonder whether under the hood something like this may be going on:

I have been having this bug for some time. For me, it turns out that I keep holding a python variable (i.e. torch tensor) that references the model result, and so it cannot be safely released as the code can still access it.
My code looks something like:
predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p)
The fix for this was to transfer p to a list. So, the code should look like:
predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p.tolist())
This ensures that predictions hold values in the main memory, not a tensor in the GPU.
@abdelrahmanhosny Thanks for pointing this out. I faced the exact same issue in PyTorch 1.5.0, and had no OOM issues during training however during inference I also kept holding a python variable (i.e. torch tensor) that references the model result in memory which resulted in the GPU running out of memory after a certain number of batches.
In my case however transferring the predictions to the list did not work as I am generating images with my network, therefore I had to do the following:

predictions.append(p.detach().cpu().numpy())
This then solved the issue!

source: pytorch/pytorch#16417 (comment)

from voice-type-classifier.

MarvinLvn commented on August 19, 2024

The --batch_size argument, such as provided in the config.yml is not used during validation. That is why changing this argument does not change the size of the attempted allocation.

To specify the batch size during, validation, you must pass the --batch=N argument to the pyannote command. For instance :

pyannote-audio mlt validate --subset=development --batch=16 --from=10 --to=150 --every=10 model_ellis/train/ELLIS.SpeakerDiarization.Classroom.train/ ELLIS.SpeakerDiarization.Classroom

(the default batch size for inference and validation is 32)

Could you try that and let me know if it helps ?

from voice-type-classifier.