Git Product home page Git Product logo

Comments (3)

TSSlade avatar TSSlade commented on August 19, 2024

@leebean337 - I also wonder whether under the hood something like this may be going on:

I have been having this bug for some time. For me, it turns out that I keep holding a python variable (i.e. torch tensor) that references the model result, and so it cannot be safely released as the code can still access it.
My code looks something like:

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p)

The fix for this was to transfer p to a list. So, the code should look like:

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p.tolist())

This ensures that predictions hold values in the main memory, not a tensor in the GPU.
@abdelrahmanhosny Thanks for pointing this out. I faced the exact same issue in PyTorch 1.5.0, and had no OOM issues during training however during inference I also kept holding a python variable (i.e. torch tensor) that references the model result in memory which resulted in the GPU running out of memory after a certain number of batches.

In my case however transferring the predictions to the list did not work as I am generating images with my network, therefore I had to do the following:

predictions.append(p.detach().cpu().numpy())
This then solved the issue!

source: pytorch/pytorch#16417 (comment)

from voice-type-classifier.

MarvinLvn avatar MarvinLvn commented on August 19, 2024

The --batch_size argument, such as provided in the config.yml is not used during validation. That is why changing this argument does not change the size of the attempted allocation.

To specify the batch size during, validation, you must pass the --batch=N argument to the pyannote command. For instance :

pyannote-audio mlt validate --subset=development --batch=16 --from=10 --to=150 --every=10 model_ellis/train/ELLIS.SpeakerDiarization.Classroom.train/ ELLIS.SpeakerDiarization.Classroom

(the default batch size for inference and validation is 32)

Could you try that and let me know if it helps ?

from voice-type-classifier.

leebean337 avatar leebean337 commented on August 19, 2024

Thansk Marvin, changing the batch size to 16 for validate and apply steps fixes this, and thanks for updating your config.yml file

from voice-type-classifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.