Would someone share training duration per sample for some of the nets? <p dir="aut

Hyper params for DPRNN: <div class="snippet-clipboard-content notranslate position

Question: training duration per sample about asteroid HOT 7 CLOSED

asteroid-team commented on August 23, 2024

Question: training duration per sample

from asteroid.

Comments (7)

jonashaag commented on August 23, 2024 1

Just wanted to share some training graphs here. Comparison of following models:

DPRNN kernel_size 16
DPRNN kernel_size 4
ConvTasNet (Asteroid default hyper params)
SuDoRM-RF: enc_kernel_size/K_eps=21, enc_num_basis/C_eps, out_channels/C, in_channels/C_U = 512 upsampling_depth/Q=5, num_blocks/=25
Each model trained with 2s and 4s non-silent segments of training sources.

Dataset is a (so far) proprietary reverberated speech dataset using ~1k handpicked room impulse responses combined with the ~44k speech samples from VCTK. Training dataset has ~80% of rooms and ~80% of speakers from VCTK, and validation dataset has ~20% of rooms and ~20% of speakers from VCTK. I'm combining each room with each speech sample from VCTK for a total dataset of size ~28M training samples + 1.7M validation samples. Each epoch a randomly generated subset of the training dataset is used, and validation uses a random 5k samples subset of the validation dataset.

In retrospect, I believe the validation subset to be too small so I actually believe the training loss to be more indicative of model performance than validation loss. Also note that the number of non-silent 4s segments in VCTK is much lower than 2s ones, which could mean that the 4s dataset has less variety in its speech samples.

DPRNN training was done on GTX 1080 Ti and all of the other models were trained on RTX 2080 Ti (which is ~1.6x as fast as the GTX 1080 Ti for this task). Train loss was SI-SDR with LR of 5e-4. Note that none of models were trained to convergence.

Train loss (x is hours, note that DPRNN was trained on slower GPU)

Val loss (x is hours, note that DPRNN was trained on slower GPU)

from asteroid.

mpariente commented on August 23, 2024

Good question.
I just finished training DPRNN with ks=16 on a single P100 (on WHAM!), the 200 epochs took 2days and 5 hours.
This is by far the best compromise between performance and training duration.
ConvTasNet and DPRNN (with ks=2) might take 5-6 days on single GPUs.

from asteroid.

jonashaag commented on August 23, 2024

Thanks a lot!

So if I’m correct that would be ~73h data in ~53h training in 200 epochs = ~ 3.5ms per second of training data.

Which is very very little compared to the 0.5-1s I have seen elsewhere.

from asteroid.

mpariente commented on August 23, 2024

Yes, we are super fast 🚀
No, I'm joking. I don't know, where have you seen these numbers and with which models?

from asteroid.

mpariente commented on August 23, 2024

It seems I answered to the question, I'm closing this.
Feel free to reopen

from asteroid.

mpariente commented on August 23, 2024

Thanks a lot for the insights !
Do you double the batch size when you train on 2sec segments?
I'm half surprised about DPRNN ks=4 because dereverberation might need more context.. Also, what is your batch size with it? We struggled to obtain convergence with DPRNN ks=2 for separation..

from asteroid.

jonashaag commented on August 23, 2024

Hyper params for DPRNN:

ks  4 seg 2 batch  5 chunk 200
ks  4 seg 4 batch  2 chunk 200
ks 16 seg 2 batch 21 chunk 100
ks 16 seg 4 batch 11 chunk 100

I always use largest possible batch size (found using trial and error).

Well, as said, none of those were trained to convergence and I also don't have an intuition for how far each of them is from convergence. Maybe it also helps that I used 5e-4 LR instead of 1e-3, which I have observed to be a bit too aggressive for Adam with other (unrelated to speech separation) models.

from asteroid.

Question: training duration per sample about asteroid HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent