Use SpeakerModel and VoxCeleb1 to train a first speaker verification model. Cross

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Train First Speaker Verification Model about similaritylearning HOT 10 CLOSED

juanmc2005 commented on June 22, 2024

Train First Speaker Verification Model

from similaritylearning.

Comments (10)

juanmc2005 commented on June 22, 2024

Cannot directly use VoxCeleb1 data with SincNet. We need to split the samples in chunks, like they explain in the paper (at least at the beginning).
This missing chunks are most likely causing an out of memory issue.

@hbredin any way of chunking audio segments already implemented in pyannote?

from similaritylearning.

hbredin commented on June 22, 2024

Not sure what you need.

SpeechSegmentGenerator already yields audio chunks, whose duration is controlled by the duration parameter.

Can you clarify your needs?

from similaritylearning.

juanmc2005 commented on June 22, 2024

Nevermind, I misunderstood the
duration parameter. I will try changing that on Monday.
Thanks!

from similaritylearning.

juanmc2005 commented on June 22, 2024

@hbredin Please tell me if I can do something to help solve the 3199 issue.
If you want to force it to see what it looks like, you can use the sv-train2 branch and run:
python -W ignore main.py --task speaker --loss softmax --epochs 1500 --no-plot --no-save --batch-size 100 --log-interval 5
The problem occurs around 25% of the first epoch

from similaritylearning.

hbredin commented on June 22, 2024

I am not sure why this happens.

One way of understanding this behavior is to create a simple script that simply iterates forever on SpeechSegmentGenerator and stops as soon as the number of samples is not 3200.

You can edit SpeechSegmentGenerator temporarily so that it also returns the value of sub_segment and files[i].

from similaritylearning.

hbredin commented on June 22, 2024

Starting from here, we will be able to investigate what is happening

from similaritylearning.

juanmc2005 commented on June 22, 2024

Got it. I'm switching to STS for the time being, to integrate the model and dataset.
After that I'll start working this out.
Thanks!

from similaritylearning.

juanmc2005 commented on June 22, 2024

Will unblock and use 0-padding for samples with wrong dimensions while we still look for what's causing this problem.

from similaritylearning.

juanmc2005 commented on June 22, 2024

The validation code is too expensive to run after each epoch, will use a separate script to run validations in parallel when a model is saved. This will be done as part of another issue: #25
The remaining task for this issue is being able to train a model without validation.

from similaritylearning.

juanmc2005 commented on June 22, 2024

Update: Validation can be done in-training for VoxCeleb1, but it would be useful to parallelize anyway to tackle VoxCeleb2.
Models can be trained using cross entropy, although the results are not very good. Will close this issue and open another one to address that problem.

from similaritylearning.

Train First Speaker Verification Model about similaritylearning HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent