Comments (10)
Cannot directly use VoxCeleb1 data with SincNet. We need to split the samples in chunks, like they explain in the paper (at least at the beginning).
This missing chunks are most likely causing an out of memory issue.
@hbredin any way of chunking audio segments already implemented in pyannote?
from similaritylearning.
Not sure what you need.
SpeechSegmentGenerator
already yields audio chunks, whose duration is controlled by the duration
parameter.
Can you clarify your needs?
from similaritylearning.
Nevermind, I misunderstood the
duration
parameter. I will try changing that on Monday.
Thanks!
from similaritylearning.
@hbredin Please tell me if I can do something to help solve the 3199 issue.
If you want to force it to see what it looks like, you can use the sv-train2
branch and run:
python -W ignore main.py --task speaker --loss softmax --epochs 1500 --no-plot --no-save --batch-size 100 --log-interval 5
The problem occurs around 25% of the first epoch
from similaritylearning.
I am not sure why this happens.
One way of understanding this behavior is to create a simple script that simply iterates forever on SpeechSegmentGenerator and stops as soon as the number of samples is not 3200.
You can edit SpeechSegmentGenerator temporarily so that it also returns the value of sub_segment
and files[i]
.
from similaritylearning.
Starting from here, we will be able to investigate what is happening
from similaritylearning.
Got it. I'm switching to STS for the time being, to integrate the model and dataset.
After that I'll start working this out.
Thanks!
from similaritylearning.
Will unblock and use 0-padding for samples with wrong dimensions while we still look for what's causing this problem.
from similaritylearning.
The validation code is too expensive to run after each epoch, will use a separate script to run validations in parallel when a model is saved. This will be done as part of another issue: #25
The remaining task for this issue is being able to train a model without validation.
from similaritylearning.
Update: Validation can be done in-training for VoxCeleb1, but it would be useful to parallelize anyway to tackle VoxCeleb2.
Models can be trained using cross entropy, although the results are not very good. Will close this issue and open another one to address that problem.
from similaritylearning.
Related Issues (20)
- Integrate VoxCeleb HOT 3
- Integrate STS Model HOT 1
- Train Contrastive STS model HOT 3
- Integrate SV Model HOT 3
- Trainer Callbacks to Separate Files HOT 1
- Integrate STS Dataset HOT 1
- STS Cluster Simulation HOT 4
- Question about parameter "s" in ArcFace HOT 2
- Triplet STS experiment HOT 2
- ArcFace Speaker Experiment HOT 1
- EER Parallel Validation HOT 1
- Train Softmax STS model
- Significance Test: KL-Divergence and Contrastive loss
- Speaker Verification Training Problem HOT 1
- Test and Compare Speaker Verification Losses
- how to run
- Model Saving HOT 2
- STS Golden Rating plot
- STS Triplets HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from similaritylearning.