This is really interesting, the results look much better than gentle [which is already

Evaluations with OOD data? about ctc_segmentation HOT 3 CLOSED

AdolfVonKleist commented on August 16, 2024

Evaluations with OOD data?

from ctc_segmentation.

Comments (3)

cornerfarmer commented on August 16, 2024 4

So, the dynamic program as described in the paper and as it is implemented in this repo, is of course not designed for unknown audio in the middle of the text. Therefore the alignment algorithm will probably try to stretch the words of the transcript before and after the unknown part across it. So in the end most of the audio will be aligned correctly besides the part around the unknown segment. However, one can easily detect such incorrect segments by looking at the confidence score provided by the network.

Additionally, it should also be possible to extend the dynamic program to support unknown segments in the middle of the audio. One could for example allow the algorithm to skip parts of the text or the audio if this leads to a higher average probability across whole alignment in the end. However, this needs to be carefully designed, otherwise the algorithm might just skip the whole audio.

from ctc_segmentation.

cornerfarmer commented on August 16, 2024

Thanks for being interested in our work!

Our main motivation for this tool was to align public available data in an utterance-wise fashion, so we can use it for supervised ASR training.
Usually data from e.q. librivox.de consists of long audio files (~1h) together with one long transcript without any alignments.
What makes the automatic alignment particularly challenging is that the speaker is often introducing himself/herself and the book at the beginning and the end of every audio file.
As these parts are not contained in the transcript, many forced alignments algorithms fail.
In our evaluation we tried to simulate such situations by prepending/appending audio to our test data.

We have therefore not looked into using this technique for completely unlabelled data and I am also not completely sure how this would work out, but it sounds like a good idea and might be promising for future work to look into this.

from ctc_segmentation.

AdolfVonKleist commented on August 16, 2024

Thanks, that makes sense and was more or less what I expected. What would be your expectation here regarding utterance internal misalignments in that case? For example did you look at stitching any of these 'incorrect' segment directly into the middle of the utterances? Do you have any reason to think it would be worse or better than the performance you saw here working with appended/prepended segments? Thanks again for sharing this work and the great implementation.

from ctc_segmentation.

Evaluations with OOD data? about ctc_segmentation HOT 3 CLOSED

Comments (3)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent