cornerfarmer / ctc_segmentation Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 9.0 35 KB

Segment a given audio into utterances using a trained end-to-end ASR model.

Home Page: https://arxiv.org/abs/2007.09127

License: Apache License 2.0

Python 83.45% Shell 16.55%

ctc_segmentation's People

Contributors

Stargazers

Watchers

Forkers

sknadig abhishekyana entn-at ruohoruotsi dophist robinatp benjamesbabala holianh danchukim

ctc_segmentation's Issues

why did you do in this manner?

Hello, I am very grateful for your work. The result is really good. After reading the paper four or five times, I still feel very confused; especially the equations (1) and (2) in it. The work relies on the undecoded path graph obtained by the encoder model. So how does an encoder-decoder with ctc and attention speech recognition model help the segmentation precision?

TypeError: type object got multiple values for keyword argument 'gratis_blank'

Evaluations with OOD data?

This is really interesting, the results look much better than gentle [which is already a very nice tool].
I am curious: have you also evaluated it in a 'completely unlabelled' context?

Reading the paper my understanding is that the unlabelled section is limited to data where every target utterance still has some central kernel of data that does contain a reliable transcription. Then these recordings are prepended/appended with additional audio/speech data.

Have you / are you also looking using this as a means to extend a training corpus with, for instance, ASR hypothesis lattices produced for novel input?

I'm thinking something like a still slightly more structured segue into unsupervised or semi-supervised training like this:

https://github.com/ShigekiKarita/espnet-semi-supervised

License

Very impressive work!

The repo doesn't include any license file although the files you added to the espnet repo mention Apache 2.0 license. Would it possible to add license to this project?

Thanks.

cornerfarmer / ctc_segmentation Goto Github PK

ctc_segmentation's People

Contributors

Stargazers

Watchers

Forkers

ctc_segmentation's Issues

why did you do in this manner?

TypeError: type object got multiple values for keyword argument 'gratis_blank'

Evaluations with OOD data?

License

questions

where is the exp/tedlium2_rnn/cmvn.ark from?

word-based alignments?

how it works when i use my own CTC probabilities and char_list?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent