cornerfarmer / ctc_segmentation Goto Github PK
View Code? Open in Web Editor NEWSegment a given audio into utterances using a trained end-to-end ASR model.
Home Page: https://arxiv.org/abs/2007.09127
License: Apache License 2.0
Segment a given audio into utterances using a trained end-to-end ASR model.
Home Page: https://arxiv.org/abs/2007.09127
License: Apache License 2.0
Hello, I am very grateful for your work. The result is really good. After reading the paper four or five times, I still feel very confused; especially the equations (1) and (2)
in it. The work relies on the undecoded path graph obtained by the encoder
model. So how does an encoder-decoder with ctc and attention speech recognition model help the segmentation precision?
TypeError: type object got multiple values for keyword argument 'gratis_blank'
This is really interesting, the results look much better than gentle [which is already a very nice tool].
I am curious: have you also evaluated it in a 'completely unlabelled' context?
Reading the paper my understanding is that the unlabelled section is limited to data where every target utterance still has some central kernel of data that does contain a reliable transcription. Then these recordings are prepended/appended with additional audio/speech data.
Have you / are you also looking using this as a means to extend a training corpus with, for instance, ASR hypothesis lattices produced for novel input?
I'm thinking something like a still slightly more structured segue into unsupervised or semi-supervised training like this:
Very impressive work!
The repo doesn't include any license file although the files you added to the espnet repo mention Apache 2.0 license. Would it possible to add license to this project?
Thanks.
dear author , I want to use evaluate_segments.py to evaluate my output text grids and the effect, is it useful?
thanks for your reply very much
hi,I have run the program according to the README. now I want to use another model in ESPnet Model Zoo, but "cmvn.ark" is not included in the zip file. I wonder how can i get the cmvn.ark, and what is it used for? THX.
is it possible to get the word-based alignments instead of full utterance?
I have a issue at lumaku/ctc-segmentation#17. any suggestion is helpful for me. thanks very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.