Is your feature request related to a problem? Please describe. <p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[ASR] Add Paraformer model about nemo HOT 6 CLOSED

ArtyomZemlyak commented on June 6, 2024

[ASR] Add Paraformer model

from nemo.

Comments (6)

titu1994 commented on June 6, 2024 2

Very interesting papers, especially the CIF-T. Will read them after the long weekend.

My thought was pretty much the same, very hard to beat RNNT without sacrificing something (speed or quality).

AEDs, despite being slow and having hallucination issues, enable lots of sophisticated capability which rnnt alone can probably handle but with poor results. So imo it makes sense to trade off some of the benefits of RNNT for those new capabilities.

CIF, to me at least, just falls in the middle of the niche already dominated by CTC and RNNT, that is efficient and highly accurate ASR. AEDs fill another niche - advanced capabilities that aren't efficient with monotonic alignment learning losses.

from nemo.

ArtyomZemlyak commented on June 6, 2024 1

@titu1994 Thank you very much for your detailed answer!
Yes, I also had doubts about how high-quality Paraformer could actually be.

In our team we already use Fast Conformer (CTC or HAT) (xlarge for Knowledge Distillation and medium for production) and its speed is quite good compared to the regular Conformer, and the quality practically does not drop.

But in search of some improvements, it would be interesting to try alternative architectures. And one of them is Paraformer (due to the fact that it is NAR) and probably even its CIF.

There are also several interesting articles on CIF, using which maybe can be achievied interesting or better results:

https://arxiv.org/pdf/1905.11235.pdf - standard article (CIF: CONTINUOUS INTEGRATE-AND-FIRE FOR END-TO-END SPEECH RECOGNITION) (https://github.com/MingLunHan/CIF-PyTorch)
https://github.com/MingLunHan/CIF-ColDec (add selective context to improve decoding result)
https://github.com/MingLunHan/CIF-HieraDist (transfer of knowledge from a PLM model to an ASR model, at the linguistic and acoustic level) (in this article Branchformer shows a good summary - you can take a look at it)
https://arxiv.org/pdf/2307.14132.pdf (CIF-T: A NOVEL CIF-BASED TRANSDUCER ARCHITECTURE FOR AUTOMATIC SPEECH RECOGNITION) (good article with interesting additions) (here, by the way, paraformer does not show very good results)

NeMo support for AED models will come soon* (no release date for the time being).

Its be very good!

from nemo.

titu1994 commented on June 6, 2024

I've read this paper before, so some of my comments are below.

MWER is a pain to train with and implement, plus there's no public efficient implementation as far as I can see. I'd rather stick to rnnt loss or even CTC. 2
Continuous Integrate and Fire is a novel concept but has not gained much traction in the 4 years since it's paper in 2019. I've personally experimented a bit with it in some branch, and found that while it works fine, it's wer is inferior to RNNT. Maybe there are new variants to CIF that surpass RNNT
NeMo support for AED models will come soon* (no release date for the time being).
The papers core contribution seems to be RTF at compatible accuracy to AR models. This is very good but for RTF - Fast Conformer already gets very good RTF (0.0003x on long audio files ~ 3687 times real time factor). So encoder level optimizations already surpass this paper with CTC. With RNNT, that RTF is something like 0.009 which is on par with this paper. I'm somewhat confident that rnnt wer would be competitive as well with this model, while also supporting long audio inference and other tasks like speech translation.

So it seems para former would be a middle ground between CTC and RNNT in terms of accuracy and speed both.

Still, I need to reread the paper to see if I'm missing something crucial.

Just a note, there are just my personal comments, the team will need to discuss whether the model will be added to NeMo or not.

If you're up for it, we will gladly welcome contributions to add this model too !

from nemo.

kobenaxie commented on June 6, 2024

What about AIF ?

from nemo.

github-actions commented on June 6, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

from nemo.

github-actions commented on June 6, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

from nemo.

[ASR] Add Paraformer model about nemo HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent