Git Product home page Git Product logo

Comments (6)

titu1994 avatar titu1994 commented on June 6, 2024 2

Very interesting papers, especially the CIF-T. Will read them after the long weekend.

My thought was pretty much the same, very hard to beat RNNT without sacrificing something (speed or quality).

AEDs, despite being slow and having hallucination issues, enable lots of sophisticated capability which rnnt alone can probably handle but with poor results. So imo it makes sense to trade off some of the benefits of RNNT for those new capabilities.

CIF, to me at least, just falls in the middle of the niche already dominated by CTC and RNNT, that is efficient and highly accurate ASR. AEDs fill another niche - advanced capabilities that aren't efficient with monotonic alignment learning losses.

from nemo.

ArtyomZemlyak avatar ArtyomZemlyak commented on June 6, 2024 1

@titu1994 Thank you very much for your detailed answer!
Yes, I also had doubts about how high-quality Paraformer could actually be.

In our team we already use Fast Conformer (CTC or HAT) (xlarge for Knowledge Distillation and medium for production) and its speed is quite good compared to the regular Conformer, and the quality practically does not drop.

But in search of some improvements, it would be interesting to try alternative architectures. And one of them is Paraformer (due to the fact that it is NAR) and probably even its CIF.

There are also several interesting articles on CIF, using which maybe can be achievied interesting or better results:

  1. https://arxiv.org/pdf/1905.11235.pdf - standard article (CIF: CONTINUOUS INTEGRATE-AND-FIRE FOR END-TO-END SPEECH RECOGNITION) (https://github.com/MingLunHan/CIF-PyTorch)
  2. https://github.com/MingLunHan/CIF-ColDec (add selective context to improve decoding result)
  3. https://github.com/MingLunHan/CIF-HieraDist (transfer of knowledge from a PLM model to an ASR model, at the linguistic and acoustic level) (in this article Branchformer shows a good summary - you can take a look at it)
  4. https://arxiv.org/pdf/2307.14132.pdf (CIF-T: A NOVEL CIF-BASED TRANSDUCER ARCHITECTURE FOR AUTOMATIC SPEECH RECOGNITION) (good article with interesting additions) (here, by the way, paraformer does not show very good results)

NeMo support for AED models will come soon* (no release date for the time being).

  • Its be very good!

from nemo.

titu1994 avatar titu1994 commented on June 6, 2024

I've read this paper before, so some of my comments are below.

  1. MWER is a pain to train with and implement, plus there's no public efficient implementation as far as I can see. I'd rather stick to rnnt loss or even CTC. 2

  2. Continuous Integrate and Fire is a novel concept but has not gained much traction in the 4 years since it's paper in 2019. I've personally experimented a bit with it in some branch, and found that while it works fine, it's wer is inferior to RNNT. Maybe there are new variants to CIF that surpass RNNT

  3. NeMo support for AED models will come soon* (no release date for the time being).

  4. The papers core contribution seems to be RTF at compatible accuracy to AR models. This is very good but for RTF - Fast Conformer already gets very good RTF (0.0003x on long audio files ~ 3687 times real time factor). So encoder level optimizations already surpass this paper with CTC. With RNNT, that RTF is something like 0.009 which is on par with this paper. I'm somewhat confident that rnnt wer would be competitive as well with this model, while also supporting long audio inference and other tasks like speech translation.

So it seems para former would be a middle ground between CTC and RNNT in terms of accuracy and speed both.

Still, I need to reread the paper to see if I'm missing something crucial.

Just a note, there are just my personal comments, the team will need to discuss whether the model will be added to NeMo or not.

If you're up for it, we will gladly welcome contributions to add this model too !

from nemo.

kobenaxie avatar kobenaxie commented on June 6, 2024

What about AIF ?

from nemo.

github-actions avatar github-actions commented on June 6, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

from nemo.

github-actions avatar github-actions commented on June 6, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

from nemo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.