Git Product home page Git Product logo

Comments (7)

pzelasko avatar pzelasko commented on August 21, 2024 2

OK, that is pretty similar to Kaldi. I can work with that. I also recently started to dislike the "configuration layer" and somehow prefer to write just the necessary things in a given script/notebook...

Some suggestions:

  • let's aim to keep all the model definitions in the "library" part of Icefall and specifically avoid local for that -- I feel that "well-tuned" configurations should be more easily re-usable than copy-paste style;
  • instead of models1/models2/modelsN, we could do icefall/models/conformer/v{1,2,3}.py, icefall/models/tdnnf/v{1,2,3}.py, etc. (or even better models/conformer/conformer_{1a,1b,1c}.py), I like that approach more because models1/models2 suggests it's a different, new version of the toolkit (like espnet, espnet2)
  • it'd be best to have the lexicon/lang creation in Python in the "library" part (as per k2-fsa/snowfall#191 which maybe I'll finally find the time for now that I'm back) -- I'd like to be able to re-use phone/BPE/subword/character lexicons and lexicon-related code across recipes, also merge multiple lexicons e.g. for multilingual/multi-accent models

Mid/long-term considerations:

  • ideally, every "training" script wouldn't be too long (I think ~200, maybe ~300 loc sounds reasonable, unless we come up with something very involved and non-standard), we should regularly move some common patterns to the "library" part as functions/classes...
  • we should at least consider a common function signature for all model forward methods to allow simpler use via torchscript (or we could add methods like .get_inputs_info() [not necessarily with that name] that help users figure out which inputs to provide, although maybe that beats the purpose?)

from icefall.

danpovey avatar danpovey commented on August 21, 2024 1

We can consider such things later on. We are working on a deadline right now and I want to consider this a demo of k2/lhotse for now, so we can explore architectures, before really settling on common APIs.

from icefall.

pzelasko avatar pzelasko commented on August 21, 2024

One more suggestion: let's try to aim to make the library friendly to use with jupyter notebooks by making most of the functionality importable (even if it's something used just by one recipe I guess); they are really great for prototyping/tinkering with these things

from icefall.

danpovey avatar danpovey commented on August 21, 2024

OK, that is pretty similar to Kaldi. I can work with that. I also recently started to dislike the "configuration layer" and somehow prefer to write just the necessary things in a given script/notebook...

Cool!

  • let's aim to keep all the model definitions in the "library" part of Icefall and specifically avoid local for that -- I feel that "well-tuned" configurations should be more easily re-usable than copy-paste style;

I guess I'm OK with that, if we mean things like the conformer being in the "library", but I think it could be a good idea to use local model.py to "put pieces together". The point is, we may want to experiment with a lot of new things, and I want it to be easy to do.

  • instead of models1/models2/modelsN, we could do icefall/models/conformer/v{1,2,3}.py, icefall/models/tdnnf/v{1,2,3}.py, etc. (or even better models/conformer/conformer_{1a,1b,1c}.py), I like that approach more because models1/models2 suggests it's a different, new version of the toolkit (like espnet, espnet2)

OK.

  • it'd be best to have the lexicon/lang creation in Python in the "library" part (as per Building lexicons in Python snowfall#191 which maybe I'll finally find the time for now that I'm back) -- I'd like to be able to re-use phone/BPE/subword/character lexicons and lexicon-related code across recipes, also merge multiple lexicons e.g. for multilingual/multi-accent models

After seeing a draft I might agree. I'm OK to centralize some code code after patterns emerge. But I want any utilities and centralized classes to befairly simple and easily separable, so that you can understand one piece without understanding the whole.

Mid/long-term considerations:

  • ideally, every "training" script wouldn't be too long (I think ~200, maybe ~300 loc sounds reasonable, unless we come up with something very involved and non-standard), we should regularly move some common patterns to the "library" part as functions/classes...

Sure, in the longer term hopefully they can become much shorter.

  • we should at least consider a common function signature for all model forward methods to allow simpler use via torchscript (or we could add methods like .get_inputs_info() [not necessarily with that name] that help users figure out which inputs to provide, although maybe that beats the purpose?)

I'm OK with settling on a particular order of tensors, e.g. (N, C, T), not sure if that's what you mean? (I think speechbrain does that?).

from icefall.

pzelasko avatar pzelasko commented on August 21, 2024

I'm OK with settling on a particular order of tensors, e.g. (N, C, T), not sure if that's what you mean? (I think speechbrain does that?).

my point was about the number of args/kwargs — currently the signature of forward in our models is inconsistent, so you can’t use them as drop in replacements with inference code. 100% plug and play doesn’t seem attainable (we may want to keep adding inputs to try out new things) but at the very least we could modify all signatures to accept (*args, **kwargs) so that simpler models can work with extra inputs.

from icefall.

danpovey avatar danpovey commented on August 21, 2024

from icefall.

pzelasko avatar pzelasko commented on August 21, 2024

Mm, do speechbrain or ESPNet do that? I'm cautious about this. Seems to me potentially a recipe for bugs.

I don't think they do. My knowledge might be out of date but the last time I checked, none of the existing PyTorch-based ASR frameworks I know of seriously addressed the matter of model deployment.

from icefall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.