Git Product home page Git Product logo

Comments (8)

popcornell avatar popcornell commented on August 23, 2024

Me and @mpariente discussed about this since the beginning of Asteroid.
Basically now it is super flexible, but in my opinion it is also confusing.
In order to run the full recipe you have however to pass also paths to where.json data is saved which can be modified by the user in run.sh. ( otherwise the user will have to modify paths both in the yaml and run.sh)

My suggestion is to keep the paths in run.sh and every hyperpar in the yaml file. Paths will be common to multiple runs so one just specify them in run.sh and then moves to the yaml to modify hyperpars for successive runs.

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Yes, I agree that this is still confusing and not optimal and we discussed it several times..

One of the most important feature to have IMO is to be able to change the hparams of an experiment from the command line so that you can use a for loop to launch experiments with different hparams, without needing to duplicate your conf.yml file.
I personally never duplicate conf files, but we can change the logic to make this simpler.

@jonashaag can you have a look at this PR (#104) and tell me what you think?
I think this is close to what I'd like:

  • There are no duplicates between bash and yaml
  • Every yml field can be modified from the command line
  • You could easily pass a config file to start from I guess

The only thing that holds me to do that is that it's "complicated" and the code can be confusing..

from asteroid.

jonashaag avatar jonashaag commented on August 23, 2024

Looks good to me, but I wonder why roll our own? We could use something like https://github.com/bw2/ConfigArgParse instead (never used it but looks good).

For my use cases, I think best case would be something along the lines of:

  • Have a single "model command runner" Python file that knows how to preprocess, train, eval, infer for a certain model. So, more or less a mixture of run.sh, train.py and eval.py with the addition of inference (e.g. directly from .wav files).
  • The runner file reads a model YAML file (like conf.yaml but without the strictly "localhost" things like data paths). The values read from the YAML file can be changed using CLI args.
  • The runner file also parses "localhost" things like data paths from CLI args. These do not make sense in the model YAML file.
  • Get rid of run.sh entirely, since then everything is covered by the Python script, and this way we also do not have to deal with arg parsing that is consistent in Bash and Python code.
  • (Optional: Move dataset preprocessing code to the dataset code itself, since it's duplicated for any model/dataset combination at the moment)

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Also thought about removing the run.sh file completely and replace it by an omnipotent Python file.
And yes, in this case, argument parsing gets much simpler.
It's also related to #183, because unit testing python files would be easier. And we could refactor the evaluation to be part of the lightning module.

Would you be willing to put some time into this @jonashaag ?
There are lot of things to improve on asteroid's code, particularly things which are recipe-related IMO and your feedbacks are well appreciated thanks !

@popcornell What would you say about moving full python?

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

You would think about something like this?

python run.py dataprep  # run dataprep train and eval (default)
python run.py train --someargs  # run train and eval
python run.py eval --otherargs  # run eval only
python run.py predict  --files_glob # perform inference

from asteroid.

popcornell avatar popcornell commented on August 23, 2024

IDK for simple things full python is nice. But what about if we have to check some dependencies in one recipe and pip install them. An advantage of going full python is debugging. Easier especially with an IDE.
Sure there are workarounds but they are not immediate as they are in bash. Also data download is not immediate. You have always to use subprocess.
And this comes from one who is not a bash super fan. I prefer to avoid it when I can but for some things it is the best tool.

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

@popcornell is right, parts of data preparation is a mess in python, how about?

  • Centralized bash script for each dataset, registered as a package script (accessible everywhere)
  • Python wrapper to that script, with subprocess, inside asteroid.data.dataset_name
  • Recipe in python just calling asteroid.data.librimix.dataprep in the dataprep mode

If we want to rewrite everything in Python for the dataprep later, we also can.

from asteroid.

jonashaag avatar jonashaag commented on August 23, 2024

I will try to spend some time on understanding the similarities and differences of the models and datasets and come up with a concept

from asteroid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.