Git Product home page Git Product logo

Comments (16)

groadabike avatar groadabike commented on August 23, 2024 1

@faroit

@groadabike great to hear that convtasnet works. Are you getting to results similar to demucs?

Not really, at the moment I have really poor performances.
As soon as I have a decent trained model, I would send it to you.

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Wow that would be great! We'd be really glad to have your contributions obviously. Music separation was on the list but we didn't get to it yet.

A nice hello from the sigsep gang,

This looks like a very nice and ambitious approach. Would love to contribute here. Would you be interested in adding music separation things such as

  • The musdb dataset

Yes please !

  • Music specific augmentations

Yes please ! Can you link to a paper or an existing implementation? What do you have in mind for it? Implementation in numpy or torch?

  • multichannel support throughout the package

Yes please ! Again, what do you have in mind exactly? We are very interested.
The docs was not updated yet but you might be interested by #42, our STFT now supports arbitrary number of channels, the only convention is that time is last (like nn.Conv1D)

  • Open-unmix model

Sure, it would be great !

Feel free to join the slack, it will make some things easier.

I'm going to moderately refactor some things in the next days, maybe I can ping you when it's ready.
Thanks again for your message

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

@faroit Released 0.1.0 yesterday, should be ready to use now ! 🚀

from asteroid.

faroit avatar faroit commented on August 23, 2024

Thats great. Will check it out soon. In the meantime, I wanted to ask if you are interested to work on a new evaluation tool that should serve the whole community and replace mir_eval.separation and museval. I can add you to the discussion if you are interested

from asteroid.

groadabike avatar groadabike commented on August 23, 2024

Hi @faroit, I am testing a model (ConvTasNet) by using the current version of the musdb18 dataset and it works great.
I just did small modifications for my specific problem:

  • Transform to mono
  • work with 16KH
  • Target = vocals.

Can I ask you why the mixture is the addition of all sources instead of using the mixture.wav from each directory?
What can I expect in the second draft of MUSDB18 dataset?

Thank you

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Please correct me if I'm wrong.

Can I ask you why the mixture is the addition of all sources instead of using the mixture.wav from each directory?

If you pass only ['bass', 'drums'] you'll have a mixture that only contains bass and drums, and separation still makes sense in this way.
If you load the mixture, you'll always have all the sources.

What can I expect in the second draft of MUSDB18 dataset?

I think he meant to add the directions to download and convert MUSDB18 to wav, right?

Now that I have you here, @groadabike, would you like to share your pre-trained model?

from asteroid.

groadabike avatar groadabike commented on August 23, 2024

I f you pass only ['bass', 'drums'] you'll have a mixture that only contains bass and drums, and separation still makes sense in this way.
If you load the mixture, you'll always have all the sources.

You are right about this. But, currently, all sources become targets. So, I had to modify the dataset to just get the vocals and background as targets.

I think he meant to add the directions to download and convert MUSDB18 to wav, right?

Do you mean to have the ability to manage both MUSDB18 and MUSDB18HQ?
I also asked because, in the current MUSDB18dataset, the sources are returned in a dictionary instead of a vstacked np array.

Now that I have you here, @groadabike, would you like to share your pre-trained model?

Yes, sure!!
Let me fix some errors first. There is no novelty here, just adaptations.
How can I share the model with you?

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Oh great thank you !

The idea was to return a dict here because the sources are not of the same nature (as opposed to speech/speech separation), and you can still use the default Dataset and make vocals and backgrounds your target when you receive the dict.
I think that was the intended usage.

Regarding sharing the model (thanks), did you use the plain ConvTasNet importable from asteroid or did you change things there?

from asteroid.

faroit avatar faroit commented on August 23, 2024

@groadabike great to hear that convtasnet works. Are you getting to results similar to demucs?

from asteroid.

groadabike avatar groadabike commented on August 23, 2024

Regarding sharing the model (thanks), did you use the plain ConvTasNet importable from asteroid or did you change things there?

Yes, I did use the plain ConvTasNet model. I am re-running again. When I finished, I can tell you how it went.

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

Which loss do you use?
Did you check the demucs implementation?

from asteroid.

groadabike avatar groadabike commented on August 23, 2024

Which loss do you use?

That was my first error, didn't change it to L1 loss.

Did you check the demucs implementation?

Yes, I did.
Just to confirm that I am not mixing terminology and have another silly mistake.
Are these parameters correct?

filterbank: # Conv-TasNet symbols from table 1
n_filters: 256 # N
kernel_size: 20 # L
stride: 10 # L/2
masknet:
n_blocks: 8 # X
n_repeats: 4 # R
mask_act: relu # mask_nonlinear
conv_kernel_size: 3 # P
bn_chan: 256 # B
skip_chan: 256 # Sc
hid_chan: 512 # H

In terms of the augmentation, will it be a standard way to do it in Asteroid or should I implement it for the specific case? If I am trying to replicate demucs performances with Asteroid, I should implement, as it says in demucs paper, shuffling sources within one batch to generate one new mix, randomly swapping channels

from asteroid.

mpariente avatar mpariente commented on August 23, 2024

These parameters are correct yes.

There is no default way to do this is Asteroid yet but we've been thinking about it with @faroit but it not clear yet.
You can check this script on MUSDB, or the dynamic mixing on WHAM.

from asteroid.

YadongChen-1016 avatar YadongChen-1016 commented on August 23, 2024

Hi, I've recently been doing audio source separation tasks by using the Conv-Tasnet model. However, I have no way to obtain the MUSDB18HQ dataset, I would be very grateful if you could provide it to me.

from asteroid.

faroit avatar faroit commented on August 23, 2024

@YadChen request here: https://zenodo.org/record/3338373

from asteroid.

YadongChen-1016 avatar YadongChen-1016 commented on August 23, 2024

�Thans!

from asteroid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.