Music separation datasets and recipes about asteroid HOT 16 CLOSED

asteroid-team commented on August 23, 2024

Music separation datasets and recipes

from asteroid.

Comments (16)

groadabike commented on August 23, 2024 1

@faroit

@groadabike great to hear that convtasnet works. Are you getting to results similar to demucs?

Not really, at the moment I have really poor performances.
As soon as I have a decent trained model, I would send it to you.

from asteroid.

mpariente commented on August 23, 2024

Wow that would be great! We'd be really glad to have your contributions obviously. Music separation was on the list but we didn't get to it yet.

A nice hello from the sigsep gang,

This looks like a very nice and ambitious approach. Would love to contribute here. Would you be interested in adding music separation things such as

The musdb dataset

Yes please !

Music specific augmentations

Yes please ! Can you link to a paper or an existing implementation? What do you have in mind for it? Implementation in numpy or torch?

multichannel support throughout the package

Yes please ! Again, what do you have in mind exactly? We are very interested.
The docs was not updated yet but you might be interested by #42, our STFT now supports arbitrary number of channels, the only convention is that time is last (like nn.Conv1D)

Open-unmix model

Sure, it would be great !

Feel free to join the slack, it will make some things easier.

I'm going to moderately refactor some things in the next days, maybe I can ping you when it's ready.
Thanks again for your message

from asteroid.

mpariente commented on August 23, 2024

@faroit Released 0.1.0 yesterday, should be ready to use now ! 🚀

from asteroid.

faroit commented on August 23, 2024

Thats great. Will check it out soon. In the meantime, I wanted to ask if you are interested to work on a new evaluation tool that should serve the whole community and replace mir_eval.separation and museval. I can add you to the discussion if you are interested

from asteroid.

groadabike commented on August 23, 2024

Hi @faroit, I am testing a model (ConvTasNet) by using the current version of the musdb18 dataset and it works great.
I just did small modifications for my specific problem:

Transform to mono
work with 16KH
Target = vocals.

Can I ask you why the mixture is the addition of all sources instead of using the mixture.wav from each directory?
What can I expect in the second draft of MUSDB18 dataset?

Thank you

from asteroid.

mpariente commented on August 23, 2024

Please correct me if I'm wrong.

Can I ask you why the mixture is the addition of all sources instead of using the mixture.wav from each directory?

If you pass only ['bass', 'drums'] you'll have a mixture that only contains bass and drums, and separation still makes sense in this way.
If you load the mixture, you'll always have all the sources.

What can I expect in the second draft of MUSDB18 dataset?

I think he meant to add the directions to download and convert MUSDB18 to wav, right?

Now that I have you here, @groadabike, would you like to share your pre-trained model?

from asteroid.

groadabike commented on August 23, 2024

I f you pass only ['bass', 'drums'] you'll have a mixture that only contains bass and drums, and separation still makes sense in this way.
If you load the mixture, you'll always have all the sources.

You are right about this. But, currently, all sources become targets. So, I had to modify the dataset to just get the vocals and background as targets.

I think he meant to add the directions to download and convert MUSDB18 to wav, right?

Do you mean to have the ability to manage both MUSDB18 and MUSDB18HQ?
I also asked because, in the current MUSDB18dataset, the sources are returned in a dictionary instead of a vstacked np array.

Now that I have you here, @groadabike, would you like to share your pre-trained model?

Yes, sure!!
Let me fix some errors first. There is no novelty here, just adaptations.
How can I share the model with you?

from asteroid.

mpariente commented on August 23, 2024

Oh great thank you !

The idea was to return a dict here because the sources are not of the same nature (as opposed to speech/speech separation), and you can still use the default Dataset and make vocals and backgrounds your target when you receive the dict.
I think that was the intended usage.

Regarding sharing the model (thanks), did you use the plain ConvTasNet importable from asteroid or did you change things there?

from asteroid.

faroit commented on August 23, 2024

@groadabike great to hear that convtasnet works. Are you getting to results similar to demucs?

from asteroid.

groadabike commented on August 23, 2024

Regarding sharing the model (thanks), did you use the plain ConvTasNet importable from asteroid or did you change things there?

Yes, I did use the plain ConvTasNet model. I am re-running again. When I finished, I can tell you how it went.

from asteroid.

mpariente commented on August 23, 2024

Which loss do you use?
Did you check the demucs implementation?

from asteroid.

groadabike commented on August 23, 2024

Which loss do you use?

That was my first error, didn't change it to L1 loss.

Did you check the demucs implementation?

Yes, I did.
Just to confirm that I am not mixing terminology and have another silly mistake.
Are these parameters correct?

filterbank: # Conv-TasNet symbols from table 1
n_filters: 256 # N
kernel_size: 20 # L
stride: 10 # L/2
masknet:
n_blocks: 8 # X
n_repeats: 4 # R
mask_act: relu # mask_nonlinear
conv_kernel_size: 3 # P
bn_chan: 256 # B
skip_chan: 256 # Sc
hid_chan: 512 # H

In terms of the augmentation, will it be a standard way to do it in Asteroid or should I implement it for the specific case? If I am trying to replicate demucs performances with Asteroid, I should implement, as it says in demucs paper, shuffling sources within one batch to generate one new mix, randomly swapping channels

from asteroid.

mpariente commented on August 23, 2024

These parameters are correct yes.

There is no default way to do this is Asteroid yet but we've been thinking about it with @faroit but it not clear yet.
You can check this script on MUSDB, or the dynamic mixing on WHAM.

from asteroid.

YadongChen-1016 commented on August 23, 2024

Hi, I've recently been doing audio source separation tasks by using the Conv-Tasnet model. However, I have no way to obtain the MUSDB18HQ dataset, I would be very grateful if you could provide it to me.

from asteroid.

faroit commented on August 23, 2024

@YadChen request here: https://zenodo.org/record/3338373

from asteroid.

YadongChen-1016 commented on August 23, 2024

�Thans!

from asteroid.

Music separation datasets and recipes about asteroid HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent