First, love the work, been playing around with it for the past day or two. <p dir=

Starting Point for Dataset about audio-diffusion HOT 4 CLOSED

teticio commented on May 23, 2024

Starting Point for Dataset

from audio-diffusion.

Comments (4)

teticio commented on May 23, 2024

Hi there.

To be honest I don't know the answer, I can only say what I have done and speculate a little. My hope is that more people like you will try with different datasets and help us all understand better!

I have found that it works best with electronic music and my guess is that it is useful to have entire songs chopped up into samples. I think of these as "augmentations" in the same way as you might use images taken from different angles to train an image model.

I can tell you for sure that trying to compensate for lacking data with more epochs will definitely not work well. The model will just overfit to the little data it has and, at best, just memorize it. For reference, I used about 400 songs - perhaps you could slice the 50 songs into 5 second samples starting from 0 seconds, 1 seconds, .... 4 seconds. That might help.

from audio-diffusion.

JousterL commented on May 23, 2024

Thanks for the clear response! That definitely helps, in that I didn't think about overlapping samples being 'acceptable' from a dataset perspective. I'll definitely try that.

I am curious how the Mel spectrograms are generated in a 'naive' approach (I.E. when I was first playing around with it, I wasn't careful and fed Mel the entire audio file, but with your suggested parameters of 64x64 resolution). Does it split the song up into equivalent # of samples, and my custom scripting/sample creation is duplicative, or is it compressing a 3.5 minute song into a tiny window and compressing away all of the detail?

from audio-diffusion.

teticio commented on May 23, 2024

The script automatically chops it up into slices. If you want to do the augmentation I suggested, you could create additional input files by using something like ffmpeg to chop off the first 1 second, 2 second etc. But take into account the length in time your 64x64 spectrogram converts back to (use the test_mel notebook to play around with the parameters before training any models). I warn you that 64x64 will not get great results. You might want to try training on Colab - there is a starter notebook for this train_model although I haven't adapted it to push the models to hub (hint: install git lfs on Colab).

from audio-diffusion.

JousterL commented on May 23, 2024

Thank you for the guidance! I'll definitely play around and explore expanded spectrograms.

from audio-diffusion.

Recommend Projects

Starting Point for Dataset about audio-diffusion HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent