Git Product home page Git Product logo

Comments (26)

teticio avatar teticio commented on May 26, 2024 1

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

You need a more recent version. I have updated the requirements.txt and setup.cfg files to require a version >= 0.4.1. It should work with the latest version of diffusers so just do pip install --upgrade diffusers and let me know if that works for you. Thanks!

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.

I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb

If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

So I did 100 epochs (you can see the tensorboard here https://huggingface.co/teticio/audio-diffusion-256/tensorboard. After 50 it was pretty good, but it did continue to improve.

Thanks for the info on pushing - that is very useful to know.

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

I don't have one of the top of my head, I'm sure you can figure it out searching for a way using , say, librosa

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

Team haha - it's just me really.

I made adaptations to use grayscale (1 channel) as opposed to colour (3 channels). This was relativley straightforward in train_unconditional.py - the main changes involved replacing 3 with 1 and 'RGB' with 'L'. The train_vase.py took a bit more work - the chnages there can be found searching for similar terms, as well as for channels and, most importantly, I changed the config file ldm_autoenconder_kl.yaml compared to the version I got from the Stable Diffusion repo.

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024 1

L = mode for grayscale, as opposed to RGB

I think the best think to do is that you do diff against the files that I used as a starting point https://github.com/huggingface/diffusers/blob/main/examples/train_unconditional.py, https://github.com/CompVis/stable-diffusion/blob/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml and https://github.com/CompVis/stable-diffusion/blob/main/main.py (for train_vae.py).

I found the training speed and convergence was faster and the GPU memory requirement less.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio , i followed your instructions to try training on colab.
It completes epoch 0 upto 50 steps , but after that the following error pops up

Epoch 0: 100% 50/50 [00:41<00:00, 1.21it/s, ema_decay=0.946, loss=0.106, lr=1e-5, step=50] Traceback (most recent call last): File "scripts/train_unconditional.py", line 381, in <module> main(args) File "scripts/train_unconditional.py", line 284, in main batch_size=args.eval_batch_size, File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/audiodiffusion/__init__.py", line 256, in __call__ self.progress_bar(self.scheduler.timesteps[start_step:])): AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar' Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]].

The error was
AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar'

I've attached an screenshot of the error below.

Hope you could suggest a fix for the same.
Screenshot from 2022-11-02 16-54-33

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

was using the one that was in the requirements
diffusers>=0.2.4

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

could you make a training colab notebook , it would be great way to get started.
since only the inference notebook is currently available.

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024

I will look into making a colab notebook for training.

As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do pip list and note the version? Also, it would help to know the command line arguments you used that lead to an error .

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

I will look into making a colab notebook for training.

As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do pip list and note the version? Also, it would help to know the command line arguments you used that lead to an error .

cool , looking forward to the notebook.
I think it was 0.2.4 in the requirements.txt

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

You need a more recent version. I have updated the requirements.txt and setup.cfg files to require a version >= 0.4.1. It should work with the latest version of diffusers so just do pip install --upgrade diffusers and let me know if that works for you. Thanks!

yes, this works.
I removed the version specification , and just used pip install diffusers , so now it is working.
def looking forward to your notebook as well , maybe you can mention recommended number of input samples , probably you can also include a script at the end of the training notebook where users can generate .wav sample outputs from their trained model.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

I just reach epoch 99 , then this happens :

Several commits (11) will be pushed upstream. WARNING:huggingface_hub.repository:Several commits (11) will be pushed upstream. 100% 1000/1000 [04:03<00:00, 4.10it/s] Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]].

Screenshot from 2022-11-02 20-08-03

Any idea why this happens @teticio

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio
thanks for the clarification.

btw, the notebook you linked is an inference notebook right?

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.

I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb

If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).

I found the training notebook in the folder inside the notebooks folder in the repo.
But I guess you linked another notebook by mistake in your comment above.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

btw @teticio ,

if I have 100 - 1s wav files , I only have to train it for 10 epochs using ddim?
would that be enough , or is there any formula that relates epochs to number of audio samples (and audio length)

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio , i just heard the generation after training for 10 epochs on 100 - 1s audio clips , it was just noise.
Now , I've changed the number of epochs to 100 and am retraining the model, do you have any other recommendations .

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.

That is interesting.
How many epochs did you train the 20k 5s samples for?
Also , do you have any suggestions on how to denoise the generated output , since noise gets added to empty spaces , did you ever look into anything in that direction?

I was able to push to hugginface by doing the following :

First I cd into the model folder , then I run the following commands

!git lfs install
!git add .
!git lfs migrate import --everything
!git commit -m "initial commit"
!git push origin main --force

This allowed me to successfully push the model via colab.

Also , I have a local gpu which I use for model training via jupyter notebook.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio , that's good to know.

I was going through the training notebook.
It generates the image and audio.

I was able to save the image to local path using image.save()
but , how can i save the audio to a local path within the python script as a .wav file?

for saving the image i used :

image.save("filename.jpg", 'JPEG')

the audio is displayed using
display(Audio(audio, rate=sample_rate))

to save it i tried (from scipy.io.wavfile import write)

write('test.wav', sample_rate, audio)

but that saves a file which is very low in volume , do you have a fix to save it properly as a .wav in local path.

from audio-diffusion.

teticio avatar teticio commented on May 26, 2024

The easiest way is to click on the three dots on the audio widget and download it from there.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

The easiest way is to click on the three doors on the audio widget and download it from there.

yea. I wanted to write it in the script , because , if i'm generating 100 files , then clicking download 100 times would be a hassle.

So , do you have any work around for the same.
maybe some other way to save as .wav

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio ,

since the mel inputs here are grayscale.
I was wondering if you have implemented any new ideas in this repo compared to the original DDPM implementation in diffusers which is general purpose (for color images too) .

Since , in case of mel we only use grayscale input and expect only grayscale to be generated , did you make any changes to any parameters or techniques so that DDPM works better with grayscale image training?

If so , I'd love to know some of the novelties you and the team have implemented in this repo.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio ,

did you implement any functions to prevent waste of computation?
because , here you are only training grayscale inputs , so you might have changed things from the usual DDPM implementation which is more focused on natural color images which are more complex.

compared to those natural images, aren't mel spectrograms in greyscale more simple images , in terms of features etc?

so wanted to know if you have added any unique implementations in this repo when compared to vanilla DDPMs , if so , hope you can mention them here.

from audio-diffusion.

GeorvityLabs avatar GeorvityLabs commented on May 26, 2024

@teticio , that is interesting to hear.
So L is the lightness in the image right?

could you go into a bit more detail regarding the changes made in train_vase.py and ldm_autoencoder_kl.yaml , and maybe explain as to what motivated to make those changes and how it affects the training process positively.

from audio-diffusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.