Comments (26)
Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh
from audio-diffusion.
You need a more recent version. I have updated the requirements.txt
and setup.cfg
files to require a version >= 0.4.1. It should work with the latest version of diffusers so just do pip install --upgrade diffusers
and let me know if that works for you. Thanks!
from audio-diffusion.
Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.
I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb
If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).
from audio-diffusion.
You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.
from audio-diffusion.
So I did 100 epochs (you can see the tensorboard here https://huggingface.co/teticio/audio-diffusion-256/tensorboard. After 50 it was pretty good, but it did continue to improve.
Thanks for the info on pushing - that is very useful to know.
from audio-diffusion.
I don't have one of the top of my head, I'm sure you can figure it out searching for a way using , say, librosa
from audio-diffusion.
Team haha - it's just me really.
I made adaptations to use grayscale (1 channel) as opposed to colour (3 channels). This was relativley straightforward in train_unconditional.py
- the main changes involved replacing 3 with 1 and 'RGB' with 'L'. The train_vase.py
took a bit more work - the chnages there can be found searching for similar terms, as well as for channels
and, most importantly, I changed the config file ldm_autoenconder_kl.yaml
compared to the version I got from the Stable Diffusion repo.
from audio-diffusion.
L = mode for grayscale, as opposed to RGB
I think the best think to do is that you do diff against the files that I used as a starting point https://github.com/huggingface/diffusers/blob/main/examples/train_unconditional.py
, https://github.com/CompVis/stable-diffusion/blob/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml
and https://github.com/CompVis/stable-diffusion/blob/main/main.py
(for train_vae.py
).
I found the training speed and convergence was faster and the GPU memory requirement less.
from audio-diffusion.
@teticio , i followed your instructions to try training on colab.
It completes epoch 0 upto 50 steps , but after that the following error pops up
Epoch 0: 100% 50/50 [00:41<00:00, 1.21it/s, ema_decay=0.946, loss=0.106, lr=1e-5, step=50] Traceback (most recent call last): File "scripts/train_unconditional.py", line 381, in <module> main(args) File "scripts/train_unconditional.py", line 284, in main batch_size=args.eval_batch_size, File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/audiodiffusion/__init__.py", line 256, in __call__ self.progress_bar(self.scheduler.timesteps[start_step:])): AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar' Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]].
The error was
AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar'
I've attached an screenshot of the error below.
Hope you could suggest a fix for the same.
from audio-diffusion.
Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh
was using the one that was in the requirements
diffusers>=0.2.4
from audio-diffusion.
Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh
could you make a training colab notebook , it would be great way to get started.
since only the inference notebook is currently available.
from audio-diffusion.
I will look into making a colab notebook for training.
As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do pip list
and note the version? Also, it would help to know the command line arguments you used that lead to an error .
from audio-diffusion.
I will look into making a colab notebook for training.
As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do
pip list
and note the version? Also, it would help to know the command line arguments you used that lead to an error .
cool , looking forward to the notebook.
I think it was 0.2.4 in the requirements.txt
from audio-diffusion.
You need a more recent version. I have updated the
requirements.txt
andsetup.cfg
files to require a version >= 0.4.1. It should work with the latest version of diffusers so just dopip install --upgrade diffusers
and let me know if that works for you. Thanks!
yes, this works.
I removed the version specification , and just used pip install diffusers , so now it is working.
def looking forward to your notebook as well , maybe you can mention recommended number of input samples , probably you can also include a script at the end of the training notebook where users can generate .wav sample outputs from their trained model.
from audio-diffusion.
I just reach epoch 99 , then this happens :
Several commits (11) will be pushed upstream. WARNING:huggingface_hub.repository:Several commits (11) will be pushed upstream. 100% 1000/1000 [04:03<00:00, 4.10it/s] Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]].
Any idea why this happens @teticio
from audio-diffusion.
@teticio
thanks for the clarification.
btw, the notebook you linked is an inference notebook right?
from audio-diffusion.
Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.
I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb
If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).
I found the training notebook in the folder inside the notebooks folder in the repo.
But I guess you linked another notebook by mistake in your comment above.
from audio-diffusion.
btw @teticio ,
if I have 100 - 1s wav files , I only have to train it for 10 epochs using ddim?
would that be enough , or is there any formula that relates epochs to number of audio samples (and audio length)
from audio-diffusion.
@teticio , i just heard the generation after training for 10 epochs on 100 - 1s audio clips , it was just noise.
Now , I've changed the number of epochs to 100 and am retraining the model, do you have any other recommendations .
from audio-diffusion.
You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.
That is interesting.
How many epochs did you train the 20k 5s samples for?
Also , do you have any suggestions on how to denoise the generated output , since noise gets added to empty spaces , did you ever look into anything in that direction?
I was able to push to hugginface by doing the following :
First I cd into the model folder , then I run the following commands
!git lfs install
!git add .
!git lfs migrate import --everything
!git commit -m "initial commit"
!git push origin main --force
This allowed me to successfully push the model via colab.
Also , I have a local gpu which I use for model training via jupyter notebook.
from audio-diffusion.
@teticio , that's good to know.
I was going through the training notebook.
It generates the image and audio.
I was able to save the image to local path using image.save()
but , how can i save the audio to a local path within the python script as a .wav file?
for saving the image i used :
image.save("filename.jpg", 'JPEG')
the audio is displayed using
display(Audio(audio, rate=sample_rate))
to save it i tried (from scipy.io.wavfile import write)
write('test.wav', sample_rate, audio)
but that saves a file which is very low in volume , do you have a fix to save it properly as a .wav in local path.
from audio-diffusion.
The easiest way is to click on the three dots on the audio widget and download it from there.
from audio-diffusion.
The easiest way is to click on the three doors on the audio widget and download it from there.
yea. I wanted to write it in the script , because , if i'm generating 100 files , then clicking download 100 times would be a hassle.
So , do you have any work around for the same.
maybe some other way to save as .wav
from audio-diffusion.
@teticio ,
since the mel inputs here are grayscale.
I was wondering if you have implemented any new ideas in this repo compared to the original DDPM implementation in diffusers which is general purpose (for color images too) .
Since , in case of mel we only use grayscale input and expect only grayscale to be generated , did you make any changes to any parameters or techniques so that DDPM works better with grayscale image training?
If so , I'd love to know some of the novelties you and the team have implemented in this repo.
from audio-diffusion.
@teticio ,
did you implement any functions to prevent waste of computation?
because , here you are only training grayscale inputs , so you might have changed things from the usual DDPM implementation which is more focused on natural color images which are more complex.
compared to those natural images, aren't mel spectrograms in greyscale more simple images , in terms of features etc?
so wanted to know if you have added any unique implementations in this repo when compared to vanilla DDPMs , if so , hope you can mention them here.
from audio-diffusion.
@teticio , that is interesting to hear.
So L is the lightness in the image right?
could you go into a bit more detail regarding the changes made in train_vase.py
and ldm_autoencoder_kl.yaml
, and maybe explain as to what motivated to make those changes and how it affects the training process positively.
from audio-diffusion.
Related Issues (20)
- Recommended training hyperparameters for 44.1Khz & 48Khz Samplerate HOT 2
- diffusers v-0.12.0 causes import issues HOT 1
- Diffusers v0.12 removed the `ema_model.averaged_model` attribute HOT 1
- Increasing input size HOT 4
- how does the audio_to_images.py file work? HOT 3
- Whether the longer music sample is the repetition of a shorted sample? HOT 1
- NameError: name 'transformers' is not defined upon running model via Gradio HOT 2
- Dataset constriants HOT 16
- High fidelity training? HOT 3
- Training own music samples? HOT 1
- Can I input audio file then generate image HOT 2
- Numpy Error HOT 1
- AttributeError: 'AutoencoderKL' object has no attribute 'sample_size' HOT 3
- teticio/audio-diffusion-256 is really good HOT 1
- multi-gpu training HOT 1
- [Little Feedback] Thank you! :) HOT 2
- is it possible to use the train_unet.py script as a regular ldm? HOT 2
- whats the difference between 256 and 512 dataset HOT 1
- Duration of generated audio HOT 4
- WARNING: audio_to_images: No valid audio files were found error! HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audio-diffusion.