Git Product home page Git Product logo

wavegan-pytorch's Introduction

WaveGAN v2 Pytorch

Pytorch implementation of WaveGAN , a machine learning algorithm which learns to generate raw audio waveforms.

  • In v2 Added ability to train WaveGANs capable of generating longer audio examples (up to 4 seconds at 16kHz)
  • In v2 Added ability to train WaveGANs capable of generating multi-channel audio

This is the ported Pytorch implementation of WaveGAN (Donahue et al. 2018) (paper) (demo) (sound examples). WaveGAN is a machine learning algorithm which learns to synthesize raw waveform audio by observing many examples of real audio. WaveGAN is comparable to the popular DCGAN approach (Radford et al. 2016) for learning to generate images.

In this repository, we include an implementation of WaveGAN capable of learning to generate up to 4 seconds of audio at 16kHz.

WaveGAN is capable of learning to synthesize audio in many different sound domains. In the above figure, we visualize real and WaveGAN-generated audio of speech, bird vocalizations, drum sound effects, and piano excerpts. These sound examples and more can be heard here.

Requirements

pip install -r requirements.txt

Datasets

WaveGAN can now be trained on datasets of arbitrary audio files (previously required preprocessing). You can use any folder containing audio, but here are a few example datasets to help you get started:

WaveGan Parameters (params.py)

  • target_signals_dir: folder including train subfolder contianing train wav data files
  • model_prefix: model name used for saving mode
  • n_iterations: number of train iterations
  • lr_g: generator learning rate
  • lr_d: discriminator learning rate
  • beta11: Adam optimizer first decay rate for moment estimates
  • beta2: Adam optimizer second decay rate for moment estimates
  • decay_lr: flag used to decay learning rate linearly through iterations till reaching zero at 100k iteration
  • generator_batch_size_factor: in some cases we might try to multiply batch size by a factor when updatng the generator to give it a more correct and meaningful signal from the discriminator
  • n_critic: updating the generator every n updates to the critic/ discriminator
  • p_coeff: gradient penalty regularization factor
  • batch_size: batch size during training default 10
  • noise_latent_dim: dimension of the latent dim used to generate waves
  • model_capacity_size: capacity of the model default 64 can be 32 when generating longer window length of 2-4 seconds
  • output_dir: directory that contains saved model and saved samples during the training
  • window_length: window length of the output utterance can be 16384 (1 sec), 32768 (2 sec), 65536 (4 sec)
  • manual_seed: model random seed
  • num_channels: to define number of channels used in the data

Samples

  • Model trained on piano dataset to generate 4 seconds using model capacity 32 for faster training
  • Latent space interpolation to check the model give the following image

- A sample audio can be found at sample (from an early iteration with 4 sec window)

Quality considerations

If your results are too noisy, try adding a post-processing filter . You may also want to change the amount of or remove phase shuffle from models.py . Increasing either the model size or filter length from models.py may improve results but will increase training time.

Monitoring

The train script will generate a fixed latent space and save output samples to the output dir specified in the params.

Contributions

This repo is based on chrisdonahue's , jtcramer's implementation and mazzzystar

Attribution

If you use this code in your research, cite via the following BibTeX:

@inproceedings{donahue2019wavegan,
  title={Adversarial Audio Synthesis},
  author={Donahue, Chris and McAuley, Julian and Puckette, Miller},
  booktitle={ICLR},
  year={2019}
}

wavegan-pytorch's People

Contributors

mostafaelaraby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wavegan-pytorch's Issues

Showing error everytime I try to train

Thank you for making the repository public. I am trying to implement the model in Python 3.9. It showed a package error when using Python 3.11. Now that I am trying to train the model, it shows the below-mentioned error. Please let me know where I am going wrong. I have installed all the required packages.

Screenshot from 2023-09-29 14-44-56

Silence on all iterations

Hello.

I'm currently training wavegan-pytorch with a set of files (<300) and I'm getting silence files every iteration - it generates 10 .wav but it's silence, 0 values all duration. During the installation, I had some problems with librosa in my machine and changed line 172 on utils.py to:

    #librosa.output.write_wav(output_path, sample, sampling_rate)
    sf.write(output_path, sample, sampling_rate) #'PCM_24')

sf as of soundfile library. Might this be the problem? Is there a script here to navigate the latent space and debug this?

Thanks in advance.
Luis

THANK YOU!

I was trying to get the original Wavegan to run and tearing my hair out. Thank you for making it into a usable colab notebook, you saved me days of frustrated effort.

demo

How do I generate audio with a trained model?

Fails in Google Colab

Fails with last Step

0% 1/250 [00:27<1:52:18, 27.06s/it, Loss_D WD=-0.02465313859283924, Loss_G=-0.41385307908058167, Val_G=-0.030089471489191055]Traceback (most recent call last):
  File "train.py", line 223, in <module>
    wave_gan.train()
  File "train.py", line 199, in train
    save_samples(fake, iter_indx)
  File "/content/wavegan-pytorch/utils.py", line 172, in save_samples
    librosa.output.write_wav(output_path, sample, sampling_rate)
AttributeError: module 'librosa' has no attribute 'output'
  0% 1/250 [00:28<1:57:56, 28.42s/it, Loss_D WD=-0.02465313859283924, Loss_G=-0.41385307908058167, Val_G=-0.030089471489191055]

WavGAN paper recommends against upsampling.

Thanks very much for making this repo!

Quick comment is that, as I understand it, the WavGAN paper actually does not recommend nearest neighbor upsampling, and instead better results were obtained by the standard ConvTranspose approach. However, in model.py, the class "Transpose1dLayer" has a comment saying that upsampling IS recommended and the class "WaveGANGenerator" has upsampling enabled by default.

macos running problem

Hi,

I try to run MacBook Pro 2015 computer which has no GPU unit. Successfully downloaded piano dataset but while I try to start train, get these errors:

Screen Shot 2022-07-05 at 15 34 27

Are these codes available on sc09 dataset?

Dear mostafaelaraby,

Hi, thank you for your pytorch implementation.

I've done your codes on sc09 speech dataset with some hyperparameters, but I couldn't get any plausible sound results (no sound).

Are these codes available on sc09 dataset?
If you have succeeded on sc09, please tell me the hyperparameters.

Regards,

Increasing output duration

Hi,

Is there any way to increase output file duration. It is now just 4 seconds and that is really short for me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.