chrisdonahue / wavegan Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 282.0 74.27 MB

WaveGAN: Learn to synthesize raw audio with generative adversarial networks

License: MIT License

Python 54.06% Jupyter Notebook 22.44% Shell 0.06% CSS 1.22% HTML 1.80% JavaScript 20.42%

wavegan's Introduction

WaveGAN (v2)

Official implementation of WaveGAN, a machine learning algorithm which learns to generate raw audio waveforms.

UPDATE (2/2/19): We have made substantial improvements to this repository in response to common requests:

Added streaming data loader allowing you to train a WaveGAN on MP3s/WAVs/OGGs/etc. without preprocessing
Added ability to train WaveGANs capable of generating longer audio examples (up to 4 seconds at 16kHz)
Added support for any audio sample rate
Added support for multi-channel audio
Compatibility with Python 3 and Tensorflow 1.12.0
Old (v1) version still available at this tag

This is the official TensorFlow implementation of WaveGAN (Donahue et al. 2018) (paper) (demo) (sound examples). WaveGAN is a machine learning algorithm which learns to synthesize raw waveform audio by observing many examples of real audio. WaveGAN is comparable to the popular DCGAN approach (Radford et al. 2016) for learning to generate images.

In this repository, we include an implementation of WaveGAN capable of learning to generate up to 4 seconds of audio at 16kHz. For comparison, we also include an implementation of SpecGAN, an approach to audio generation which applies image-generating GANs to image-like audio spectrograms.

WaveGAN is capable of learning to synthesize audio in many different sound domains. In the above figure, we visualize real and WaveGAN-generated audio of speech, bird vocalizations, drum sound effects, and piano excerpts. These sound examples and more can be heard here.

Requirements

pip install tensorflow-gpu==1.12.0
pip install scipy==1.0.0
pip install matplotlib==3.0.2
pip install librosa==0.6.2

Datasets

WaveGAN can now be trained on datasets of arbitrary audio files (previously required preprocessing). You can use any folder containing audio, but here are a few example datasets to help you get started:

Train a WaveGAN

Here is how you would begin (or resume) training a WaveGAN on random clips from a directory containing longer audio, i.e., more than a few seconds per file:

export CUDA_VISIBLE_DEVICES="0"
python train_wavegan.py train ./train \
	--data_dir ./data/dir_with_longer_audio_files

If you are instead training on datasets of short sound effects (e.g., SC09 or drum sound effects), you want to use this command:

export CUDA_VISIBLE_DEVICES="0"
python train_wavegan.py train ./train \
	--data_dir ./data/sc09/train \
	--data_first_slice \
	--data_pad_end \
	--data_fast_wav

Because our codebase buffers audio clips directly from files, it is important to change the data-related command line arguments to be appropriate for your dataset (see [#data-considerations](data considerations below)).

We currently do not support training on multiple GPUs. If your machine has multiple GPUs, make sure to set the CUDA_VISIBLE_DEVICES flag as shown above.

While you can technically train a WaveGAN on CPU, it is prohibitively slow and not recommended. If you do attempt this, add the flag --data_prefetch_gpu_num -1.

Data considerations

The WaveGAN training script is configured out-of-the-box to be appropriate for training on random slices from a directory containing longer audio files (e.g., songs, a common use case). If you want to train WaveGAN on shorter sound effects, you will almost likely want to use the flag --data_first_slice to only extract the first slice from each audio file. If your clips are extremely short (i.e., less than $16384$ samples each as in SC09), you will want to add --data_pad_end so that they get zero padded to fill the slice.

If your dataset consists exclusively of "standard" WAV files (16-bit signed PCM or 32-bit float), you can use the flag --data_fast_wav which will use scipy (faster) to decode your audio instead of librosa. This may slightly increase training speed.

If you want to change the generation length, set --data_slice_len to 16384, 32768, or 65536 to generate that many audio samples. If you choose a larger generation length, you will likely want to reduce the number of model parameters to train more quickly (e.g. --wavegan_dim 32). You can also adjust the sampling rate using --data_sample_rate which will effectively change the generation length.

If you have stereo (or multi-channel) audio, adjust --data_num_channels as needed. If you are modeling more than 2 channels, each audio file must have the exact number of channels specified.

If you want to normalize each audio file before training, set --data_normalize.

Quality considerations

If your results are too noisy, try adding a post-processing filter with --wavegan_genr_pp. You may also want to change the amount of or remove phase shuffle using --wavegan_disc_phaseshuffle 0. Increasing either the model size (--wavegan_dim) or filter length (--wavegan_kernel_len) may improve results but will increase training time.

Monitoring

To run a script that will dump a preview of fixed latent vectors at each checkpoint on the CPU

export CUDA_VISIBLE_DEVICES="-1"
python train_wavegan.py preview ./train

To back up checkpoints every hour (GAN training may occasionally collapse so it's good to have backups)

python backup.py ./train 60

To monitor training via tensorboard, use

tensorboard --logdir=./train

If you are training on the SC09 dataset, this command will (slowly) calculate inception score at each checkpoint

export CUDA_VISIBLE_DEVICES="-1"
python train_wavegan.py incept ./train

Train a SpecGAN

The primary focus of this repository is on WaveGAN, our raw audio generation method. For comparison, we also include an implementation of SpecGAN, an approach to generating audio by applying image-generating GANs on image-like audio spectrograms. This implementation only generates spectrograms of one second in length at 16khz.

Before training a SpecGAN, we must first compute mean and variance of each spectrogram bin to use for normalization. This may take a while(you can also measure these statistics on a subset of the data)

python train_specgan.py moments ./train \
	--data_dir ./data/dir_with_mp3s \
	--data_moments_fp ./train/moments.pkl

To begin (or resume) training on GPU:

python train_specgan.py train ./train \
	--data_dir ./data/dir_with_mp3s \
	--data_moments_fp ./train/moments.pkl

Monitoring

To run a script that will dump a preview of fixed latent vectors at each checkpoint on the CPU

export CUDA_VISIBLE_DEVICES="-1"
python train_specgan.py preview ./train \
	--data_moments_fp ./train/moments.pkl

To back up checkpoints every hour (GAN training will occasionally collapse)

python backup.py ./train 60

To monitor training via tensorboard, use

tensorboard --logdir=./train

If you are training on the SC09 dataset, this command will (slowly) calculate inception score at each checkpoint

export CUDA_VISIBLE_DEVICES="-1"
python train_specgan.py incept ./train \
	--data_moments_fp ./train/moments.pkl

Generation

The training scripts for both WaveGAN and SpecGAN create simple TensorFlow MetaGraphs for generating audio waveforms, located in the training directory. An example usage is below; see this Colab notebook for additional features.

import tensorflow as tf
from IPython.display import display, Audio

# Load the graph
tf.reset_default_graph()
saver = tf.train.import_meta_graph('infer.meta')
graph = tf.get_default_graph()
sess = tf.InteractiveSession()
saver.restore(sess, 'model.ckpt')

# Create 50 random latent vectors z
_z = (np.random.rand(50, 100) * 2.) - 1

# Synthesize G(z)
z = graph.get_tensor_by_name('z:0')
G_z = graph.get_tensor_by_name('G_z:0')
_G_z = sess.run(G_z, {z: _z})

# Play audio in notebook
display(Audio(_G_z[0, :, 0], rate=16000))

Evaluation

Our paper uses Inception score to (roughly) measure model performance. If you plan to directly compare to our reported numbers, you should run this script on a directory of 50,000 16-bit PCM WAV files with 16384 samples each.

python score.py --audio_dir wavs

To reproduce our paper results (9.18 +- 0.04) for the SC09 (download) training dataset, run

python score.py --audio_dir sc09/train  --fix_length --n 18620

Web code

Under web, we also include a JavaScript implementation of WaveGAN (generation only). Using this implementation, we created a procedural drum machine powered by a WaveGAN trained on drum sound effects.

Attribution

If you use this code in your research, cite via the following BibTeX:

@inproceedings{donahue2019wavegan,
  title={Adversarial Audio Synthesis},
  author={Donahue, Chris and McAuley, Julian and Puckette, Miller},
  booktitle={ICLR},
  year={2019}
}

wavegan's People

Contributors

Stargazers

Watchers

Forkers

swang423 kelvinson eos21 sherjilozair jdc08161063 aoifemcdonagh icewwn roszcz entn-at wurde scatterbrain333 heimanba89 esmaeilinia 007v aymar73 sfrias bhaveshoswal sumhncku hailgambo dailyactie mark-rrnr mcleavey wangjinlong9788 dalabs coolingozone michaelyq umutisik erikmillergalow minwookchang mamonraab hdesgy will-rice mazzzystar ahmed-fau cedricyhm augmen cclauss triper1022 tony32769 xieyi318 sucrerouge anooptoffy jongwook anigi98932 bottlecapper alphacyc shibata-san msrocean yes7rose ooshaunoo zekun-jack-xu whiteweak vanova pandinosaurus jshuadvd dendisuhubdy deorbit mmmika hcvemula0332 duoergun0729 fendaq yxma2015 godlovesun 4golub anirudh257 bencolburn123 danielgray0 daitomanabe qweshpd appalachianwine andimarafioti nperraud shuangloveyou toannhu wtoco wangwt9907 cubernet fangfm yuhonghong7035 wenjunjiang aozhi aung2phyowai nathanaelraj prichey hsong071 afd77 kfirmanty akshayg056 fedeadolfi genekogan rbarghou 0492wzl ml-lab robdmac cjustacoder surisdi kutim lijuan123 ouyue2 lechaney

wavegan's Issues

Samplerate questions

Hi!
I have a little question concerning the train-data that can be used.
As I see, the default sample rate that is used by WaveGan is 16k. Does this imply that I also have to convert the training data to 16k sample rate? The BachPiano pieces for example have a sample rate of 44.1k
Thank you for your reply!

Learned post-processing filters

Hello Chris,

I read your paper that I liked much for its in-depth analysis and the clarity of the discussions.
The idea of using a learnable filter to adaptively remove artifacts is interesting, however I cannot find it in the WaveGAN codes.

Did I miss it ? Or would you have a code example for this part please ?

Thanks !

Checkpoints and global step

Hi! Interesting work!

I'm trying to train the wavegan using the speech dataset (09) you used here. I have a (possibly very naive) doubt.

I run

python train_wavegan.py train ./train --data_dir data

When I look at the outcomes I see that the checkpoints come enumerated by the global step (right?). Does the global state then correspond to the number of epochs the generator has been trained for?

As example: model.ckpt-497

Thank you!

Generating spectograms using GANs

I would like to ask you if it is common problem when generating spectrograms using waveGAN and GANs in general the filtering of high frequencies in the results.

Issue after shuffle buffer is filled

Hi,

I have been trying to reproduce the results of your paper on the dataset provided (drums, SC09, piano) but here is an issue I have been facing(attached), after the shuffle buffer is filled, everything hangs, I have left the algorithm for 10 days and nothing have changed nor the process have completed. I am using the following command to run the code:
"python3 wavegan/train_wavegan.py train ./train --data_dir sc09/train"
The version of TensorFlow and the other libraries do match to what what was recommended and I am running it on Nvidia Titan V GPU.

How can I verify that the training has been completed? since nothing is printed out to the screen as the model is training.

Any help would be appreciate it.

Thanks

Raw Piano Files

Dear chrisdonahue,
Hi! Your work wavegan was quite impressive and gave me a lot of inspirations to explore something more beyond this model. But as I tried to rebuild your model from scratch using pytorch according to your paper, something seemed to be wrong with my model, therefore I would like to ask if you are able to provide the raw file of the piano training data, so that I could verify if the training method of my model is wrong.
Really appreciate a lot if you could do me a favor! Thanks in advance!

Error generating audio ("The operation, 'G_z_spec', does not exist in the graph.")

Hi @chrisdonahue,

when trying to generate audio from my modell using the code from the 'wavegan_generate.ipynb' I run into the following:
KeyError: "The name 'G_z_spec:0' refers to a Tensor which does not exist. The operation, 'G_z_spec', does not exist in the graph."

I proceeded like this:

# Load the model
import numpy as np
import tensorflow as tf

tf.reset_default_graph()
saver = tf.train.import_meta_graph('train/infer/infer.meta')
graph = tf.get_default_graph()
sess = tf.InteractiveSession()
saver.restore(sess, 'train/model.ckpt-9925')

which went fine: INFO:tensorflow:Restoring parameters from train/model.ckpt-9925

So I continued:

ngenerate = 64
ndisplay = 4
import PIL.Image
from IPython.display import display, Audio
import time as time

# Sample latent vectors
_z = (np.random.rand(ngenerate, 100) * 2.) - 1.

# Generate
z = graph.get_tensor_by_name('z:0')
G_z = graph.get_tensor_by_name('G_z:0')[:, :, 0]
G_z_spec = graph.get_tensor_by_name('G_z_spec:0')

At the last line it throws the cited error.

Thanks,
Joscha

Bias not found in Checkpoint error

An error is caused when running the train_specgan script over sc09 dataset in Train mode. The error persisted through different systems.

The execution was done as follows:
train_specgan -> train_wavegan (gives error)
train_specgan -> train_wavegan(create moments) -> train_wavegan (gives error)
Following is the traceback taken from PyCharm's console:

F:\Apps\Python3\python.exe C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py train ./train --data_dir ./sc09 --data_moments_fp ./train/moments.pkl
WARNING:tensorflow:From C:\Users\admin\PycharmProjects\WaveGAN\loader.py:59: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
--------------------------------------------------------------------------------
Generator vars
[100, 16384] (1638400): G/z_project/dense/kernel:0
[16384] (16384): G/z_project/dense/bias:0
[5, 5, 512, 1024] (13107200): G/upconv_0/conv2d_transpose/kernel:0
[512] (512): G/upconv_0/conv2d_transpose/bias:0
[5, 5, 256, 512] (3276800): G/upconv_1/conv2d_transpose/kernel:0
[256] (256): G/upconv_1/conv2d_transpose/bias:0
[5, 5, 128, 256] (819200): G/upconv_2/conv2d_transpose/kernel:0
[128] (128): G/upconv_2/conv2d_transpose/bias:0
[5, 5, 64, 128] (204800): G/upconv_3/conv2d_transpose/kernel:0
[64] (64): G/upconv_3/conv2d_transpose/bias:0
[5, 5, 1, 64] (1600): G/upconv_4/conv2d_transpose/kernel:0
[1] (1): G/upconv_4/conv2d_transpose/bias:0
Total params: 19065345 (72.73 MB)
--------------------------------------------------------------------------------
Discriminator vars
[5, 5, 1, 64] (1600): D/downconv_0/conv2d/kernel:0
[64] (64): D/downconv_0/conv2d/bias:0
[5, 5, 64, 128] (204800): D/downconv_1/conv2d/kernel:0
[128] (128): D/downconv_1/conv2d/bias:0
[5, 5, 128, 256] (819200): D/downconv_2/conv2d/kernel:0
[256] (256): D/downconv_2/conv2d/bias:0
[5, 5, 256, 512] (3276800): D/downconv_3/conv2d/kernel:0
[512] (512): D/downconv_3/conv2d/bias:0
[5, 5, 512, 1024] (13107200): D/downconv_4/conv2d/kernel:0
[1024] (1024): D/downconv_4/conv2d/bias:0
[16384, 1] (16384): D/output/dense/kernel:0
[1] (1): D/output/dense/bias:0
Total params: 17427969 (66.48 MB)
--------------------------------------------------------------------------------
2018-09-26 23:24:45.642928: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-26 23:24:45.748523: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key D/downconv_0/conv2d/bias not found in checkpoint
Traceback (most recent call last):
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key D/downconv_0/conv2d/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1725, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 877, in run
    run_metadata_ptr)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
    run_metadata)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key D/downconv_0/conv2d/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 693, in <module>
    train(fps, args)
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 263, in train
    save_summaries_secs=args.train_summary_secs) as sess:
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 421, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 832, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 555, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1018, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1023, in _create_session
    return self._sess_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 712, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 474, in create_session
    self._scaffold.finalize()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 212, in finalize
    self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 853, in _get_saver_or_default
    saver = Saver(sharded=True, allow_empty=True)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1281, in __init__
    self.build()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 772, in _build_internal
    restore_sequentially, reshape)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 450, in _AddShardedRestoreOps
    name="restore_shard"))
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 397, in _AddRestoreOps
    restore_sequentially)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 829, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key D/downconv_0/conv2d/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1737, in restore
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 348, in get_tensor
    status)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 693, in <module>
    train(fps, args)
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 263, in train
    save_summaries_secs=args.train_summary_secs) as sess:
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 421, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 832, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 555, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1018, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1023, in _create_session
    return self._sess_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 712, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 483, in create_session
    init_fn=self._scaffold.init_fn)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\session_manager.py", line 281, in prepare_session
    config=config)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\session_manager.py", line 211, in _restore_checkpoint
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1743, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key D/downconv_0/conv2d/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 693, in <module>
    train(fps, args)
  File "C:/Users/admin/PycharmProjects/WaveGAN/train_specgan.py", line 263, in train
    save_summaries_secs=args.train_summary_secs) as sess:
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 421, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 832, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 555, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1018, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 1023, in _create_session
    return self._sess_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 712, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 474, in create_session
    self._scaffold.finalize()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\monitored_session.py", line 212, in finalize
    self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 853, in _get_saver_or_default
    saver = Saver(sharded=True, allow_empty=True)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1281, in __init__
    self.build()
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 772, in _build_internal
    restore_sequentially, reshape)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 450, in _AddShardedRestoreOps
    name="restore_shard"))
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 397, in _AddRestoreOps
    restore_sequentially)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\saver.py", line 829, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
  File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key D/downconv_0/conv2d/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


Process finished with exit code 1

Dataset creation problem

I have been trying to create the dataset in the format required for training using the command you have provided i have wav files and i changed the command of ext to wave but still the tfrecord files created seems to be empty and when i train tensorflow says we are trying to run infinitly on an empty dataset

Not an issue, just sharing results of my exploration.

I trained a model using the NIN Ghosts album. I was aiming for an aggressive wall of noise sound when I choose Ghosts, I think the model learned exactly what I had hoped it would. This is an interesting approach to sound design, it's almost like sampling (sound selection) but the outcome is more chaotic. There are some sounds that I would have never thought up and others that would have taken a very long time to program.
https://archive.org/details/nineinchnails_ghosts_I_IV

The raw audio wasn't great but after some sculpting using a compressors, reverb, eq, distortion, rhythmic tools (gates, filters, etc) and synthesis. I got some interesting sounds. For this demo, I'm using 5 sounds that are either 1 or 4 seconds long (I trained two different models). I rarely have more than 2 sounds playing at once. The kick drum was created using a VST, that's not coming from the model.
https://soundcloud.com/dustin-williams-25/gan-ghosts-sound-design-demo

Here are the raw unprocessed wav's. I'm going to make some changes and retrain a new model. I want to see what I can do to improve the sound quality.
https://drive.google.com/open?id=1PqoxnyIFmutJsyD6KGu_GqIebP4yQkGV

v2 code could not train

I only can run the v1 code, v2 code would be stuck when training Discriminator.

How to setup?

This is a really interesting project! I'm learning how to use Tensorflow from a Data Engineer's perspective. I'm mostly focused on scaling up training and how to deploy models to Google ML Engine for serving. So please forgive me if I'm missing anything obvious.

I've been trying to run it for the past few days but the results have been mixed. My goal isn't to produce anything realistic, exactly the opposite. I want machine generated sounds that are unique to the model. I really like challenging experimental sounds. Ultimately I want to produce some waveforms that I can load in to a wavetable synth. I'll use the synth engine to mold them into something more refined (pads, drones, FX hits, etc).

On the first attempt, I used the default 1 sec length and that gave some interesting results. I fed it 1400 random songs (thanks internet archive!) and after a few thousand (epochs?) it started producing proto-sounds. So I decided to bump up the length to 4 secs (then back down to 2). I initially used 3000 songs that experiment didn't seem to go well. I let the training run for 1600+ cycles but the model produced almost a steady tone, nothing interesting.

My second attempt, I used a 32 songs that are more abstract & atmospheric. Tonal and a little rhythmic but mostly just interesting sounds. After about 800 (epochs/cycles?), I tested the model and it produced a steady tone same as before. I'm wondering if perhaps I didn't let it train long enough before checking the model. I'm a bit unclear on how to best use this script, so I wanted to kill it as quickly as possible and fix my config. That might be a mistake due to lack of understanding.

python wavegan/train_wavegan.py train ./train --data_dir /home/jupyter/ghosts --data_normalize --wavegan_genr_pp --data_slice_len 65536 --data_overlap_ratio .2 --train_save_secs 300 --wavegan_batchnorm

I'm running a third attempt now. Same 32 songs but only 2 seconds this time.
python wavegan/train_wavegan.py train ./train --data_dir /home/jupyter/ghosts --data_normalize --data_slice_len 32768 --data_overlap_ratio .2 --train_save_secs 300

I have a few questions;

In your example you're using a much more focused set (person talking, drum hits, pianos). What are your thoughts on processing long complex sounds like a song? With the assumption that the model doesn't have to produce anything that would pass as human made.

In your example data you have a directory for test/train/valid. When I try to create that structure it gives me error messages and crashes. As far as I can tell it wants the data_dir to have wav/mp3 files. So does the script break the long wav/mp3 in to slices and then splits those in to train/test/valid sets in memory?

How many epochs/cycles does this script run for? I let it run for a couple of days on my first run and I killed it when I thought I had confirmed it worked.

I read in one of the other questions I read "you will likely need to reduce the value of dim_mul or train_batch_size to ensure that the model still fits into memory.". The Google DataLab VM instance I'm using has 8CPU/30GB RAM/1xK80 GPU, I'm not getting any complaints/errors about memory. Is it OK to assume that this configuration works, even when generating longer sounds (2/4 seconds).

Last issue is a technical one. It looks like this script is hard coded to CUDA 9. Any idea on what I'd need to do to get it working on the latest drivers? In DataLab I lose the ability to access Tensorboard when I use an older version of CUDA.

print() is a function in Python 3

flake8 testing of https://github.com/chrisdonahue/wavegan on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./backup.py:19:40: E999 SyntaxError: invalid syntax
    print 'Waiting for first checkpoint'
                                       ^
./train_specgan.py:112:11: E999 SyntaxError: invalid syntax
  print '-' * 80
          ^
./train_wavegan.py:39:11: E999 SyntaxError: invalid syntax
  print '-' * 80
          ^
./data/count_tfrecord.py:40:9: E999 SyntaxError: invalid syntax
  print n
        ^
./data/preview_tfrecord.py:50:13: E999 SyntaxError: invalid syntax
    print '-' * 80
            ^
./data/ljspeech/split.py:36:15: E999 SyntaxError: invalid syntax
      print '-' * 80
              ^
./data/sc09/split.py:26:15: E999 SyntaxError: invalid syntax
      print '-' * 80
              ^
./eval/inception/score.py:147:35: E999 SyntaxError: invalid syntax
  print 'Inception score: {} +- {}'.format(mean, std)
                                  ^
./eval/inception/train.py:168:27: E999 SyntaxError: invalid syntax
        print 'Preview: {}'.format(latest_ckpt_fp)
                          ^
./eval/noise/noise.py:56:18: E999 SyntaxError: invalid syntax
  print '{} +- {}'.format(np.mean(X_weighted_mean), np.std(X_weighted_mean))
                 ^
./eval/similarity/sim.py:14:24: E999 SyntaxError: invalid syntax
  print 'Creating model'
                       ^
./web/bundle.py:19:16: E999 SyntaxError: invalid syntax
  print '{}->{}'.format(path, out_path)
               ^
12    E999 SyntaxError: invalid syntax
12

Training the new wavegan on speech dataset

Hi Chris,

I wanted to train the model with the speech dataset you used. I would like to look at the latent space properties.

I already trained the model it with the old version and I had no issues in training. On the other hand, the .pkl file I got are not decodable using a pickle open-load.

Since I trained another dataset with the new version and I could decode the latent vectors, I would like to use the speech dataset. I run into the error "audioread.NoBackEndError". Have you tried this training and/or experienced problems while reading the dataset? Should I add some particular option?

Thanks a lot for the help!

Scipy cannot resample audio , is there any preprocessing required for wav files? although I did use --data_fast_wav as suggested in the documentation

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_wavegan.py", line 650, in
train(fps, args)
File "train_wavegan.py", line 200, in train
sess.run(D_train_op)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1171, in run
run_metadata=run_metadata)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "/opt/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: NotImplementedError: Scipy cannot resample audio.
Traceback (most recent call last):

File "/home/s1878561/.local/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call
ret = func(*args)

File "/home/s1878561/wavegan/loader.py", line 128, in
fast_wav=decode_fast_wav)

File "/home/s1878561/wavegan/loader.py", line 25, in decode_audio
raise NotImplementedError('Scipy cannot resample audio.')

NotImplementedError: Scipy cannot resample audio.

Generate sound of more than 1 second as output

Hello @chrisdonahue, this is a really amazing library as well as an implementation of gan for audios.
I've been successful in generating piano sounds with the dataset and model provided but the output is of 1 second. Could we increase this time? Please let me know. Thanks!

train_wavegan.py freezes at line 202

When I ran python train_wavegan.py train ./train_tmp --data_dir ./data/sc09/train/, it freezes at line 202, and ctrl-c won't kill it. Any ideas what might be happening here? Our conjecture is that there might be some issues with data loader, and were wondering if anybody else ran into this and may have a fix? Thanks!

Both tested on linux ubuntu and centOS, with python 3.5 and 3.6. Tested GPUs are Tesla V100 and P4000. All required packages are the same as written in README. All of the configs above freeze at the same point.

Sample of D_loss and G_loss Tensorflow Graphs

Hi @chrisdonahue,

Can you kindly share the D_loss and G_loss Tensorflow graphs at the end of your WaveGAN training of your experiment which you have mentioned in your paper given that I would like to verify my experiment outcomes against yours.

Thanks in advance

About the stopping criteri you mentioned in other issues

Hi,

Thanks for your excellent work. I am trying to train the model, however, I found you didn't implement the early stop (inception score). Can you explain a little bit about that? Like how many samples should I use to implement the inception score?

Thanks!

What are G_z/audio/0 and x/audio/0 generated in TensorBoard-Audio

Hi Chris,

I was wondering what are the G_z/audio/0 and x/audio/0 audio clips generated in TensorBoard Audio tab. I have been looking for what they stand for but haven't found any information on that.

Thanks,
Sara

Error in training

Hi
I am having trouble training with my data
I sucessfully created tfrecord file

but when i use this data to train the GAN it has this problem

Generate 8 seconds' sample

Hi Chris,
I'd like to generate 8 seconds' samples with your model. Would you be so kind as to give any hint on how to modify the model?

Batch training

HI, @chrisdonahue

I am trying to apply my sound file on Wavegan, your code.
However, I don't know this code has a batch training block.
In train_wavegan.py, I can find ' x = loader.decode_extract_and_batch(...)'
It is return dataset iterator and, x is called only once.
In my opinion, this means that batch size represents the total number of training data sets.
I expected the same code below, but could not find it.

dataset = tf.data.Dataset(...)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()

x = iterator.get_next()

while True:
    try:
        print(sess.run(x), end=' ')  
    except tf.errors.OutOfRangeError: 
        print('end\n')
        break

Thanks.

Utilizing multiple GPU:s

Hey -

I currently hold a big AWS EC2-instance with multiple GPU:s. My biggest workflow-bottleneck whilst experimenting with this is training time and since I should be able to 8x it I was just wondering if you know what the biggest hurdle would be to implement it to run on multiple GPU:s. Cheers.

Issue with moments.pkl

Hi,

I set everything up correctly but I'm not able to run the SpecGAN scripts. I used the command that was given in the help file but I'm not able to generate anything. The program gives me an error for data_moments_fp.

Thanks

Recommended number of training steps to achieve example results

Hi!

I'm really excited about this implementation of generative audio and have just started training on my gaming GTX 1060 laptop.

My focus is to generate different waveGANs using a dataset out of different folk musics from Africa and Latin America.

I'm really interested in the applications also for live music, as I will be trying to generate sounds in offline (maybe also real time?) for a spatial piece and would like to know if I should use a cloud service to generate my different WaveGAN checkpoints.

Around what is the estimated training step at which the example models started generating the posted results?

Thanks and looking forward how this develops!

Update: I'm at roughly step 20k and the results are coming through quite nicely! Will post some audio soon

How long does it takes to synthesize text?

Hi @chrisdonahue Great work on the repo! I was wondering how long does it takes to synthesize the text today, and if there are plans to improve the inference time. Thanks!

Question) Using reflection padding for Phase Shuffle???

Hi, I'm an undergrad student, and recently I've been studying your paper and code: ADVERSARIAL AUDIO SYNTHESIS.

I leave this issue thread out of curiosity about your implementation of the function apply_phaseshuffle(x, rad, pad_type='reflect').

so, you used 'reflect' mode for tf.pad function,

which means, for example, for an original array [1 2 3 4 5 6 7],

it results in [4 3 2 1 2 3 4 5 6 7] for phase = 3 ==> pad_l = 3, pad_r = 0.

Then we crop only the part [4 3 2 1 2 3 4] the final result of phase suffling.

At this point, these are my questions:

First, is my understanding of your phase suffling correct??

If it is correct, I guess it means we totally lose some data, 5 6 7, of the original array....

Does this not make any problem???? T-T

parameters

Hi i am working on a audio 'wav' file which is 44100Hz. Can you please suggest me other settings needed to update.
I have only two classes . I have made dataset dir and put wav files in it

firstly i changed sample rate = 44100hz works with --data_slice_len =16384 which is default I think.
tensorboard shows output . training starts scalars and other things are visible

also tried to change --data_slice_len = 32768 or 65536 (its ok if output file is not exactly 1sec , it can be more)
But tensorboard doesn't show output. no scalars and only graph and projector options available

Can anyone help me on this . Its urgent..

tfrecord file cannot create

Hi! This is yuu.

I'm trying wavegan to learning voice samples. But can't create tfrecord file in make_tfrecord.py

Like this.
What can I do?
plz help me

Progress of generated sounds

Thank you so much for an amazing paper and repository.

I have built waveGAN model in PyTorch (https://github.com/lukysummer/WaveGAN-in-PyTorch), and currently training it for the articulation of number "7". So I am just using all "Seven_*.wav" files in sc09/ as training inputs. I am currently at around Epoch 30, having sampled a generated audio every 5 epochs. But the pattern I am finding is that all six generated audios so far show very fast spiky repetition of drumy sounds such as this waveform:

Did this ever happen early in your training stage? If not, did you see any particular pattern building up in the generated audio (output of Generator) as training progressed?

Also, would the Discriminator and Generator's weights have to be initialized at all?

Thank you so much.

data_sample_rate

I use "train_wavegan" to train and see your previous reply to modify the length of the output file.

Use "--data_sample_rate 4 " but jump out "ZeroDivisionError : integer division by Zero"

How can I fix it? Which is the problem?

Error in training SpecGAN

I am facing the following error when training the specGAN on SC09 dataset:

Traceback (most recent call last):
File "wavegan/train_specgan.py", line 750, in
moments(fps, args)
File "wavegan/train_specgan.py", line 607, in moments
_X_lmags = np.concatenate(_X_lmags, axis=0)
ValueError: need at least one array to concatenate

Any tips on how it can resolved?

Generating very large audio dataset

Hi @chrisdonahue,

I am trying to generate a very large dataset (1M audio clips) from a trained waveGAN model, I was able to incoperate a for loop to generate 4 audio clips, however, I have noticed when I increqse that to a million I get an error related to this line specifically in generate.py:
_z = (np.random.rand(64, 100) * 2.) - 1.

Any tips on how to resolve this issue?

Thanks,
Sara

Full generate.py code with my modification for your reference:

import argparse
import glob
import sys
import os
import librosa
import numpy as np
import tensorflow as tf


def get_arguments():
  parser = argparse.ArgumentParser(description='WaveGan generation script')
  parser.add_argument(
        'checkpoint', type=str, help='Which model checkpoint to generate from e.g. "(fullpath)/model.ckpt-XXX"')
  parser.add_argument('--train_dir', type=str, help='Training directory')
  parser.add_argument('--wav_out_path', type=str, help='Path to output wav file')
  arguments = parser.parse_args()

  return arguments


def main():
  args = get_arguments()
  infer_dir = os.path.join(args.train_dir, 'infer')
  infer_metagraph_fp = os.path.join(infer_dir, 'infer.meta')
  tf.reset_default_graph()
  saver = tf.train.import_meta_graph(infer_metagraph_fp)
  graph = tf.get_default_graph()
  sess = tf.InteractiveSession()
  saver.restore(sess, args.checkpoint)
  _z = (np.random.rand(64, 100) * 2.) - 1.
  z = graph.get_tensor_by_name('z:0')
  G_z = graph.get_tensor_by_name('G_z:0')[:, :, 0]
  waveform = sess.run(G_z, {z: _z})
  ndisplay= 64
  for i in range(ndisplay):
   librosa.output.write_wav(args.wav_out_path+str(i), waveform[i], 16000)
  sess.close()

  print('Finished generating.')


if __name__ == '__main__':
  main()

errata

Hi Chris,

I have checked the beautiful generation of piano sound using wavegan model based on your tfrecord piano. BTW, in the code of colab, you miswrote the name of the tensor. 'G_z_spec:0' should be modified into 'global_step:0'.

Namshik

About the seed

Hi,

I already asked you some details but I have some other doubts. I am pretty new with tensorflow so this could be related to this fact.

Is the same seed used along all the training?
In the preview mode I see that the first time the preview is generated, since there's no z, it is defined from the infer_metagraph. Then, is it a z that has been used during training?

Just a wondering: if my dataset is composed by wav audio files with duration > 1s, should I be aware about something in particular? Like some parameters (out of the option for the wav format that I already mastered and applied) that might be changed...

Add a LICENSE

About TIMIT dataset

How do you preprocessing the TIMIT dataset ? Still by cutting them into 1 second each piece ? I notice that audio in TIMIT dataset are length of 1-2 seconds, if the audio length is 1.2s, you cut them each 1 second, them the second utterance contains only 0.2s, is that proper ?

Issue with training flag --data_prefetch_gpu_num -1

I was only able to run the Wave Training without errors using the Flag --data_prefetch_gpu_num -1 (python3 train_wavegan.py train ./train --data_dir ./Data/data_to_gan --data_prefetch_gpu_num -1) otherwise I will get the following error:
"wavegan/loader.py", line 192, in decode_extract_and_batch
tf.data.experimental.prefetch_to_device(
AttributeError: module 'tensorflow.data' has no attribute 'experimental' ".
I have Tensorflow-gpu, does that mean its running on the CPU or the GPU since this flag is enabled " --data_prefetch_gpu_num -1 " ?
Thanks

Speed and Training duration Issue for Piano and Drums Datasets - Observation

Hi Chris,

I have been testing waveGAN on the given datasets, namely, Piano and Drums. I have noticed that, although, the piano dataset have inconsistent longer audio clips with the following specs (shown below), the algorithms runs very fast on my Titan V setup, which, in 2 days reached to 200k training steps. However, for Drums dataset, where audio files are all of 1 sec length and specs shown below, it takes literally days to fill the buffer and train (2 days for 16 steps only). Hence I was wondering if that is an issue in V2 version of the code and how it can be resolved?

Thanks,
Sara

Generate a preview while training

Hi,
This might be more a tensorflow related question. I know that it is already possible to generate a preview while training (just having the two modalities at the same time). For reasons related to the framework (=server) where I am working I'm trying to integrate the preview into the train but I'm facing some difficulties. This might be due to how is defined the training section. If I actually try to generate a preview each time it trains the generator it complains (and it's right indeed the checkpoints are saved based on time - every #seconds). So I'm trying to modify the training (or add some option to do that), did you already did this?
Thanks

What is the meaning of "phase" in this context ?

Hi,

I have read the paper as well as the implementation of your interesting WaveGan. However, I still don't know what the term "phase" refers to when you introduce the idea of phase shuffling.

According to the implementation of function apply_phaseshuffle , I think the input signal x is padded with random number rad along the sequence dimension. Then the signal is sliced to have length of x_len .

What is the relation between this operation and the "phase" to be named as "phase shuffling" ? And how does this reduce the up-sampling artifacts ?

I have implemented the whole network except this utility and my generated signals still have obvious background noise (I am training with part of sc09-data to generate the word "three" only and currently reached epoch number 1500) ... could this background noise decreased after adding the phase shuffling or this is irrelevant to up-sampling artifacts.

Thanks in advance

Does the algorithm stops training on its own?

Hi there, I have been training a Wavegan model from about 6 hours now, using 11 short WAV files.
I have been following the training process using TensorBoard and things look good. However, I am unclear about how long the training process will take and how would I know when the process has finished.

Thanks for your help!

Determining loss values during training

Hi Chris,

How can I determine the loss values for the Generator and DIscriminator during the training process?

[SpecGAN] error generating moments

Hi, there,
Tried to generate moments of SpecGAN using script:

python train_specgan.py moments --data_dir ../sc09/train/ ./train

Where '../sc09/train' is a folder containing the training split of SC09 dataset (all .WAV files). The script failed with error:

Traceback (most recent call last):
  File "train_specgan.py", line 756, in <module>
    moments(fps, args)
  File "train_specgan.py", line 594, in moments
    prefetch_gpu_num=args.data_prefetch_gpu_num)[0, :, 0, 0]
  File "/raid/home/ntr/wavegan/wavegan/loader.py", line 198, in decode_extract_and_batch
    return iterator.get_next()
  File "/raid/home/ntr/anaconda3/envs/py3tf1.2/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 623, in get_next
    return self._next_internal()
  File "/raid/home/ntr/anaconda3/envs/py3tf1.2/lib/python3.6/site-packages/tensorflow/python/data/experimental/ops/prefetching_ops.py", line 259, in _next_internal
    output_types=self._flat_output_types)
  File "/raid/home/ntr/anaconda3/envs/py3tf1.2/lib/python3.6/site-packages/tensorflow/python/ops/gen_experimental_dataset_ops.py", line 460, in experimental_function_buffering_resource_get_next
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
         [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[1,16384,1,1]], output_types=[DT_FLOAT]](IteratorFromStringHandleV2)]] [Op:ExperimentalFunctionBufferingResourceGetNext]

Is there anything I did wrong? Please help me!

Thanks first!

the new Generation code

Hi @chrisdonahue,

I used the v2 version of the trained model.

Follow the Generation program in Read me and find that there is an error.

The error content is
"The name 'z:0' refers to a Tensor which does not exist. The operation, 'z', does not exist in the graph."

What do I need to modify in "z = graph.get_tensor_by_name('z:0')"?

Please tell me the answer, thank you.

Issue generating tfrecords

I've had trouble generating proper tfrecords using make_tfrecords.py.

The main issue arises from trying to create tfrecords that can properly be previewed and tfrecords that can be trained.
I have installed all the prereqs like ffmpeg, tensorflow 1.9.0 (Testing on CPU for now), python 3, etc.
The model looks like it trains normally on the tfrecords that you provide through your download link, but is not properly created the script. I tried testing by downloading the piano wav files and generating tfrecords, and then training them, however the model finishes training within a minute. I compared it to training based on the records you provide for the piano samples, and the model trains for the appropriate amount of time.

I have tried commenting out the audio_labels and I modifying the script. Currently I get this error when I'm trying to run the script without modification
Traceback (most recent call last): File "data/make_tfrecord.py", line 121, in <module> 'id': tf.train.Feature(bytes_list=tf.train.BytesList(value=audio_id)), TypeError: 'U' has type str, but expected one of: bytes

Would anyone know how to solve this issue?

How to set the output to higher than 1600khz?

I've figured out how to generate 4 second sounds but the sample rate is far too low to be useful. Is there a way to change this?

Rewriting your code in Keras

Hi Chris, I'm trying to rewrite waveGAN in keras. Can you tell me where I might be going wrong with the dimensions here?

Generator:

def defineGen(Gin, d = 1, lr = 1e-3):
    
    shapes = [d*x for x in [256,16,8,4,2,1]]

    x = Dense(shapes[0])(Gin)
    x = Reshape((1,16,16))(x)
    x = Activation('relu')(x)

    x = Conv2DTranspose(25,(shapes[1],shapes[2]),padding='same')(x)
    x = Activation('relu')(x)
    
    x = Conv2DTranspose(25,(shapes[2],shapes[3]),padding='same')(x)
    x = Activation('relu')(x)
    
    x = Conv2DTranspose(25,(shapes[3],shapes[4]),padding='same')(x)
    x = Activation('relu')(x)
    
    x = Conv2DTranspose(25,(shapes[4],shapes[5]),padding='same')(x)
    x = Activation('relu')(x)
    
    x = Conv2DTranspose(25,(shapes[5],1),padding='same')(x)
    G_out = Activation('tanh')(x)
    
    G = Model(inputs=[Gin],outputs=G_out)
    optimizer = SGD(lr =lr)
    
    G.compile(loss = 'binary_crossentropy',optimizer=optimizer)
    
    return G, G_out

G_in1 = Input(shape=[None,100])
G, G_out = defineGen(G_in1)
G.summary()

Discriminator:

def defineDisc(Din, d = 1, lr = 1e-3):
    shapes = [d*x for x in [1,2,4,8,16]]
    
    x = Conv1D(25,kernel_size=(shapes[0]))(Din)
    x = LeakyReLU(alpha=0.1)(x)
    
    # phase shuffle - not implemented yet
    
    x = Conv1D(25,(shapes[1]),strides=4)(x)
    x = LeakyReLU(alpha=0.1)(x)
    
    # phase shuffle - not implemented yet
    
    x = Conv1D(25,(shapes[2]),strides=4)(x)
    x = LeakyReLU(alpha=0.1)(x)
    
    # phase shuffle - not implemented yet
    
    x = Conv1D(25,(shapes[3]),strides=4)(x)
    x = LeakyReLU(alpha=0.1)(x)
    
    # phase shuffle - not implemented yet
    
    x = Conv1D(25,(shapes[4]),strides=4)(x)
    x = LeakyReLU(alpha=0.1)(x)
    
    x = Reshape((256))(x)
    
    Dout = Dense(256)(x)
    
    D = Model(inputs=[Din],outputs = Dout)
    D.compile(loss="binary_crossentropy", optimizer=dopt)
    
    return D, Dout

Din = Input(shape=[16384])
D, D_out = defineDisc(Din)
D.summary()

after shuffle buffer filled, the training seems stopped

I train the wavegan model using a single 1080Ti Nvidia Graphic card, 12 GB memory, After the screen shows shuffle buffer filled, the training seems stopped. It just stuck there for more than forty hours.

chrisdonahue / wavegan Goto Github PK

wavegan's Introduction

WaveGAN (v2)

Requirements

Datasets

Train a WaveGAN

Data considerations

Quality considerations

Monitoring

Train a SpecGAN

Monitoring

Generation

Evaluation

Web code

Attribution

wavegan's People

Contributors

Stargazers

Watchers

Forkers

wavegan's Issues

Recommend Projects

Recommend Topics

Recommend Org