magenta / ddsp Goto Github PK
View Code? Open in Web Editor NEWDDSP: Differentiable Digital Signal Processing
Home Page: https://magenta.tensorflow.org/ddsp
License: Apache License 2.0
DDSP: Differentiable Digital Signal Processing
Home Page: https://magenta.tensorflow.org/ddsp
License: Apache License 2.0
Thank you for your great job.
I want to train something in gpu , but got a assert error in 3_training.ipynb
# Setup the session.
import os
assert "COLAB_TPU_ADDR" in os.environ, "ERROR: Not connected to a TPU runtime; please set the runtime type to 'TPU'."
TPU_ADDRESS = "grpc://" + os.environ["COLAB_TPU_ADDR"]
sess = tf.Session(TPU_ADDRESS)
how to train ddsp in gpu ?
At the end of the function call of the class Autoencoder in ddsp/training/models.py, we have these lines, where it clearly feeds the loss_obj with the target and the synthetized audios in that order
if training:
for loss_obj in self.loss_objs:
self.add_loss(loss_obj(features['audio'], audio_gen))
However, in ddsp/losses.py, we see how the headers of all the loss functions implemented are the following
def call(self, audio, target_audio):
which means that they expect first the synthetized audio and then the target audio.
So basically, they are fed in reverse order. For most loss functions, this fact is not relevant, that's why this has been unnoticed until now. It is still a bug, though.
Hi, DDSP seems to be quite helpful to my project, but I'm using PyTorch rather than TensorFlow, will it work with PyTorch?
It's truly amazing work and thanks for the codes! I'd like to reproduce the auto-encoder demo model you described in the paper. Can I find the violin audio sample in some places?
Since MIDI pitches are essentially correspoding to the base frequencies of instrumental notes, it is intuitive to condition f0 on MIDI pitches. And I even see that you mentioned the MIDI data on Nsynth dataset. However, I never found the usage of MIDI data. So,
The Audio Examples link in the timbre_transfer demo is invalid.
Hi, I tried using various 'example_secs' values for ddsp_prepare_tfrecord to find a way to resynthesise complete audio recordings having different lengths. Basically I am trying to train the autoencoder with a collection of train files and then feed test audio files to see how well the autoencoder can perform copy-synthesis with unseen data.
Using the default values:
--example_secs=4
--sliding_window_hop_secs=1
the process ends without problems but of course with chunks of data sized as 4 seconds.
I tried with various other settings such as:
--example_secs=0
(to keep entire file)
--example_secs=8 \
--sliding_window_hop_secs=2 \
etc...
In creation the tfrecords file, no error message is printed and the file is created as expected.
However trying to read/consume data(the tfrecords file produced), all settings other than the default returned with the following error::
InvalidArgumentError: Key: f0_confidence. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
Any suggestions to keep original length of data while packing in tfrecords format?
In core.frequency_impulse_response()
and core.frequency_filter()
, the documentation states:
The frequencies of the last dimension are ordered as [0, f_nyqist / (n_frames -1), ..., f_nyquist], here f_nyquist is (sample_rate / 2).
Shouldn't this be f_nyqist / (n_frequencies -1)
?
It didn't make sense to me signal-processing-wise, so I think it's a typo?
The latest crepe(==0.0.10) has a bug on crepe.predict
and ddsp uses it to calculate f0.
Related issue: marl/crepe#49
Where used:
Lines 267 to 273 in 4472de0
I noticed that f0 in gansynth_subset provided via tensorflow_dataset is calculated with this bug. Fortunately, the impact of the bug is not so large. I've checked some examples (not all) and about 1.2% values in each example are wrong.
Hi. What license is assigned to this project?
when I running this code:
data_provider = ddsp.training.data.TFRecordProvider(TRAIN_TFRECORD_FILEPATTERN)
dataset = data_provider.get_dataset(shuffle=False)
ex = next(iter(dataset))
got a OutOfRangeError:
---------------------------------------------------------------------------
OutOfRangeError Traceback (most recent call last)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/eager/context.py in execution_mode(mode)
1896 ctx.executor = executor_new
-> 1897 yield
1898 finally:
11 frames
OutOfRangeError: End of sequence [Op:IteratorGetNextSync]
During handling of the above exception, another exception occurred:
OutOfRangeError Traceback (most recent call last)
OutOfRangeError: End of sequence
During handling of the above exception, another exception occurred:
StopIteration Traceback (most recent call last)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/data/ops/iterator_ops.py in next(self)
674 return self._next_internal()
675 except errors.OutOfRangeError:
--> 676 raise StopIteration
677
678 @property
StopIteration:
I do n’t know if it ’s too little training data, I only uploaded one file for training.
The file https://colab.research.google.com/github/magenta/ddsp/blob/master/ddsp/colab/tutorials/1_synth_and_effects.ipynb
isn't able to access.
Hi, I used to apply DDSP autoencoder to do audio to audio reconstruction and it works really well. But when I change the frame_rate from 250 to 50/100/200, and example_secs to 2/4, I found that the reconstruction is making no sense at all. It's very strange to me. I think DDSP should be robust to the frame_rate and example_secs? I could see the loss decreases, but the spectrogram and audio reconstruction are not good. Previously if I use 250 frame rate and 4 seconds, I could get correlation coefficient over 0.8, now the correlation between ground truth and reconstructed spectrograms are 0.
I attached the reconstructed result https://drive.google.com/file/d/1E3IMQHnQQpYT_uuRQGgdEjvbptXmCEBF/view?usp=sharing
and it would be great you could give me some hint why the frame rate and example_secs is so important? Or maybe there are some parameters I omit are supposed to be changed?
--gin_param="TFRecordProvider.frame_rate = 200" \
--gin_param='TFRecordProvider.example_secs = 2' \
--gin_param='DefaultPreprocessor.time_steps = 400' \
--gin_param='Additive.n_samples = 32000' \
--gin_param='FilteredNoise.n_samples = 32000' \
Hi, and first of all thanks for your work.
When trying to execute the first cell of the train_autoencoder
notebook, I get this output:
TensorFlow 2.x selected.
|████████████████████████████████| 92kB 4.4MB/s
|████████████████████████████████| 3.1MB 14.7MB/s
|████████████████████████████████| 3.4MB 40.2MB/s
|████████████████████████████████| 368kB 64.2MB/s
|████████████████████████████████| 59.2MB 48kB/s
|████████████████████████████████| 61kB 9.8MB/s
|████████████████████████████████| 81kB 12.6MB/s
|████████████████████████████████| 235kB 51.3MB/s
|████████████████████████████████| 51kB 8.6MB/s
|████████████████████████████████| 1.2MB 58.3MB/s
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
In the logs, I see 9 info messages, preceded by the following 2 warnings (I don't know if they are related):
warn("IPython.utils.traitlets has moved to a top-level traitlets package.")
/usr/local/lib/python2.7/dist-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
Hello. I've been working through the tutorials, running them locally on my machine. When running with a single GPU, I was able to get the 3_training.ipynb
running just fine after adding in
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
When I have my second GPU enabled as well:
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I get an error on this line: trainer.build(next(iter(dataset)))
:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input dimension 2 must have length of at least 256 but got: 64
[[node replica_1/autoencoder/processor_group/rfft (defined at /site-packages/ddsp/core.py:683) ]]
(1) Invalid argument: Input dimension 2 must have length of at least 256 but got: 64
[[node replica_1/autoencoder/processor_group/rfft (defined at /site-packages/ddsp/core.py:683) ]]
[[replica_1/autoencoder/processor_group/add_4/_10]]
0 successful operations.
0 derived errors ignored. [Op:__inference___call___11281]
Errors may have originated from an input operation.
Input Source operations connected to node replica_1/autoencoder/processor_group/rfft:
replica_1/autoencoder/processor_group/frame/Reshape_4 (defined at /site-packages/ddsp/core.py:670)
Input Source operations connected to node replica_1/autoencoder/processor_group/rfft:
replica_1/autoencoder/processor_group/frame/Reshape_4 (defined at /site-packages/ddsp/core.py:670)
Function call stack:
__call__ -> __call__
I'm running with two 2070 Supers, and I don't fully understand the whole allow_growth
thing either, but I'm wondering if you may have any idea why I'm able to run on a single GPU but not both. Let me know if I can provide any more information. Thanks in advance for the help, and thanks for the awesome library!
Successfully cloned the repo to my saturn cloud jupyter notebook but I've run into a problem. The buttons and interactive displays dont show up in jupyter like they do in colab. IS there a workaround I can implement? Please help, love what you guys are doing so far. thanks
Thanks for the work you've done, this work is very exciting because it's intuitive to me! I do have a question though...
It is encouraged to use the multi-resolution spectrogram loss; however, the spectrogram loss does not incorporate a number of perceptual biases:
How come?
Furthermore, in the paper, it mentions computing the reconstruction loss without the log scale. Given that human perception is non-linear, this choice doesn't make sense to me. Why would you compute the loss without the log scale?
Hey, I am trying to run codes in train_autoencoder.ipynb and I got the following error after running next(iter(dataset))
InvalidArgumentError: Key: audio. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
I change some of the ddsp_prepare_tfrecord configs since my audio is usually less than 1 sec:
!ddsp_prepare_tfrecord \
--input_audio_filepatterns=$AUDIO_FILEPATTERN \
--output_tfrecord_path=$TRAIN_TFRECORD \
--num_shards=1 \
--example_secs 1 \
--sliding_window_hop_secs 0.25 \
--alsologtostderr
I am not sure what went wrong. I could find the dataset contains: <ParallelMapDataset shapes: {audio: (64000,), f0_confidence: (1000,), f0_hz: (1000,), loudness_db: (1000,)}, types: {audio: tf.float32, f0_confidence: tf.float32, f0_hz: tf.float32, loudness_db: tf.float32}>
Hey, getting an issue when I try to run the upload function in the timbre transfer demo
I run the cell, choose the file to upload then get this error
MessageError Traceback (most recent call last)
in ()
10 # Load audio sample here (.mp3 or .wav3 file)
11 # Just use the first file.
---> 12 filenames, audios = upload()
13 audio = audios[0]
14 audio = audio[np.newaxis, :]
3 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108
MessageError: RangeError: Maximum call stack size exceeded.
Pls help, thanks!
When I try to upload my own model in timbre_transfer.ipynb,got error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-e3bb52fea3e8> in <module>()
8 model_dir = os.path.join(GCS_CKPT_DIR, 'solo_%s_ckpt' % model.lower())
9 else:
---> 10 raise ValueError
11
12 # Assumes only one checkpoint in the folder, 'model.ckpt-[iter]`.
ValueError:
I noticed that in both ddsp/ddsp/training/gin/models/ae.gin and ddsp/ddsp/training/gin/models/ae_abs.gin settings, the model will use z as latent space. I tried to replace Autoencoder.decoder = @decoders.ZRnnFcDecoder()
to Autoencoder.decoder = @decoders.RnnFcDecoder()
to not use z and test the model's performance, is it the right way? I found that if I did not use z and use ae_abs.gin
which jointly learns an encoder for f0, I will get loss nan after around 2000 steps. I doubt if this issue is from z latent missing...
Hi,
Thank you for this super cool code!
I use the colab implementation. The outcome of the style transfer seems to be a transfer of the amplitudes but the frequencies seem to be not right. My model also only trains for maybe 15-30 min. on a 3min source.
Details:
Everything is executable without errors. But I get a lot of warnings in the section "Preprocess raw audio into TFRecord dataset" and in the section "We will now begin training. "
"Preprocess raw audio into TFRecord dataset":
Warnings:
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/compat/v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
I0128 20:34:14.087458 139829082847104 fn_api_runner_transforms.py:490] ==================== <function annotate_downstream_side_inputs at 0x7f2bcb8d9510> ====================
I0128 20:34:14.088107 139829082847104 fn_api_runner_transforms.py:490] ==================== <function fix_side_input_pcoll_coders at 0x7f2bcb8d9620> ====================
I0128 20:34:14.088482 139829082847104 fn_api_runner_transforms.py:490] ==================== <function lift_combiners at 0x7f2bcb8d96a8> ====================
I0128 20:34:14.088626 139829082847104 fn_api_runner_transforms.py:490] ==================== <function expand_sdf at 0x7f2bcb8d9730> ====================
I0128 20:34:14.088833 139829082847104 fn_api_runner_transforms.py:490] ==================== <function expand_gbk at 0x7f2bcb8d97b8> ====================
I0128 20:34:14.089247 139829082847104 fn_api_runner_transforms.py:490] ==================== <function sink_flattens at 0x7f2bcb8d98c8> ====================
I0128 20:34:14.089420 139829082847104 fn_api_runner_transforms.py:490] ==================== <function greedily_fuse at 0x7f2bcb8d9950> ====================
I0128 20:34:14.090623 139829082847104 fn_api_runner_transforms.py:490] ==================== <function read_to_impulse at 0x7f2bcb8d99d8> ====================
I0128 20:34:14.090765 139829082847104 fn_api_runner_transforms.py:490] ==================== <function impulse_to_input at 0x7f2bcb8d9a60> ====================
I0128 20:34:14.090906 139829082847104 fn_api_runner_transforms.py:490] ==================== <function inject_timer_pcollections at 0x7f2bcb8d9bf8> ====================
I0128 20:34:14.091151 139829082847104 fn_api_runner_transforms.py:490] ==================== <function sort_stages at 0x7f2bcb8d9c80> ====================
I0128 20:34:14.091248 139829082847104 fn_api_runner_transforms.py:490] ==================== <function window_pcollection_coders at 0x7f2bcb8d9d08> ====================
I0128 20:34:14.092715 139829082847104 statecache.py:137] Creating state cache with size 100
I0128 20:34:14.093585 139829082847104 fn_api_runner.py:1538] Created Worker handler <apache_beam.runners.portability.fn_api_runner.EmbeddedWorkerHandler object at 0x7f2bcb36fef0> for environment urn: "beam:env:embedded_python:v1"
I0128 20:34:14.093782 139829082847104 fn_api_runner.py:693] Running (((((ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/DoOnce/Impulse_26)+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/DoOnce/FlatMap(<lambda at core.py:2530>)_27))+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/DoOnce/Map(decode)_29))+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/InitializeWrite_30))+(ref_PCollection_PCollection_18/Write))+(ref_PCollection_PCollection_19/Write)
I0128 20:34:14.110267 139829082847104 fn_api_runner.py:693] Running (((((((((ref_AppliedPTransform_Create/Impulse_3)+(ref_AppliedPTransform_Create/FlatMap(<lambda at core.py:2530>)_4))+(ref_AppliedPTransform_Create/Map(decode)_6))+(ref_AppliedPTransform_Map(_load_audio)_7))+(ref_AppliedPTransform_Map(_add_f0_estimate)_8))+(ref_AppliedPTransform_Map(_add_loudness)_9))+(ref_AppliedPTransform_FlatMap(_split_example)_10))+(ref_AppliedPTransform_Reshuffle/AddRandomKeys_12))+(ref_AppliedPTransform_Reshuffle/ReshufflePerKey/Map(reify_timestamps)_14))+(Reshuffle/ReshufflePerKey/GroupByKey/Write)
I0128 20:34:14.130905 139826085275392 prepare_tfrecord_lib.py:34] Loading 'data/audio/vocal-by-1.wav'.
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0128 20:34:14.580774 139826085275392 deprecation.py:506] From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-01-28 20:34:15.311901: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
/usr/local/lib/python3.6/dist-packages/librosa/core/time_frequency.py:1006: RuntimeWarning: divide by zero encountered in log10
- 0.5 * np.log10(f_sq + const[3]))
I0128 20:34:52.714827 139829082847104 fn_api_runner.py:693] Running ((((((Reshuffle/ReshufflePerKey/GroupByKey/Read)+(ref_AppliedPTransform_Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_19))+(ref_AppliedPTransform_Reshuffle/RemoveRandomKeys_20))+(ref_AppliedPTransform_Map(_float_dict_to_tfexample)_21))+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/ParDo(_RoundRobinKeyFn)_31))+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/WindowInto(WindowIntoFn)_32))+(WriteToTFRecord/Write/WriteImpl/GroupByKey/Write)
I0128 20:34:54.352579 139829082847104 fn_api_runner.py:693] Running ((WriteToTFRecord/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/WriteBundles_37))+(ref_PCollection_PCollection_25/Write)
W0128 20:34:54.418104 139826085275392 tfrecordio.py:60] Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
I0128 20:34:54.581911 139829082847104 fn_api_runner.py:693] Running ((ref_PCollection_PCollection_18/Read)+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/PreFinalize_38))+(ref_PCollection_PCollection_26/Write)
I0128 20:34:54.591270 139829082847104 fn_api_runner.py:693] Running (ref_PCollection_PCollection_18/Read)+(ref_AppliedPTransform_WriteToTFRecord/Write/WriteImpl/FinalizeWrite_39)
I0128 20:34:54.597444 139826069305088 filebasedsink.py:294] Starting finalize_write threads with num_shards: 10 (skipped: 0), batches: 10, num_threads: 10
I0128 20:34:54.700012 139826069305088 filebasedsink.py:331] Renamed 10 shards in 0.10 seconds.
Maybe it s no problem. (?)
Then, when I train a model I get a lot of warnings again:
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/compat/v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:189: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.
W0128 20:38:03.140560 139811238639488 module_wrapper.py:138] From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:189: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:191: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.
W0128 20:38:03.140788 139811238639488 module_wrapper.py:138] From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:191: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:199: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
W0128 20:38:03.141086 139811238639488 module_wrapper.py:138] From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:199: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
INFO:tensorflow:Using config: {'_model_dir': '/content/models/ddsp-solo-instrument', '_tf_random_seed': None, '_save_summary_steps': 300, '_save_checkpoints_steps': 300, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 100, '_keep_checkpoint_every_n_hours': 1, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=300, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0128 20:38:03.141758 139811238639488 estimator.py:216] Using config: {'_model_dir': '/content/models/ddsp-solo-instrument', '_tf_random_seed': None, '_save_summary_steps': 300, '_save_checkpoints_steps': 300, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 100, '_keep_checkpoint_every_n_hours': 1, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=300, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu False
I0128 20:38:03.142008 139811238639488 tpu_context.py:221] _TPUContext: eval_on_tpu False
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0128 20:38:03.145722 139811238639488 deprecation.py:506] From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0128 20:38:03.146073 139811238639488 deprecation.py:323] From /tensorflow-2.1.0/python3.6/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
I0128 20:38:03.266683 139811238639488 estimator.py:1151] Calling model_fn.
INFO:tensorflow:Running train on CPU
I0128 20:38:03.266903 139811238639488 tpu_estimator.py:3124] Running train on CPU
I0128 20:38:04.029543 139811238639488 processors.py:138] Connecting node (additive):
I0128 20:38:04.029708 139811238639488 processors.py:140] Input 0: amps
I0128 20:38:04.029782 139811238639488 processors.py:140] Input 1: harmonic_distribution
I0128 20:38:04.029845 139811238639488 processors.py:140] Input 2: f0_hz
I0128 20:38:04.095593 139811238639488 processors.py:138] Connecting node (filtered_noise):
I0128 20:38:04.095721 139811238639488 processors.py:140] Input 0: noise_magnitudes
I0128 20:38:04.194273 139811238639488 processors.py:138] Connecting node (add):
I0128 20:38:04.194403 139811238639488 processors.py:140] Input 0: filtered_noise/signal
I0128 20:38:04.194476 139811238639488 processors.py:140] Input 1: additive/signal
I0128 20:38:04.194946 139811238639488 processors.py:138] Connecting node (reverb):
I0128 20:38:04.195056 139811238639488 processors.py:140] Input 0: add/signal
I0128 20:38:04.302336 139811238639488 processors.py:157] ProcessorGroup output node (reverb)
I0128 20:38:04.933219 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc/dense/kernel:0 (shape=(1, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.933395 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933452 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc/layer_normalization/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933498 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc/layer_normalization/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933540 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_1/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.933584 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_1/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933624 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_1/layer_normalization_1/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933666 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_1/layer_normalization_1/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933704 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_2/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.933745 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_2/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933784 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_2/layer_normalization_2/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933821 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack/fc_2/layer_normalization_2/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.933857 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc/dense/kernel:0 (shape=(1, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.933896 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934019 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc/layer_normalization_3/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934084 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc/layer_normalization_3/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934144 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_1/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.934210 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_1/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934275 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_1/layer_normalization_4/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934336 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_1/layer_normalization_4/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934394 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_2/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.934453 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_2/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934509 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_2/layer_normalization_5/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934564 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_1/fc_2/layer_normalization_5/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934619 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/gru/kernel:0 (shape=(1024, 1536), dtype=<dtype: 'float32'>).
I0128 20:38:04.934680 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/gru/recurrent_kernel:0 (shape=(512, 1536), dtype=<dtype: 'float32'>).
I0128 20:38:04.934738 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/gru/bias:0 (shape=(1536,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934794 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc/dense/kernel:0 (shape=(1536, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.934854 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934910 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc/layer_normalization_6/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.934981 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc/layer_normalization_6/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935039 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_1/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.935099 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_1/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935154 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_1/layer_normalization_7/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935209 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_1/layer_normalization_7/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935270 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_2/dense/kernel:0 (shape=(512, 512), dtype=<dtype: 'float32'>).
I0128 20:38:04.935333 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_2/dense/bias:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935391 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_2/layer_normalization_8/gamma:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935447 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/fc_stack_2/fc_2/layer_normalization_8/beta:0 (shape=(512,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935502 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/dense/kernel:0 (shape=(512, 126), dtype=<dtype: 'float32'>).
I0128 20:38:04.935561 139811238639488 models.py:230] adding trainable variable rnn_fc_decoder/dense/bias:0 (shape=(126,), dtype=<dtype: 'float32'>).
I0128 20:38:04.935617 139811238639488 models.py:230] adding trainable variable ir:0 (shape=(48000,), dtype=<dtype: 'float32'>).
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:126: The name tf.estimator.tpu.TPUEstimatorSpec is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimatorSpec instead.
W0128 20:38:06.495219 139811238639488 module_wrapper.py:138] From /usr/local/lib/python3.6/dist-packages/ddsp/training/train_util.py:126: The name tf.estimator.tpu.TPUEstimatorSpec is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimatorSpec instead.
INFO:tensorflow:Done calling model_fn.
I0128 20:38:06.514110 139811238639488 estimator.py:1153] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0128 20:38:06.515020 139811238639488 basic_session_run_hooks.py:546] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0128 20:38:07.337075 139811238639488 monitored_session.py:246] Graph was finalized.
2020-01-28 20:38:07.447110: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
INFO:tensorflow:Running local_init_op.
I0128 20:38:08.580479 139811238639488 session_manager.py:504] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0128 20:38:08.617971 139811238639488 session_manager.py:507] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /content/models/ddsp-solo-instrument/model.ckpt.
I0128 20:38:10.710745 139811238639488 basic_session_run_hooks.py:613] Saving checkpoints for 0 into /content/models/ddsp-solo-instrument/model.ckpt.
INFO:tensorflow:global_step/sec: 0.166465
I0128 20:38:26.712445 139811238639488 tpu_estimator.py:2307] global_step/sec: 0.166465
INFO:tensorflow:examples/sec: 2.66345
I0128 20:38:26.713377 139811238639488 tpu_estimator.py:2308] examples/sec: 2.66345
INFO:tensorflow:global_step/sec: 0.556173
I0128 20:38:28.510447 139811238639488 tpu_estimator.py:2307] global_step/sec: 0.556173
INFO:tensorflow:examples/sec: 8.89877
I0128 20:38:28.510792 139811238639488 tpu_estimator.py:2308] examples/sec: 8.89877
INFO:tensorflow:global_step/sec: 0.587273
I0128 20:38:30.213216 139811238639488 tpu_estimator.py:2307] global_step/sec: 0.587273
...
Training the model executes with:
INFO:tensorflow:global_step/sec: 0.557365
I0128 21:08:40.238079 139811238639488 tpu_estimator.py:2307] global_step/sec: 0.557365
INFO:tensorflow:examples/sec: 8.91784
I0128 21:08:40.238506 139811238639488 tpu_estimator.py:2308] examples/sec: 8.91784
INFO:tensorflow:global_step/sec: 0.548976
I0128 21:08:42.059640 139811238639488 tpu_estimator.py:2307] global_step/sec: 0.548976
INFO:tensorflow:examples/sec: 8.78361
I0128 21:08:42.060112 139811238639488 tpu_estimator.py:2308] examples/sec: 8.78361
INFO:tensorflow:Saving checkpoints for 1000 into /content/models/ddsp-solo-instrument/model.ckpt.
I0128 21:08:42.060704 139811238639488 basic_session_run_hooks.py:613] Saving checkpoints for 1000 into /content/models/ddsp-solo-instrument/model.ckpt.
INFO:tensorflow:Loss for final step: 6.1354375.
I0128 21:08:42.603451 139811238639488 estimator.py:375] Loss for final step: 6.1354375.
INFO:tensorflow:training_loop marked as finished
I0128 21:08:42.604181 139811238639488 error_handling.py:108] training_loop marked as finished
When I upload the model in the style transfer colab, the resynthesized sample sounds like rythmic noise / wind.
I wonder, where I am wrong and if I maybe have to adapt any of the following:
num_shards=10
num_train_steps=1000
gin_param=batch_size=16
Generated model data looks like:
I would really appreciate any leads!
Have a great day!
Running into an issue running the block that installs dependencies:
ERROR: pydrive 1.3.1 has requirement oauth2client>=4.0.0, but you'll have oauth2client 3.0.0 which is incompatible.
ERROR: google-api-python-client 1.7.12 has requirement httplib2<1dev,>=0.17.0, but you'll have httplib2 0.12.0 which is incompatible.
ERROR: chainer 6.5.0 has requirement typing<=3.6.6, but you'll have typing 3.7.4.1 which is incompatible.
ERROR: chainer 6.5.0 has requirement typing-extensions<=3.6.6, but you'll have typing-extensions 3.7.4.2 which is incompatible.
The timbre transfer colab worked fine for me.
Thank you for your work on these colab demos. I find them super useful.
# Processor group DAG
dag = [
(additive, ['amps', 'harmonic_distribution', 'f0_hz']),
(noise, ['magnitudes']),
(add, ['additive/signal', 'noise/signal']),
(reverb, ['ir', 'add/signal'])
]
processor_group = ddsp.processors.ProcessorGroup(dag=dag)
audio_out = processor_group.get_signal(inputs)
# Listen
play(audio_out)
specplot(audio_out)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-9c0cecf1091d> in <module>
8
9 processor_group = ddsp.processors.ProcessorGroup(dag=dag)
---> 10 audio_out = processor_group.get_signal(inputs)
11
12 # Listen
~/anaconda3/envs/hw/lib/python3.6/site-packages/ddsp/processors.py in get_signal(self, *args, **kwargs)
113 def get_signal(self, *args: tf.Tensor, **kwargs: tf.Tensor) -> tf.Tensor:
114 """Convert input tensors arguments into a signal tensor."""
--> 115 outputs = self.get_outputs(*args, **kwargs)
116 signal = outputs[self.name]['signal']
117 return signal
~/anaconda3/envs/hw/lib/python3.6/site-packages/ddsp/processors.py in get_outputs(self, dag_inputs)
144
145 # Build the processor (does nothing if not the first time).
--> 146 processor.build(*[tensor.shape for tensor in inputs])
147 # Run processor.
148 controls = processor.get_controls(*inputs)
TypeError: build() takes 2 positional arguments but 3 were given
audios[0:1].astype(np.float32)
in the basic timbre_transfer.ipynb does not work, as audios
is a list.
audios[0].astype(np.float32)
works.
Hi, I'm trying to train a model locally (adapting the code from train_autoencoder.ipynb), and I'm getting the error in the title just before the model is supposed to start training. I will copy the complete log below. My configuration is as follows:
2020-02-21 13:39:39.259132: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-21 13:39:41.110202: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
I0221 13:39:43.156791 2672 train_util.py:56] Defaulting to MirroredStrategy
2020-02-21 13:39:43.164404: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-21 13:39:43.237886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.68GHz coreCount: 34 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-21 13:39:43.241122: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-21 13:39:43.246274: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-21 13:39:43.250949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-21 13:39:43.253287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-21 13:39:43.257189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-21 13:39:43.261498: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-21 13:39:43.269133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-21 13:39:43.271574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-21 13:39:43.272927: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-02-21 13:39:43.275556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.68GHz coreCount: 34 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-02-21 13:39:43.278705: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-21 13:39:43.280447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-21 13:39:43.282142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-21 13:39:43.283834: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-21 13:39:43.285671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-21 13:39:43.287438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-21 13:39:43.289994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-21 13:39:43.291835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-21 13:39:43.970857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-21 13:39:43.973353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] 0
2020-02-21 13:39:43.974871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0: N
2020-02-21 13:39:43.976781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6306 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0221 13:39:43.974044 2672 mirrored_strategy.py:501] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0221 13:39:44.343264 2672 train_util.py:201] Building the model...
WARNING:tensorflow:From c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:1809: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0221 13:39:48.817270 3952 deprecation.py:506] From c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:1809: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-02-21 13:39:52.821030: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-21 13:39:53.103556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-21 13:39:53.327462: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
I0221 13:39:54.833573 2672 train_util.py:172] Restoring from checkpoint...
I0221 13:39:54.833573 2672 train_util.py:184] No checkpoint, skipping.
I0221 13:39:54.833573 2672 train_util.py:256] Creating metrics for ListWrapper(['spectral_loss', 'total_loss'])
2020-02-21 13:40:02.551385: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-02-21 13:40:02.554137: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Fatal Python error: Aborted
Thread 0x00000a70 (most recent call first):
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\execute.py", line 60 in quick_execute
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\function.py", line 598 in call
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\function.py", line 1741 in _call_flat
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\function.py", line 1660 in _filtered_call
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\def_function.py", line 646 in _call
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\tensorflow\python\eager\def_function.py", line 576 in __call__
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\ddsp\training\train_util.py", line 273 in train
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\gin\config.py", line 1055 in gin_wrapper
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\ddsp\training\ddsp_run.py", line 151 in main
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\absl\app.py", line 250 in _run_main
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\absl\app.py", line 299 in run
File "c:\users\andrey\anaconda3\envs\test\lib\site-packages\ddsp\training\ddsp_run.py", line 172 in console_entry_point
File "C:\Users\andrey\Anaconda3\envs\test\Scripts\ddsp_run.exe\__main__.py", line 7 in <module>
File "c:\users\andrey\anaconda3\envs\test\lib\runpy.py", line 85 in _run_code
File "c:\users\andrey\anaconda3\envs\test\lib\runpy.py", line 193 in _run_module_as_main
I can't point my finger on where's the problem because:
This is with a Windows system. On Ubuntu the situation was the same, but I was getting the following error:
Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Any help will be appreciated.
This is great! I'm not too experienced with ML development but follow a lot of audio ML research, and I've been thinking that this approach should be the way to do things for a good while. Looking forward to playing around with ddsp for an upcoming project.
Got a few questions...
In some places, harmonic distribution seems nearly synonymous with the amplitude distribution a(n), as a model of variations between partials' spectral magnitudes, but then it's also referenced to model spectral centroid? Can you elaborate on the difference between harmonic distribution and a(n)? I use "overtone distribution" in my code to refer to discrete frequency distributions of partials relative to a fundamental (inharmonic timbre stuff)... probably contributing to my confusion. 😛
I'll be synthesizing novel inharmonic timbres with retuned pitches, using (mostly) harmonic timbres for inputs. Remapping/interpolating f(0) seems easy with the current model. I'm wondering if it's viable to remap overtone partials to an arbitrary frequency distribution with the current model... ie, instead of multiplying the fundamental by integers, simply multiply it by some predefined set of rational numbers/floats. As I'll be synthesizing novel timbres, I won't necessarily have training sets to provide as inputs to train an unconstrained oscillator bank via a loss function... so I'm thinking the process could just be training the current model, still limited to f(0), for the given input and then remapping partial frequencies onto inharmonic frequency sets at the additive synth while still using other features that are generated by the encoder and/or interpolated. Make sense/any immediate issues with that idea?
You use 101 partials in the synth... which for harmonic timbres would extend past the 8kHz nyquist limit for any pitch >~80Hz. Is that just to cover the entire frequency range for any reasonable pitch? Also curious about why you limited it to the 16kHz sample rate... real-time constraint or faster training or something?
Thanks and stay safe out there! Sorry for the wall of text.
In ddsp.spectral_ops.compute_f0
, when len(audio)
is 1025253
and sample_rate
is 16000
, n_samples
, which is equivalent to 1025253 / 16000 * 16000
becomes 1025252.9999999999
, which then causes the assertion further down to fail (assert n_padding % 1 == 0
).
Once the training is stopped and launched again, it continues training from the last checkpoint, however, the optimizer schedule is always reinicialized.
This happens in the __init__ method of the class Trainer in ddsp/training/train_util.py
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=learning_rate,
decay_steps=lr_decay_steps,
decay_rate=lr_decay_rate)
with self.strategy.scope():
optimizer = tf.keras.optimizers.Adam(lr_schedule)
self.optimizer = optimizer
A new instance of Trainer is created each time ddsp_run is executed
Hey, I just got a really good reconstruction result which is too good to be true. I have a sense that the idea behind the model is really good but it is still so amazing to me. I just use your demo autoencoder to reconstruct audios from the human voice and the result is really good. But I could not understand how it can be achieved by only using f0 and loudness information? For example, the vowel 'a' and 'e' is definitely different, how does this be reflected through f0 and loudness? I thought there might be some difference between musical instruments and human voice. I just couldn't understand that these features are enough.
By the way, if I want to add z as latent space besides f0 and loudness, how can I tell the model to use it? I thought you mentioned in the paper that z may correspond to timbre information but I couldn't find it in timbre_transfer.ipynb
, can you achieve timbre transfer without z?
Traceback (most recent call last):
File "/usr/local/bin/ddsp_prepare_tfrecord", line 8, in <module>
sys.exit(console_entry_point())
File "/usr/local/lib/python3.6/dist-packages/ddsp/training/data_preparation/prepare_tfrecord.py", line 91, in console_entry_point
tf.disable_v2_behavior()
AttributeError: module 'tensorflow_core.compat.v2' has no attribute 'disable_v2_behavior'
I really enjoy tinkering with ddsp. It would be a bit more approachable if we could experiment more easily with 44.1kHz or the other standard audio formats. Could you perhaps make it more straightforward in the demos, or alternatively document what should we set differently to accommodate other sample rates for a whole training-synthesizing pipeline, or at least some "best advices"?
The ddsp_prepare_tfrecord function, for example, is not very forgiving with custom sample rates (it asserts because of crepe's 16kHz resampling producing some decimal paddings?).
I suppose, 16kHz is a default because we are stuck in the world of speech synthesis (of the '90-ies?), but what might be acceptable for telephony is just not acceptable for many audio use cases.
I hope you don't mind me opening an issue just because of an idea/rant, feel free to close it anytime, and keep up doing these amazing contributions to the world of audio/music!
hey, I have a question about model save and restore. As you said here:
Saving weights in checkpoint format because saved_model requires handling variable batch size, which some synths and effects can't.
Do you mean the model also saves some synths and effects' variables?
I am struggling with this because I'd like to do some transfer learning with a new encoder but would like to use the pre-trained model's decoder' weights and I found that you use tf.train.Checkpoint.restore to restore the whole model. And you can use trainer.restore(model_dir) to restore the model during training. But it seems to make it hard to restore part of the model's weights using this coding style.
Is there a way to restore only part of the pretrained model (like decoder) in restore part and pass it to the new model's decoder? Another solution I can think up is to restore the whole model and replace all the parts except decoder, which seems really weird and might now work.
the link to DDSP Timbre Tranfer Colab at the beginning of the notebook is dead.
I'm having trouble with the Colab Training Demo. I keep getting this error
I0319 03:02:50.860800 140595398194944 prepare_tfrecord_lib.py:30] Loading 'data/audio/19063.wav'.
/usr/local/lib/python3.6/dist-packages/librosa/core/time_frequency.py:1006: RuntimeWarning: divide by zero encountered in log10
- 0.5 * np.log10(f_sq + const[3]))
I'm running everything in Google Colab with the GPU runtime selected. I've tried multiple MP3s and wav files but I keep getting the error. I'm not sure how to determine what the issue is and any help is appreciated.
I am often having problems with not being able to download the result from the small soundfile gui (the vertical three dots) in the colab when files are long >45~ seconds. (using upload, and .wav - both with pretrained and custom checkpoint)
if using a small 13mb file (for an example) there is no problem the download button throws/opens a os window for downloading or it will just download to the sys/os download folder immediately, as download.wav . depending on browser used and settings.
FYI, i have tried with chrome, firefox and brave. running debian buster.
and macOS mojave with firefox, chrome and safari.
sometimes i have the problem of not being able to download sometimes not.
tried to clear my cache and logout of google account + restarting runtime and refreshing page. no help. download button in soundfile view just results in "download failed due to network connection error" or even sometimes just nothing. no error, no os window pop up.
and I have double checked that it is not my network connection which is the trouble maker or that is is something i am doing wrong on my system. So when i experience this bug i immediately go to another colab running (for an example) some librosa stuff or even another ddsp demo like one from the tutorials and there i can download results from processes. no problem.
I would love to write the resynthesis output from model(af, ...) to the files directory. but I am not able to find any write out or store funcs in the ddsp lib.
PS. ddsp is making so nice sounding material compared to other ML resynthesis methods i have come across. Good job. And many thanks for sharing.
In train_autoencoder.ipynb this exception needs tf.errors to be imported:
try:
ex = next(iter(dataset))
except OutOfRangeError:
Hey,
I'm struggling with getting a simple (e.g. additive) synth to run with variable length data. Say, you are trying to train an autoencoder but the data is not always exactly the same length.
n_samples
at initialization. Why is this? It makes much more sense to me to have this argument when calling the synth.n_samples
I need at that time step. However, I'm not sure what the correct way of doing this would be. I tried passing a tensor shape as n_samples
, but this leads to a crash, see below.Short code example:
def test(inp):
# inp is a dummy -- it just represents "something with variable length"
osc = ddsp.synths.Additive(n_samples=int(tf.shape(inp)[0]))
# dummy values for amplitude/harmonics/f0
audio = osc([[[3.], [2.], [5.]]], [[[3.], [2.], [5.]]], [[[441.], [442.], [443.]]])
return audio
test(tf.random.normal([102]))
This leads to the following crash
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-5d43e4898acc> in <module>
----> 1 test(tf.random.normal([102]))
<ipython-input-78-211339ba06f2> in test(inp)
1 def test(inp):
2 osc = ddsp.synths.Additive(n_samples=tf.shape(inp)[0])
----> 3 audio = osc([[[3.], [2.], [5.]]], [[[3.], [2.], [5.]]], [[[441.], [442.], [443.]]])
4 return audio
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
820 with base_layer_utils.autocast_context_manager(
821 self._compute_dtype):
--> 822 outputs = self.call(cast_inputs, *args, **kwargs)
823 self._handle_activity_regularization(inputs, outputs)
824 self._set_mask_metadata(inputs, outputs, input_masks)
/usr/local/lib/python3.6/dist-packages/ddsp/processors.py in call(self, *args, **kwargs)
59 """Convert input tensors arguments into a signal tensor."""
60 controls = self.get_controls(*args, **kwargs)
---> 61 signal = self.get_signal(**controls)
62 return signal
63
/usr/local/lib/python3.6/dist-packages/ddsp/synths.py in get_signal(self, amplitudes, harmonic_distribution, f0_hz)
97 harmonic_distribution=harmonic_distribution,
98 n_samples=self.n_samples,
---> 99 sample_rate=self.sample_rate)
100 return signal
101
/usr/local/lib/python3.6/dist-packages/ddsp/core.py in harmonic_synthesis(frequencies, amplitudes, harmonic_shifts, harmonic_distribution, n_samples, sample_rate, amp_resample_method)
405 frequency_envelopes = resample(harmonic_frequencies, n_samples) # cycles/sec
406 amplitude_envelopes = resample(harmonic_amplitudes, n_samples,
--> 407 method=amp_resample_method)
408
409 # Synthesize from harmonics [batch_size, n_samples].
/usr/local/lib/python3.6/dist-packages/ddsp/core.py in resample(inputs, n_timesteps, method, add_endpoint)
124
125 elif method == 'window':
--> 126 outputs = upsample_with_windows(inputs, n_timesteps, add_endpoint)
127
128 else:
/usr/local/lib/python3.6/dist-packages/ddsp/core.py in upsample_with_windows(inputs, n_timesteps, add_endpoint)
170 n_frames, n_timesteps))
171
--> 172 if n_timesteps % n_intervals != 0.0:
173 minus_one = '' if add_endpoint else ' - 1'
174 raise ValueError(
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_ops.py in tensor_not_equals(self, other)
1363 if ops.Tensor._USE_EQUALITY and ops.executing_eagerly_outside_functions():
1364 if fwd_compat.forward_compatible(2019, 9, 25):
-> 1365 return gen_math_ops.not_equal(self, other, incompatible_shape_error=False)
1366 else:
1367 return gen_math_ops.not_equal(self, other)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_math_ops.py in not_equal(x, y, incompatible_shape_error, name)
6435 _ctx._context_handle, tld.device_name, "NotEqual", name,
6436 tld.op_callbacks, x, y, "incompatible_shape_error",
-> 6437 incompatible_shape_error)
6438 return _result
6439 except _core._FallbackException:
TypeError: Cannot convert 0.0 to EagerTensor of dtype int32
which I assume is due to n_timesteps
being a tensor in upsample_with_windows
. In this case it can be fixed by converting the shape to an int()
explicitly, but this won't work when using @tf.function
because the shape is not known at the time the code is actually run. Any workarounds? My colleague proposed simply initializing one synth for every possible length and choosing the correct one on the fly, but this seems wasteful.
I hope this is the right place to ask this. All the tutorials seem to be using fixed-length data (e.g. always 4 seconds) but I don't think variable lengths are a particularly exotic scenario.
I would like to use a custom training loop with ddsp_run
. Can I somehow swap Trainer
for a custom trainer (overriding step_fn
) in the gin config file?
I assume I cannot just do
train.trainer = @my_module.MyCustomTrainer()
because then the model
and strategy
arguments would not get passed into the trainer's constructor here.
I suppose I could define a function like
@gin.configurable
def get_trainer(*args, trainer_class=Trainer, **kwargs):
return trainer_class(*args, **kwargs)
and then call it from ddsp_run
instead of instantiating Trainer
directly, but that seems a bit clumsy.
Is there a better way to do it?
I am having trouble training models that don't rely on an f0 estimate from the Crepe pitch estimator. In my tests, whenever fundamental frequency estimation is part of the differential graph I cannot get any convergence of the additive synthesizer at all.
To reproduce it, I create a batch consisting of one sample generated with the additive synth as in the synths and effects tutorial notebook. I then try overfitting an autoencoder on that one sample, with code adapted from the training on one sample notebook.
The decoder uses an additive synthesizer too so, in theory, it should easily reconstruct the sample. Here is a Colab notebook that demonstrates the behavior. In order to make the model converge replace f0_encoder=f0_encoder
with f0_encoder=None
.
After the first few training steps, the loss does not improve anymore (around 18.-19.).
The model converges immediately with the loss going down to 3. in a short time.
This is happening just trying to fit one sample. I tried fitting multiple samples too without success.
import time
import ddsp
from ddsp.training import (data, decoders, encoders, models, preprocessing,
train_util)
import gin
import numpy as np
import tensorflow.compat.v2 as tf
import itertools
sample_rate = 16000
### Generate an audio sample using the additive synth
n_frames = 1000
hop_size = 64
n_samples = n_frames * hop_size
# Amplitude [batch, n_frames, 1].
# Make amplitude linearly decay over time.
amps = np.linspace(1.0, -3.0, n_frames,dtype=np.float32)
amps = amps[np.newaxis, :, np.newaxis]
# Harmonic Distribution [batch, n_frames, n_harmonics].
# Make harmonics decrease linearly with frequency.
n_harmonics = 20
harmonic_distribution = np.ones([n_frames, 1],dtype=np.float32) * np.linspace(1.0, -1.0, n_harmonics,dtype=np.float32)[np.newaxis, :]
harmonic_distribution = harmonic_distribution[np.newaxis, :, :]
# Fundamental frequency in Hz [batch, n_frames, 1].
f0_hz = 440.0 * np.ones([1, n_frames, 1],dtype=np.float32)
# Create synthesizer object.
additive_synth = ddsp.synths.Additive(n_samples=n_samples,
scale_fn=ddsp.core.exp_sigmoid,
sample_rate=sample_rate)
# Generate some audio.
audio = additive_synth(amps, harmonic_distribution, f0_hz)
# Create a batch of data (1 example) to train on
batch = {"audio": audio, "f0_hz": f0_hz, "amplitudes": amps, "loudness_db": np.ones_like(amps)}
dataset_iter = itertools.repeat(batch)
batch = next(dataset_iter)
audio = batch['audio']
n_samples = audio.shape[1]
### Create an autoencoder
# Create Neural Networks.
preprocessor = preprocessing.DefaultPreprocessor(time_steps=n_samples)
# f0 encoder
f0_encoder = encoders.ResnetF0Encoder(size="small")
encoder = encoders.MfccTimeDistributedRnnEncoder(rnn_channels = 256,
rnn_type = 'gru',
z_dims = 16,
z_time_steps=125,
f0_encoder=f0_encoder)
# set f0_encoder=None to use Crepe
decoder = decoders.RnnFcDecoder(rnn_channels = 256,
rnn_type = 'gru',
ch = 256,
layers_per_stack = 1,
output_splits = (('amps', 1),
('harmonic_distribution', 45)))
# Create Processors.
additive = ddsp.synths.Additive(n_samples=n_samples,
sample_rate=sample_rate,
name='additive')
# Create ProcessorGroup.
dag = [(additive, ['amps', 'harmonic_distribution', 'f0_hz'])]
processor_group = ddsp.processors.ProcessorGroup(dag=dag,
name='processor_group')
# Loss_functions
spectral_loss = ddsp.losses.SpectralLoss(loss_type='L1',
mag_weight=1.0,
logmag_weight=1.0)
strategy = train_util.get_strategy()
with strategy.scope():
# Put it together in a model.
model = models.Autoencoder(preprocessor=preprocessor,
encoder=encoder,
decoder=decoder,
processor_group=processor_group,
losses=[spectral_loss])
trainer = train_util.Trainer(model, strategy, learning_rate=1e-3)
### Try overfitting to the synthetic sample
# Build model, easiest to just run forward pass.
trainer.build(batch)
for i in range(3000):
losses = trainer.train_step(dataset_iter)
res_str = 'step: {}\t'.format(i)
for k, v in losses.items():
res_str += '{}: {:.2f}\t'.format(k, v)
print(res_str)
I noticed that the evaluate_or_sample function doesn't release memory between checkpoints.
For example, if I run ddsp_run in eval mode while another ddsp_run is training, the size of the evaluation process keeps growing as it loads and evaluates the checkpoints that are being generated by the training process. While killing and rerunning the eval process solves the issue, it is not an ideal solution.
SpectralLoss:
I am trying to use alternative loss weights, but all of them throw an error except logmag_weight and mag_weight:
INFO:tensorflow:Error reported to Coordinator: in user code:
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:64 __call__ *
results = super().__call__(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:123 call *
loss = loss_obj(features['audio'], audio_gen)
/usr/local/lib/python3.6/dist-packages/ddsp/losses.py:107 call *
target = diff(target_mag, axis=1)
/usr/local/lib/python3.6/dist-packages/ddsp/spectral_ops.py:158 diff *
size = shape.as_list()
AttributeError: 'list' object has no attribute 'as_list'
or in case of trying to use loudness:
INFO:tensorflow:Error reported to Coordinator: in user code:
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:64 __call__ *
results = super().__call__(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:123 call *
loss = loss_obj(features['audio'], audio_gen)
/usr/local/lib/python3.6/dist-packages/ddsp/losses.py:138 call *
target = spectral_ops.compute_loudness(target_audio, n_fft=2048)
/usr/local/lib/python3.6/dist-packages/ddsp/spectral_ops.py:209 compute_loudness *
s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=True)
/usr/local/lib/python3.6/dist-packages/ddsp/spectral_ops.py:61 stft_np *
audio = np.pad(audio, padding, 'constant')
<__array_function__ internals>:6 pad **
/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py:741 pad
array = np.asarray(array)
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:85 asarray
return array(a, dtype, copy=False, order=order)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749 __array__
" array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (fn:0) to a numpy array.
Hello!
Running through the training tutorial linked from the README.md (3_training.ipynb) without any alterations in Google Colab crashes at the "Build Model" cell. Specifically, these lines:
dataset = trainer.distribute_dataset(dataset)
trainer.build(next(iter(dataset)))
Seems as though the decoder is expecting the latent variable z
from the conditioning
dict but it's not present. Here's the stack trace:
>>> dataset = trainer.distribute_dataset(dataset)
>>> trainer.build(next(iter(dataset)))
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:64 __call__ *
results = super().__call__(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:120 call *
audio_gen = self.decode(conditioning, training=training)
/usr/local/lib/python3.6/dist-packages/ddsp/training/models.py:114 decode *
processor_inputs = self.decoder(conditioning, training=training)
/usr/local/lib/python3.6/dist-packages/ddsp/training/decoders.py:42 call *
x = self.decode(conditioning)
/usr/local/lib/python3.6/dist-packages/ddsp/training/decoders.py:85 decode *
inputs = [conditioning[k] for k in self.input_keys]
KeyError: 'z'
Hello!
Thank you for all your great work on this library.
I want to reproduce the solo violin experiment from the library.
I've downloaded the mp3s (the wavs are paywalled unfortunately) of the pieces performed by John Gardner from the link provided in the paper.
However, I'm getting the following error when I run ddsp_prepare_tfrecord.
File "/home/myuser/.local/lib/python3.7/site-packages/ddsp/spectral_ops.py", line 261, in compute_f0
assert n_padding % 1 == 0
RuntimeErrar: AssertionError [while running 'Map(_add_f0_estimate)']
This seems to be an issue with some of the clips, like V. Sarabande.mp3.
Thank you!
Hi, I have a question about embedding loss used in autoencoder, more specifically, PretrainedCREPEEmbeddingLoss
used in ae_abs.gin. From the codes it seems to regularize the original and reconstruction audio's latent f0, right? I am not quite sure why you do this, is it kind of like the cycle loss? since you calculate the latent again from the reconstructed audio?
I am not sure what this loss is used for, is it for training the CREPE model? I just found the logic is a little weird here.
Thank you!
Posting this again because the issue was closed before my question was resolved: #12
@jesseengel Thanks for the response!
You didn't quite answer my second question. Let me ask it in a different way...
With regards to the autoencoder configuration...
ddsp/ddsp/training/gin/models/ae.gin
Line 40 in cd98116
Is there a benefit to training on an amplitude spectrogram? It looks like it's also used in the loss in addition to the log scaled spectrogram.
The amplitude spectrogram is not meaningful from a human perspective, right?
What would it take to use DDSP to change the words a singer is singing in a song while still keeping the melody? So, a combination of TTS and DDSP, I would think. For example, one could feed in new lyrics (text) to an existing song like Creep (Radiohead) and have Thom say "I love feet" instead of "I'm a creep".
I think this project, and similar projects, seem the closest to actually doing this, but I haven't seen any specific mention of it. Any tips or additional info would be appreciated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.