mbinkowski / deepspeechdistances Goto Github PK
View Code? Open in Web Editor NEWAuthors' implementation of DeepSpeech Distances.
License: Apache License 2.0
Authors' implementation of DeepSpeech Distances.
License: Apache License 2.0
Hi,
Thanks for the colab notebook.
The current code to evaluate DeepSpeech Distances seems to require a GPU. Could you add an option for CPU?
For example, even on the collab, when I switch to CPU only I get:
INFO:tensorflow:Restoring parameters from /content/drive/My Drive/DeepSpeechDistances/checkpoint/ds2_large/model.ckpt-54800
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
1364 try:
-> 1365 return fn(*args)
1366 except errors.OpError as e:
11 frames
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams}}with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams (defined at /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:1748) with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/training/saver.py in restore(self, sess, save_path)
1324 # We add a more reasonable error message here to help users (b/110263146)
1325 raise _wrap_restore_error_with_msg(
-> 1326 err, "a mismatch between the current graph and the graph")
1327
1328 @staticmethod
InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams (defined at /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:1748) with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]```
HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS
I read a really good paper.
However, there are questions about the thesis.
The article says that GAN-TTS can make text into speech.
but, text entry section is not described in the model.
Where can I learn text about GAN-TTS?
Is this article only talking about speech synthesis, which turns a wave file into a wave file via GAN?
If that's the case, should I make some of the TTS models like Tacotron?
Hi, Mikolaj
Thanks for open-sourcing those metrics, and congratulations on ICLR acceptance of your paper!
I wonder if those metrics could be also used for speech with sample rates other than 24kHz. Since 22050Hz is also popular for speech synthesis, I'm thinking of using those metrics by resampling the speech from 22050Hz to 24000Hz using ffmpeg
. Will that be okay?
Looking forward for your reply, thanks in advance.
Seungwon
Hi Mikolaj,
Thanks for creating this useful repo but when I'm following your colab notebook i got stuck with this following error,
The step
reference_path = os.path.join(SAMPLE_PATH, 'ref', '*.wav')
eval_paths = [os.path.join(SAMPLE_PATH, f'noisy_{i+1}', '*.wav') for i
in range(NUM_NOISE_LEVELS)]
evaluator = audio_distance.AudioDistance(
load_path=os.path.join(PATH, 'checkpoint', 'ds2_large', 'model.ckpt-54800'),
meta_path=os.path.join(PATH, 'checkpoint', 'collection-stripped-meta.meta'),
required_sample_size=NUM_SPLITS * SAMPLES_PER_SPLIT,
num_splits=NUM_SPLITS)
evaluator.load_real_data(reference_path)
The error
NotImplementedError Traceback (most recent call last)
[<ipython-input-5-f43dcce18c53>](https://localhost:8080/#) in <module>()
7 meta_path=os.path.join(PATH, 'checkpoint', 'collection-stripped-meta.meta'),
8 required_sample_size=NUM_SPLITS * SAMPLES_PER_SPLIT,
----> 9 num_splits=NUM_SPLITS)
10
11 evaluator.load_real_data(reference_path)
9 frames
[/content/drive/My Drive/DeepSpeechDistances/audio_distance.py](https://localhost:8080/#) in __init__(self, load_path, meta_path, keep_features, required_sample_size, num_splits, do_kdsd, do_conditional_dsds, sample_freq)
81 tf.split(self.eval_features, num_splits))
82
---> 83 dists = [frechet_dist(ref, ev) for ref, ev in zipped]
84 self.dists = [(tf.reduce_mean(dists), tf.math.reduce_std(dists))]
85 if self.do_kdsd:
[/content/drive/My Drive/DeepSpeechDistances/audio_distance.py](https://localhost:8080/#) in <listcomp>(.0)
81 tf.split(self.eval_features, num_splits))
82
---> 83 dists = [frechet_dist(ref, ev) for ref, ev in zipped]
84 self.dists = [(tf.reduce_mean(dists), tf.math.reduce_std(dists))]
85 if self.do_kdsd:
[/tensorflow-1.15.2/python3.7/tensorflow_gan/python/eval/classifier_metrics.py](https://localhost:8080/#) in frechet_classifier_distance_from_activations(activations1, activations2)
791 """
792 return _frechet_classifier_distance_from_activations_helper(
--> 793 activations1, activations2, streaming=False)
794
795
[/tensorflow-1.15.2/python3.7/tensorflow_gan/python/eval/classifier_metrics.py](https://localhost:8080/#) in _frechet_classifier_distance_from_activations_helper(activations1, activations2, streaming)
714 num_examples_real = tf.cast(tf.shape(input=activations1)[0], tf.float64)
715 sigma = (num_examples_real / (num_examples_real - 1) *
--> 716 tfp.stats.covariance(activations1),)
717 # Calculate the unbiased covariance matrix of second activations.
718 num_examples_generated = tf.cast(
[/tensorflow-1.15.2/python3.7/tensorflow_probability/python/stats/sample_stats.py](https://localhost:8080/#) in covariance(x, y, sample_axis, event_axis, keepdims, name)
436 # batch of covariance.
437 event_shape**2,
--> 438 tf.ones([sample_ndims], tf.int32)),
439 0))
440 # Permuting by the argsort inverts the permutation, making
[/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py](https://localhost:8080/#) in ones(shape, dtype, name)
2558 # Create a constant if it won't be very big. Otherwise create a fill op
2559 # to prevent serialized GraphDefs from becoming too large.
-> 2560 output = _constant_if_small(one, shape, dtype, name)
2561 if output is not None:
2562 return output
[/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py](https://localhost:8080/#) in _constant_if_small(value, shape, dtype, name)
2293 def _constant_if_small(value, shape, dtype, name):
2294 try:
-> 2295 if np.prod(shape) < 1000:
2296 return constant(value, shape=shape, dtype=dtype, name=name)
2297 except TypeError:
<__array_function__ internals> in prod(*args, **kwargs)
[/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in prod(a, axis, dtype, out, keepdims, initial, where)
3050 array([1, 2, 6])
3051 >>> a = np.array([[1, 2, 3], [4, 5, 6]])
-> 3052 >>> np.cumprod(a, dtype=float) # specify type of output
3053 array([ 1., 2., 6., 24., 120., 720.])
3054
[/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
84 else:
85 return reduction(axis=axis, out=out, **passkwargs)
---> 86
87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
[/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py](https://localhost:8080/#) in __array__(self)
734 def __array__(self):
735 raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
--> 736 " array.".format(self.name))
737
738 def __len__(self):
NotImplementedError: Cannot convert a symbolic Tensor (covariance/Size_2:0) to a numpy array.
I search and found that the numpy version could be wrong so i again ran with !pip install numpy==1.19.5
but still the error is there.
Thanks for your time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.