mbinkowski / deepspeechdistances Goto Github PK

View Code? Open in Web Editor NEW

127.0 127.0 12.0 2.89 MB

Authors' implementation of DeepSpeech Distances.

License: Apache License 2.0

Python 26.09% Jupyter Notebook 73.91%

deepspeechdistances's People

Contributors

Stargazers

Watchers

Forkers

xzm2004260 entn-at wangfn ahmed-fau giuseppe5 fagan2888 blgnksy tubbz-alt ajitkumar15 davegabe hninlwin-byte

deepspeechdistances's Issues

Evaluation on CPU

Hi,
Thanks for the colab notebook.

The current code to evaluate DeepSpeech Distances seems to require a GPU. Could you add an option for CPU?
For example, even on the collab, when I switch to CPU only I get:

INFO:tensorflow:Restoring parameters from /content/drive/My Drive/DeepSpeechDistances/checkpoint/ds2_large/model.ckpt-54800
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

11 frames
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams}}with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams (defined at /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:1748) with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/training/saver.py in restore(self, sess, save_path)
   1324       # We add a more reasonable error message here to help users (b/110263146)
   1325       raise _wrap_restore_error_with_msg(
-> 1326           err, "a mismatch between the current graph and the graph")
   1327 
   1328   @staticmethod

InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams (defined at /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:1748) with these attrs: [rnn_mode="gru", seed2=598, seed=0, dropout=0, num_params=60, input_mode="linear_input", T=DT_HALF, direction="bidirectional"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[ForwardPass/ds2_encoder/cudnn_gru/cudnn_gru/CudnnRNNCanonicalToParams]]```

I have a question regarding a thesis.

HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS

I read a really good paper.
However, there are questions about the thesis.

The article says that GAN-TTS can make text into speech.
but, text entry section is not described in the model.

Where can I learn text about GAN-TTS?
Is this article only talking about speech synthesis, which turns a wave file into a wave file via GAN?
If that's the case, should I make some of the TTS models like Tacotron?

Can those metrics be used for sample rates other than 24kHz?

Hi, Mikolaj
Thanks for open-sourcing those metrics, and congratulations on ICLR acceptance of your paper!

I wonder if those metrics could be also used for speech with sample rates other than 24kHz. Since 22050Hz is also popular for speech synthesis, I'm thinking of using those metrics by resampling the speech from 22050Hz to 24000Hz using ffmpeg. Will that be okay?

Looking forward for your reply, thanks in advance.
Seungwon

Error in the demo notebook

Hi Mikolaj,

Thanks for creating this useful repo but when I'm following your colab notebook i got stuck with this following error,

The step

reference_path = os.path.join(SAMPLE_PATH, 'ref', '*.wav')
eval_paths = [os.path.join(SAMPLE_PATH, f'noisy_{i+1}', '*.wav') for i
              in range(NUM_NOISE_LEVELS)]

evaluator = audio_distance.AudioDistance(
    load_path=os.path.join(PATH, 'checkpoint', 'ds2_large', 'model.ckpt-54800'),
    meta_path=os.path.join(PATH, 'checkpoint', 'collection-stripped-meta.meta'),
    required_sample_size=NUM_SPLITS * SAMPLES_PER_SPLIT,
    num_splits=NUM_SPLITS)

evaluator.load_real_data(reference_path)

The error

NotImplementedError                       Traceback (most recent call last)
[<ipython-input-5-f43dcce18c53>](https://localhost:8080/#) in <module>()
      7     meta_path=os.path.join(PATH, 'checkpoint', 'collection-stripped-meta.meta'),
      8     required_sample_size=NUM_SPLITS * SAMPLES_PER_SPLIT,
----> 9     num_splits=NUM_SPLITS)
     10 
     11 evaluator.load_real_data(reference_path)

9 frames
[/content/drive/My Drive/DeepSpeechDistances/audio_distance.py](https://localhost:8080/#) in __init__(self, load_path, meta_path, keep_features, required_sample_size, num_splits, do_kdsd, do_conditional_dsds, sample_freq)
     81                  tf.split(self.eval_features, num_splits))
     82 
---> 83     dists = [frechet_dist(ref, ev) for ref, ev in zipped]
     84     self.dists = [(tf.reduce_mean(dists), tf.math.reduce_std(dists))]
     85     if self.do_kdsd:

[/content/drive/My Drive/DeepSpeechDistances/audio_distance.py](https://localhost:8080/#) in <listcomp>(.0)
     81                  tf.split(self.eval_features, num_splits))
     82 
---> 83     dists = [frechet_dist(ref, ev) for ref, ev in zipped]
     84     self.dists = [(tf.reduce_mean(dists), tf.math.reduce_std(dists))]
     85     if self.do_kdsd:

[/tensorflow-1.15.2/python3.7/tensorflow_gan/python/eval/classifier_metrics.py](https://localhost:8080/#) in frechet_classifier_distance_from_activations(activations1, activations2)
    791   """
    792   return _frechet_classifier_distance_from_activations_helper(
--> 793       activations1, activations2, streaming=False)
    794 
    795 

[/tensorflow-1.15.2/python3.7/tensorflow_gan/python/eval/classifier_metrics.py](https://localhost:8080/#) in _frechet_classifier_distance_from_activations_helper(activations1, activations2, streaming)
    714     num_examples_real = tf.cast(tf.shape(input=activations1)[0], tf.float64)
    715     sigma = (num_examples_real / (num_examples_real - 1) *
--> 716              tfp.stats.covariance(activations1),)
    717     # Calculate the unbiased covariance matrix of second activations.
    718     num_examples_generated = tf.cast(

[/tensorflow-1.15.2/python3.7/tensorflow_probability/python/stats/sample_stats.py](https://localhost:8080/#) in covariance(x, y, sample_axis, event_axis, keepdims, name)
    436                 # batch of covariance.
    437                 event_shape**2,
--> 438                 tf.ones([sample_ndims], tf.int32)),
    439             0))
    440     # Permuting by the argsort inverts the permutation, making

[/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py](https://localhost:8080/#) in ones(shape, dtype, name)
   2558         # Create a constant if it won't be very big. Otherwise create a fill op
   2559         # to prevent serialized GraphDefs from becoming too large.
-> 2560         output = _constant_if_small(one, shape, dtype, name)
   2561         if output is not None:
   2562           return output

[/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py](https://localhost:8080/#) in _constant_if_small(value, shape, dtype, name)
   2293 def _constant_if_small(value, shape, dtype, name):
   2294   try:
-> 2295     if np.prod(shape) < 1000:
   2296       return constant(value, shape=shape, dtype=dtype, name=name)
   2297   except TypeError:

<__array_function__ internals> in prod(*args, **kwargs)

[/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in prod(a, axis, dtype, out, keepdims, initial, where)
   3050     array([1, 2, 6])
   3051     >>> a = np.array([[1, 2, 3], [4, 5, 6]])
-> 3052     >>> np.cumprod(a, dtype=float) # specify type of output
   3053     array([   1.,    2.,    6.,   24.,  120.,  720.])
   3054 

[/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84             else:
     85                 return reduction(axis=axis, out=out, **passkwargs)
---> 86 
     87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 

[/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py](https://localhost:8080/#) in __array__(self)
    734   def __array__(self):
    735     raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
--> 736                               " array.".format(self.name))
    737 
    738   def __len__(self):

NotImplementedError: Cannot convert a symbolic Tensor (covariance/Size_2:0) to a numpy array.

I search and found that the numpy version could be wrong so i again ran with !pip install numpy==1.19.5 but still the error is there.
Thanks for your time.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.