Git Product home page Git Product logo

Comments (10)

ninaburns avatar ninaburns commented on May 23, 2024

I am new to style transfer and this really helped me follow along, thank you. Just curious, what was the reason for the 'byte mask fix' in flowtron.py of your fork? Is that necessary to run your style transfer example?

from flowtron.

karkirowle avatar karkirowle commented on May 23, 2024

@ninaburns there is a bit of discussion why that's needed in this issue
Basically, if you want to use the latest PyTorch version you have to do this. This is a preferred way in COLAB, because otherwise you have to install also a previous version of PyTorch and it takes more time. As you have to set up your env in colab each time, this is decisive. The disadvantage of this approach is that I have to point the COLAB to my bug fixed fork and the COLAB might break over time (i.e PyTorch version is not fixed).
I also tried setting up with the previous version, which introduces all sorts of problems with the GPU computation. I think what happened is that less efficient GPU computations hit the hard limit of COLAB.

from flowtron.

ninaburns avatar ninaburns commented on May 23, 2024

Gotcha. I should have seen the issue you mentioned! Thanks for the explanation, it makes sense.

from flowtron.

DamienToomey avatar DamienToomey commented on May 23, 2024

Hi, has anybody managed to do style transfer with the code shared by @karkirowle ?

I have also read the content from issue #9 but I am still struggling.

In particular, I am trying to reproduce the demo 4.4.4 Sampling the Posterior ( Unseen speaker ). I am using the LibriTTS model and using the wav file ravdess_surprised_prior.wav with emotion "surprised" from the demo as style.

I use average_over_time = True and speaker_id = 2092 in the code from @karkirowle.

The generated audio is of good quality but it does not contain the "surprised" emotion which is distinctly heard in the demo.

from flowtron.

karkirowle avatar karkirowle commented on May 23, 2024

Hi @DamienToomey !
If I understand right you are using the LibriTTS model and the (one) surprised wav file as style that is on the NVIDIA demo page that you linked.
Please note that the results will be almost certainly different than in the demo site. In the demo, the Sally TTS model is used, which is not available publicly as far as I know. Also, we don't know the seed, standard deviation, etc. that were used for synthesis.
To get better results my suggestion would be to use more audio files from RAVDESS containing the same style, but from different speakers. This would be certainly more successful, because it averages out the per speaker noise, copying only the surprised style. This way you will get more robust results. If you are not satisfied with the variety of the intonation, you can try changing the random seed and the std. It might be that it still doesn't work after all this. I hope this helps!

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

@karkirowle I sent you an e-mail a few days ago wrt to your article on Flowtron and Style transfer.
Did you receive it?

from flowtron.

karkirowle avatar karkirowle commented on May 23, 2024

@rafaelvalle No, I haven't. Which e-mail address? Did you use the e-mail address in the COLAB or on my blog? I can have a second look or try resending it just in case?

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

@karkirowle I sent and re-sent it to [email protected]
@karkirowle Sent yet another one with a suggestion for experiments.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Please take a look at the link below for a style transfer demo.
https://github.com/NVIDIA/flowtron/blob/master/inference_style_transfer.ipynb

from flowtron.

TotzkePaul avatar TotzkePaul commented on May 23, 2024

I get this error with the inference_style_transfer.ipynb
I'm on windows and using the ljs model


TypeError Traceback (most recent call last)
in ()
8 in_lens = torch.LongTensor([text.shape[1]]).cuda()
9 with torch.no_grad():
---> 10 z = model(mel, sid, text, in_lens, None)[0]
11 z_values.append(z.permute(1, 2, 0))

C:\Users\totzke\miniconda3\envs\flow\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

H:\TTS\flowtron\flowtron.py in forward(self, mel, speaker_vecs, text, in_lens, out_lens)
592 for i, flow in enumerate(self.flows):
593 mel, log_s, gate, attn = flow(
--> 594 mel, encoder_outputs, mask, out_lens)
595 log_s_list.append(log_s)
596 attns_list.append(attn)

C:\Users\totzke\miniconda3\envs\flow\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

H:\TTS\flowtron\flowtron.py in forward(self, mel, text, mask, out_lens)
394 # backwards flow, send padded zeros back to end
395 for k in range(mel.size(1)):
--> 396 mel[:, k] = mel[:, k].roll(out_lens[k].item(), dims=0)
397
398 mel, log_s, gates, attn = self.ar_step(mel, text, mask, out_lens)

TypeError: 'NoneType' object is not subscriptable

from flowtron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.