Comments (10)
I am new to style transfer and this really helped me follow along, thank you. Just curious, what was the reason for the 'byte mask fix' in flowtron.py of your fork? Is that necessary to run your style transfer example?
from flowtron.
@ninaburns there is a bit of discussion why that's needed in this issue
Basically, if you want to use the latest PyTorch version you have to do this. This is a preferred way in COLAB, because otherwise you have to install also a previous version of PyTorch and it takes more time. As you have to set up your env in colab each time, this is decisive. The disadvantage of this approach is that I have to point the COLAB to my bug fixed fork and the COLAB might break over time (i.e PyTorch version is not fixed).
I also tried setting up with the previous version, which introduces all sorts of problems with the GPU computation. I think what happened is that less efficient GPU computations hit the hard limit of COLAB.
from flowtron.
Gotcha. I should have seen the issue you mentioned! Thanks for the explanation, it makes sense.
from flowtron.
Hi, has anybody managed to do style transfer with the code shared by @karkirowle ?
I have also read the content from issue #9 but I am still struggling.
In particular, I am trying to reproduce the demo 4.4.4 Sampling the Posterior ( Unseen speaker ). I am using the LibriTTS model and using the wav file ravdess_surprised_prior.wav with emotion "surprised" from the demo as style.
I use average_over_time = True
and speaker_id = 2092
in the code from @karkirowle.
The generated audio is of good quality but it does not contain the "surprised" emotion which is distinctly heard in the demo.
from flowtron.
Hi @DamienToomey !
If I understand right you are using the LibriTTS model and the (one) surprised wav file as style that is on the NVIDIA demo page that you linked.
Please note that the results will be almost certainly different than in the demo site. In the demo, the Sally TTS model is used, which is not available publicly as far as I know. Also, we don't know the seed, standard deviation, etc. that were used for synthesis.
To get better results my suggestion would be to use more audio files from RAVDESS containing the same style, but from different speakers. This would be certainly more successful, because it averages out the per speaker noise, copying only the surprised style. This way you will get more robust results. If you are not satisfied with the variety of the intonation, you can try changing the random seed and the std. It might be that it still doesn't work after all this. I hope this helps!
from flowtron.
@karkirowle I sent you an e-mail a few days ago wrt to your article on Flowtron and Style transfer.
Did you receive it?
from flowtron.
@rafaelvalle No, I haven't. Which e-mail address? Did you use the e-mail address in the COLAB or on my blog? I can have a second look or try resending it just in case?
from flowtron.
@karkirowle I sent and re-sent it to [email protected]
@karkirowle Sent yet another one with a suggestion for experiments.
from flowtron.
Please take a look at the link below for a style transfer demo.
https://github.com/NVIDIA/flowtron/blob/master/inference_style_transfer.ipynb
from flowtron.
I get this error with the inference_style_transfer.ipynb
I'm on windows and using the ljs model
TypeError Traceback (most recent call last)
in ()
8 in_lens = torch.LongTensor([text.shape[1]]).cuda()
9 with torch.no_grad():
---> 10 z = model(mel, sid, text, in_lens, None)[0]
11 z_values.append(z.permute(1, 2, 0))
C:\Users\totzke\miniconda3\envs\flow\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
H:\TTS\flowtron\flowtron.py in forward(self, mel, speaker_vecs, text, in_lens, out_lens)
592 for i, flow in enumerate(self.flows):
593 mel, log_s, gate, attn = flow(
--> 594 mel, encoder_outputs, mask, out_lens)
595 log_s_list.append(log_s)
596 attns_list.append(attn)
C:\Users\totzke\miniconda3\envs\flow\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
H:\TTS\flowtron\flowtron.py in forward(self, mel, text, mask, out_lens)
394 # backwards flow, send padded zeros back to end
395 for k in range(mel.size(1)):
--> 396 mel[:, k] = mel[:, k].roll(out_lens[k].item(), dims=0)
397
398 mel, log_s, gates, attn = self.ar_step(mel, text, mask, out_lens)
TypeError: 'NoneType' object is not subscriptable
from flowtron.
Related Issues (20)
- Inference starting repeat itself. HOT 5
- List index out of range
- Request for clarification on some of the readme scripts. HOT 8
- Custom model resumed from pre-trained model has a stuttering problem.
- How would one keep the model loaded for immediate synthesis? HOT 17
- Inference on pre-trained model (flowtron_ljs) speaking nonsense. HOT 4
- Inference Demo "Hitting gate limit" HOT 2
- .
- inference speed on CPU
- Accelerated inference with TensorRT HOT 2
- Single word input leads to ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) HOT 1
- Error on loading training model "_pickle.UnpicklingError: invalid load key, '<'"
- Custom trained model and dataset problem
- Index out of range for custom dataset.
- value error while training custom dataset
- TypeError: guvectorize() missing 1 required positional argument 'signature' HOT 1
- _pickle.UnpicklingError: invalid load key, '<'. in inference.py in colab HOT 3
- What's the filelist used to train LibriTTS2k pretrained embedding?
- Unable to train on custom data with multiple speakers HOT 6
- Which torch version to use?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flowtron.