Comments (9)
HI @DiDimus , please try to specify an index of GPUs if you have several of in your computer by CUDA_VISIBLE_DEVICES=0
for the first GPU as an example.
from stylespeech.
Thanks. but result is exactly the same. I think the problem is in the software environment. Do you have docker for this project? Wich OS do you use?
from stylespeech.
Gotcha, you can refer to this:
https://github.com/keonlee9420/Daft-Exprt/blob/main/Dockerfile
I think the Dockerfile should also work for this project. Please try it out and let me know the result.
from stylespeech.
This is obviously project code error with predicted tensor size:
With duration_control = 0.3 here RuntimeError: The size of tensor a (25) must match the size of tensor b (31)
x shape is torch.Size([1, 31, 256]) ; mask shape is torch.Size([1, 25])
Right value is 31 (104*0.3)
With duration_control = 0.5 here RuntimeError: The size of tensor a (47) must match the size of tensor b (52)
x shape is torch.Size([1, 52, 256]) ; mask shape is torch.Size([1, 47])
Right value is 52 (104*0.5)
With duration_control = 1.0 all OK
x shape is torch.Size([1, 104, 256]) ; mask shape is torch.Size([1, 104])
With duration_control = 2.0 all OK
x shape is torch.Size([1, 208, 256]) ; mask shape is torch.Size([1, 208])
from stylespeech.
yes, @Vadim2S . Problem found, thanks. Docker from Daft-Export didn't help :(
from stylespeech.
hey guys, I just found that you had issue with the control value lower than 1. sorry for the late correction, and thanks to @Vadim2S , I can confirm that there is an error in current code. I'll fix it and push soon. thank you all for the report!
from stylespeech.
Temporal workaround:
/model/modules.py #177 class LengthRegulator(nn.Module):
change
if max_len is not None:
output = pad(output, max_len)
else:
output = pad(output)
to:
if max_len is not None:
output = pad(output, max_len)
#VVS
mel_len.clear()
mel_len.append(output.shape[1])
else:
output = pad(output)
P.S. Duration prediction is real and in LengthRegulator.expand you do
for i, vec in enumerate(batch):
expand_size = predicted[i].item()
out.append(vec.expand(max(int(expand_size), 0), -1))
out = torch.cat(out, 0)
of course, you get out smaller than max_len due rounding. I am presume you must extend out to max_size later
from stylespeech.
I fixed the code and it's working now. The problem was originated from the value of max_len
at inference time in VarianceAdaptor
, where it should be 'None' but that of a reference audio was wrongly passed.
from stylespeech.
Thanks! Tested. Low duration work OK!
from stylespeech.
Related Issues (15)
- VCTK datasets HOT 7
- Maybe style_prototype can instead of ref_melοΌ HOT 3
- time dimension doesn't match HOT 24
- RuntimeError: Error(s) in loading state_dict for Stylespeech HOT 1
- UnboundLocalError: local variable 'pitch' referenced before assignment
- How can I improve the synthesized results? HOT 1
- please share the samples of it HOT 1
- What is the perfermance compared with Adaspeech HOT 10
- architecture shows bad results HOT 2
- There was a problem when starting training HOT 2
- the synthesis result is bad when using pretrain model HOT 4
- losses_2 here is referenced before assignment while step < meta_learning_warmup HOT 1
- training error HOT 6
- Pretrained Vocoder HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stylespeech.