r9y9 / deepvoice3_pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Home Page: https://r9y9.github.io/deepvoice3_pytorch/
License: Other
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Home Page: https://r9y9.github.io/deepvoice3_pytorch/
License: Other
Hello. I am currently trying to train VCTK model on deepvoice 3 multispeaker model.
While it seems that it works okay, sometimes the training crashes with the following error.
2734it [13:58, 3.26it/s]Traceback (most recent call last):
File "train.py", line 957, in <module>
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 585, in train
in tqdm(enumerate(data_loader)):
File "H:\envs\pytorch\lib\site-packages\tqdm\_tqdm.py", line 959, in __iter__
for obj in iterable:
File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
return self._process_next_batch(batch)
File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "H:\Tensorflow_Study\git\deepvoice3_pytorch\train.py", line 329, in collate_fn
dtype=np.float32)
MemoryError
Forcing garbage collection sporadically(using gc.collect()) doesn't help the issue.
Currently, I have 16 GB of RAM with 48 GB of virtual memory available on my SSD (just in case).
(Using Windows 10 with PyTorch 0.3.1 (with CUDA 8.0, GTX1060 6GB))
Also, I do observe that in Resource Monitor, the memory usage in Commit(KB) and Working Set(KB) is significantly different, as shown below. (Sorry for the non-english)
Thank you for creating such wonderful implementation!
:)
@r9y9 To my understanding, x = x * (s * math.sqrt(1.0 / s))
== x = x * (math.sqrt(s))
. Is this right?
If it is right, why we need to multiply x
with math.sqrt(s)
instead of divided by math.sqrt(s)
?
hi,ryuuiti. Could you share the korean single speaker data? I met difficulties when trying to download the data from the link you provided.
Hi again,
I trained single Korean speaker successfully and moving to multiple Korean speaker. Again, I encountered such Assertion error as shown below. I tracked down and looks like self.encoder
in AttentionSeq2Seq class gave such error messages. Could you let me know where the following self.encoder function is defined so that I can look into further? max_position doesn't work this time.
encoder_outputs = self.encoder(
text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)
Thanks in advance,
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch2]$ CUDA_VISIBLE_DEVICES=2 python train.py --data-root=./data/nikl_m/ --hparams="frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker" --checkpoint-dir checkpoint_nikl_m
Command line args:
{'--checkpoint': None,
'--checkpoint-dir': 'checkpoint_nikl_m',
'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--data-root': './data/nikl_m/',
'--help': False,
'--hparams': 'frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker',
'--load-embedding': None,
'--log-event-path': None,
'--reset-optimizer': False,
'--restore-parts': None,
'--speaker-id': None,
'--train-postnet-only': False,
'--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
adam_beta1: 0.5
adam_beta2: 0.9
adam_eps: 1e-06
allow_clipping_in_normalization: False
batch_size: 16
binary_divergence_weight: 0.1
builder: deepvoice3_multispeaker
checkpoint_interval: 10000
clip_thresh: 0.1
converter_channels: 256
decoder_channels: 256
downsample_step: 4
dropout: 0.050000000000000044
embedding_weight_std: 0.1
encoder_channels: 256
eval_interval: 10000
fft_size: 1024
fmax: 7600
fmin: 125
force_monotonic_attention: True
freeze_embedding: False
frontend: ko
guided_attention_sigma: 0.2
hop_size: 256
initial_learning_rate: 0.0005
kernel_size: 3
key_position_rate: 1.385
key_projection: False
lr_schedule: noam_learning_rate_decay
lr_schedule_kwargs: {}
masked_loss_weight: 0.5
max_positions: 512
min_level_db: -100
n_speakers: 1
name: deepvoice3
nepochs: 10000
num_mels: 80
num_workers: 2
outputs_per_step: 1
padding_idx: 0
pin_memory: True
power: 1.4
preemphasis: 0.97
preset: deepvoice3_niklm
presets: {'deepvoice3_niklm': {'n_speakers': 119, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 3000, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 600, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'encoder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
priority_freq: 3000
priority_freq_weight: 0.0
query_position_rate: 1.0
ref_level_db: 20
replace_pronunciation_prob: 0.5
rescaling: False
rescaling_max: 0.999
sample_rate: 22050
save_optimizer_state: True
speaker_embed_dim: 16
speaker_embedding_weight_std: 0.01
text_embed_dim: 256
trainable_positional_encodings: False
use_decoder_state_for_postnet_input: True
use_guided_attention: True
use_memory_mask: True
value_projection: False
weight_decay: 0.0
window_ahead: 3
window_backward: 1
Override hyper parameters with preset "deepvoice3_niklm": {
"n_speakers": 119,
"speaker_embed_dim": 16,
"downsample_step": 4,
"outputs_per_step": 1,
"embedding_weight_std": 0.1,
"speaker_embedding_weight_std": 0.05,
"dropout": 0.050000000000000044,
"kernel_size": 3,
"text_embed_dim": 256,
"encoder_channels": 512,
"decoder_channels": 256,
"converter_channels": 256,
"use_guided_attention": true,
"guided_attention_sigma": 0.4,
"binary_divergence_weight": 0.1,
"use_decoder_state_for_postnet_input": true,
"max_positions": 3000,
"query_position_rate": 2.0,
"key_position_rate": 7.6,
"key_projection": true,
"value_projection": true,
"clip_thresh": 0.1,
"initial_learning_rate": 0.0005
}
0it [00:00, ?it/s]
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c line=70 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 967, in <module>
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 661, in train
input_lengths=input_lengths)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 80, in forward
text_positions, frame_positions, input_lengths)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 117, in forward
print(text_sequences)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 119, in __repr__
return 'Variable containing:' + self.data.__repr__()
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 133, in __repr__
return str(self)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 140, in __str__
return _tensor_str._str(self)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 297, in _str
strt = _matrix_str(self)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 216, in _matrix_str
min_sz=5 if not print_full_mat else 0)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 79, in _number_format
tensor = torch.DoubleTensor(tensor.size()).copy_(tensor).abs_().view(tensor.nelement())
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c:70
Sorry if this is off-topic (deepvoice vs tacotron) but it seems like the tacotron 2 paper is now released.
The speech samples sounds better than ever (I think):
https://google.github.io/tacotron/publications/tacotron2/index.html
I must admit that I'm not too well versed in how much this differs from the original tacotron. But perhaps the changes made also could be used in your projects?
Ran the following command on downloaded LJSpeech dataset:
python3 preprocess.py ljspeech ~/data/LJSpeech-1.0/ ./data/ljspeech
No preprocessed data was generated and instead got an error:
NameError: name 'hparams' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preprocess.py", line 47, in
preprocess(mod, in_dir, out_dir, num_workers)
File "preprocess.py", line 21, in preprocess
metadata = mod.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
File "/home/coglac/Documents/deepvoice3_pytorch/ljspeech.py", line 34, in build_from_path
return [future.result() for future in tqdm(futures)]
File "/home/coglac/Documents/deepvoice3_pytorch/ljspeech.py", line 34, in
return [future.result() for future in tqdm(futures)]
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
NameError: name 'hparams' is not defined
Hi r9y9,
For TTS area, I am a newcomer. Apart from MOS, do you know what else can be used to evaluate the quality of synthesized speech? Looking forward your kind reply. Thanks~
Hi r9y9, first I just want to say that your repos are great and I have personally learned a lot from them. So a big thanks to you.
So I too have been trying to replicate the results of the big TTS papers. However the main thing that is frustrating me is the lack of a high quality TTS dataset (although 50 gpus would help too!).
I just wanted to throw this idea out there - what if random people on the internet interested in TTS/ML collaborated to create a good dataset? If enough people joined in (20+) the segmentation and labelling work should only be a couple of hours per person.
Here is a list of the options that occurred to me (and I by no means consider this list complete):
1 - Find a 20+ hour high-quality, open-source audiobook online. Given how massive the internet is - surely there is a possibility of a hi-fi audiobook that isn't poorly recorded, overly-compressed or too 'performed'. Working together, scouring the internet... who knows - a gem might be out there.
2 - Podcasts - there's an endless supply of these. But podcasts bring their own unique difficulties - e.g., were different eq/compression/mic/mastering used by the sound engineer across different episodes? Again, with enough searching, a candidate with consistent sound-quality may reveal itself.
3 - Commercial Audiobooks - this would unfortunately render the whole dataset closed-sourced and for personal research only. However I don't see how there would be any problems if all collaborators purchased the audiobook and didn't redistribute the dataset beyond the initial group of collaborators.
4 - Crowdfunding it - probably the least realistic option. Still though, if enough people were interested, 100 or so, then it might be possible. One studio, one sound engineer, one professional reader and someone to oversee the project for a week or two weeks max? Would $10,000 cover it? $20,000? I'm no expert in studio time and sound-engineering rates etc so I can't say for certain.
So to wrap this up - I just wanted to put this idea out there. I'm very curious what you, or any others reading this, think - even if you feel it's unrealistic. I know buying 50 gpus is unfeasible for most of us - but working together to solve the dataset problem? Personally, I'm optimistic.
Hi,
Is GPU a must requirement on the machine? I tried tp run the synthesis.py on the pre-trained model, but got the following error.
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
Data: https://keithito.com/LJ-Speech-Dataset/
Data: VCTK
I'd like to train this model on 8 V100 GPUs - does it support multi GPU training?
I downloaded pretrained models and upon running any of them I receive the following error:
My pytorch version is: 0.3.0.post4
RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "synthesis.py", line 125, in
model.load_state_dict(checkpoint["state_dict"])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 487, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named seq2seq.encoder.embed_tokens.weight, whose dimensions in the model are torch.Size([149, 128]) and whose dimensions in the checkpoint are torch.Size([149, 256]).
python3 preprocess.py ljspeech ./data/LJSpeech-1.0/ ./data/ljspeech
0%| | 0/13100 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.5/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 57, in _process_utterance
spectrogram = audio.spectrogram(wav).astype(np.float32)
File "/data1/demobin/deepvoice3_pytorch/audio.py", line 32, in spectrogram
D = _lws_processor().stft(preemphasis(y)).T
File "/data1/demobin/deepvoice3_pytorch/audio.py", line 53, in _lws_processor
return lws.lws(hparams.fft_size, hparams.hop_size, mode="speech")
File "lws.pyx", line 357, in lws.lws.__init__ (lws.bycython.cpp:15047)
TypeError: unorderable types: NoneType() > int()
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preprocess.py", line 55, in <module>
preprocess_ljspeech(in_dir, out_dir, num_workers)
File "preprocess.py", line 21, in preprocess_ljspeech
metadata = ljspeech.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 34, in build_from_path
return [future.result() for future in tqdm(futures)]
File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 34, in <listcomp>
return [future.result() for future in tqdm(futures)]
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
TypeError: unorderable types: NoneType() > int()
I got following error when i try to train model. Did this due to i have some speech with very long (such as 30 seconds) and bring issue ?
======
Los event path: ./log/aclclp
^M0it [00:00, ?it/s]
Traceback (most recent call last):
File "train.py", line 950, in
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 685, in train
priority_w=hparams.priority_freq_weight)
File "train.py", line 510, in spec_loss
l1_loss = w * masked_l1(y_hat, y, mask=mask) + (1 - w) * l1(y_hat, y)
File "/home/chester/hdd22t/virtualenv/deepvoice3-pytorch-r9y9/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "train.py", line 280, in forward
loss = self.criterion(input * mask_, target * mask_)
RuntimeError: The size of tensor a (1025) must match the size of tensor b (513) at non-singleton dimension 2
I found that , Tacotron will ignore some words in a long sentence's( a sentence with 30 words etc.) synthesis. Does Deep Voice 3 has that problem?
Greetings!
I have successfully preprocessed LJSpeech dataset and trained model for a while with preset hyperparameters:
python3 train.py --data-root=./data/ljspeech \
--hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"
But when trying to generate audio from text:
python3 synthesis.py ./checkpoints/checkpoint_step000270000.pth ./text_list.txt ./generated \
--hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"
I'm getting the error:
*** Error in `python3': free(): invalid next size (fast): 0x000000000db7b050 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f1138cbd7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f1138cc637a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f1138cca53c]
/usr/local/cuda-8.0/lib64/libcudnn.so.6(cudnnDestroyConvolutionDescriptor+0x9)[0x7f10e47eac69]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(+0x2dedf7)[0x7f10cc728df7]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(_ZN5torch5cudnn30cudnn_convolution_full_forwardEP8THCStateP12cudnnContext15cudnnDataType_tPNS_12THVoidTensorES7_S7_S7_St6vectorIiSaIiEESA_SA_ibb+0x6a4)[0x7f10cd5f9ee4]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(_ZN5torch8autograd11ConvForward5applyERKSt6vectorINS0_8VariableESaIS3_EE+0x1192)[0x7f10cc9694a2]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(+0x40d26e)[0x7f10cc85726e]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3[0x540199]
python3(PyEval_EvalFrameEx+0x50b2)[0x53bd92]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebd23]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebd23]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
...
....
After debugging I managed to find out that problem appears in this loop at first iteration(deepvoice3.py, line 90):
for f in self.convolutions:
x = f(x, speaker_embed_btc) if isinstance(f, Conv1dGLU) else f(x)
but still can't solve it.
I tried using Python 3.5.2 and 3.6.3 with tensorflow 1.3.0 and torch 0.3.1 (also tried 0.3.0.post4)
CUDA version is 8.0, GPU: Titan X
Any help would be appreciated.
I got fatal error when testing synthesis.py. Could you help?
python3.5 synthesis.py --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" /home/ml/deepvoice3_pytorch/models/20171213_deepvoice3_checkpoint_step000210000.pth ./text_list.txt ./output/
python3.5 synthesis.py --hparams="uilder=nyanko,preset=nyanko_ljspeech" "/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth" "/home/ml/deepvoice3_pytorch/text_list.txt" "/home/ml/deepvoice3_pytorch/output"
Command line args:
{'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--file-name-suffix': '',
'--help': False,
'--hparams': 'uilder=nyanko,preset=nyanko_ljspeech',
'--max-decoder-steps': '500',
'--output-html': False,
'--replace_pronunciation_prob': '0.0',
'--speaker_id': None,
'': '/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth',
'<dst_dir>': '/home/ml/deepvoice3_pytorch/output',
'<text_list_file>': '/home/ml/deepvoice3_pytorch/text_list.txt'}
Traceback (most recent call last):
File "synthesis.py", line 98, in
hparams.parse(args["--hparams"])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/training/python/training/hparam.py", line 472, in parse
values_map = parse_values(values, type_map)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/training/python/training/hparam.py", line 206, in parse_values
raise ValueError('Unknown hyperparameter type for %s' % name)
ValueError: Unknown hyperparameter type for uilder
ml@tesla1a:~/deepvoice3_pytorch$ python3.5 synthesis.py --hparams="uilder=nyanko,preset=nyanko_ljspeech" "/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth" "/home/ml/deepvoice3_pytorch/text_list.txt" "/home/ml/deepvoice3_pytorch/output"
Traceback (most recent call last):
File "synthesis.py", line 26, in
import torch
File "/usr/local/lib/python3.5/dist-packages/torch/init.py", line 56, in
from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS
Hi,
After successful (1) installation of all prerequisites; and (2) pre-processing.
Starting the training phase with:
python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./data/ljspeech/
continues with a report of input parameters and eventually hangs on:
0it [00:00, ?it/s].
the command watch -n 1 nvidia-smi reports the VRAM usage with 499M range with no activity on GPU
python3 train.py --data-root=./data/ljspeech --checkpoint-dir=checkpoints_nyanko --hparams="use_preset=True,builder=nyanko" --log-event-path=log/nyanko_preset
Command line args:
{'--checkpoint': None,
'--checkpoint-dir': 'checkpoints_nyanko',
'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--data-root': './data/ljspeech',
'--help': False,
'--hparams': 'use_preset=True,builder=nyanko',
'--log-event-path': 'log/nyanko_preset',
'--reset-optimizer': False,
'--train-postnet-only': False,
'--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
adam_beta1: 0.5
adam_beta2: 0.9
adam_eps: 1e-06
batch_size: 16
binary_divergence_weight: 0.1
builder: nyanko
checkpoint_interval: 5000
clip_thresh: 0.1
converter_channels: 256
decoder_channels: 256
downsample_step: 4
dropout: 0.050000000000000044
encoder_channels: 256
fft_size: 1024
force_monotonic_attention: True
frontend: en
guided_attention_sigma: 0.2
hop_size: 256
initial_learning_rate: 0.0005
kernel_size: 3
key_position_rate: 1.385
lr_schedule: noam_learning_rate_decay
lr_schedule_kwargs: {}
masked_loss_weight: 0.0
max_positions: 512
min_level_db: -100
name: deepvoice3
nepochs: 2000
num_mels: 80
num_workers: 2
outputs_per_step: 1
padding_idx: 0
pin_memory: True
power: 1.4
preemphasis: 0.97
presets: {'nyanko': {'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'outputs_per_step': 1, 'text_embed_dim': 128, 'initial_learning_rate': 0.0005, 'binary_divergence_weight': 0.1, 'kernel_size': 3, 'downsample_step': 4, 'decoder_channels': 256, 'dropout': 0.050000000000000044, 'clip_thresh': 0.1, 'encoder_channels': 256, 'converter_channels': 256, 'use_decoder_state_for_postnet_input': True}, 'deepvoice3': {'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'outputs_per_step': 4, 'text_embed_dim': 256, 'initial_learning_rate': 0.001, 'binary_divergence_weight': 0.0, 'kernel_size': 7, 'downsample_step': 1, 'decoder_channels': 256, 'dropout': 0.050000000000000044, 'clip_thresh': 1.0, 'encoder_channels': 256, 'converter_channels': 256, 'use_decoder_state_for_postnet_input': True}, 'latest': {}}
priority_freq: 3000
priority_freq_weight: 0.0
query_position_rate: 1.0
ref_level_db: 20
replace_pronunciation_prob: 0.5
sample_rate: 22050
text_embed_dim: 128
trainable_positional_encodings: False
use_decoder_state_for_postnet_input: True
use_guided_attention: True
use_memory_mask: True
use_preset: True
weight_decay: 0.0
Override hyper parameters with preset "nyanko": {
"use_guided_attention": true,
"guided_attention_sigma": 0.2,
"outputs_per_step": 1,
"text_embed_dim": 128,
"initial_learning_rate": 0.0005,
"binary_divergence_weight": 0.1,
"kernel_size": 3,
"downsample_step": 4,
"decoder_channels": 256,
"dropout": 0.050000000000000044,
"clip_thresh": 0.1,
"encoder_channels": 256,
"converter_channels": 256,
"use_decoder_state_for_postnet_input": true
}
Los event path: log/nyanko_preset
0it [00:00, ?it/s]Traceback (most recent call last):
File "train.py", line 777, in <module>
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 466, in train
in tqdm(enumerate(data_loader)):
File "/usr/local/lib/python3.5/dist-packages/tqdm/_tqdm.py", line 816, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 201, in __next__
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
return batch.pin_memory()
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 82, in pin_memory
return type(self)().set_(storage.pin_memory()).view_as(self)
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 198, in view_as
return self.view(tensor.size())
RuntimeError: invalid argument 2: size '[16 x 126]' is invalid for input of with 126 elements at /home/demobin/github/pytorch/torch/lib/TH/THStorage.c:41
Hi,
I am new to pytorch and following the example of jsut here. And I encountered the following assertion error which is hard for me to look in further. Could anyone help me out?
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python -V
Python 3.5.4 :: Anaconda custom (64-bit)
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls /home/kwon/copora/jsut_ver1.1
basic5000 ChangeLog.txt countersuffix26 LICENCE.txt loanword128 onomatopee300 precedent130 README_en.txt README_ja.txt repeat500 travel1000 utparaphrase512 voiceactress100
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python preprocess.py jsut /home/kwon/copora/jsut_ver1.1 ./data/jsut
0%| | 0/7696 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 52, in _process_utterance
mel_spectrogram = audio.melspectrogram(wav).astype(np.float32)
File "/home/kwon/3rdParty/deepvoice3_pytorch/audio.py", line 50, in melspectrogram
assert S.max() <= 0 and S.min() - hparams.min_level_db >= 0
AssertionError
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preprocess.py", line 47, in <module>
preprocess(mod, in_dir, out_dir, num_workers)
File "preprocess.py", line 21, in preprocess
metadata = mod.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in build_from_path
return [future.result() for future in tqdm(futures)]
File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in <listcomp>
return [future.result() for future in tqdm(futures)]
File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
AssertionError
Hi,
When you train on the JSUT corpus, did you use the original Japanese script? What I'm curious is Chinese characters are not phonetic, so I doubt if the network can learn with them. I thought they need to be converted into phonetic transcription (romaji).
Thanks for your excellent implementation of Deep Voice 3. I am attempting to retrain a DeepVoice3 model using the LJSpeech data. My interest in training a new model is that I want to make some small model parameter changes in order to enable fine-tuning using some Spanish data that I have.
As a first step I tried to retrain the baseline model and I have run into some issues.
With my installation, I have been able to successfully synthesize using the pre-trained DeepVoice3 model with git commit 4357976 as your instructions indicate. That synthesized audio sounds very much like the samples linked from the instructions page.
However, I am trying to train now with the latest git commit (commit 48d1014, dated Feb 7). I am using the LJSpeech data set downloaded from the link you provided. I have run the pre-processing and training steps as indicated in your instructions. I am using the default preset parameters for deepvoice3_ljspeech.
I have let the training process run for a while. When I synthesize using the checkpoint saved at 210K iterations, the alignment is bad and the audio is very robotic and mostly unintelligible.
When I synthesize using the checkpoint saved at 700K iterations, the alignment is better (but not great); the audio is improved but still robotic and choppy.
I can post the synthesized wav files via dropbox if you are interested. I expected to have good alignment and audio at 210K iterations as that is what the pretrained model used.
Any ideas what has changed between git commits 4357976 and 48d1014 that could have caused this issue? When I diff the two commits, I see some changes in audio.py, some places where support for multi-voice has been added, and some other changes I do not yet understand. There are some additions to hparams.py, but I only noticed one difference: in the current commit, masked_loss_weight defaults to 0.5, but in the prior commit the default was 0.0.
I have just started a new training run with masked_loss_weight set to 0.0. In the meantime, do you have thoughts on anything else that might be causing the issues I am seeing?
When I used
x = F.relu(self.fc1(x), inplace=True)
cuda will out of memory?
So, I set the $inplace=False and solved the problem!
x = F.relu(self.fc1(x), inplace=False)
For easy test, is there any way to add a demo_server similar to the one (demo_server.py) here https://github.com/keithito/tacotron ?
Great works! Thanks a lot.
Deep Voice3で、下記のエラーが出ます。
collected_files = self.file_data_source.collect_files()
File "train.py", line 126, in collect_files
assert len(l) == 4 or len(l) == 5
AssertionError
テキストの書き方が間違っているのでしょうか。データはJSUTです
Hi, thanks for the fantastic DeepVoice3 implementation!
When trying to train Nyanko model starting from your pre-trained checkpoint using the following args:
--hparams="builder=nyanko,preset=nyanko_ljspeech"
--checkpoint=checkpoints.pretrained/20171129_nyanko_checkpoint_step000585000.pth
I'm getting the error:
Load checkpoint from: checkpoints.pretrained/20171129_nyanko_checkpoint_step000585000.pth
Traceback (most recent call last):
File "train.py", line 936, in <module>
load_checkpoint(checkpoint_path, model, optimizer, reset_optimizer)
File "train.py", line 820, in load_checkpoint
model.load_state_dict(checkpoint["state_dict"])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 490, in load_state_dict
.format(name))
KeyError: 'unexpected key "seq2seq.decoder.attention.in_projection.bias" in state_dict'
Looks like in_projection
is missing from AttentionLayer
implementation in deepvoice3_pytorch/deepvoice3.py but still in the Nyanko pre-trained model https://github.com/r9y9/deepvoice3_pytorch#pretrained-models
Hi @r9y9, I'm training on German audio. I have added the german characters (Ä, Ö, Ü, ß, ä, ö, ü) to the symbolset and am using basic_cleaners.
The problem is the alignment on test-audio. Look at some of the samples. And, of course, the audio is horrible too. I have tested with up to 500k steps. Always the same results. When I generate audio with synthesis, I have similar results. Any hints where I'd need to add more info?
Thanks for any recommendations... (I converted the German training data to ljspeech format...)
Hi,
I have tried to train the lj speech model with latest mater and it gives me error like this, with
num_workers = 2
It looks like _frontend for worker processes didn`t got assigned. I tried injecting _frontend object to the TextDataSource. But It failed. Is there a fix for this ?
When I set the num_workers = 0 , it is training ok.
After quick google search it tells me that when num_workers = 0 it will do all the work in main thread.
My question is, will it slow down my training process significantly ?
Any plan for WORLD vocoder for Multi-Speaker TTS
Hi @r9y9 you mention that aligning VCTK with gentle does not work, can you tell what is happening? is it the quality of the alignment, and how did you see it?
Hi r9y9,
Thanks for the amazing library here. I'm only beginning to learn ML, and love what this can do! Ultimately trying to create what lyrebird.ai has been doing. Managed to finally setup it all up and started training single speaker with the ljspeech.
However i'm experiencing same training speed of ~3s/it between my dekstop specs below and my MBP (2.5Ghz, i8, 4 Cores). is there a way I can speed things up? I know I don't have the ideal AI training hardware specs, but kinda looking forward to the results.
*Both setup has all CPU cores running at 100%
OS: Ubuntu 16.04.4
CPU: i7-7820X (8 CORE)
GPU: 2x 1080 Ti
Hi @r9y9 ,
Thanks for working on this project. I trained model with param -hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"
with latest commit. However, when I synthesis speech, i get following errors:
python synthesis.py --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" checkpoints_deepvoice3/checkpoint_step000630000.pth test.txt samples
Command line args:
{'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--file-name-suffix': '',
'--help': False,
'--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
'--max-decoder-steps': '500',
'--output-html': False,
'--replace_pronunciation_prob': '0.0',
'--speaker_id': None,
'<checkpoint>': 'checkpoints_deepvoice3/checkpoint_step000630000.pth',
'<dst_dir>': 'samples',
'<text_list_file>': 'test.txt'}
Override hyper parameters with preset "deepvoice3_ljspeech": {
"n_speakers": 1,
"downsample_step": 4,
"outputs_per_step": 1,
"embedding_weight_std": 0.1,
"dropout": 0.050000000000000044,
"kernel_size": 3,
"text_embed_dim": 256,
"encoder_channels": 512,
"decoder_channels": 256,
"converter_channels": 256,
"use_guided_attention": true,
"guided_attention_sigma": 0.2,
"binary_divergence_weight": 0.1,
"use_decoder_state_for_postnet_input": true,
"max_positions": 512,
"query_position_rate": 1.0,
"key_position_rate": 1.385,
"key_projection": true,
"value_projection": true,
"clip_thresh": 0.1,
"initial_learning_rate": 0.0005
}
*** Error in `python': free(): invalid next size (fast): 0x0000000004da9360 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fcd2c2417e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fcd2c24a37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fcd2c24e53c]
/home/fatman/anaconda2/envs/dev3/bin/../lib/libcudnn.so.6(cudnnDestroyConvolutionDescriptor+0x9)[0x7fccdeb64c69]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(+0x2dedf7)[0x7fccb75acdf7]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(_ZN5torch5cudnn30cudnn_convolution_full_forwardEP8THCStateP12cudnnContext15cudnnDataType_tPNS_12THVoidTensorES7_S7_S7_St6vectorIiSaIiEESA_SA_ibb+0x6a4)[0x7fccb847dee4]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(_ZN5torch8autograd11ConvForward5applyERKSt6vectorINS0_8VariableESaIS3_EE+0x1192)[0x7fccb77ed4a2]
Detail logs are here.
The text file contains only single line:
Generative adversarial network or variational auto-encoder.
Thanks.
Hi there,
I changed hparams.py to
fft_size=2052, # default 1024
hop_size=114, # fedault 256
And I get un-audible result!
What should I do, if I want to increase the fft_size & reduce hop_size? What did I do wrong?
Thanks a lot for any help!
Hi again,
I am applying this repository for Korean speech corpus (http://www.korean.go.kr/front/board/boardStandardView.do?board_id=4&mn_id=17&b_seq=464) and have encountered the following error. Could you have a look at it? I will be happy to ask PR once it gets working.
I formatted Korean corpus into npy as same as ljspeech has as single speaker and ran training with single GPU or multipe GPU. But it shows a series of error messages like Assertion srcIndex < srcSelectDimSize
failed.
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | head -3
nikl-mel-00001.npy
nikl-mel-00002.npy
nikl-mel-00003.npy
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | tail -3
nikl-spec-00929.npy
nikl-spec-00930.npy
train.txt
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl/*.npy | wc -l
1860
CUDA_VISIBLE_DEVICES=3 python train.py \
--data-root=./data/nikl/ \
--hparams="frontend=jp,builder=deepvoice3,preset=deepvoice3_ljspeech" \
--checkpoint-dir checkpoint_nikl
Command line args:
{'--checkpoint': None,
'--checkpoint-dir': 'checkpoint_nikl',
'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--data-root': './data/nikl/',
'--help': False,
'--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
'--load-embedding': None,
'--log-event-path': None,
'--reset-optimizer': False,
'--restore-parts': None,
'--speaker-id': None,
'--train-postnet-only': False,
'--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
adam_beta1: 0.5
adam_beta2: 0.9
adam_eps: 1e-06
allow_clipping_in_normalization: True
batch_size: 16
binary_divergence_weight: 0.1
builder: deepvoice3
checkpoint_interval: 10000
clip_thresh: 0.1
converter_channels: 256
decoder_channels: 256
downsample_step: 4
dropout: 0.050000000000000044
embedding_weight_std: 0.1
encoder_channels: 256
eval_interval: 10000
fft_size: 1024
force_monotonic_attention: True
freeze_embedding: False
frontend: en
guided_attention_sigma: 0.2
hop_size: 256
initial_learning_rate: 0.0005
kernel_size: 3
key_position_rate: 1.385
key_projection: False
lr_schedule: noam_learning_rate_decay
lr_schedule_kwargs: {}
masked_loss_weight: 0.5
max_positions: 512
min_level_db: -100
n_speakers: 1
name: deepvoice3
nepochs: 2000
num_mels: 80
num_workers: 2
outputs_per_step: 1
padding_idx: 0
pin_memory: True
power: 1.4
preemphasis: 0.97
preset: deepvoice3_ljspeech
presets: {'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'enc
oder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input':
True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk
': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size
': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_
decoder_state_for_postnet_input': True, 'max_positions': 1024, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_
rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'en
coder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input'
: True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
priority_freq: 3000
priority_freq_weight: 0.0
query_position_rate: 1.0
ref_level_db: 20
replace_pronunciation_prob: 0.5
sample_rate: 22050
save_optimizer_state: True
speaker_embed_dim: 16
speaker_embedding_weight_std: 0.01
text_embed_dim: 128
trainable_positional_encodings: False
use_decoder_state_for_postnet_input: True
use_guided_attention: True
use_memory_mask: True
value_projection: False
weight_decay: 0.0
window_ahead: 3
window_backward: 1
Override hyper parameters with preset "deepvoice3_ljspeech": {
"n_speakers": 1,
"downsample_step": 4,
"outputs_per_step": 1,
"embedding_weight_std": 0.1,
"dropout": 0.050000000000000044,
"kernel_size": 3,
"text_embed_dim": 256,
"encoder_channels": 512,
"decoder_channels": 256,
"converter_channels": 256,
"use_guided_attention": true,
"guided_attention_sigma": 0.2,
"binary_divergence_weight": 0.1,
"use_decoder_state_for_postnet_input": true,
"max_positions": 512,
"query_position_rate": 1.0,
"key_position_rate": 1.385,
"key_projection": true,
"value_projection": true,
"clip_thresh": 0.1,
"initial_learning_rate": 0.0005
}
Los event path: log/run-test2018-01-30_15:05:32.238606
34it [00:08, 4.24it/s]
7it/s]/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, i
nt, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 941, in <module>
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 642, in train
input_lengths=input_lengths)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/__init__.py", line 94, in forward
linear_outputs = self.postnet(postnet_inputs, speaker_embed)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/deepvoice3.py", line 597, in forward
return F.sigmoid(x)
File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 817, in sigmoid
return input.sigmoid()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58
Noticed while working on #21.
Trained 300k steps, but the model was not generalized well. Need to figure out how we can improve.
Does synthesis.py implement the algorithm mentioned in "appendix B - OPTIMIZING DEEP VOICE 3 FOR DEPLOYMENT" of the original paper?
https://arxiv.org/abs/1710.07654
Hi, I can not find the shift by one operation on the decoder input data ( mel ). Is this a bug?
position_enc = np.array([
[position_rate * pos / np.power(10000, 2 * (i // 2) / d_pos_vec) for i in range(d_pos_vec)]
if pos != 0 else np.zeros(d_pos_vec) for pos in range(n_position)])
Hey! I wonder what is a motivation behind repeating positional encoding values twice?
in paper it's done this way:
position_rate * pos / np.power(10000, i / d_pos_vec)...
Since the synthesis script has been altered to accept a builder param called deepvoice3_multispeaker instead of deepvoice3_vctk, please change the table in the pretrained models section of the README to reflect the new hyperparams for vctk. It will eliminate confusion by people using this platform.
Reference Issue #14
The table entry should read:
--hparams="builder=deepvoice3_multispeaker,preset=deepvoice3_vctk"
deepvoice3_pytorch/init.py
from .version import version
this line has error
version.py is not provided.
deepvoice3_pytorch/builder.py
deepvoice3_multispeaker
inconsistent with hparams.py
deepvoice3_pytorch/deepvoice3.py
line 474, (done>0.5).all()
maybe done.data is better
I created a new environment for this project and made it to through the preprocessing for LJ dataset, and now I'm stuck at the training portion. I get this error
Traceback (most recent call last):
File "train.py", line 906, in <module>
model = build_model()
File "train.py", line 799, in build_model
value_projection=hparams.value_projection,
File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/builder.py", line 46, in deepvoice3
(h, k, 1), (h, k, 3)],
File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/deepvoice3.py", line 54, in __init__
dilation=1, std_mul=std_mul))
File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/modules.py", line 104, in Conv1d
return nn.utils.weight_norm(m)
AttributeError: module 'torch.nn.utils' has no attribute 'weight_norm'
when running python train.py --data-root=./data/ljspeech/ --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"
I installed pytorch with conda install pytorch torchvision cuda90 -c pytorch
. Any help would be appreciated.
Pretrained models of DeepVoice3 Only need 21k steps to train ?
In my experiment.I think 21k steps is too small to train.
Maybe you write 210k to 21k?
And 30k for Nyanko, 58.5k for multi-deepvoice3?
When i ran the train.py
(python3 train.py --data-root=./datapath/ljspeech/ --hparams="batch_size=10")
This error is came:
Exception ignored in: <bound method Image.del of <tkinter.PhotoImage object at 0x7f1b5f86a710>>
Traceback (most recent call last):
File "/usr/lib/python3.5/tkinter/init.py", line 3359, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
I was able to run the "python3.5 synthesis.py ....". But it generated an error at the end each time.
..............................
Finished! Check out ./output for generated audio samples.
*** Error in `python3.5': corrupted size vs. prev_size: 0x0000000001682bd0 ***
Aborted (core dumped)
In fair team saying,
They said there's a big speed difference their own conv_temperal and original nn.conv1d in inferencing.
Have you checked the speed of this two modules while removing fairseq-py dependency?
By the way, I agree implementation without dependency. It make me readily to see overall code flow.
good job!
When I try to train a dataset with the command from the tutorial (python train.py --data-root=./data/ljspeech/ --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech") I get an error telling me that _frontend is a NoneType object and has no 'text_to_sequence' attribute. Do I need to modify anything to get this to work again?
AttributeError: 'NoneType' object has no attribute 'text_to_sequence'
addresses #1 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.