Git Product home page Git Product logo

deepvoice3_pytorch's Issues

jsut data

Hi,

When you train on the JSUT corpus, did you use the original Japanese script? What I'm curious is Chinese characters are not phonetic, so I doubt if the network can learn with them. I thought they need to be converted into phonetic transcription (romaji).

AttributeError: module 'torch.nn.utils' has no attribute 'weight_norm'

I created a new environment for this project and made it to through the preprocessing for LJ dataset, and now I'm stuck at the training portion. I get this error

Traceback (most recent call last):
  File "train.py", line 906, in <module>
    model = build_model()
  File "train.py", line 799, in build_model
    value_projection=hparams.value_projection,
  File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/builder.py", line 46, in deepvoice3
    (h, k, 1), (h, k, 3)],
  File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/deepvoice3.py", line 54, in __init__
    dilation=1, std_mul=std_mul))
  File "/mnt/deepvoice3_pytorch/deepvoice3_pytorch/modules.py", line 104, in Conv1d
    return nn.utils.weight_norm(m)
AttributeError: module 'torch.nn.utils' has no attribute 'weight_norm'

when running python train.py --data-root=./data/ljspeech/ --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"

I installed pytorch with conda install pytorch torchvision cuda90 -c pytorch. Any help would be appreciated.

Phonemes

Hi there,

I was wondering if you were ever considering making adjustments for `JOINT REPRESENTATION OF CHARACTERS AND PHONEMES' as the deepvoice3 paper, part 3.2 mentions.
33697180-9df6c988-db40-11e7-97c9-03689ca557a8

Thanks in advance,

B1gM

Crowdsourcing a high-quality, open-source TTS dataset

Hi r9y9, first I just want to say that your repos are great and I have personally learned a lot from them. So a big thanks to you.

So I too have been trying to replicate the results of the big TTS papers. However the main thing that is frustrating me is the lack of a high quality TTS dataset (although 50 gpus would help too!).

I just wanted to throw this idea out there - what if random people on the internet interested in TTS/ML collaborated to create a good dataset? If enough people joined in (20+) the segmentation and labelling work should only be a couple of hours per person.

Here is a list of the options that occurred to me (and I by no means consider this list complete):

1 - Find a 20+ hour high-quality, open-source audiobook online. Given how massive the internet is - surely there is a possibility of a hi-fi audiobook that isn't poorly recorded, overly-compressed or too 'performed'. Working together, scouring the internet... who knows - a gem might be out there.

2 - Podcasts - there's an endless supply of these. But podcasts bring their own unique difficulties - e.g., were different eq/compression/mic/mastering used by the sound engineer across different episodes? Again, with enough searching, a candidate with consistent sound-quality may reveal itself.

3 - Commercial Audiobooks - this would unfortunately render the whole dataset closed-sourced and for personal research only. However I don't see how there would be any problems if all collaborators purchased the audiobook and didn't redistribute the dataset beyond the initial group of collaborators.

4 - Crowdfunding it - probably the least realistic option. Still though, if enough people were interested, 100 or so, then it might be possible. One studio, one sound engineer, one professional reader and someone to oversee the project for a week or two weeks max? Would $10,000 cover it? $20,000? I'm no expert in studio time and sound-engineering rates etc so I can't say for certain.

So to wrap this up - I just wanted to put this idea out there. I'm very curious what you, or any others reading this, think - even if you feel it's unrealistic. I know buying 50 gpus is unfeasible for most of us - but working together to solve the dataset problem? Personally, I'm optimistic.

Error in `python3': free(): invalid next size (fast) when running synthesis.py

Greetings!
I have successfully preprocessed LJSpeech dataset and trained model for a while with preset hyperparameters:

python3 train.py --data-root=./data/ljspeech \
--hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"

But when trying to generate audio from text:

python3 synthesis.py ./checkpoints/checkpoint_step000270000.pth ./text_list.txt ./generated \ 
--hparams="builder=deepvoice3,preset=deepvoice3_ljspeech"

I'm getting the error:

*** Error in `python3': free(): invalid next size (fast): 0x000000000db7b050 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f1138cbd7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f1138cc637a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f1138cca53c]
/usr/local/cuda-8.0/lib64/libcudnn.so.6(cudnnDestroyConvolutionDescriptor+0x9)[0x7f10e47eac69]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(+0x2dedf7)[0x7f10cc728df7]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(_ZN5torch5cudnn30cudnn_convolution_full_forwardEP8THCStateP12cudnnContext15cudnnDataType_tPNS_12THVoidTensorES7_S7_S7_St6vectorIiSaIiEESA_SA_ibb+0x6a4)[0x7f10cd5f9ee4]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(_ZN5torch8autograd11ConvForward5applyERKSt6vectorINS0_8VariableESaIS3_EE+0x1192)[0x7f10cc9694a2]
/usr/local/lib/python3.5/dist-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so(+0x40d26e)[0x7f10cc85726e]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3[0x540199]
python3(PyEval_EvalFrameEx+0x50b2)[0x53bd92]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebd23]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebd23]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x4fb9ce]
python3(PyObject_Call+0x47)[0x5c1797]
python3[0x574b36]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x4ec6)[0x53bba6]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
python3(PyObject_Call+0x47)[0x5c1797]
python3(PyEval_EvalFrameEx+0x252b)[0x53920b]
python3(PyEval_EvalCodeEx+0x13b)[0x540f9b]
python3[0x4ebe37]
...
....

After debugging I managed to find out that problem appears in this loop at first iteration(deepvoice3.py, line 90):

for f in self.convolutions:
            x = f(x, speaker_embed_btc) if isinstance(f, Conv1dGLU) else f(x)

but still can't solve it.

I tried using Python 3.5.2 and 3.6.3 with tensorflow 1.3.0 and torch 0.3.1 (also tried 0.3.0.post4)
CUDA version is 8.0, GPU: Titan X
Any help would be appreciated.

Assertion `srcIndex < srcSelectDimSize` failed

Hi again,

I am applying this repository for Korean speech corpus (http://www.korean.go.kr/front/board/boardStandardView.do?board_id=4&mn_id=17&b_seq=464) and have encountered the following error. Could you have a look at it? I will be happy to ask PR once it gets working.

I formatted Korean corpus into npy as same as ljspeech has as single speaker and ran training with single GPU or multipe GPU. But it shows a series of error messages like Assertion srcIndex < srcSelectDimSize failed.

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | head -3
nikl-mel-00001.npy
nikl-mel-00002.npy
nikl-mel-00003.npy
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | tail -3
nikl-spec-00929.npy
nikl-spec-00930.npy
train.txt
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl/*.npy | wc -l
1860


CUDA_VISIBLE_DEVICES=3 python train.py \
  --data-root=./data/nikl/ \
  --hparams="frontend=jp,builder=deepvoice3,preset=deepvoice3_ljspeech" \
  --checkpoint-dir checkpoint_nikl


Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoint_nikl',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './data/nikl/',
 '--help': False,
 '--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
 '--load-embedding': None,
 '--log-event-path': None,
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: True
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: deepvoice3
  checkpoint_interval: 10000
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 256
  eval_interval: 10000
  fft_size: 1024
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: en
  guided_attention_sigma: 0.2
  hop_size: 256
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  key_projection: False
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 512
  min_level_db: -100
  n_speakers: 1
  name: deepvoice3
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  preset: deepvoice3_ljspeech
  presets: {'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'enc
oder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input':
 True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk
': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size
': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_
decoder_state_for_postnet_input': True, 'max_positions': 1024, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_
rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'en
coder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input'
: True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
  priority_freq: 3000
  priority_freq_weight: 0.0
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.01
  text_embed_dim: 128
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: False
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Override hyper parameters with preset "deepvoice3_ljspeech": {
    "n_speakers": 1,
    "downsample_step": 4,
    "outputs_per_step": 1,
    "embedding_weight_std": 0.1,
    "dropout": 0.050000000000000044,
    "kernel_size": 3,
    "text_embed_dim": 256,
    "encoder_channels": 512,
    "decoder_channels": 256,
    "converter_channels": 256,
    "use_guided_attention": true,
    "guided_attention_sigma": 0.2,
    "binary_divergence_weight": 0.1,
    "use_decoder_state_for_postnet_input": true,
    "max_positions": 512,
    "query_position_rate": 1.0,
    "key_position_rate": 1.385,
    "key_projection": true,
    "value_projection": true,
    "clip_thresh": 0.1,
    "initial_learning_rate": 0.0005
}
Los event path: log/run-test2018-01-30_15:05:32.238606
34it [00:08,  4.24it/s]
7it/s]/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, i
nt, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

...

/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=59 : device-side assert triggered

Traceback (most recent call last):
  File "train.py", line 941, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 642, in train
    input_lengths=input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/__init__.py", line 94, in forward
    linear_outputs = self.postnet(postnet_inputs, speaker_embed)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/deepvoice3.py", line 597, in forward
    return F.sigmoid(x)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 817, in sigmoid
    return input.sigmoid()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58

Alignment problems with German text?

Hi @r9y9, I'm training on German audio. I have added the german characters (Ä, Ö, Ü, ß, ä, ö, ü) to the symbolset and am using basic_cleaners.

The problem is the alignment on test-audio. Look at some of the samples. And, of course, the audio is horrible too. I have tested with up to 500k steps. Always the same results. When I generate audio with synthesis, I have similar results. Any hints where I'd need to add more info?

step000180000_text4_single_alignment

Thanks for any recommendations... (I converted the German training data to ljspeech format...)

positional encoding

    position_enc = np.array([
        [position_rate * pos / np.power(10000, 2 * (i // 2) / d_pos_vec) for i in range(d_pos_vec)]
        if pos != 0 else np.zeros(d_pos_vec) for pos in range(n_position)])

Hey! I wonder what is a motivation behind repeating positional encoding values twice?
in paper it's done this way:

position_rate * pos / np.power(10000,  i / d_pos_vec)...

RuntimeError: main thread is not in main loop

When i ran the train.py
(python3 train.py --data-root=./datapath/ljspeech/ --hparams="batch_size=10")

This error is came:
Exception ignored in: <bound method Image.del of <tkinter.PhotoImage object at 0x7f1b5f86a710>>
Traceback (most recent call last):
File "/usr/lib/python3.5/tkinter/init.py", line 3359, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop

VCTK alignment

Hi @r9y9 you mention that aligning VCTK with gentle does not work, can you tell what is happening? is it the quality of the alignment, and how did you see it?

Getting error when num_workers > 0

Hi,
I have tried to train the lj speech model with latest mater and it gives me error like this, with

num_workers = 2
image
It looks like _frontend for worker processes didn`t got assigned. I tried injecting _frontend object to the TextDataSource. But It failed. Is there a fix for this ?

When I set the num_workers = 0 , it is training ok.
After quick google search it tells me that when num_workers = 0 it will do all the work in main thread.
My question is, will it slow down my training process significantly ?

Issue training with DeepVoice3 model with LJSpeech Data

Thanks for your excellent implementation of Deep Voice 3. I am attempting to retrain a DeepVoice3 model using the LJSpeech data. My interest in training a new model is that I want to make some small model parameter changes in order to enable fine-tuning using some Spanish data that I have.

As a first step I tried to retrain the baseline model and I have run into some issues.

With my installation, I have been able to successfully synthesize using the pre-trained DeepVoice3 model with git commit 4357976 as your instructions indicate. That synthesized audio sounds very much like the samples linked from the instructions page.

However, I am trying to train now with the latest git commit (commit 48d1014, dated Feb 7). I am using the LJSpeech data set downloaded from the link you provided. I have run the pre-processing and training steps as indicated in your instructions. I am using the default preset parameters for deepvoice3_ljspeech.

I have let the training process run for a while. When I synthesize using the checkpoint saved at 210K iterations, the alignment is bad and the audio is very robotic and mostly unintelligible.

0_checkpoint_step000210000_alignment

When I synthesize using the checkpoint saved at 700K iterations, the alignment is better (but not great); the audio is improved but still robotic and choppy.

0_checkpoint_step000700000_alignment

I can post the synthesized wav files via dropbox if you are interested. I expected to have good alignment and audio at 210K iterations as that is what the pretrained model used.

Any ideas what has changed between git commits 4357976 and 48d1014 that could have caused this issue? When I diff the two commits, I see some changes in audio.py, some places where support for multi-voice has been added, and some other changes I do not yet understand. There are some additions to hparams.py, but I only noticed one difference: in the current commit, masked_loss_weight defaults to 0.5, but in the prior commit the default was 0.0.

I have just started a new training run with masked_loss_weight set to 0.0. In the meantime, do you have thoughts on anything else that might be causing the issues I am seeing?

train: RuntimeError: invalid argument 2: size '[16 x 126]' is invalid for input of with 126 elements at /home/demobin/github/pytorch/torch/lib/TH/THStorage.c:41

python3 train.py --data-root=./data/ljspeech --checkpoint-dir=checkpoints_nyanko --hparams="use_preset=True,builder=nyanko" --log-event-path=log/nyanko_preset

Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoints_nyanko',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './data/ljspeech',
 '--help': False,
 '--hparams': 'use_preset=True,builder=nyanko',
 '--log-event-path': 'log/nyanko_preset',
 '--reset-optimizer': False,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: nyanko
  checkpoint_interval: 5000
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  encoder_channels: 256
  fft_size: 1024
  force_monotonic_attention: True
  frontend: en
  guided_attention_sigma: 0.2
  hop_size: 256
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.0
  max_positions: 512
  min_level_db: -100
  name: deepvoice3
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  presets: {'nyanko': {'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'outputs_per_step': 1, 'text_embed_dim': 128, 'initial_learning_rate': 0.0005, 'binary_divergence_weight': 0.1, 'kernel_size': 3, 'downsample_step': 4, 'decoder_channels': 256, 'dropout': 0.050000000000000044, 'clip_thresh': 0.1, 'encoder_channels': 256, 'converter_channels': 256, 'use_decoder_state_for_postnet_input': True}, 'deepvoice3': {'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'outputs_per_step': 4, 'text_embed_dim': 256, 'initial_learning_rate': 0.001, 'binary_divergence_weight': 0.0, 'kernel_size': 7, 'downsample_step': 1, 'decoder_channels': 256, 'dropout': 0.050000000000000044, 'clip_thresh': 1.0, 'encoder_channels': 256, 'converter_channels': 256, 'use_decoder_state_for_postnet_input': True}, 'latest': {}}
  priority_freq: 3000
  priority_freq_weight: 0.0
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  sample_rate: 22050
  text_embed_dim: 128
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  use_preset: True
  weight_decay: 0.0
Override hyper parameters with preset "nyanko": {
    "use_guided_attention": true,
    "guided_attention_sigma": 0.2,
    "outputs_per_step": 1,
    "text_embed_dim": 128,
    "initial_learning_rate": 0.0005,
    "binary_divergence_weight": 0.1,
    "kernel_size": 3,
    "downsample_step": 4,
    "decoder_channels": 256,
    "dropout": 0.050000000000000044,
    "clip_thresh": 0.1,
    "encoder_channels": 256,
    "converter_channels": 256,
    "use_decoder_state_for_postnet_input": true
}
Los event path: log/nyanko_preset
0it [00:00, ?it/s]Traceback (most recent call last):
  File "train.py", line 777, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 466, in train
    in tqdm(enumerate(data_loader)):
  File "/usr/local/lib/python3.5/dist-packages/tqdm/_tqdm.py", line 816, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 201, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
    batch = pin_memory_batch(batch)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
    return [pin_memory_batch(sample) for sample in batch]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
    return [pin_memory_batch(sample) for sample in batch]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
    return [pin_memory_batch(sample) for sample in batch]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
    return [pin_memory_batch(sample) for sample in batch]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
    return batch.pin_memory()
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 82, in pin_memory
    return type(self)().set_(storage.pin_memory()).view_as(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 198, in view_as
    return self.view(tensor.size())
RuntimeError: invalid argument 2: size '[16 x 126]' is invalid for input of with 126 elements at /home/demobin/github/pytorch/torch/lib/TH/THStorage.c:41

TODOs, status and progress

Single speaker model

Data: https://keithito.com/LJ-Speech-Dataset/

  • Convolution layers
  • Multi-hop attention layers
  • Attention mask for input zero padding
  • Alignments are learned almost monotonically
  • Incremental inference (greedy decoding)
  • Force monotonic attention
  • Done flag prediction
  • Get reasonable sound quality as Tacotron (https://github.com/r9y9/tacotron_pytorch)
  • Audio samples (en)
  • Audio samples (jp)
  • Pre-trained models

Multi-speaker model

Data: VCTK

  • Preprocessor for VCTK
  • Speaker embedding
  • Get reasonable sound quality
  • Audio samples
  • Pre-trained model

Misc

From https://arxiv.org/abs/1710.08969

  • Guided attention
  • Downsample mel-spectrogram / upsample converter
  • Binary divergence
  • Separate training for encoder+decoder and converter

Notes (to be moved to README.md)

  • Multiple attention layers are hard to learn. Empirically, one or two (first and last) attention layers seems enough.
  • With guided attention (see https://arxiv.org/abs/1710.08969), alignments get monotonic more quickly and reliably if we use multiple attention layers. With guided attention, I can confirm five attention layers get monotonic, though I cannot get speech quality improvements.
  • Positional encoding (i.e., using text positions and frame positions in decoder) is essential to learn monotonic alignments (without this I cannot get it to work). However, I'm still not sure why position rate matters. 1.0 for both encoder/decoder worked from my previous experiment.
  • Weight initialization is quite important particularly for deeper (e.g. > 8 layers) networks. Noticed when I tried to replicate https://arxiv.org/abs/1710.08969. They use more than 20 layers in the decoder! Very hard to train. Work in progress in #3. Speech samples (model: encoder/converter from https://arxiv.org/abs/1710.08969 and decoder from DeepVoice3): https://www.dropbox.com/sh/q9xfgscgh3k5lqa/AACPgWCprBfNgjRravscdDYCa?dl=0.
  • Adam with step lr decay works. However, for deeper networks, I find Adam + noam's lr scheduler is more stable.

cuda out of memory?

When I used
x = F.relu(self.fc1(x), inplace=True)
cuda will out of memory?
So, I set the $inplace=False and solved the problem!
x = F.relu(self.fc1(x), inplace=False)

Please correct hyperparams

Since the synthesis script has been altered to accept a builder param called deepvoice3_multispeaker instead of deepvoice3_vctk, please change the table in the pretrained models section of the README to reflect the new hyperparams for vctk. It will eliminate confusion by people using this platform.

Reference Issue #14

The table entry should read:

--hparams="builder=deepvoice3_multispeaker,preset=deepvoice3_vctk"

korean data

hi,ryuuiti. Could you share the korean single speaker data? I met difficulties when trying to download the data from the link you provided.

Memory corruption when synthesising speech

Hi @r9y9 ,
Thanks for working on this project. I trained model with param -hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" with latest commit. However, when I synthesis speech, i get following errors:

 python synthesis.py --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" checkpoints_deepvoice3/checkpoint_step000630000.pth test.txt samples
Command line args:
 {'--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--file-name-suffix': '',
 '--help': False,
 '--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
 '--max-decoder-steps': '500',
 '--output-html': False,
 '--replace_pronunciation_prob': '0.0',
 '--speaker_id': None,
 '<checkpoint>': 'checkpoints_deepvoice3/checkpoint_step000630000.pth',
 '<dst_dir>': 'samples',
 '<text_list_file>': 'test.txt'}
Override hyper parameters with preset "deepvoice3_ljspeech": {
    "n_speakers": 1,
    "downsample_step": 4,
    "outputs_per_step": 1,
    "embedding_weight_std": 0.1,
    "dropout": 0.050000000000000044,
    "kernel_size": 3,
    "text_embed_dim": 256,
    "encoder_channels": 512,
    "decoder_channels": 256,
    "converter_channels": 256,
    "use_guided_attention": true,
    "guided_attention_sigma": 0.2,
    "binary_divergence_weight": 0.1,
    "use_decoder_state_for_postnet_input": true,
    "max_positions": 512,
    "query_position_rate": 1.0,
    "key_position_rate": 1.385,
    "key_projection": true,
    "value_projection": true,
    "clip_thresh": 0.1,
    "initial_learning_rate": 0.0005
}
*** Error in `python': free(): invalid next size (fast): 0x0000000004da9360 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fcd2c2417e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fcd2c24a37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fcd2c24e53c]
/home/fatman/anaconda2/envs/dev3/bin/../lib/libcudnn.so.6(cudnnDestroyConvolutionDescriptor+0x9)[0x7fccdeb64c69]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(+0x2dedf7)[0x7fccb75acdf7]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(_ZN5torch5cudnn30cudnn_convolution_full_forwardEP8THCStateP12cudnnContext15cudnnDataType_tPNS_12THVoidTensorES7_S7_S7_St6vectorIiSaIiEESA_SA_ibb+0x6a4)[0x7fccb847dee4]
/home/fatman/anaconda2/envs/dev3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so(_ZN5torch8autograd11ConvForward5applyERKSt6vectorINS0_8VariableESaIS3_EE+0x1192)[0x7fccb77ed4a2]

Detail logs are here.
The text file contains only single line:
Generative adversarial network or variational auto-encoder.
Thanks.

Tacotron 2

Sorry if this is off-topic (deepvoice vs tacotron) but it seems like the tacotron 2 paper is now released.
The speech samples sounds better than ever (I think):
https://google.github.io/tacotron/publications/tacotron2/index.html

I must admit that I'm not too well versed in how much this differs from the original tacotron. But perhaps the changes made also could be used in your projects?

preprocess: TypeError: unorderable types: NoneType() > int()

python3 preprocess.py ljspeech ./data/LJSpeech-1.0/ ./data/ljspeech
  0%|                                                                                   | 0/13100 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.5/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 57, in _process_utterance
    spectrogram = audio.spectrogram(wav).astype(np.float32)
  File "/data1/demobin/deepvoice3_pytorch/audio.py", line 32, in spectrogram
    D = _lws_processor().stft(preemphasis(y)).T
  File "/data1/demobin/deepvoice3_pytorch/audio.py", line 53, in _lws_processor
    return lws.lws(hparams.fft_size, hparams.hop_size, mode="speech")
  File "lws.pyx", line 357, in lws.lws.__init__ (lws.bycython.cpp:15047)
TypeError: unorderable types: NoneType() > int()
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "preprocess.py", line 55, in <module>
    preprocess_ljspeech(in_dir, out_dir, num_workers)
  File "preprocess.py", line 21, in preprocess_ljspeech
    metadata = ljspeech.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
  File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 34, in build_from_path
    return [future.result() for future in tqdm(futures)]
  File "/data1/demobin/deepvoice3_pytorch/ljspeech.py", line 34, in <listcomp>
    return [future.result() for future in tqdm(futures)]
  File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
    return self.__get_result()
  File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
TypeError: unorderable types: NoneType() > int()

Changing fft_size, hop_size in hparams.py?

Hi there,

I changed hparams.py to

fft_size=2052, # default 1024
hop_size=114, # fedault 256

And I get un-audible result!

What should I do, if I want to increase the fft_size & reduce hop_size? What did I do wrong?

Thanks a lot for any help!

How about speeds between conv_TBC of fairseq-py and nn.conv1D in inferencing?

In fair team saying,
They said there's a big speed difference their own conv_temperal and original nn.conv1d in inferencing.

Have you checked the speed of this two modules while removing fairseq-py dependency?

By the way, I agree implementation without dependency. It make me readily to see overall code flow.
good job!

"ImportError: dlopen: cannot load any more object with static TLS" in python3.5 synthesis.py ........

I got fatal error when testing synthesis.py. Could you help?

python3.5 synthesis.py --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" /home/ml/deepvoice3_pytorch/models/20171213_deepvoice3_checkpoint_step000210000.pth ./text_list.txt ./output/

python3.5 synthesis.py --hparams="uilder=nyanko,preset=nyanko_ljspeech" "/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth" "/home/ml/deepvoice3_pytorch/text_list.txt" "/home/ml/deepvoice3_pytorch/output"
Command line args:
{'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--file-name-suffix': '',
'--help': False,
'--hparams': 'uilder=nyanko,preset=nyanko_ljspeech',
'--max-decoder-steps': '500',
'--output-html': False,
'--replace_pronunciation_prob': '0.0',
'--speaker_id': None,
'': '/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth',
'<dst_dir>': '/home/ml/deepvoice3_pytorch/output',
'<text_list_file>': '/home/ml/deepvoice3_pytorch/text_list.txt'}
Traceback (most recent call last):
File "synthesis.py", line 98, in
hparams.parse(args["--hparams"])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/training/python/training/hparam.py", line 472, in parse
values_map = parse_values(values, type_map)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/training/python/training/hparam.py", line 206, in parse_values
raise ValueError('Unknown hyperparameter type for %s' % name)
ValueError: Unknown hyperparameter type for uilder
ml@tesla1a:~/deepvoice3_pytorch$ python3.5 synthesis.py --hparams="uilder=nyanko,preset=nyanko_ljspeech" "/home/ml/deepvoice3_pytorch/models/20171129_nyanko_checkpoint_step000585000.pth" "/home/ml/deepvoice3_pytorch/text_list.txt" "/home/ml/deepvoice3_pytorch/output"
Traceback (most recent call last):
File "synthesis.py", line 26, in
import torch
File "/usr/local/lib/python3.5/dist-packages/torch/init.py", line 56, in
from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS

21k 30k 58.5k wrong ?

Pretrained models of DeepVoice3 Only need 21k steps to train ?
In my experiment.I think 21k steps is too small to train.
Maybe you write 210k to 21k?
And 30k for Nyanko, 58.5k for multi-deepvoice3?

error on training

I got following error when i try to train model. Did this due to i have some speech with very long (such as 30 seconds) and bring issue ?

======
Los event path: ./log/aclclp
^M0it [00:00, ?it/s]
Traceback (most recent call last):
File "train.py", line 950, in
train_seq2seq=train_seq2seq, train_postnet=train_postnet)
File "train.py", line 685, in train
priority_w=hparams.priority_freq_weight)
File "train.py", line 510, in spec_loss
l1_loss = w * masked_l1(y_hat, y, mask=mask) + (1 - w) * l1(y_hat, y)
File "/home/chester/hdd22t/virtualenv/deepvoice3-pytorch-r9y9/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "train.py", line 280, in forward
loss = self.criterion(input * mask_, target * mask_)
RuntimeError: The size of tensor a (1025) must match the size of tensor b (513) at non-singleton dimension 2

hparams is not defined while running preprocess.py

Ran the following command on downloaded LJSpeech dataset:

python3 preprocess.py ljspeech ~/data/LJSpeech-1.0/ ./data/ljspeech

No preprocessed data was generated and instead got an error:

NameError: name 'hparams' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "preprocess.py", line 47, in
preprocess(mod, in_dir, out_dir, num_workers)
File "preprocess.py", line 21, in preprocess
metadata = mod.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
File "/home/coglac/Documents/deepvoice3_pytorch/ljspeech.py", line 34, in build_from_path
return [future.result() for future in tqdm(futures)]
File "/home/coglac/Documents/deepvoice3_pytorch/ljspeech.py", line 34, in
return [future.result() for future in tqdm(futures)]
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
NameError: name 'hparams' is not defined

No activity on training

Hi,

After successful (1) installation of all prerequisites; and (2) pre-processing.
Starting the training phase with:
python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./data/ljspeech/
continues with a report of input parameters and eventually hangs on:
0it [00:00, ?it/s].

the command watch -n 1 nvidia-smi reports the VRAM usage with 499M range with no activity on GPU

Some modification on my side

  1. deepvoice3_pytorch/init.py
    from .version import version
    this line has error
    version.py is not provided.

  2. deepvoice3_pytorch/builder.py
    deepvoice3_multispeaker
    inconsistent with hparams.py

  3. deepvoice3_pytorch/deepvoice3.py
    line 474, (done>0.5).all()
    maybe done.data is better

RuntimeError: invalid argument 2: sizes do not match

I downloaded pretrained models and upon running any of them I receive the following error:

My pytorch version is: 0.3.0.post4

RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "synthesis.py", line 125, in
model.load_state_dict(checkpoint["state_dict"])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 487, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named seq2seq.encoder.embed_tokens.weight, whose dimensions in the model are torch.Size([149, 128]) and whose dimensions in the checkpoint are torch.Size([149, 256]).

AssertionError

Hi,

I am new to pytorch and following the example of jsut here. And I encountered the following assertion error which is hard for me to look in further. Could anyone help me out?

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python -V
Python 3.5.4 :: Anaconda custom (64-bit)
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls /home/kwon/copora/jsut_ver1.1
basic5000  ChangeLog.txt  countersuffix26  LICENCE.txt  loanword128  onomatopee300  precedent130  README_en.txt  README_ja.txt  repeat500  travel1000  utparaphrase512  voiceactress100
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python preprocess.py jsut /home/kwon/copora/jsut_ver1.1 ./data/jsut
  0%|                                                                                                                                                                               | 0/7696 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 52, in _process_utterance
    mel_spectrogram = audio.melspectrogram(wav).astype(np.float32)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/audio.py", line 50, in melspectrogram
    assert S.max() <= 0 and S.min() - hparams.min_level_db >= 0
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "preprocess.py", line 47, in <module>
    preprocess(mod, in_dir, out_dir, num_workers)
  File "preprocess.py", line 21, in preprocess
    metadata = mod.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in build_from_path
    return [future.result() for future in tqdm(futures)]
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in <listcomp>
    return [future.result() for future in tqdm(futures)]
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 405, in result
    return self.__get_result()
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
AssertionError

Persistent MemoryError while training on VCTK

Hello. I am currently trying to train VCTK model on deepvoice 3 multispeaker model.
While it seems that it works okay, sometimes the training crashes with the following error.

2734it [13:58,  3.26it/s]Traceback (most recent call last):
  File "train.py", line 957, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 585, in train
    in tqdm(enumerate(data_loader)):
  File "H:\envs\pytorch\lib\site-packages\tqdm\_tqdm.py", line 959, in __iter__
    for obj in iterable:
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\train.py", line 329, in collate_fn
    dtype=np.float32)
MemoryError

Forcing garbage collection sporadically(using gc.collect()) doesn't help the issue.
Currently, I have 16 GB of RAM with 48 GB of virtual memory available on my SSD (just in case).
(Using Windows 10 with PyTorch 0.3.1 (with CUDA 8.0, GTX1060 6GB))

Also, I do observe that in Resource Monitor, the memory usage in Commit(KB) and Working Set(KB) is significantly different, as shown below. (Sorry for the non-english)
image

Thank you for creating such wonderful implementation!
:)

KeyError: 'unexpected key "seq2seq.decoder.attention.in_projection.bias" in state_dict'

Hi, thanks for the fantastic DeepVoice3 implementation!

When trying to train Nyanko model starting from your pre-trained checkpoint using the following args:

--hparams="builder=nyanko,preset=nyanko_ljspeech" 
--checkpoint=checkpoints.pretrained/20171129_nyanko_checkpoint_step000585000.pth

I'm getting the error:

Load checkpoint from: checkpoints.pretrained/20171129_nyanko_checkpoint_step000585000.pth
Traceback (most recent call last):
  File "train.py", line 936, in <module>
    load_checkpoint(checkpoint_path, model, optimizer, reset_optimizer)
  File "train.py", line 820, in load_checkpoint
    model.load_state_dict(checkpoint["state_dict"])
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 490, in load_state_dict
    .format(name))
KeyError: 'unexpected key "seq2seq.decoder.attention.in_projection.bias" in state_dict'

Looks like in_projection is missing from AttentionLayer implementation in deepvoice3_pytorch/deepvoice3.py but still in the Nyanko pre-trained model https://github.com/r9y9/deepvoice3_pytorch#pretrained-models

AttributeError: 'NoneType' object has no attribute 'text_to_sequence'

When I try to train a dataset with the command from the tutorial (python train.py --data-root=./data/ljspeech/ --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech") I get an error telling me that _frontend is a NoneType object and has no 'text_to_sequence' attribute. Do I need to modify anything to get this to work again?

AttributeError: 'NoneType' object has no attribute 'text_to_sequence'

Speed up training.

Hi r9y9,
Thanks for the amazing library here. I'm only beginning to learn ML, and love what this can do! Ultimately trying to create what lyrebird.ai has been doing. Managed to finally setup it all up and started training single speaker with the ljspeech.

However i'm experiencing same training speed of ~3s/it between my dekstop specs below and my MBP (2.5Ghz, i8, 4 Cores). is there a way I can speed things up? I know I don't have the ideal AI training hardware specs, but kinda looking forward to the results.

*Both setup has all CPU cores running at 100%

OS: Ubuntu 16.04.4
CPU: i7-7820X (8 CORE)
GPU: 2x 1080 Ti

Does this implement ignore words?

I found that , Tacotron will ignore some words in a long sentence's( a sentence with 30 words etc.) synthesis. Does Deep Voice 3 has that problem?

Multi GPU Support

I'd like to train this model on 8 V100 GPUs - does it support multi GPU training?

Another Assertion error

Hi again,

I trained single Korean speaker successfully and moving to multiple Korean speaker. Again, I encountered such Assertion error as shown below. I tracked down and looks like self.encoder
in AttentionSeq2Seq class gave such error messages. Could you let me know where the following self.encoder function is defined so that I can look into further? max_position doesn't work this time.

encoder_outputs = self.encoder(
text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)

Thanks in advance,

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch2]$ CUDA_VISIBLE_DEVICES=2 python train.py   --data-root=./data/nikl_m/   --hparams="frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker"   --checkpoint-dir checkpoint_nikl_m
Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoint_nikl_m',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './data/nikl_m/',
 '--help': False,
 '--hparams': 'frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker',
 '--load-embedding': None,
 '--log-event-path': None,
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: False
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: deepvoice3_multispeaker
  checkpoint_interval: 10000
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 256
  eval_interval: 10000
  fft_size: 1024
  fmax: 7600
  fmin: 125
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: ko
  guided_attention_sigma: 0.2
  hop_size: 256
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  key_projection: False
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 512
  min_level_db: -100
  n_speakers: 1
  name: deepvoice3
  nepochs: 10000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  preset: deepvoice3_niklm
  presets: {'deepvoice3_niklm': {'n_speakers': 119, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 3000, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 600, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'encoder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
  priority_freq: 3000
  priority_freq_weight: 0.0
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  rescaling: False
  rescaling_max: 0.999
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.01
  text_embed_dim: 256
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: False
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Override hyper parameters with preset "deepvoice3_niklm": {
    "n_speakers": 119,
    "speaker_embed_dim": 16,
    "downsample_step": 4,
    "outputs_per_step": 1,
    "embedding_weight_std": 0.1,
    "speaker_embedding_weight_std": 0.05,
    "dropout": 0.050000000000000044,
    "kernel_size": 3,
    "text_embed_dim": 256,
    "encoder_channels": 512,
    "decoder_channels": 256,
    "converter_channels": 256,
    "use_guided_attention": true,
    "guided_attention_sigma": 0.4,
    "binary_divergence_weight": 0.1,
    "use_decoder_state_for_postnet_input": true,
    "max_positions": 3000,
    "query_position_rate": 2.0,
    "key_position_rate": 7.6,
    "key_projection": true,
    "value_projection": true,
    "clip_thresh": 0.1,
    "initial_learning_rate": 0.0005
}

0it [00:00, ?it/s]
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c line=70 error=59 : device-side assert triggered

Traceback (most recent call last):
  File "train.py", line 967, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 661, in train
    input_lengths=input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 80, in forward
    text_positions, frame_positions, input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 117, in forward
    print(text_sequences)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 119, in __repr__
    return 'Variable containing:' + self.data.__repr__()
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 133, in __repr__
    return str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 140, in __str__
    return _tensor_str._str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 297, in _str
    strt = _matrix_str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 216, in _matrix_str
    min_sz=5 if not print_full_mat else 0)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 79, in _number_format
    tensor = torch.DoubleTensor(tensor.size()).copy_(tensor).abs_().view(tensor.nelement())
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c:70

テキストを読み込む際にエラーが出る

Deep Voice3で、下記のエラーが出ます。

collected_files = self.file_data_source.collect_files()
File "train.py", line 126, in collect_files
assert len(l) == 4 or len(l) == 5
AssertionError

テキストの書き方が間違っているのでしょうか。データはJSUTです

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.