Hello team tarteel, I would like to thank you for your hard work.
currently i am experimenting a quraan tutor for surat elekhlass, the same idea as surat alfatihah but with another audio set which from a known tutors.
I prepared all the files for training, but i face a problem in the training phase.
!python /content/OpenNMT-py/train.py -model_type audio -enc_rnn_size 512 -dec_rnn_size 512 -audio_enc_pooling 1,2 -dropout 0 -enc_layers 2 -dec_layers 1 -rnn_type LSTM -data /content/OpenNMT-py/data/speech/demo -save_model demo-model -global_attention mlp -gpu_ranks 0 -batch_size 8 -optim adam -max_grad_norm 100 -learning_rate 0.0003 -learning_rate_decay 0.8 -train_steps 2000
_[2020-03-04 21:03:57,891 INFO] * tgt vocab size = 15
[2020-03-04 21:03:57,892 INFO] Building model...
[2020-03-04 21:04:02,067 INFO] NMTModel(
(encoder): AudioEncoder(
(W): Linear(in_features=512, out_features=512, bias=False)
(batchnorm_0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_0): LSTM(161, 512)
(pool_0): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(rnn_1): LSTM(512, 512)
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(15, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.0, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): LSTMCell(1012, 512)
)
)
(attn): GlobalAttention(
(linear_context): Linear(in_features=512, out_features=512, bias=False)
(linear_query): Linear(in_features=512, out_features=512, bias=True)
(v): Linear(in_features=512, out_features=1, bias=False)
(linear_out): Linear(in_features=1024, out_features=512, bias=True)
)
)
(generator): Sequential(
(0): Linear(in_features=512, out_features=15, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2020-03-04 21:04:02,067 INFO] encoder: 3747840
[2020-03-04 21:04:02,067 INFO] decoder: 4190555
[2020-03-04 21:04:02,067 INFO] * number of parameters: 7938395
[2020-03-04 21:04:02,068 INFO] Starting training on GPU: [0]
[2020-03-04 21:04:02,068 INFO] Start training loop and validate every 10000 steps...
[2020-03-04 21:04:02,069 INFO] Loading dataset from /content/OpenNMT-py/data/speech/demo.train.0.pt
[2020-03-04 21:04:02,070 INFO] number of examples: 15
Traceback (most recent call last):
File "/content/OpenNMT-py/train.py", line 6, in <module>
main()
File "/content/OpenNMT-py/onmt/bin/train.py", line 204, in main
train(opt)
File "/content/OpenNMT-py/onmt/bin/train.py", line 88, in train
single_main(opt, 0)
File "/content/OpenNMT-py/onmt/train_single.py", line 143, in main
valid_steps=opt.valid_steps)
File "/content/OpenNMT-py/onmt/trainer.py", line 244, in train
report_stats)
File "/content/OpenNMT-py/onmt/trainer.py", line 365, in _gradient_accumulation
with_align=self.with_align)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/content/OpenNMT-py/onmt/models/model.py", line 45, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/content/OpenNMT-py/onmt/encoders/audio_encoder.py", line 119, in forward
memory_bank = pool(memory_bank)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/pooling.py", line 76, in forward
self.return_indices)
File "/usr/local/lib/python3.6/dist-packages/torch/_jit_internal.py", line 181, in fn
return if_false(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 457, in _max_pool1d
input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (7x1x1). Calculated output size: (7x1x0). Output size is too small_
I know that the problem is in the pooling size but i don't know how to fix it.