Git Product home page Git Product logo

end2end-asr-pytorch's Introduction

End-to-End Speech Recognition on Pytorch

Transformer-based Speech Recognition Model

License: MIT

If you use any source codes included in this toolkit in your work, please cite the following paper.

  • Winata, G. I., Madotto, A., Wu, C. S., & Fung, P. (2019). Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) (pp. 271-280).
  • Winata, G. I., Cahyawijaya, S., Lin, Z., Liu, Z., & Fung, P. (2019). Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer. arXiv preprint arXiv:1910.13923. (Accepted by ICASSP 2020)
  • Zhou, S., Dong, L., Xu, S., & Xu, B. (2018). Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese. Proc. Interspeech 2018, 791-795.

Highlights

  • supports batch parallelization on multi-GPU
  • supports multiple dataset training and evaluation

Requirements

Results

AiShell-1

Decoding strategy CER
Greedy 14.5%
Beam-search (beam width=8) 13.5%

Data

AiShell-1 (Chinese)

To preprocess the data. You need to download the data from https://www.openslr.org/33/. I will add a script to automate the process.

❱❱❱ python data/aishell.py

Librispeech (English)

To automatically download the data

❱❱❱ python data/librispeech.py

Training

usage: train.py [-h] [--train-manifest-list] [--valid-manifest-list] [--test-manifest-list] [--cuda] [--verbose] [--batch-size] [--labels-path] [--lr] [--name] [--save-folder] [--save-every] [--feat_extractor] [--emb_trg_sharing] [--shuffle] [--sample_rate] [--label-smoothing] [--window-size] [--window-stride] [--window] [--epochs]  [--src-max-len] [--tgt-max-len] [--warmup] [--momentum] [--lr-anneal] [--num-layers] [--num-heads] [--dim-model] [--dim-key] [--dim-value] [--dim-input] [--dim-inner] [--dim-emb] [--shuffle]

Parameters

- feat_extractor: "emb_cnn" or "vgg_cnn" as the feature extractor, or set "" for none
    - emb_cnn: add 4-layer 2D CNN
    - vgg_cnn: add 6-layer 2D CNN
- cuda: train on GPU
- shuffle: randomly shuffle every batch

Example

❱❱❱ python train.py --train-manifest-list data/manifests/aishell_train_manifest.csv --valid-manifest-list data/manifests/aishell_dev_manifest.csv --test-manifest-list data/manifests/aishell_test_manifest.csv --cuda --batch-size 12 --labels-path data/labels/aishell_labels.json --lr 1e-4 --name aishell_drop0.1_cnn_batch12_4_vgg_layer4 --save-folder save/ --save-every 5 --feat_extractor vgg_cnn --dropout 0.1 --num-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 161 --dim-inner 2048 --dim-emb 512 --shuffle --min-lr 1e-6 --k-lr 1

Use python train.py --help for more parameters and options.

Results

AiShell-1 Loss Curve

Multi-GPU Training

usage: train.py [--parallel] [--device-ids]

Parameters

- parallel: split batches to GPUs (the number of batch has to be divisible by the number of GPUs)
- device-ids: GPU ids

Example

❱❱❱ CUDA_VISIBLE_DEVICES=0,1 python train.py --train-manifest-list data/manifests/aishell_train_manifest.csv --valid-manifest-list data/manifests/aishell_dev_manifest.csv --test-manifest-list data/manifests/aishell_test_manifest.csv --cuda --batch-size 12 --labels-path data/labels/aishell_labels.json --lr 1e-4 --name aishell_drop0.1_cnn_batch12_4_vgg_layer4 --save-folder save/ --save-every 5 --feat_extractor vgg_cnn --dropout 0.1 --num-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 161 --dim-inner 2048 --dim-emb 512 --shuffle --min-lr 1e-6 --k-lr 1 --parallel --device-ids 0 1

Test

usage: test.py [-h] [--test-manifest] [--cuda] [--verbose] [--continue_from]

Parameters

- cuda: test on GPU
- continue_from: path to the trained model

Example

❱❱❱ python test.py --test-manifest-list libri_test_clean_manifest.csv --cuda --continue_from save/model

Use python multi_train.py --help for more parameters and options.

Custom Dataset

Manifest file

To use your own dataset, you must create a CSV manifest file using the following format:

/path/to/audio.wav,/path/to/text.txt
/path/to/audio2.wav,/path/to/text2.txt
...

Each line contains the path to the audio file and transcript file separated by a comma.

Label file

You need to specify all characters in the corpus by using the following JSON format:

[ 
  "_",
  "'",
  "A",
  ...,
  "Z",
  " "
]

Bug Report

Feel free to create an issue

end2end-asr-pytorch's People

Contributors

gentaiscool avatar ppfliu avatar samuelcahyawijaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

end2end-asr-pytorch's Issues

create_manifest missing in utils directory

from utils import create_manifest

Traceback (most recent call last):
File "", line 1, in
from utils import create_manifest

ImportError: cannot import name 'create_manifest' from 'utils' (/home/mandrake/end2end-asr-pytorch/utils/init.py)

This is resulting in an error as there is no entry in the manifest as under

==================================================
THE EXPERIMENT LOG IS SAVED IN: log/aishell_drop0.1_cnn_batch12_4_vgg_layer4
TRAINING MANIFEST: ['data/manifests/aishell_train_manifest.csv']
VALID MANIFEST: ['data/manifests/aishell_dev_manifest.csv']
TEST MANIFEST: ['data/manifests/aishell_test_manifest.csv']

0it [00:00, ?it/s]
Traceback (most recent call last):
File "train.py", line 117, in
trainer.train(model, train_loader, train_sampler, valid_loader_list, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics)
File "/home/mandrake/end2end-asr-pytorch/trainer/asr/trainer.py", line 107, in train
(epoch+1), total_loss/(len(train_loader)), total_cer*100/total_char, opt._rate))
ZeroDivisionError: division by zero

Please do respond
thanks

CER on test

Hi,I am using my dataset on this model. I find that the CER on test is 60%-70%. But the CER on val is 40% and the CER on train is 20%. Even I put train dataset or val dataset in the test code, the CER is 60%-70%.
would you like to give me some advise on this question?

Question about WER and feature extraction

First thanks for opening source! Here are two questions:
What WER or CER can your model achieve for Librispeech and how many epochs it needs?
Why don't you use fbank feature as input?

Issue in librispeech.py

Hi!

As per your instructions, I ran the python librispeech.py command and the data began downloading. The manifests were also created. However, upon checking the directories, I found that the .wav/.flac files did not exit and due to that, the manifest files were all empty.

Am I doing something wrong on my part here or is this a bug?
Thanks!

Could not automatically download the AiShell-1 (Chinese) data

Can you check the script "data/aishell.py" ?
I got this error"

Traceback (most recent call last):
  File "data/aishell.py", line 188, in <module>
    tr_file_list = traverse(root, "transcript/train", search_fix="")
  File "data/aishell.py", line 19, in traverse
    for s_p in sorted(os.listdir(p)):
FileNotFoundError: [Errno 2] No such file or directory: 'Aishell_dataset/transcript/train'

a problem in multi-GPU training at the last batch

The problem appeared after #24 so that I preserve the link. But I think it is another issue. Thank you for solving the problem.

I merged your code on master branch, however, I got another error with multi-gpu running. This issue only appears at the last batch, which is smaller than normal batches. Thus actually, I could by-pass this issue by using drop_last=True option in train_loader and valid_loader variables in train.py.

I tried to reproduce the error with a toy example. It seems to be related to the batch size and number of GPUs. I made data list files that contains 10 examples each. I ran the example with 4 GPUs with batch size 8.

python train.py --train-manifest-list ~/asr/data/librispeech/libri_asis_10 --valid-manifest-list ~/asr/data/librispeech/libri_dev_10 --test-manifest-list ~/asr/data/librispeech/libri_test_clean --labels-path data/labels/labels.json --cuda --save-every 1 --save-folder trained_models/ --name librispeech_drop0.1_cnn_batch12_4_vgg_layer4_lr0.1 --epochs 10 --cuda --batch-size 8 --lr 0.1 --save-folder save/ --save-every 1 --feat_extractor vgg_cnn --dropout 0.1 --num-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 161 --dim-inner 2048 --dim-emb 512 --shuffle --min-lr 1e-6 --k-lr 1 --parallel --device-ids 0 1 2 3

And I encountered the following error. The same error appeared when the batch size is not a multiple of the number of GPUs. For example, when the batch size is set to 6 while the number of GPUs is 4, the error occurs.

In the following run, it seems that then the device 2 does not have any examples in the last batch, since the number of the remaining examples at the last batch is 2 but the number of GPUs is 4. At this time, the device 0 and 1 seems to have examples, while the device 2 and device 3 do not.

I think this is natural situation, however, yet the program made an error. I also tested the official example at https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html, however, this example did not give me any errors. So I don't know why this error appears in this code.

I hope to know whether this error appears on your versions of codes or not. I'm suspicious that if I made some mistakes in the merging your codes into my own version of codes.

==================================================
THE EXPERIMENT LOG IS SAVED IN: log/librispeech_drop0.1_cnn_batch12_4_vgg_layer4_lr0.1
TRAINING MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_asis_10']
VALID MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_dev_10']
TEST MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_test_clean']
==================================================
load with device_ids [0, 1, 2, 3]
(Epoch 1) TRAIN LOSS:4.3495 CER:96.68% LR:0.0000010: 50%|███████████████████████████████████████████████ | 1/2 [00:09<00:09, 9.41s/it]Traceback (most recent call last):
File "train.py", line 116, in
trainer.train(model, train_loader, train_sampler, valid_loader_list, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics)
File "/home/hh1208-kang/end2end-asr-pytorch-user/trainer/asr/trainer.py", line 58, in train
pred, gold, hyp_seq, gold_seq = model(src, src_lengths, tgt, verbose=False)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 2 on device 2.
Original Traceback (most recent call last):
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() missing 3 required positional arguments: 'padded_input', 'input_lengths', and 'padded_target'

p.s. This is my debugging example referencing the pytorch example, which might be helpful to explain the problem.

python train.py --train-manifest-list ~/asr/data/librispeech/libri_asis_10 --valid-manifest-list ~/asr/data/librispeech/libri_dev_10 --test-manifest-list ~/asr/data/librispeech/libri_test_clean --labels-path data/labels/labels.json --cuda --save-every 1 --save-folder trained_models/ --name librispeech_drop0.1_cnn_batch12_4_vgg_layer4_lr0.1 --epochs 10 --cuda --batch-size 4 --lr 0.1 --save-folder save/ --save-every 1 --feat_extractor vgg_cnn --dropout 0.1 --num-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 161 --dim-inner 2048 --dim-emb 512 --shuffle --min-lr 1e-6 --k-lr 1 --parallel --device-ids 0 1 2 3

==================================================
THE EXPERIMENT LOG IS SAVED IN: log/librispeech_drop0.1_cnn_batch12_4_vgg_layer4_lr0.1
TRAINING MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_asis_10']
VALID MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_dev_10']
TEST MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_test_clean']
==================================================
load with device_ids [0, 1, 2, 3]
0%| | 0/2 [00:00<?, ?it/s]In Model: input size torch.Size([1, 128, 40, 395]) output size torch.Size([1, 395, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 395]) output size torch.Size([1, 395, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 395]) output size torch.Size([1, 395, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 395]) output size torch.Size([1, 395, 512]) torch.Size([1, 1000, 32])
(Epoch 1) TRAIN LOSS:4.3895 CER:97.83% LR:0.0000010: 50%|████████████████████████████████████████████████ | 1/2 [00:11<00:11, 11.38s/it]In Model: input size torch.Size([1, 128, 40, 403]) output size torch.Size([1, 403, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 403]) output size torch.Size([1, 403, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 403]) output size torch.Size([1, 403, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 403]) output size torch.Size([1, 403, 512]) torch.Size([1, 1000, 32])
(Epoch 1) TRAIN LOSS:4.3098 CER:95.95% LR:0.0000010: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 8.09s/it]

0%| | 0/3 [00:00<?, ?it/s]In Model: input size torch.Size([1, 128, 40, 312]) output size torch.Size([1, 312, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 312]) output size torch.Size([1, 312, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 312]) output size torch.Size([1, 312, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 312]) output size torch.Size([1, 312, 512]) torch.Size([1, 1000, 32])
VALID SET 0 LOSS:4.2888 CER:106.35%: 33%|█████████████████████████████████████▎ | 1/3 [00:00<00:01, 1.74it/s]In Model: input size torch.Size([1, 128, 40, 735]) output size torch.Size([1, 735, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 735]) output size torch.Size([1, 735, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 735]) output size torch.Size([1, 735, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 735]) output size torch.Size([1, 735, 512]) torch.Size([1, 1000, 32])
VALID SET 0 LOSS:4.2631 CER:107.82%: 67%|██████████████████████████████████████████████████████████████████████████▋ | 2/3 [00:00<00:00, 2.08it/s]In Model: input size torch.Size([1, 128, 40, 457]) output size torch.Size([1, 457, 512]) torch.Size([1, 1000, 32])
In Model: input size torch.Size([1, 128, 40, 457]) output size torch.Size([1, 457, 512]) torch.Size([1, 1000, 32])
Traceback (most recent call last):
File "train.py", line 122, in
label2id, id2label, metrics)
File "/home/hh1208-kang/end2end-asr-pytorch-user/trainer/asr/trainer.py", line 145, in train
pred, gold, hyp_seq, gold_seq = model(src, src_lengths, tgt, verbose=False)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 2 on device 2.
Original Traceback (most recent call last):
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() missing 3 required positional arguments: 'padded_input', 'input_lengths', and 'padded_target'

Aishell

we tried aishell1-dataset using custom dataset method and reported to have a bad result.
"""
python train.py --cuda --train-manifest data_aishell/aishell1.csv --device-ids 1 --val-manifest data_aishell/aishell1_dev.csv --test-manifest data_aishell/aishell1_test.csv --labels-path data/labels/aishell_labels.json --emb_cnn --shuffle --epochs 1000 --batch-size 32
"""
in epoch 50 : CER -train :48 CER -valid:85, WER 100.
the CER -valid remains 80+ for the whole 50-epoch.
is there any idea that probably result in the mistake? such as we need to have 1000 epoch to have a better result? or we should build a new labels.json for the aishell1 ---we use the original aishell_label.json in the project.

problem of training on multi-gpu

when I train on the multi-gpu, I encountered this problem, could you help me figure it out?
File "multi_train.py", line 91, in
trainer.train(model, train_loader, train_sampler, valid_loaders, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics, logger)
File "/asr-data/end2end-asr-pytorch/trainer/asr/multi_trainer.py", line 66, in train
for x in hyp]) for hyp in hyp_seq]
File "/asr-data/end2end-asr-pytorch/trainer/asr/multi_trainer.py", line 66, in
for x in hyp]) for hyp in hyp_seq]
File "/asr-data/end2end-asr-pytorch/trainer/asr/multi_trainer.py", line 66, in
for x in hyp]) for hyp in hyp_seq]
KeyError: 72340172838076672

about the result

hello, could you tell what is your best WER in libirspeech data?i got the result of 17% CER ,33 % WER on test dataset, it is far from the result on validation dataset-25% WER

Version of Pytorch.

Hello, I want to train my dataset on google colab. Can the latest version of Pytorch be used ??

CUDA out of memory when validation

First i want to thanks for your work. And I run you code with dataset llibrispeech train 100 as training data, dev clean as validation data using GeForce RTX 2070. After several time OOM error, i set the barch size to 4 and finally can train normally. But after one epoch, i also met the OOM error in validation. So i want to konw if i set the batch size smaller can avoid this problem ? Because i notice that in the librispeech dataset process script, the training data have been pruned to min/max duration but the validation and test data didn't.
And I also to konw is there a result for librispeech using this code ? I only saw the aishell result in README.
Thanks.

Librespeech `/bin/sh: 1: sox: not found`

Hello,

I'm getting the following error /bin/sh: 1: sox: not found while running the command python data/librispeech.py.

I suspect that the error is coming from this line :

subprocess.call(["sox {}  -r {} -b 16 -c 1 {}".format(full_recording_path, str(args.sample_rate),
                                                          wav_recording_path)], shell=True)

EOS, SOS characters in dataloader and decoder

I have a question about decoder inputs. I think the following pre-processing adds SOS and EOS token to label y.

seq_in = [torch.cat([sos, y], dim=0) for y in seq]

seq_out = [torch.cat([y, eos], dim=0) for y in seq]

It seems SpectrogramDataset also contain a process for adding SOS and EOS to label y.

transcript = constant.SOS_CHAR + transcript_file.read().replace('\n', '').lower() + constant.EOS_CHAR

But I think SpectrogramDataset should not do this. I think the decoder currently process the label like this:
y= HELLO

seq_in: SOS, SOS, H, E, L, L, O, EOS
seq_out: SOS, H, E, L, L, O, EOS, EOS

I'll be very grateful if you confirm whether this is correct or not.

INCOMPLETE TRAINING OF CUSTOM DATASET

Hello, I got the following error while training on google colab and I am still unable to completely train my dataset

VALID SET 0 LOSS:2.7531 CER:103.64%: 41% 2312/5635 [02:02<07:07, 7.77it/s]Traceback (most recent call last):
File "/content/drive/My Drive/end2end-asr-pytorch-master/train.py", line 117, in
trainer.train(model, train_loader, train_sampler, valid_loader_list, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics)
File "/content/drive/My Drive/end2end-asr-pytorch-master/trainer/asr/trainer.py", line 119, in train
for i, (data) in enumerate(valid_pbar):
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 979, in iter
for obj in iterable:
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
TypeError: function takes exactly 5 arguments (1 given)

[Question] what is the meaning of src,tgt len?

i found the src max len is fist use in the dataloader for spec, and if the sample rate is 16K, win size is 320, hop size is 160, then the spec len for 1s is about 51. If we set the max src len to 4000, then the max src input is 80s ?I think maybe i'm wrong in somewhere.
Besides, the tgt len is about the transcript, how to understand the default value of 1000?

Questions about dataset preprocessing

Hello, when I use ‘aishell.py’ to process data, it will fail to execute successfully. Can you tell me how to process the data, for example, what is the generated csv file? grateful!

一些疑问

为了表达地更清晰些还是直接用中文,请勿见怪。

  1. data_loader.py的136-137行,在读取transcript文件时为什么需要增加SOS_CHAR,EOS_CHAR?
    with open(transcript_path, 'r', encoding='utf8') as transcript_file:
    transcript = constant.SOS_CHAR + transcript_file.read().replace('\n', '').lower() + constant.EOS_CHAR

test,py get different result with train,py

when i load the checkpoint to use test.py to test the valid set, it get very higher wer and cer, which is much higher than valid cer showing in training. Why would this happen ?

Report bugs

Some bugs need to be fixed. For example:
in line 97 of trainer/asr/trainer.py
opt.optimizer.step() -> opt.step()

Problem in Training on multi-gpu

I saw you fixed an issue at #6, and I think this is not my problem.

I slightly fixed your code at the data loading for my convenience. Instead of using labels as separated files, I changed it to use a long list file.

Anyway, the problem appeared in the multi-gpu training.

The following training code with a single device (gpu) did not caused any problems.

CUDA_VISIBLE_DEVICES=0 python train.py --train-manifest-list ~/asr/data/librispeech/libri_asis --valid-manifest-list ~/asr/data/librispeech/libri_dev --test-manifest-list ~/asr/data/librispeech/libri_test_clean --labels-path data/labels/labels.json --cuda --device-ids 0 --parallel --save-every 1 --save-folder train_models/librispeech_transformer --name librispeech_transformer --warmup 8000 --epochs 10 --label-smoothing 0.15 --window-size 0.025 --window-stride 0.01 --window hann --lr 0.1 --feat_extractor None --num-layers 6 --dropout 0.2 --dim-inner 2048 --num-heads 8 --dim-input 201 --batch-size 4

However, when I run it with multiple gpus, I got error messages.

Here is the command,

CUDA_VISIBLE_DEVICES=0,1 python train.py --train-manifest-list ~/asr/data/librispeech/libri_asis --valid-manifest-list ~/asr/data/librispeech/libri_dev --test-manifest-list ~/asr/data/librispeech/libri_test_clean --labels-path data/labels/labels.json --cuda --device-ids 0 1 --parallel --save-every 1 --save-folder train_models/librispeech_transformer --name librispeech_transformer --warmup 8000 --epochs 10 --label-smoothing 0.15 --window-size 0.025 --window-stride 0.01 --window hann --lr 0.1 --feat_extractor None --num-layers 6 --dropout 0.2 --dim-inner 2048 --num-heads 8 --dim-input 201 --batch-size 8

and this is the error messages.

==================================================
THE EXPERIMENT LOG IS SAVED IN: log/librispeech_transformer
TRAINING MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_asis']
VALID MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_dev']
TEST MANIFEST: ['/home/hh1208-kang/asr/data/librispeech/libri_test_clean']
==================================================
the model is initialized without feature extractor
load with device_ids [0, 1]
0%| | 0/35156 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 117, in
trainer.train(model, train_loader, train_sampler, valid_loader_list, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics)
File "/home/hh1208-kang/end2end-asr-pytorch/trainer/asr/trainer.py", line 59, in train
src, src_lengths, tgt, verbose=False)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/hh1208-kang/end2end-asr-pytorch/utils/parallel.py", line 147, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hh1208-kang/end2end-asr-pytorch/utils/parallel.py", line 190, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/hh1208-kang/end2end-asr-pytorch/utils/parallel.py", line 146, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/hh1208-kang/end2end-asr-pytorch/utils/parallel.py", line 184, in replicate
return replicate(module, device_ids)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 88, in replicate
param_copies = _broadcast_coalesced_reshape(params, devices, detach)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 71, in _broadcast_coalesced_reshape
tensor_copies = Broadcast.apply(devices, *tensors)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/nn/parallel/_functions.py", line 21, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/hh1208-kang/venv/lib/python3.5/site-packages/torch/cuda/comm.py", line 39, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

My machine have 4 GTX TITAN GPUs. I'll be very grateful if you would give any suggestions or advises for me to solve this problem.

how to speed up training?

i use librispeech 100h to train the model with 6 2080ti. But i found the gpu usage is very low (about 20% with batch size 24). And i have use hdf5 as feature caching, also i have tried to increase the worker num in dataloader, but all of these seems not helpful to speed up. Is there any way to speed up the training or what is the bottleneck of the training procedure?

Questions about Aishell train loss.

Hello~
When I run the command given in the example directly, the loss is different from the given result. Could you please tell me if the parameters given in the example is the same parameters in the given result?

trainer.py have problem (ZeroDivisionError: division by zero)

Hi @gentaiscool , i have problem and entered the following:

python train.py --train-manifest-list data / manifests / libri_train_manifest.csv --valid-manifest-list data / manifests / libri_val_manifest.csv --test-manifest-list data / manifests / libri_test_clean_manifest.csv --cuda- -batch-size 12 --labels-path data / labels / labels.json --lr 1e-4 --name libri_drop0.1_cnn_batch12_4_vgg_layer4 --save-folder save / --save-every 5 --feat_extractor vgg_cnn --dropout 0.1 --num-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 161 --dim-inner 2048 --dim-emb 512- shuffle --min-lr 1e-6 --k-lr 1

Got wrong information
Traceback (most recent call last):
File "train.py", line 117, in
trainer.train(model, train_loader, train_sampler, valid_loader_list, opt, loss_type, start_epoch, num_epochs, label2id, id2label, metrics)
File "/data1/ytz109/end2end-asr-pytorch-master/trainer/asr/trainer.py", line 162, in train
total_valid_loss/(i+1), total_valid_cer*100/total_valid_char))
ZeroDivisionError: division by zero

=================================================================
I find total_valid_char = 0
Don't know where the settings are wrong? Thank you

Program Bug

The Multi-Head Attention of DecoderLayer need to calculate Q,K,V;the shape of decode_input_Q is (Batch, tgt_max_len, dim_emp),and the query_linear is nn.Linear(dim_model, num_heads * dim_key),if dim_model != dim_emb,"RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 2 " will occur!

A problem about LibriSpeech's testing results

I have some question for you.

Whether the low-rank transformer model is not good for longer english sentence recognition (more than 30 words), I found that the WER is high, and the testing result is shown in the following:


Epoch 75 ,"Test_clean, WER=15.98%, CER=9.79%" ,"Test_other, WER=31.55%, CER=17.71%"

For example:
hyp = "as the chase drives away mary stands bewildered and perplexed on the doorstep her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her"

gold = "as the chaise drives away mary stands bewildered and perplexed on the door step her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her mother disappointment vexation and ill humor surge and swell among those delicate organizations on which the structure and development of the soul so closely depend doing perhaps an irreparable injury"


Later sequences are not recognized, is there any way to improve it?

Thanks

Training result

Could you tell me the accuracy of the trained model? The result of my training is very bad. Thanks!

一些小BUG

  1. 在librispeech数据处理的时候讲所有的字符处理成大写字母了
    def _preprocess_transcript(phrase):
    return phrase.strip().upper()
    但在data_loader.py读数据时将所有字符转成小写的了,但是transcript的数据有问题。
    def parse_transcript(self, transcript_path):
    with open(transcript_path, 'r', encoding='utf8') as transcript_file:
    transcript = constant.SOS_CHAR + transcript_file.read().replace('\n', '').lower() + constant.EOS_CHAR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.