Comments (13)
It turns out it takes more time and memory to decode test-other
. I switched to a new machine with 100 GB CPU RAM, though the max RAM used in decoding test-other
is about 58.16 GB.
Attached is the complete decoding log for test-clean
and test-other
.
log-decode-2021-10-26-10-33-20.txt
You can find the decoding time from the attached log:
- test-clean: 1 hour and 46 minutes
- test-other: 3 hours and 21 minutes
The memory usage for decoding test-clean is given below:
There is a peak (20.56 GB) at 10:36:00, which is caused by model averaging. Decoding test-clean on CPU uses on average less than 18 GB RAM.
The memory usage for test-other
is shown below:
The memory consumption goes up between 14:25 and 14:39, which happens when decoding batch 400. See the log below:
2021-10-26 14:03:56,823 INFO [decode-500-vgg-att0.8.py:403] batch 400/757, cuts processed until now is 1549
2021-10-26 14:52:47,464 INFO [decode-500-vgg-att0.8.py:403] batch 500/757, cuts processed until now is 1946
Here is the screenshot of the decoding log. It took about 10 hours.
I cannot reproduce your results. By the way, I am using --max-duration 30
, not 80
.
Here is part of the decoding log extracted from the attached file for easier reference:
CUDA_VISIBLE_DEVICES= ./conformer_ctc/decode-500-vgg-att0.8.py \
--max-duration 30 --concatenate-cuts 0 --bucketing-sampler 1 \
--method attention-decoder --epoch 34 --avg 20
2021-10-26 10:33:20,093 INFO [decode-500-vgg-att0.8.py:465] Decoding started
2021-10-26 10:33:20,093 INFO [decode-500-vgg-att0.8.py:466] {'exp_dir': PosixPath('conformer_ctc/exp_500_att_0.8_vgg'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'subsampling_factor': 4, 'num_decoder_layers': 6, 'vgg_frontend': True, 'is_espnet_structure': True, 'mmi_loss': False, 'use_feat_batchnorm': True, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 34, 'avg': 20, 'method': 'attention-decoder', 'num_paths': 100, 'lattice_score_scale': 1.0, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-26 10:33:20,468 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe_500/Linv.pt
2021-10-26 10:33:20,740 INFO [decode-500-vgg-att0.8.py:476] device: cpu
2021-10-26 10:33:28,599 INFO [decode-500-vgg-att0.8.py:519] Loading pre-compiled G_4_gram.pt
2021-10-26 10:35:48,692 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-26 10:37:26,867 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
2021-10-26 10:37:36,221 INFO [decode-500-vgg-att0.8.py:403] batch 0/787, cuts processed until now is 3
2021-10-26 10:51:13,182 INFO [decode-500-vgg-att0.8.py:403] batch 100/787, cuts processed until now is 348
2021-10-26 11:04:48,234 INFO [decode-500-vgg-att0.8.py:403] batch 200/787, cuts processed until now is 690
2021-10-26 11:18:20,976 INFO [decode-500-vgg-att0.8.py:403] batch 300/787, cuts processed until now is 1017
2021-10-26 11:31:43,197 INFO [decode-500-vgg-att0.8.py:403] batch 400/787, cuts processed until now is 1381
2021-10-26 11:44:47,576 INFO [decode-500-vgg-att0.8.py:403] batch 500/787, cuts processed until now is 1694
2021-10-26 11:57:59,318 INFO [decode-500-vgg-att0.8.py:403] batch 600/787, cuts processed until now is 2024
2021-10-26 12:11:09,544 INFO [decode-500-vgg-att0.8.py:403] batch 700/787, cuts processed until now is 2352
2021-10-26 12:23:47,024 INFO [decode-500-vgg-att0.8.py:452]
For test-clean, WER of different settings are:
ngram_lm_scale_1.0_attention_scale_1.1 2.6 best for test-clean
ngram_lm_scale_0.7_attention_scale_1.0 2.61
ngram_lm_scale_0.9_attention_scale_0.9 2.61
ngram_lm_scale_0.9_attention_scale_1.0 2.61
ngram_lm_scale_1.0_attention_scale_1.2 2.61
ngram_lm_scale_1.1_attention_scale_1.1 2.61
... ... ...
2021-10-26 12:24:10,000 INFO [decode-500-vgg-att0.8.py:403] batch 0/757, cuts processed until now is 5
2021-10-26 12:42:13,595 INFO [decode-500-vgg-att0.8.py:403] batch 100/757, cuts processed until now is 377
2021-10-26 13:22:32,513 INFO [decode-500-vgg-att0.8.py:403] batch 200/757, cuts processed until now is 792
2021-10-26 13:42:37,275 INFO [decode-500-vgg-att0.8.py:403] batch 300/757, cuts processed until now is 1180
2021-10-26 14:03:56,823 INFO [decode-500-vgg-att0.8.py:403] batch 400/757, cuts processed until now is 1549
2021-10-26 14:52:47,464 INFO [decode-500-vgg-att0.8.py:403] batch 500/757, cuts processed until now is 1946
2021-10-26 15:11:51,084 INFO [decode-500-vgg-att0.8.py:403] batch 600/757, cuts processed until now is 2342
2021-10-26 15:29:53,848 INFO [decode-500-vgg-att0.8.py:403] batch 700/757, cuts processed until now is 2734
2021-10-26 15:45:15,609 INFO [decode-500-vgg-att0.8.py:452]
For test-other, WER of different settings are:
ngram_lm_scale_1.2_attention_scale_1.2 5.71 best for test-other
ngram_lm_scale_1.3_attention_scale_1.5 5.71
ngram_lm_scale_1.5_attention_scale_1.7 5.71
ngram_lm_scale_1.5_attention_scale_2.0 5.71
ngram_lm_scale_1.2_attention_scale_1.1 5.72
ngram_lm_scale_1.5_attention_scale_1.9 5.72
from icefall.
For comparison with GPU decoding, attached is the decoding log for GPU decoding that I just obtained using the same machine.
log-decode-2021-10-26-16-05-54.txt
The following is the CPU RAM usage for decoding test-clean on GPU:
Note: I am using #84 to average checkpoints on GPU, so it does not take much CPU RAM to average the checkpoints.
The time taken for averaging 20 checkpoints on CPU is 1 minute 38 seconds (see the log from #69 (comment))
2021-10-26 10:35:48,692 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-26 10:37:26,867 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
while it is 1 minute 20 seconds for averaging on GPU.
2021-10-26 16:06:44,528 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-26 16:08:04,477 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
When the number of checkpoints to be averaged becomes larger, the advantages to use GPU for averaging become more obvious.
The GPU memory usage reported by nvidia-smi
while decoding test-clean
is:
For test-other
, the CPU RAM usage is:
We can see that the max RAM usage is about 10.19 GB.
Note that at 16:29:48, there is an OOM error, which will free some cached memory, so you can see that
the GPU RAM drops from 275357 MB to 25205 MB.
The following is part of the decoding log extracted from the attached file for ease of reference:
CUDA_VISIBLE_DEVICES=0 ./conformer_ctc/decode-500-vgg-att0.8.py \
--max-duration 30 --concatenate-cuts 0 --bucketing-sampler 1 \
--method attention-decoder --epoch 34 --avg 20
2021-10-26 16:05:54,451 INFO [decode-500-vgg-att0.8.py:465] Decoding started
2021-10-26 16:05:54,452 INFO [decode-500-vgg-att0.8.py:466] {'exp_dir': PosixPath('conformer_ctc/exp_500_att_0.8_vgg'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'subsampling_factor': 4, 'num_decoder_layers': 6, 'vgg_frontend': True, 'is_espnet_structure': True, 'mmi_loss': False, 'use_feat_batchnorm': True, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 34, 'avg': 20, 'method': 'attention-decoder', 'num_paths': 100, 'lattice_score_scale': 1.0, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-26 16:05:54,819 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe_500/Linv.pt
2021-10-26 16:05:55,204 INFO [decode-500-vgg-att0.8.py:476] device: cuda:0
2021-10-26 16:06:08,230 INFO [decode-500-vgg-att0.8.py:519] Loading pre-compiled G_4_gram.pt
2021-10-26 16:06:44,528 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-26 16:08:04,477 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
2021-10-26 16:08:06,127 INFO [decode-500-vgg-att0.8.py:403] batch 0/787, cuts processed until now is 3
2021-10-26 16:09:45,247 INFO [decode-500-vgg-att0.8.py:403] batch 100/787, cuts processed until now is 348
2021-10-26 16:11:27,598 INFO [decode-500-vgg-att0.8.py:403] batch 200/787, cuts processed until now is 690
2021-10-26 16:13:10,609 INFO [decode-500-vgg-att0.8.py:403] batch 300/787, cuts processed until now is 1017
2021-10-26 16:14:50,366 INFO [decode-500-vgg-att0.8.py:403] batch 400/787, cuts processed until now is 1381
2021-10-26 16:16:34,659 INFO [decode-500-vgg-att0.8.py:403] batch 500/787, cuts processed until now is 1694
2021-10-26 16:18:16,964 INFO [decode-500-vgg-att0.8.py:403] batch 600/787, cuts processed until now is 2024
2021-10-26 16:19:58,504 INFO [decode-500-vgg-att0.8.py:403] batch 700/787, cuts processed until now is 2352
2021-10-26 16:22:38,703 INFO [decode-500-vgg-att0.8.py:452]
For test-clean, WER of different settings are:
ngram_lm_scale_1.0_attention_scale_1.1 2.6 best for test-clean
ngram_lm_scale_1.3_attention_scale_1.3 2.6
ngram_lm_scale_1.3_attention_scale_1.5 2.6
ngram_lm_scale_1.5_attention_scale_1.3 2.6
ngram_lm_scale_1.9_attention_scale_1.9 2.6
ngram_lm_scale_2.0_attention_scale_2.0 2.6
... ...
2021-10-26 16:22:39,845 INFO [decode-500-vgg-att0.8.py:403] batch 0/757, cuts processed until now is 5
2021-10-26 16:24:21,415 INFO [decode-500-vgg-att0.8.py:403] batch 100/757, cuts processed until now is 377
2021-10-26 16:25:59,809 INFO [decode-500-vgg-att0.8.py:403] batch 200/757, cuts processed until now is 792
2021-10-26 16:27:37,661 INFO [decode-500-vgg-att0.8.py:403] batch 300/757, cuts processed until now is 1180
2021-10-26 16:29:13,849 INFO [decode-500-vgg-att0.8.py:403] batch 400/757, cuts processed until now is 1549
2021-10-26 16:29:48,348 INFO [decode.py:588] Caught exception:
CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 31.75 GiB total capacity; 20.32 GiB already allocated; 7.15 GiB free; 23.38 GiB reserved in total by PyTorch)
2021-10-26 16:29:48,349 INFO [decode.py:589] num_arcs before pruning: 254683
2021-10-26 16:29:48,378 INFO [decode.py:596] num_arcs after pruning: 7851
2021-10-26 16:30:52,598 INFO [decode-500-vgg-att0.8.py:403] batch 500/757, cuts processed until now is 1946
2021-10-26 16:32:29,012 INFO [decode-500-vgg-att0.8.py:403] batch 600/757, cuts processed until now is 2342
2021-10-26 16:34:05,310 INFO [decode-500-vgg-att0.8.py:403] batch 700/757, cuts processed until now is 2734
2021-10-26 16:36:15,618 INFO [decode-500-vgg-att0.8.py:452]
For test-other, WER of different settings are:
ngram_lm_scale_1.3_attention_scale_1.7 5.72 best for test-other
ngram_lm_scale_1.5_attention_scale_2.0 5.72
ngram_lm_scale_1.2_attention_scale_1.2 5.73
ngram_lm_scale_1.3_attention_scale_1.5 5.73
ngram_lm_scale_1.3_attention_scale_1.9 5.73
ngram_lm_scale_1.5_attention_scale_1.7 5.73
ngram_lm_scale_1.7_attention_scale_2.0 5.73
The following table compares the decoding time between CPU and GPU:
test-clean | test-other | |
---|---|---|
CPU | 1 hour 46 minutes | 3 hours 21 minutes |
GPU | 14 minutes | 14 minutes |
(Note: As you can see, the GPU RAM is not fully used. If you increase --max-duration
, it may take less time.)
from icefall.
I found that converting the model to TorchScript and running model = torch.utils.mobile_optimizer.optimize_for_mobile(model)
helps reduce the CPU recognition time by about 20%. Further improvements can probably be achieved with quantization but I didn’t try.
from icefall.
Also if you’re willing to sacrifice some WER, you can change the decoding method
argument to 1best
.
from icefall.
Using decoding method ctc-decoding
is faster as it requires no LMs and requires fewer FSA operations, though its WER is not as good as other methods.
#58 shows that the WERs using ctc decoding for the librispeech test datasets are:
ctc-decoding 3.26 best for test-clean
ctc-decoding 8.21 best for test-other
from icefall.
These methods may not work for me, and I don't want to sacrifice performance.
My expectation is to spend 30 - 60 minutes decoding on the cpu.
Thanks for your advices.
from icefall.
Attention models are never going to be that fast to decode. I believe the ESPNet setup takes something like 24 hours to decode with attention decoding; and that's on GPU. (But I may be mistaken.)
from icefall.
Hi,
Recently, I did the librispeech experiment on icefall. I found that the cpu takes a long time to decode. I know decoding on gpu will be fast. But I need to decode on cpu. So the question I want to know:
- Is there a way to make it run faster on cpu?
- how long did it take you to decode on cpu?
Here is the screenshot of the decoding log. It took about 10 hours.
thanks!
hi, I also run the decoding on cpu, so,can you tell me your CPU configuration.
@CSerV @csukuangfj When I run the decoding in the following steps:
#########
2021-10-09 15:33:49,785 INFO [decode.py:538] Decoding started
2021-10-09 15:33:49,786 INFO [decode.py:539] {'lm_dir': PosixPath('data/lm'), 'subsampling_factor': 4, 'vgg_frontend': False, 'use_feat_batchnorm': True, 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'num_decoder_layers': 6, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 34, 'avg': 20, 'method': 'attention-decoder', 'num_paths': 100, 'nbest_scale': 0.5, 'export': False, 'exp_dir': PosixPath('conformer_ctc/exp'), 'lang_dir': PosixPath('data/lang_bpe'), 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 50, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-09 15:33:50,955 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe/Linv.pt
2021-10-09 15:33:51,271 INFO [decode.py:549] device: cpu
2021-10-09 15:35:21,894 INFO [decode.py:604] Loading pre-compiled G_4_gram.pt
2021-10-09 15:35:54,747 INFO [decode.py:640] averaging ['conformer_ctc/exp/epoch-15.pt', 'conformer_ctc/exp/epoch-16.pt', 'conformer_ctc/exp/epoch-17.pt', 'conformer_ctc/exp/epoch-18.pt', 'conformer_ctc/exp/epoch-19.pt', 'conformer_ctc/exp/epoch-20.pt', 'conformer_ctc/exp/epoch-21.pt', 'conformer_ctc/exp/epoch-22.pt', 'conformer_ctc/exp/epoch-23.pt', 'conformer_ctc/exp/epoch-24.pt', 'conformer_ctc/exp/epoch-25.pt', 'conformer_ctc/exp/epoch-26.pt', 'conformer_ctc/exp/epoch-27.pt', 'conformer_ctc/exp/epoch-28.pt', 'conformer_ctc/exp/epoch-29.pt', 'conformer_ctc/exp/epoch-30.pt', 'conformer_ctc/exp/epoch-31.pt', 'conformer_ctc/exp/epoch-32.pt', 'conformer_ctc/exp/epoch-33.pt', 'conformer_ctc/exp/epoch-34.pt']
2021-10-09 16:06:26,042 INFO [decode.py:653] Number of model parameters: 116147120
#######
"Number of model parameters" appear will take a long time, and in your screenshot only take 20 seconds. At the same time, the CPU memory can up to 78G, which is very high, so did you make any optimizations?
from icefall.
Here is some screenshots about the cpu configuration. I didn't make any optimizations.
cat /proc/cpuinfo
processor : 63
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
stepping : 7
microcode : 0x5003006
cpu MHz : 2300.000
cache size : 22528 KB
physical id : 1
siblings : 32
core id : 12
cpu cores : 16
apicid : 57
initial apicid : 57
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni spec_ctrl intel_stibp flush_l1d arch_capabilities
bogomips : 4604.99
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
from icefall.
hi, I also run the decoding on cpu, so,can you tell me your CPU configuration.
I am using a virtual machine with 10 CPUs and 50 GB CPU RAM to test decoding with attention-decoder
on CPU.
The decoding command is
CUDA_VISIBLE_DEVICES= ./conformer_ctc/decode-500-vgg-att0.8.py \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--method attention-decoder \
--epoch 34 \
--avg 20
The decoding logs are:
2021-10-25 16:12:01,444 INFO [decode-500-vgg-att0.8.py:465] Decoding started
2021-10-25 16:12:01,445 INFO [decode-500-vgg-att0.8.py:466] {'exp_dir': PosixPath('conformer_ctc/exp_500_att_0.8_vgg'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'subsampling_factor': 4, 'num_decoder_layers': 6, 'vgg_frontend': True, 'is_espnet_structure': True, 'mmi_loss': False, 'use_feat_batchnorm': True, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 34, 'avg': 20, 'method': 'attention-decoder', 'num_paths': 100, 'lattice_score_scale': 1.0, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-25 16:12:01,798 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe_500/Linv.pt
2021-10-25 16:12:02,052 INFO [decode-500-vgg-att0.8.py:476] device: cpu
2021-10-25 16:12:09,762 INFO [decode-500-vgg-att0.8.py:519] Loading pre-compiled G_4_gram.pt
2021-10-25 16:14:50,914 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-25 16:15:34,641 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
/ceph-fj/fangjun/open-source/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection v
iolates the max_frames, max_cuts, or max_duration constraints - we'll return it anyway. Consider increasing max_frames/max_cuts/max_d
uration.
warnings.warn(
2021-10-25 16:15:43,896 INFO [decode-500-vgg-att0.8.py:403] batch 0/787, cuts processed until now is 3
At the same time, the CPU memory can up to 78G, which is very high, so did you make any optimizations?
I don't think it will use that much CPU memory. Here is the memory usage reported by our cluster management tools:
The maximum CPU RAM used in the decoding process is about 25.73 GB. We don't use any optimizations.
Note the code uses only two CPU threads during pruned intersection. If you have more CPU RAM, you can increase
--max-duration
, which can decrease the total decoding time.
how long did it take you to decode on cpu?
We have not performed such a test yet. I am decoding on CPU right now.
from icefall.
I ran decoding on CPU using a model from speechbrain sometime ago.
It takes 45.12778 hours to decode test-clean
. See speechbrain/speechbrain#928 (comment)
(I was not able to decode test-other
on CPU with speechbrain as it consumes lots of memory. My virtual machine got killed due to OOM)
from icefall.
how long did it take you to decode on cpu?
Here is the decoding log on CPU for test-clean that I just got.
You can see that it takes about 1 hour and 46 minutes to decode test clean.
(Note: I use a model with vocab size 500, not 5000)
$ CUDA_VISIBLE_DEVICES= ./conformer_ctc/decode-500-vgg-att0.8.py --max-duration 30 --concatenate-cuts 0 --bucketing-sampler 1 --method attention-decoder --epoch 34 --avg 20
2021-10-25 16:12:01,444 INFO [decode-500-vgg-att0.8.py:465] Decoding started
2021-10-25 16:12:01,445 INFO [decode-500-vgg-att0.8.py:466] {'exp_dir': PosixPath('conformer_ctc/exp_500_att_0.8_vgg'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'subsampling_factor': 4, 'num_decoder_layers': 6, 'vgg_frontend': True, 'is_espnet_structure': True, 'mmi_loss': False, 'use_feat_batchnorm': True, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 34, 'avg': 20, 'method': 'attention-decoder', 'num_paths': 100, 'lattice_score_scale': 1.0, 'export': False, 'full_libri': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-10-25 16:12:01,798 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe_500/Linv.pt
2021-10-25 16:12:02,052 INFO [decode-500-vgg-att0.8.py:476] device: cpu
2021-10-25 16:12:09,762 INFO [decode-500-vgg-att0.8.py:519] Loading pre-compiled G_4_gram.pt
2021-10-25 16:14:50,914 INFO [decode-500-vgg-att0.8.py:558] averaging ['conformer_ctc/exp_500_att_0.8_vgg/epoch-15.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-16.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-17.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-18.pt','conformer_ctc/exp_500_att_0.8_vgg/epoch-19.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-20.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-21.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-22.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-23.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-24.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-25.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-26.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-27.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-28.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-29.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-30.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-31.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-32.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-33.pt', 'conformer_ctc/exp_500_att_0.8_vgg/epoch-34.pt']
2021-10-25 16:15:34,641 INFO [decode-500-vgg-att0.8.py:571] Number of model parameters: 102568040
/ceph-fj/fangjun/open-source/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames, max_cuts, or max_duration constraints - we'll return it anyway. Consider increasing max_frames/max_cuts/max_duration.
warnings.warn(
2021-10-25 16:15:43,896 INFO [decode-500-vgg-att0.8.py:403] batch 0/787, cuts processed until now is 3
2021-10-25 16:29:17,429 INFO [decode-500-vgg-att0.8.py:403] batch 100/787, cuts processed until now is 348
2021-10-25 16:42:46,474 INFO [decode-500-vgg-att0.8.py:403] batch 200/787, cuts processed until now is 690
2021-10-25 16:56:16,433 INFO [decode-500-vgg-att0.8.py:403] batch 300/787, cuts processed until now is 1017
2021-10-25 17:09:35,900 INFO [decode-500-vgg-att0.8.py:403] batch 400/787, cuts processed until now is 1381
2021-10-25 17:22:36,109 INFO [decode-500-vgg-att0.8.py:403] batch 500/787, cuts processed until now is 1694
2021-10-25 17:35:43,517 INFO [decode-500-vgg-att0.8.py:403] batch 600/787, cuts processed until now is 2024
2021-10-25 17:48:50,643 INFO [decode-500-vgg-att0.8.py:403] batch 700/787, cuts processed until now is 2352
2021-10-25 18:01:22,994 INFO [decode-500-vgg-att0.8.py:452]
For test-clean, WER of different settings are:
ngram_lm_scale_1.0_attention_scale_1.1 2.6 best for test-clean
ngram_lm_scale_0.7_attention_scale_1.0 2.61
ngram_lm_scale_0.9_attention_scale_0.9 2.61
ngram_lm_scale_0.9_attention_scale_1.0 2.61
CPU RAM usage is given below. You see that the maximum RAM usage is less than 26 GB.
from icefall.
"Number of model parameters" appear will take a long time, and in your screenshot only take 20 seconds. At the same time, the CPU memory can up to 78G, which is very high, so did you make any optimizations?
This does not look right to me. Model averaging should not take 78 GB CPU RAM. Can you check that
no other memory-consuming processes are running while you are decoding?
from icefall.
Related Issues (20)
- librispeech SSL finetune.py throwing error
- Training with disfluencies in speech
- librispeech hubert pretrain.py throwing error : UnboundLocalError: local variable 'sub_batch_idx' referenced before assignment HOT 1
- error in librispeech SSL pretrain.py HOT 1
- How did you prepare the manifest dir for pretrain and in which format? HOT 1
- early context injection HOT 3
- LLM based speech recognition HOT 1
- Recommended recipe for noisy 5K hours noisy training data HOT 1
- How to fine-tune KWS without downloading wenetspeedch HOT 1
- Libriheavy train_bert_encoder.py incompatible with Lhotse 1.27.0 HOT 4
- Early Stopping of Token Generation in Streaming Model Training HOT 27
- How to specify zipformer model training with different specifications HOT 3
- Zipformer-s model training problem , thanks!
- gigaspeech数据集下载成功,但解压缩失败 HOT 1
- Troubles with streaming decode HOT 16
- Use multi_zh-hans finetune whisper get erro
- [Transducer Loss] Why not normalize transducer loss
- execute prepare.sh and encountered an error in gigaspeech HOT 2
- LLM post processing
- Nan outputs from encoder HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icefall.