Comments (24)
from icefall.
Are you able to get the stack trace after it gets stuck in get_lattice by manually pressing ctrl + c?
No, Actually it does not even stop with ctrl + c
, need to kill the process with ctrl + z
what is the value of --max-duration
500s
how large is your GPU RAM
Its 48GB and 10GB is occupied during decoding.
Let me try to debug further, if I could get any detailed stack trace.
Thanks,
Divyesh Rajpura
from icefall.
Changing modified=True
, I am successfully able to run decoding on GPU with larger --max-duration
.
By reducing vocab to 500, the GPU memory usage is reduced to ~10GB. Further, I observed the degradation of relative ~10% (when decoding with external LM) compared to vocab size of 5000.
When I revert modified=False
, I can run decoding on GPU with large --max-duration
, but again the memory consumption is reached to ~38GB.
If you can provide some detail or reference to understand the underlying concept for using modified
, that would be really helpful.
Thanks for your time, effort and suggestions.
Thanks,
Divyesh Rajpura
from icefall.
from icefall.
Hi, I am getting the similar issue.
I have trained zipformer with --use-transducer false and --use-ctc true. During decoding, it stuck at the stage of get_lattice in zipformer/ctc_decoding.py
Below are the loss at the end of 10th epoch which I am using for decoding
Training: loss[loss=0.03422, ctc_loss=0.03422, over 12273.00 frames. ],
tot_loss[loss=0.06499, ctc_loss=0.06499, over 2670542.70 frames. ]
Validation: loss=0.02084, ctc_loss=0.02084
Below are the library versions
torch: 2.2.1
k2: 1.24.4
icefall: pulled on 18th Mar 2024
Thanks!
from icefall.
from icefall.
Please find the attached screenshot for your reference
from icefall.
Thanks for the screenshot.
However, the screenshot does not show that it gets stuck at get_lattice
.
Could you tell us how you found that it gets stuck at get_lattice
?
from icefall.
I had put down the logging at various stages to figure the step which is causing an issue.
from icefall.
Are you able to get the stack trace after it gets stuck in get_lattice by manually pressing ctrl + c
?
Also, what is the value of --max-duration
and how large is your GPU RAM?
Does it work when you reduce --max-duration
?
from icefall.
@csukuangfj , I have changed the device from GPU to CPU.
With this change, the decoding terminated automatically and below is the stack trace
[F] /var/www/k2/csrc/array.h:176:k2::Array1<T> k2::Array1<T>::Arange(int32_t, int32_t) const [with T = char; int32_t = int] Check failed: start >= 0 (-152562455 vs. 0)
[ Stack-Trace: ]
/opt/conda/lib/python3.10/site-packages/k2/lib64/libk2_log.so(k2::internal::GetStackTrace()+0x34) [0x7fbc188e49b4]
/opt/conda/lib/python3.10/site-packages/k2/lib64/libk2context.so(k2::Array1<char>::Arange(int, int) const+0x69d) [0x7fbc18e7fded]
/opt/conda/lib/python3.10/site-packages/k2/lib64/libk2context.so(k2::MultiGraphDenseIntersectPruned::PruneTimeRange(int, int)+0x6a6) [0x7fbc1904a236]
/opt/conda/lib/python3.10/site-packages/k2/lib64/libk2context.so(std::_Function_handler<void (), k2::MultiGraphDenseIntersectPruned::Intersect(k2::DenseFsaVec*)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x1e7) [0x7fbc1904d067]
/opt/conda/lib/python3.10/site-packages/k2/lib64/libk2context.so(k2::ThreadPool::ProcessTasks()+0x163) [0x7fbc191ec283]
/opt/conda/lib/python3.10/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xdbbf4) [0x7fbccb0c7bf4]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fbcd3713ac3]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fbcd37a4a04]
Aborted (core dumped)
Thanks,
Divyesh Rajpura
from icefall.
@csukuangfj, @JinZr
Just wanted to check if you had a chance to go through the above issue.
from icefall.
Below are the library versions
torch: 2.2.1
k2: 1.24.4
Could you tell us the exact k2 version you are using?
from icefall.
![Screenshot 2024-04-16 at 09 31 50](https://private-user-images.githubusercontent.com/5284924/322671745-5298bcc1-4c29-4af6-af4a-af9118980aa1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxNDg5NTIsIm5iZiI6MTcxOTE0ODY1MiwicGF0aCI6Ii81Mjg0OTI0LzMyMjY3MTc0NS01Mjk4YmNjMS00YzI5LTRhZjYtYWY0YS1hZjkxMTg5ODBhYTEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYyMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MjNUMTMxNzMyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzQzYTM2NmYzNGRhMDRmNWVkNmM1YWQwMGM0Y2I5YjFjNWYyNTk3NGE3ZTU0MDcwNzliNGZmZjkwODI3ZTBlNCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.gJL3weHTjbcGEtzQhSUNdyI6F_sMsxI7SCjXQ_zBH5g)
from icefall.
@csukuangfj , Thanks for the response
The version details for the k2 is 1.24.4.dev20240223+cuda12.1.torch2.2.1
from icefall.
Looks to me the lattice is so large that the index cannot be reprensted using a int32_t and it gets overflowed.
Could you reduce --max-duration?
In the extreme case, you can set --max-duration=1 and increase it incrementally.
from icefall.
I have already tried to reduce till 100s, let me reduce further.
from icefall.
It is also good to know the exact wave that is causing this error.
You can check whether the wave is very long.
from icefall.
The audio duration in test data are in a range of 2 to 15s.
Reducing max duration helped. The max value for --max-duration
worked for me is 15. If increase further, the decoding gets aborted with Aborted (core dumped)
or with the error that I have posted above.
Current GPU memory utilisation is ~47 GB. Isn't this too high?
Thanks for your time and effort.
Thanks,
Divyesh Rajpura
from icefall.
Current GPU memory utilisation is ~47 GB. Isn't this too high?
The max value for --max-duration worked for me is 15
Could you tell us which decoding method you are using?
It would be great if you could share the exact command you are using.
from icefall.
Sure. The decoding method is ctc-decoding
and below is the command which I am using
export CUDA_VISIBLE_DEVICES=0; python3 zipformer/ctc_decode.py --epoch 10 --avg 1 --exp-dir exp/dnn/zipformer_ctc --use-transducer 0 --use-ctc 1 --max-duration 15 --causal 0 --decoding-method ctc-decoding --manifest-dir exp/fbank --lang-dir exp/lang/bpe_5000/ --bpe-model exp/lang/bpe_5000/bpe.model
Thanks,
Divyesh Rajpura
from icefall.
--lang-dir exp/lang/bpe_5000/
I see the issue. Your vocab size is 5000, which is way larger than ours 500.
Please change
to
modified=True,
and you should be able to use a larger --max-duration
.
I suggest that you also try a smaller vocab size.
A larger vocab size does not necessarily imply a better performance.
from icefall.
Thanks for your suggestion @csukuangfj.
A larger vocab size does not necessarily imply a better performance.
Will give it a try with reduced vocab size also.
Thanks,
Divyesh Rajpura
from icefall.
If you can provide some detail or reference to understand the underlying concept for using modified, that would be really helpful.
@divyeshrajpura4114
Could you refer to the doc of k2 for that?
from icefall.
Related Issues (20)
- Help with training/finetuning a zipformer based model HOT 6
- Different Training Loss with Single Node (8 GPUs) vs. Two Nodes (4 GPUs Each)
- Data cleaning HOT 3
- ONNX decode error HOT 2
- OTC with conformer librispeech/WASR isn't converage.
- ONNX bug HOT 9
- Questions about modifying prepare.sh for training ASR model on custom data HOT 2
- How to use my own dataset based on another dataset HOT 3
- kaldifeat installation error HOT 2
- Why unique lexicon is needed in Chinese ASR, but not in English ASR?
- Error during training OTC conformer_ctc2 HOT 1
- What is difference between zipformer and zipformer_ctc models? HOT 1
- ONNX and Torch models HOT 1
- how to decrease the right chunk size when using zipformer model?
- How to load the base model in the fine-tuning task of KWS HOT 3
- Getting segmentation fault HOT 9
- Training break down. details is showed below HOT 3
- training very slowly HOT 16
- Using my own data to train pruned_transducer_stateless7_ctc_bs, encountered an error
- Grad scale is too small HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icefall.