Comments (4)
By the way, you can replace ctc_loss != float('inf')
with
torch.isinf(ctc_loss)
from icefall.
The minimum of number frames (before subsampling) is 11.
If you are using conformer_ctc, which has a subsampling factor of 4, you will get ( (11 - 1) / 2 - 1) / 2 = 2
frames.
One of the assumptions of CTC is that the output label length is not greater than that of the input length.
If that assumption does not hold, you will get an empty lattice, and hence an inf
value for the total score.
And what is the recommended minimum threshold for this value?
You can do one of the following:
- (1) Exclude
inf
values from ctc loss
icefall/egs/librispeech/ASR/conformer_ctc/train.py
Lines 367 to 373 in 243fb97
(Note you have to first set params.reduction
to none
, filter inf
values from ctc_loss
, and then compute the sum.
- (2) Filter utterances that are too short from the manifest.
from icefall.
Thanks, I will try to simply filter short utterances and adjust some threshold for experiments.
from icefall.
My solution:
ctc_loss = k2.ctc_loss(decoding_graph=decoding_graph,
dense_fsa_vec=dense_fsa_vec,
output_beam=params.beam_size,
reduction='none',
use_double_scores=params.use_double_scores,
)
#filter inf when performing sum reduction with ctc_loss
ctc_loss = torch.sum(torch.where(ctc_loss != float('inf'),ctc_loss , torch.tensor(0, dtype=torch.float32).to(device)))
from icefall.
Related Issues (20)
- zipformer-adapter streaming_forward without adapters. HOT 4
- Feature extraction for 5000 hours of data HOT 4
- Plans to make installation simpler HOT 14
- How to use an external RNN-LM (mono-lingual) with a bilingual ASR? HOT 3
- json.decoder.JSONDecodeError,when I run wenetspeech prepare.sh HOT 1
- kaldi经典的强制对齐算法怎么在k2实现呢 HOT 1
- export a non-stream onnx model from a streaming pytorch model HOT 6
- A question about the data preparation on AMI corpus HOT 9
- Decoding conformer_ctc trained on TIMIT with ctc-decoding HOT 24
- 关于wenetspeech的指标是不是有一点问题 HOT 5
- What is the purpose of --lr-hours config in LibriHeavy recipe? HOT 2
- Using a BTC/OTC in the training Zipformer instead of Conformer. HOT 10
- Decoding Issue: fast beam search nbest LG HOT 1
- Is there any recipe for a Spanish model? HOT 1
- Is it possible to do reverberation on the fly? HOT 7
- Mamba implementation under icefall HOT 1
- Seeking advice on parameter configuration and settings for large-scale ASR models HOT 1
- initial decoder input in onnx decoding results in deletion errors HOT 1
- 使用sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13模型进行语音识别,每次重新启动时都有首字不能识别的问题。 HOT 1
- Decoding using LM with Contextual biasing (Hotwords)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icefall.