Git Product home page Git Product logo

Comments (6)

MicKot avatar MicKot commented on June 23, 2024 1

setting chunk-size=-1 and left-context-frames=-1 does not mean 'non-streaming', it just means the model gets full context but the model is still 'streaming', i.e. the convolutions are still causal. By exporting the model by nonstreaming script you get rid of the ability to use the cache, which is the whole point of training streaming model. From my testing setting chunk-size=512 and left-context-frames=512 gives the best WER (which is not suprising given context is king) if WER is what you care about while still having the ability to 'stream' (just not in realtime)

from icefall.

JinZr avatar JinZr commented on June 23, 2024

from icefall.

1215thebqtic avatar 1215thebqtic commented on June 23, 2024

hi jin, thanks for your reply! More than 100k hours are used to train the model, training will take about 20days due to limited gpus. As for the streaming model, the non-stream decoding options (chunk-size=-1, --left-context-frames=-1) shows relative 20%~30% better wer than the streaming decoding options (chunk-size=32, --left-context-frames=128), so I decide to export the non-streaming model.

from icefall.

csukuangfj avatar csukuangfj commented on June 23, 2024

我在看这个

from icefall.

1215thebqtic avatar 1215thebqtic commented on June 23, 2024

我在看这个

您好,那个错误我发现是因为这个函数里面有个if else判断语句

def _get_chunk_scale(self, chunk_size: int):

导出onnx的时候输入的input是

x = torch.zeros(1, 100, 80, dtype=torch.float32)

所以走的上面那个if else中if,我在用导出的模型测试的时候语音都是十几秒的语音,所以会报那个维度不匹配的错误。如果把这个dummy input改成x = torch.zeros(1, 1000, 80, dtype=torch.float32)就会走else那个分支,就不会报错可以正常识别了。但是,现在是我不知道怎么把这个if else合并或者拆分让他长短语音都能用

from icefall.

csukuangfj avatar csukuangfj commented on June 23, 2024

Replied in the Next-gen Kaldi WeChat group.

The fix is

diff --git a/egs/librispeech/ASR/zipformer/scaling_converter.py b/egs/librispeech/ASR/zipformer/scaling_converter.py
index 76622fa1..346db55e 100644
--- a/egs/librispeech/ASR/zipformer/scaling_converter.py
+++ b/egs/librispeech/ASR/zipformer/scaling_converter.py
@@ -36,7 +36,7 @@ from scaling import (
     SwooshROnnx,
     Whiten,
 )
-from zipformer import CompactRelPositionalEncoding
+from zipformer import CompactRelPositionalEncoding, ChunkCausalDepthwiseConv1d


 # Copied from https://pytorch.org/docs/1.9.0/_modules/torch/nn/modules/module.html#Module.get_submodule  # noqa
@@ -93,6 +93,10 @@ def convert_scaled_to_non_scaled(
             # the input changes, so we have to use torch.jit.script()
             # to replace torch.jit.trace()
             d[name] = torch.jit.script(m)
+        elif is_onnx and isinstance(m, ChunkCausalDepthwiseConv1d):
+            # to export a zipformer model that is trained with --causal=1
+            # but to export it with --chunk-size=-1 and --left-chunk-size=-1
+            d[name] = torch.jit.script(m)

     for k, v in d.items():
         if "." in k:

from icefall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.