Comments (9)
Please show detail logs of error. Upload the wav file.
from funasr.
I got the same issue here, when using the cantonese model, here is the full log @LauraGPT :
Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 457, in forward_one_step
x = torch.cat((x, pre_acoustic_embeds), dim=-1)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 419, in score
logp, state = self.forward_one_step(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
scores[k], states[k] = d.score(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 309, in search
scores, states = self.score_full(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 410, in forward
best = self.search(
File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/model.py", line 996, in inference
nbest_hyps = self.beam_search(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 285, in inference
res = model.inference(**batch, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
results = self.inference(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 248, in generate
return self.inference_with_vad(input, input_len=input_len, **cfg)
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/base/base_model.py", line 35, in __call__
return self.postprocess(self.forward(*args, **kwargs))
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in __call__
output = self.model(*args, **kwargs)
File "/data/tts/sovits/GPT-SoVITS/tools/asr/funasr_cantonese.py", line 35, in <module>
rec_result = inference_pipeline(input="/data/tts/sovits/audio_res/e1/12_4.wav")
File "/home/user/miniconda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/miniconda/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
Code I used:
from funasr import AutoModel
path_asr = "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad = "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc = "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
model = AutoModel(
model = path_asr,
vad_model = path_vad,
vad_model_revision = "v2.0.4",
punc_model = path_punc,
punc_model_revision = "v2.0.4",
)
res = model.generate(
input="/data/tts/sovits/audio_res/e1/12_4.wav" # Failed
# input="/data/tts/sovits/audio_res/e1/12_12.wav" # Success
)
print(res)
Here is the audio file I used:
Desktop.zip
from funasr.
The audio file which failed in my code, can be successfully processed in the online demo of modelscope: https://www.modelscope.cn/models/iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary.
Maybe a recent update breaked some functionality?
from funasr.
After some digging, I found that the uniasr seems not able to handle the case where batch size > 1. When vad model is enabled and it split the audio to pieces, the error is triggered.
A temporary solution is disable the vad model.
from funasr.
@kexul 你的意思是这是vad模型的问题?不使用vad就行?
from funasr.
@kexul 你的意思是这是vad模型的问题?不使用vad就行?
嗯,我这边把vad关掉,就都可以跑了,你可以试试看~
from funasr.
@kexul 多谢,我试试
from funasr.
@linrb685 If you still want vad and punct, you can do them manually 🤣:
import soundfile
from pathlib import Path
from funasr import AutoModel
path_asr = "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad = "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc = "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
model = AutoModel(model=path_asr)
vad_model = AutoModel(model=path_vad)
punc_model = AutoModel(model=path_punc)
for item in Path('.').glob('*.wav'):
print(str(item))
text = model.generate(input=str(item))[0]['text']
print(text)
res_vad = vad_model.generate(input=str(item))[0]['value']
wav, sr = soundfile.read(str(item))
full_text = []
for span in res_vad:
wav_span = wav[int(span[0]*sr/1000):int(span[1]*sr/1000)]
wav_temp = soundfile.write('temp.wav', wav_span, sr)
text = model.generate(input='temp.wav')[0]['text']
full_text.append(text)
full_text = ' '.join(full_text)
punc_text = punc_model.generate(input=full_text)[0]['text']
print(punc_text)
from funasr.
@kexul 多谢,vad对我们不是必须的。但是可以考虑加上。目前没遇到,需要多测试一下
from funasr.
Related Issues (20)
- llm+asr can't infer
- 急!,求seaco_paraformer_onnx可以保留时间戳输出,为什么只有pytorch版才有。。 HOT 1
- FSMN vad有支持微调的计划吗? HOT 1
- pytorch 版本和 中文离线文件转写服务(CPU版本)版本识别录音文件 差别巨大 HOT 2
- 使用了GPU运行,但是CPU使用率依旧很高?根据性能报告,GPU确实生效。 HOT 2
- 当ws地址不是localhost的时候连接不上。 HOT 3
- Pyinstaller FunAsr是否支持?能否给个详细的打包exe说明? HOT 2
- 热词间有相互干扰 HOT 1
- Qwen-Audio + VAD 搭配使用报错 HOT 1
- 使用容器镜像启动funASR服务后使用客户端无法调用成功,提示vad_handle is null HOT 3
- 如何在浏览器中加载onnx HOT 1
- [paraformer] onnx-gpu 模型中encoder 和 decoder 相同的 fsmn_block 耗时相差100倍 HOT 2
- VAD切分效果不好
- modelscope推理 HOT 1
- benchmark_onnx_cpp中cpp编译失败 HOT 2
- h5 只显示部分内容,后台可以拿到所有内容 HOT 6
- When running paraformer, an error occurs in the first step of extracting cmvn features.
- 中文实时听写服务,docker启动后访问失败
- paraformer-en与vad联合使用时报错result中没有'timestamp'字段 HOT 1
- VAD - output speech segment probability
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from funasr.