❓ Questions and Help What is your question? 我的

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pytorch 张量问题 about funasr HOT 9 OPEN

linrb685 commented on June 25, 2024

Pytorch 张量问题

from funasr.

Comments (9)

LauraGPT commented on June 25, 2024

Please show detail logs of error. Upload the wav file.

from funasr.

kexul commented on June 25, 2024

I got the same issue here, when using the cantonese model, here is the full log @LauraGPT :

Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 457, in forward_one_step
    x = torch.cat((x, pre_acoustic_embeds), dim=-1)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 419, in score
    logp, state = self.forward_one_step(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
    scores[k], states[k] = d.score(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 309, in search
    scores, states = self.score_full(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 410, in forward
    best = self.search(
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/model.py", line 996, in inference
    nbest_hyps = self.beam_search(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 285, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
    results = self.inference(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 248, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
    output = self.model.generate(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/base/base_model.py", line 35, in __call__
    return self.postprocess(self.forward(*args, **kwargs))
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in __call__
    output = self.model(*args, **kwargs)
  File "/data/tts/sovits/GPT-SoVITS/tools/asr/funasr_cantonese.py", line 35, in <module>
    rec_result = inference_pipeline(input="/data/tts/sovits/audio_res/e1/12_4.wav")
  File "/home/user/miniconda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/miniconda/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.

Code I used:

from funasr import AutoModel

path_asr  =  "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad  =  "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc =  "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"

model = AutoModel(
    model               = path_asr,
    vad_model           = path_vad,
    vad_model_revision  = "v2.0.4",
    punc_model          = path_punc,
    punc_model_revision = "v2.0.4",
)



res = model.generate(
    input="/data/tts/sovits/audio_res/e1/12_4.wav"              # Failed 
    # input="/data/tts/sovits/audio_res/e1/12_12.wav"         # Success
)
print(res)

Here is the audio file I used:
Desktop.zip

from funasr.

kexul commented on June 25, 2024

The audio file which failed in my code, can be successfully processed in the online demo of modelscope: https://www.modelscope.cn/models/iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary.
Maybe a recent update breaked some functionality?

from funasr.

kexul commented on June 25, 2024

After some digging, I found that the uniasr seems not able to handle the case where batch size > 1. When vad model is enabled and it split the audio to pieces, the error is triggered.
A temporary solution is disable the vad model.

from funasr.

linrb685 commented on June 25, 2024

@kexul 你的意思是这是vad模型的问题？不使用vad就行？

from funasr.

kexul commented on June 25, 2024

@kexul 你的意思是这是vad模型的问题？不使用vad就行？

嗯，我这边把vad关掉，就都可以跑了，你可以试试看~

from funasr.

linrb685 commented on June 25, 2024

@kexul 多谢，我试试

from funasr.

kexul commented on June 25, 2024

@linrb685 If you still want vad and punct, you can do them manually 🤣:

import soundfile
from pathlib import Path
from funasr import AutoModel

path_asr  =  "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad  =  "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc =  "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"

model = AutoModel(model=path_asr)
vad_model = AutoModel(model=path_vad)
punc_model = AutoModel(model=path_punc)


for item in Path('.').glob('*.wav'):
    print(str(item))
    text = model.generate(input=str(item))[0]['text']
    print(text)

    res_vad = vad_model.generate(input=str(item))[0]['value']
    wav, sr = soundfile.read(str(item))

    full_text = []
    for span in res_vad:
        wav_span = wav[int(span[0]*sr/1000):int(span[1]*sr/1000)]
        wav_temp = soundfile.write('temp.wav', wav_span, sr)
        text = model.generate(input='temp.wav')[0]['text']
        full_text.append(text)

    full_text = ' '.join(full_text)

    punc_text = punc_model.generate(input=full_text)[0]['text']
    print(punc_text)

from funasr.

linrb685 commented on June 25, 2024

@kexul 多谢，vad对我们不是必须的。但是可以考虑加上。目前没遇到，需要多测试一下

from funasr.

Pytorch 张量问题 about funasr HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent