Git Product home page Git Product logo

Comments (9)

LauraGPT avatar LauraGPT commented on June 25, 2024

Please show detail logs of error. Upload the wav file.

from funasr.

kexul avatar kexul commented on June 25, 2024

I got the same issue here, when using the cantonese model, here is the full log @LauraGPT :

Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 457, in forward_one_step
    x = torch.cat((x, pre_acoustic_embeds), dim=-1)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 419, in score
    logp, state = self.forward_one_step(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
    scores[k], states[k] = d.score(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 309, in search
    scores, states = self.score_full(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 410, in forward
    best = self.search(
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/model.py", line 996, in inference
    nbest_hyps = self.beam_search(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 285, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
    results = self.inference(
  File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 248, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
    output = self.model.generate(*args, **kwargs)
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/base/base_model.py", line 35, in __call__
    return self.postprocess(self.forward(*args, **kwargs))
  File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in __call__
    output = self.model(*args, **kwargs)
  File "/data/tts/sovits/GPT-SoVITS/tools/asr/funasr_cantonese.py", line 35, in <module>
    rec_result = inference_pipeline(input="/data/tts/sovits/audio_res/e1/12_4.wav")
  File "/home/user/miniconda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/miniconda/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.

Code I used:

from funasr import AutoModel

path_asr  =  "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad  =  "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc =  "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"

model = AutoModel(
    model               = path_asr,
    vad_model           = path_vad,
    vad_model_revision  = "v2.0.4",
    punc_model          = path_punc,
    punc_model_revision = "v2.0.4",
)



res = model.generate(
    input="/data/tts/sovits/audio_res/e1/12_4.wav"              # Failed 
    # input="/data/tts/sovits/audio_res/e1/12_12.wav"         # Success
)
print(res)

Here is the audio file I used:
Desktop.zip

from funasr.

kexul avatar kexul commented on June 25, 2024

The audio file which failed in my code, can be successfully processed in the online demo of modelscope: https://www.modelscope.cn/models/iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary.
Maybe a recent update breaked some functionality?

from funasr.

kexul avatar kexul commented on June 25, 2024

After some digging, I found that the uniasr seems not able to handle the case where batch size > 1. When vad model is enabled and it split the audio to pieces, the error is triggered.
A temporary solution is disable the vad model.

from funasr.

linrb685 avatar linrb685 commented on June 25, 2024

@kexul 你的意思是这是vad模型的问题?不使用vad就行?

from funasr.

kexul avatar kexul commented on June 25, 2024

@kexul 你的意思是这是vad模型的问题?不使用vad就行?

嗯,我这边把vad关掉,就都可以跑了,你可以试试看~

from funasr.

linrb685 avatar linrb685 commented on June 25, 2024

@kexul 多谢,我试试

from funasr.

kexul avatar kexul commented on June 25, 2024

@linrb685 If you still want vad and punct, you can do them manually 🤣:

import soundfile
from pathlib import Path
from funasr import AutoModel

path_asr  =  "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad  =  "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc =  "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"

model = AutoModel(model=path_asr)
vad_model = AutoModel(model=path_vad)
punc_model = AutoModel(model=path_punc)


for item in Path('.').glob('*.wav'):
    print(str(item))
    text = model.generate(input=str(item))[0]['text']
    print(text)

    res_vad = vad_model.generate(input=str(item))[0]['value']
    wav, sr = soundfile.read(str(item))

    full_text = []
    for span in res_vad:
        wav_span = wav[int(span[0]*sr/1000):int(span[1]*sr/1000)]
        wav_temp = soundfile.write('temp.wav', wav_span, sr)
        text = model.generate(input='temp.wav')[0]['text']
        full_text.append(text)

    full_text = ' '.join(full_text)

    punc_text = punc_model.generate(input=full_text)[0]['text']
    print(punc_text)

from funasr.

linrb685 avatar linrb685 commented on June 25, 2024

@kexul 多谢,vad对我们不是必须的。但是可以考虑加上。目前没遇到,需要多测试一下

from funasr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.