Git Product home page Git Product logo

funaudiollm / cosyvoice Goto Github PK

View Code? Open in Web Editor NEW
4.4K 49.0 435.0 825 KB

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Home Page: https://funaudiollm.github.io/

License: Apache License 2.0

Python 99.77% Dockerfile 0.23%
audio-generation gpt-4o text-to-speech tts cantonese chatbot chatgpt chinese english fine-grained fine-tuning japanese korean multi-lingual natural-language-generation python cosyvoice cross-lingual voice-cloning

cosyvoice's Introduction

CosyVoice

👉🏻 CosyVoice Demos 👈🏻

[CosyVoice Paper][CosyVoice Studio][CosyVoice Code]

For SenseVoice, visit SenseVoice repo and SenseVoice space.

Roadmap

  • 2024/07

    • Flow matching training support
    • WeTextProcessing support when ttsfrd is not avaliable
    • Fastapi server and client
  • 2024/08

    • Repetition Aware Sampling(RAS) inference for llm stability
    • Streaming inference mode support, including kv cache and sdpa for rtf optimization
  • 2024/09

    • 50hz llm model which supports 10 language
  • 2024/10

    • 50hz llama based llm model which supports lora finetune
  • TBD

    • Support more instruction mode
    • Voice conversion
    • Music generation
    • Training script sample based on Mandarin
    • CosyVoice-500M trained with more multi-lingual data
    • More...

Install

Clone and install

  • Clone the repo
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone submodule due to network failures, please run following command until success
cd CosyVoice
git submodule update --init --recursive
conda create -n cosyvoice python=3.8
conda activate cosyvoice
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

Model download

We strongly recommend that you download our pretrained CosyVoice-300M CosyVoice-300M-SFT CosyVoice-300M-Instruct model and CosyVoice-ttsfrd resource.

If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step.

# SDK模型下载
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
# git模型下载,请确保已安装git lfs
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd

Optionaly, you can unzip ttsfrd resouce and install ttsfrd package for better text normalization performance.

Notice that this step is not necessary. If you do not install ttsfrd package, we will use WeTextProcessing by default.

cd pretrained_models/CosyVoice-ttsfrd/
unzip resource.zip -d .
pip install ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl

Basic Usage

For zero_shot/cross_lingual inference, please use CosyVoice-300M model. For sft inference, please use CosyVoice-300M-SFT model. For instruct inference, please use CosyVoice-300M-Instruct model. First, add third_party/Matcha-TTS to your PYTHONPATH.

export PYTHONPATH=third_party/Matcha-TTS
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
# sft usage
print(cosyvoice.list_avaliable_spks())
# change stream=True for chunk stream inference
for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
    torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')
# zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
# cross_lingual usage
prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):
    torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
# instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):
    torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)

Start web demo

You can use our web demo page to get familiar with CosyVoice quickly. We support sft/zero_shot/cross_lingual/instruct inference in web demo.

Please see the demo website for details.

# change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M

Advanced Usage

For advanced user, we have provided train and inference scripts in examples/libritts/cosyvoice/run.sh. You can get familiar with CosyVoice following this recipie.

Build for deployment

Optionally, if you want to use grpc for service deployment, you can run following steps. Otherwise, you can just ignore this step.

cd runtime/python
docker build -t cosyvoice:v1.0 .
# change iic/CosyVoice-300M to iic/CosyVoice-300M-Instruct if you want to use instruct inference
# for grpc usage
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice-300M && sleep infinity"
cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
# for fastapi usage
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && MODEL_DIR=iic/CosyVoice-300M fastapi dev --port 50000 server.py && sleep infinity"
cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>

Discussion & Communication

You can directly discuss on Github Issues.

You can also scan the QR code to join our official Dingding chat group.

Acknowledge

  1. We borrowed a lot of code from FunASR.
  2. We borrowed a lot of code from FunCodec.
  3. We borrowed a lot of code from Matcha-TTS.
  4. We borrowed a lot of code from AcademiCodec.
  5. We borrowed a lot of code from WeNet.

Disclaimer

The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.

cosyvoice's People

Contributors

aluminumbox avatar cyz-hzlh avatar dbink avatar eltociear avatar iflamed avatar junityzhan avatar lauragpt avatar passerbya avatar tyanz avatar v3ucn avatar zhihaodu avatar zhuzizyf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cosyvoice's Issues

微调

训练细节好像很少,如果要用自有数据微调,可以怎么做?

环境和模型都配置好了执行 python3 webui.py --port 50000 --model_dir speech_tts/CosyVoice-300M 报错

(cosyvoice) root@sincere-gold-5436-6f97dcc54d-85z5s:/home/tom/fssd/CosyVoice# python3 webui.py --port 50000 --model_dir speech_tts/CosyVoice-300M
2024-07-06 23:18:27,962 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2024-07-06 23:18:27,963 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-07-06 23:18:27,996 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done!
2024-07-06 23:18:28,006 - modelscope - INFO - AST-Scanning the path "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters']
Traceback (most recent call last):
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 467, in _get_single_file_scan_result
output = self.astScaner.generate_ast(file)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 366, in generate_ast
output = self.scan_import(node, show_offsets=False)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 165, in scan_import
local_out = _scan_import(el, type(el).name)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 134, in _scan_import
return self.scan_import(
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 152, in scan_import
attr = getattr(node, field)
AttributeError: 'ClassDef' object has no attribute 'type_params'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "webui.py", line 31, in
from cosyvoice.cli.cosyvoice import CosyVoice
File "/home/tom/fssd/CosyVoice/cosyvoice/cli/cosyvoice.py", line 17, in
from modelscope import snapshot_download
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/init.py", line 4, in
from modelscope.utils.import_utils import (LazyImportModule,
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/init.py", line 1, in
from .hub import create_model_if_not_exist, read_config
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/hub.py", line 12, in
from modelscope.utils.config import Config
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/config.py", line 22, in
from modelscope.utils.import_utils import import_modules_from_file
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/import_utils.py", line 380, in
class LazyImportModule(ModuleType):
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/import_utils.py", line 383, in LazyImportModule
AST_INDEX = load_index()
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 723, in load_index
_update_index(index, files_mtime)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 639, in _update_index
updated_index = file_scanner.get_files_scan_results(updated_files)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 533, in get_files_scan_results
decorator_list, import_list = self._get_single_file_scan_result(
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py", line 470, in _get_single_file_scan_result
raise Exception(
Exception: During ast indexing the file /opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/models/audio/aec/layers/activations.py, a related error excepted in the file /opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/utils/ast_utils.py at line: 152: "attr = getattr(node, field)" with error msg: "AttributeError: 'ClassDef' object has no attribute 'type_params'", please double check the origin file /opt/conda/envs/cosyvoice/lib/python3.8/site-packages/modelscope/models/audio/aec/layers/activations.py to see whether the file is correctly edited.
(cosyvoice) root@sincere-gold-5436-6f97dcc54d-85z5s:/home/tom/fssd/CosyVoice#

english short words infer error

run inference code as shown below:

ref_path="female01.wav"
cosyvoice = CosyVoice("pretrained_models/CosyVoice-300M")
# zero_shot usage
prompt_speech_16k = load_wav(ref_path, 16000)
output = cosyvoice.inference_cross_lingual(
    "hello world",
    prompt_speech_16k,
)

encounter error:

Traceback (most recent call last):
  File "my_local_infer.py", line 18, in <module>
    output = cosyvoice.inference_cross_lingual(
  File "/data/CosyVoice/cosyvoice/cli/cosyvoice.py", line 75, in inference_cross_lingual
    return {'tts_speech': torch.concat(tts_speeches, dim=1)}
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Zero shot/cross lingual generation with speech prompt more than 30s will raise ONNXRuntimeError

Describe the bug
Zero shot/cross lingual generation with speech prompt more than 30s will raise ONNXRuntimeError
[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Add node. Name:'/Add_2' Status Message: [/Add_2](https://vscode-remote+ssh-002dremote-002bpublic-005fdev-005f4090.vscode-resource.vscode-cdn.net/Add_2): right operand cannot broadcast on dim 0 LeftShape: {1,1671,1280}, RightShape: {1500,1280} at

speech_token = self.speech_tokenizer_session.run(None, {self.speech_tokenizer_session.get_inputs()[0].name: feat.detach().cpu().numpy(),

Expected behavior
should raise explicit exception or trim speech prompt(for cross lingual)

pip install error, under win10

(cosyvoice) PS J:\githubs\CosyVoice> pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/, https://download.pytorch.org/whl/cu118
Collecting conformer==0.3.2 (from -r requirements.txt (line 2))
Downloading https://mirrors.aliyun.com/pypi/packages/3f/7d/714601ab8d790d77d4158af743895fb999216cb02fc6283ab8e54911a887/conformer-0.3.2-py3-none-any.whl (4.3 kB)
Collecting deepspeed==0.14.2 (from -r requirements.txt (line 3))
Downloading https://mirrors.aliyun.com/pypi/packages/f0/84/a7b8ff287f7e1a5f01a010880b0bc5e58f718eaca784b4be592034eab3de/deepspeed-0.14.2.tar.gz (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 663.4 kB/s eta 0:00:00
Preparing metadata (setup.py) ... error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [9 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\mac\AppData\Local\Temp\pip-install-tuexzav1\deepspeed_d61cae8dc0c942e1827f90d08a2215a1\setup.py", line 148, in
assert torch_available, "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops."
AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
[WARNING] unable to import torch, please install it if you want to pre-compile any deepspeed ops.
DS_BUILD_OPS=1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Zero Shot Inference Error

when do zero_shot_inference, i uploaded a chinese mp3, it throws:
When I trim the audio length to shorter, the error is gone.
2024-07-08 17:52:16,887 DEBUG Used_phis: defaultdict(<class 'set'>,
{State(pc_initial=0 nstack_initial=0): set(),
State(pc_initial=186 nstack_initial=0): set(),
State(pc_initial=190 nstack_initial=0): set()})
2024-07-08 17:52:16,888 DEBUG defmap: {}
2024-07-08 17:52:16,888 DEBUG phismap: defaultdict(<class 'set'>, {})
2024-07-08 17:52:16,888 DEBUG changing phismap: defaultdict(<class 'set'>, {})
2024-07-08 17:52:16,888 DEBUG keep phismap: {}
2024-07-08 17:52:16,888 DEBUG new_out: defaultdict(<class 'dict'>, {})
2024-07-08 17:52:16,888 DEBUG ----------------------DONE Prune PHIs-----------------------
2024-07-08 17:52:16,888 DEBUG block_infos State(pc_initial=0 nstack_initial=0):
AdaptBlockInfo(insts=((0, {}), (2, {}), (4, {'res': '$x4.0'}), (6, {'res': '$const6.1'}), (8, {'index': '$const6.1', 'target': '$x4.0', 'res': '$8binary_subscr.2'}), (10, {}), (12, {}), (14, {}), (16, {}), (18, {'res': '$x18.3'}), (20, {'res': '$const20.4'}), (22, {'index': '$const20.4', 'target': '$x18.3', 'res': '$22binary_subscr.5'}), (24, {}), (26, {}), (28, {}), (30, {}), (32, {'op': '+', 'lhs': '$8binary_subscr.2', 'rhs': '$22binary_subscr.5', 'res': '$binop_add32.6'}), (34, {}), (36, {'res': '$const36.7'}), (38, {'res': '$x38.8'}), (40, {'res': '$const40.9'}), (42, {'index': '$const40.9', 'target': '$x38.8', 'res': '$42binary_subscr.10'}), (44, {}), (46, {}), (48, {}), (50, {}), (52, {'op': '*', 'lhs': '$const36.7', 'rhs': '$42binary_subscr.10', 'res': '$binop_mul52.11'}), (54, {}), (56, {'op': '-', 'lhs': '$binop_add32.6', 'rhs': '$binop_mul52.11', 'res': '$binop_sub56.12'}), (58, {}), (60, {'value': '$binop_sub56.12'}), (62, {'res': '$x62.13'}), (64, {'res': '$const64.14'}), (66, {'index': '$const64.14', 'target': '$x62.13', 'res': '$66binary_subscr.15'}), (68, {}), (70, {}), (72, {}), (74, {}), (76, {'res': '$x76.16'}), (78, {'res': '$const78.17'}), (80, {'index': '$const78.17', 'target': '$x76.16', 'res': '$80binary_subscr.18'}), (82, {}), (84, {}), (86, {}), (88, {}), (90, {'op': '-', 'lhs': '$66binary_subscr.15', 'rhs': '$80binary_subscr.18', 'res': '$binop_sub90.19'}), (92, {}), (94, {'res': '$const94.20'}), (96, {'op': '/', 'lhs': '$binop_sub90.19', 'rhs': '$const94.20', 'res': '$binop_truediv96.21'}), (98, {}), (100, {'value': '$binop_truediv96.21'}), (102, {'idx': 0, 'res': '$102load_global.22'}), (104, {}), (106, {}), (108, {}), (110, {}), (112, {}), (114, {'item': '$102load_global.22', 'res': '$114load_attr.24'}), (116, {}), (118, {}), (120, {}), (122, {}), (124, {'res': '$b124.25'}), (126, {}), (128, {}), (130, {'func': '$114load_attr.24', 'args': ['$b124.25'], 'kw_names': None, 'res': '$130call.26'}), (132, {}), (134, {}), (136, {}), (138, {}), (140, {'idx': 0, 'res': '$140load_global.27'}), (142, {}), (144, {}), (146, {}), (148, {}), (150, {}), (152, {'item': '$140load_global.27', 'res': '$152load_attr.29'}), (154, {}), (156, {}), (158, {}), (160, {}), (162, {'res': '$a162.30'}), (164, {}), (166, {}), (168, {'func': '$152load_attr.29', 'args': ['$a162.30'], 'kw_names': None, 'res': '$168call.31'}), (170, {}), (172, {}), (174, {}), (176, {}), (178, {'lhs': '$130call.26', 'rhs': '$168call.31', 'res': '$178compare_op.32'}), (180, {}), (182, {}), (184, {'pred': '$178compare_op.32'})), outgoing_phis={}, blockstack=(), active_try_block=None, outgoing_edgepushed={186: (), 190: ()})
2024-07-08 17:52:16,889 DEBUG block_infos State(pc_initial=186 nstack_initial=0):
AdaptBlockInfo(insts=((186, {'res': '$const186.0'}), (188, {'retval': '$const186.0', 'castval': '$188return_value.1'})), outgoing_phis={}, blockstack=(), active_try_block=None, outgoing_edgepushed={})
2024-07-08 17:52:16,889 DEBUG block_infos State(pc_initial=190 nstack_initial=0):
AdaptBlockInfo(insts=((190, {'res': '$b190.0'}), (192, {'value': '$b190.0', 'res': '$192unary_negative.1'}), (194, {'res': '$a194.2'}), (196, {'op': '/', 'lhs': '$192unary_negative.1', 'rhs': '$a194.2', 'res': '$binop_truediv196.3'}), (198, {}), (200, {'retval': '$binop_truediv196.3', 'castval': '$200return_value.4'})), outgoing_phis={}, blockstack=(), active_try_block=None, outgoing_edgepushed={})
2024-07-08 17:52:16,890 DEBUG label 0:
x = arg(0, name=x) ['x']
$const6.1 = const(int, 1) ['$const6.1']
$8binary_subscr.2 = getitem(value=x, index=$const6.1, fn=) ['$8binary_subscr.2', '$const6.1', 'x']
$const20.4 = const(int, -1) ['$const20.4']
$22binary_subscr.5 = getitem(value=x, index=$const20.4, fn=) ['$22binary_subscr.5', '$const20.4', 'x']
$binop_add32.6 = $8binary_subscr.2 + $22binary_subscr.5 ['$22binary_subscr.5', '$8binary_subscr.2', '$binop_add32.6']
$const36.7 = const(int, 2) ['$const36.7']
$const40.9 = const(int, 0) ['$const40.9']
$42binary_subscr.10 = getitem(value=x, index=$const40.9, fn=) ['$42binary_subscr.10', '$const40.9', 'x']
$binop_mul52.11 = $const36.7 * $42binary_subscr.10 ['$42binary_subscr.10', '$binop_mul52.11', '$const36.7']
a = $binop_add32.6 - $binop_mul52.11 ['$binop_add32.6', '$binop_mul52.11', 'a']
$const64.14 = const(int, 1) ['$const64.14']
$66binary_subscr.15 = getitem(value=x, index=$const64.14, fn=) ['$66binary_subscr.15', '$const64.14', 'x']
$const78.17 = const(int, -1) ['$const78.17']
$80binary_subscr.18 = getitem(value=x, index=$const78.17, fn=) ['$80binary_subscr.18', '$const78.17', 'x']
$binop_sub90.19 = $66binary_subscr.15 - $80binary_subscr.18 ['$66binary_subscr.15', '$80binary_subscr.18', '$binop_sub90.19']
$const94.20 = const(int, 2) ['$const94.20']
b = $binop_sub90.19 / $const94.20 ['$binop_sub90.19', '$const94.20', 'b']
$102load_global.22 = global(np: <module 'numpy' from 'C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\numpy\init.py'>) ['$102load_global.22']
$114load_attr.24 = getattr(value=$102load_global.22, attr=abs) ['$102load_global.22', '$114load_attr.24']
$130call.26 = call $114load_attr.24(b, func=$114load_attr.24, args=[Var(b, pitch.py:429)], kws=(), vararg=None, varkwarg=None, target=None) ['$114load_attr.24', '$130call.26', 'b']
$140load_global.27 = global(np: <module 'numpy' from 'C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\numpy\init.py'>) ['$140load_global.27']
$152load_attr.29 = getattr(value=$140load_global.27, attr=abs) ['$140load_global.27', '$152load_attr.29']
$168call.31 = call $152load_attr.29(a, func=$152load_attr.29, args=[Var(a, pitch.py:428)], kws=(), vararg=None, varkwarg=None, target=None) ['$152load_attr.29', '$168call.31', 'a']
$178compare_op.32 = $130call.26 >= $168call.31 ['$130call.26', '$168call.31', '$178compare_op.32']
bool184 = global(bool: <class 'bool'>) ['bool184']
$184pred = call bool184($178compare_op.32, func=bool184, args=(Var($178compare_op.32, pitch.py:431),), kws=(), vararg=None, varkwarg=None, target=None) ['$178compare_op.32', '$184pred', 'bool184']
branch $184pred, 186, 190 ['$184pred']
label 186:
$const186.0 = const(int, 0) ['$const186.0']
$188return_value.1 = cast(value=$const186.0) ['$188return_value.1', '$const186.0']
return $188return_value.1 ['$188return_value.1']
label 190:
$192unary_negative.1 = unary(fn=, value=b) ['$192unary_negative.1', 'b']
$binop_truediv196.3 = $192unary_negative.1 / a ['$192unary_negative.1', '$binop_truediv196.3', 'a']
$200return_value.4 = cast(value=$binop_truediv196.3) ['$200return_value.4', '$binop_truediv196.3']
return $200return_value.4 ['$200return_value.4']

2024-07-08 17:52:17.2848084 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Add node. Name:'/Add_2' Status Message: D:\a_work\1\s\onnxruntime\core/providers/cpu/math/element_wise_ops.h:560 onnxruntime::BroadcastIterator::Append axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 1500 by 1932

Traceback (most recent call last):
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\gradio\queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\gradio\blocks.py", line 1513, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\gradio\utils.py", line 831, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "X:\tmp\CosyVoice_For_Windows-pack\webui.py", line 126, in generate_audio
output = cosyvoice.inference_zero_shot(tts_text, prompt_text, prompt_speech_16k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\tmp\CosyVoice_For_Windows-pack\cosyvoice\cli\cosyvoice.py", line 63, in inference_zero_shot
model_input = self.frontend.frontend_zero_shot(i, prompt_text, prompt_speech_16k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\tmp\CosyVoice_For_Windows-pack\cosyvoice\cli\frontend.py", line 120, in frontend_zero_shot
speech_token, speech_token_len = self._extract_speech_token(prompt_speech_16k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\tmp\CosyVoice_For_Windows-pack\cosyvoice\cli\frontend.py", line 66, in _extract_speech_token
speech_token = self.speech_tokenizer_session.run(None, {self.speech_tokenizer_session.get_inputs()[0].name: feat.detach().cpu().numpy(),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\cosyvoice\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'/Add_2' Status Message: D:\a_work\1\s\onnxruntime\core/providers/cpu/math/element_wise_ops.h:560 onnxruntime::BroadcastIterator::Append axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 1500 by 1932

中文 省略号 bug

例子:
“少爷你听话,你不能出去,让小楚去,让他去吧……”

此句子中没有 pounc = ['。', '?', '!', ';', ':', '.', '?', '!', ';'] 标点符号,会导致没有字符输出,合成会失败

code:cosyvoice/utils/frontend_utils.py
if lang == "zh":
pounc = ['。', '?', '!', ';', ':', '.', '?', '!', ';']
else:
pounc = ['.', '?', '!', ';', ':']
if comma_split:
pounc.extend([',', ','])
st = 0
utts = []
for i, c in enumerate(text):
if c in pounc:
if len(text[st: i]) > 0:
utts.append(text[st: i] + c)
if i + 1 < len(text) and text[i + 1] in ['"', '”']:
tmp = utts.pop(-1)
utts.append(tmp + text[i + 1])
st = i + 2
else:
st = i + 1

如何训练一个新的语言

我想基于这个模型结构训练其他的语言,想请教一下:

  1. 训练的数据格式要求是怎样的,有参考示例吗?
  2. 预训练的代码可以在哪参考?

Typo on demo page

I am writing to inform you about a small typo I noticed on your demo page.

The incorrent part is Korean original recording.
The incorrect text(written) reads: “운 선생의 경극을 놓쳤지만, 밤의 리월에는 가볼 만한 곳이 많아.”
It should be corrected to(the recording said): “어디가서 눈을 피하지. 난 괜찮은데, 넌 감기 걸릴지도 모르니.”
These are completely different sentences

Thank you for sharing your research.

安装环境时,提示python 3.8版本太低

ERROR: Ignored the following versions that require a different python version: 3.2 Requires-Python >=3.9; 3.2.1 Requires-Python >=3.9; 3.2rc0 Requires-Python >=3.9; 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10; 3.8.0 Requires-Python >=3.9; 3.8.0rc1 Requires-Python >=3.9; 3.8.1 Requires-Python >=3.9; 3.8.2 Requires-Python >=3.9; 3.8.3 Requires-Python >=3.9; 3.8.4 Requires-Python >=3.9; 3.9.0 Requires-Python >=3.9; 3.9.0rc2 Requires-Python >=3.9; 3.9.1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu==1.16.0 (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu==1.16.0

pydoc.ErrorDuringImport: problem in cosyvoice.hifigan.generator - ModuleNotFoundError: No module named 'academicodec'

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

ubuntu执行代码出现问题

pydoc.ErrorDuringImport: problem in cosyvoice.flow.flow_matching - ModuleNotFoundError: No module named 'matcha.models'; 'matcha' is not a package报这个错误怎么办

是否支持其他语种的合成

从readme上看貌似只支持中文和英文的tts。其他语种的是否支持呢?如果支持的话,要怎样设置指定的语种呢?

ImportError: cannot import name 'Annotated' from 'pydantic.typing'

example.py is copied from official example.

2024-07-05 16:29:50,876 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2024-07-05 16:29:50,876 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-07-05 16:29:52,012 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 51af4c199ce0493bf05f6e6a4c460b07 and a total number of 980 components indexed
transformer is not installed, please install it if you want to use related modules
Traceback (most recent call last):
File "example.py", line 1, in
from cosyvoice.cli.cosyvoice import CosyVoice
File "/mnt/bigclass/project/cosyvoice/CosyVoice/cosyvoice/cli/cosyvoice.py", line 18, in
from cosyvoice.cli.frontend import CosyVoiceFrontEnd
File "/mnt/bigclass/project/cosyvoice/CosyVoice/cosyvoice/cli/frontend.py", line 23, in
import inflect
File "/mnt/bigclass/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/inflect/init.py", line 77, in
from pydantic.typing import Annotated
ImportError: cannot import name 'Annotated' from 'pydantic.typing' (/mnt/bigclass/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/pydantic/typing.py)

I update pydantic.typing from 2.7.0 to 2.8.3, still ...
Need I add transformers to requirements.txt?

Error happened during zero shot TTS for text that does not end with punctuation

In Zero Shot mode,if the Prompt text or Target text does not end with punctuation (eg '.' or '?' or '!' ), the following error message will happen:

Traceback (most recent call last):
File "./test.py", line 22, in
output = cosyvoice.inference_zero_shot(target_text, prompt_text, prompt_speech_16k)
File "/root/autodl-tmp/CosyVoice/cosyvoice/cli/cosyvoice.py", line 63, in inference_zero_shot
return {'tts_speech': torch.concat(tts_speeches, dim=1)}
RuntimeError: torch.cat(): expected a non-empty list of Tensors.

支持流式的TTS输出吗

很棒的工作,有像fish-speech那种支持流式的tts输出吗,加快数字人说话的反应时间。

docker版本镜像可以提供公开下载的地址吗

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

speaker

可以识别音频里面的speaker吗。有没有借口?

webdui里,如果输入文本里,没有。等结束符号,会无法生成音频。

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

能提供一份阿里云的镜像吗,现在太难下载原始镜像了

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Inference performance is relatively poor and speech synthesis is slow

Describe the bug

Test on a RTX 4090(24GB) server, found out it is very slow to inference.

Model: CosyVoice-300M-SFT
Input: 虽然技术路线和《Her》有所差别,但从直观效果来看,也算得上是给网友们带来了新的玩具。
Take time: 5.51s

Is it possible to make reasoning faster, please?

用预置语音生成(sft)inference会漏句

用预置语音生成(sft)inference会漏句,音色:中文女
输入:
那有哪些美剧是不太适合学英语的呢?我来给大家举几个例子吧。第一个《破产姐妹》,我不知道为什么总有人推荐这一部,我先声明一下,我真的很喜欢很喜欢破产姐妹,它真的很下饭。我大学有一段时间就是天天去食堂打包吃的,然后回到宿舍,我就边看边吃,甚至听到她那个片头曲,我就会很有食欲,但是我真的真的没有办法用它来学英语。一个是语速太快了;第二全是开车的台词,你说是生活中、考试试中哪儿会用到?所以我觉得破产姐妹下饭必备,学英语还是算了。

其中, "一个是语速太快了"没有合成直接被跳过了

(cosyvoice) E:\CosyVoice_For_Windows-main\CosyVoice_For_Windows-main>install deepspeed from https://github.com/S95Sedan/Deepspeed-Windows/releases/tag/v14.0%2Bpy311 install: target 'https://github.com/S95Sedan/Deepspeed-Windows/releases/tag/v14.0%2Bpy311' is not a directory 这是怎么回事?

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Missing information of **spk_id**

Thanks for the great work of your team,

I had a problem when using
output = cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.')
there is a spk_id parameter, In addition to the "中文男" in the text, can you tell me other parameters that can be used?
Thank you for your help😊

sudo apt-get install sox libsox-dev 在window 下怎么设置

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

RuntimeError: GET was unable to find an engine to execute this computation

Describe the bug
python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M

the web page give a error tips:

and the console error stack like as below:Additional context

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
image

Additional context
2024-07-08 15:38:37,990 INFO get sft inference request
Traceback (most recent call last):
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/gradio/queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/gradio/blocks.py", line 1945, in process_api
result = await self.call_function(
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/gradio/blocks.py", line 1513, in call_function
prediction = await anyio.to_thread.run_sync(
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/gradio/utils.py", line 831, in wrapper
response = f(*args, **kwargs)
File "webui.py", line 120, in generate_audio
output = cosyvoice.inference_sft(tts_text, sft_dropdown)
File "/data/home/xlw/workspace/github.com/CosyVoice/cosyvoice/cli/cosyvoice.py", line 51, in inference_sft
model_output = self.model.inference(**model_input)
File "/data/home/xlw/workspace/github.com/CosyVoice/cosyvoice/cli/model.py", line 58, in inference
tts_speech = self.hift.inference(mel=tts_mel).cpu()
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/home/xlw/workspace/github.com/CosyVoice/cosyvoice/hifigan/generator.py", line 391, in inference
return self.forward(x=mel)
File "/data/home/xlw/workspace/github.com/CosyVoice/cosyvoice/hifigan/generator.py", line 348, in forward
x = self.upsi
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/data/home/xlw/miniconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 801, in forward
return F.conv_transpose1d(
RuntimeError: GET was unable to find an engine to execute this computation

deepspeed安装失败

权限已经给足了,miniconda中也设置了权限,还是出现这个问题,有出现同样问题的大哥吗 我查了目录,并没有找到对应的pip-install-6nshc3v0目录存在,或是已经被remove掉了
image
image

(cosy) H:\AIaudio_Live\cosyvoice\CosyVoice>pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/, https://download.pytorch.org/whl/cu118
Collecting conformer==0.3.2 (from -r requirements.txt (line 2))
Using cached https://mirrors.aliyun.com/pypi/packages/3f/7d/714601ab8d790d77d4158af743895fb999216cb02fc6283ab8e54911a887/conformer-0.3.2-py3-none-any.whl (4.3 kB)
Collecting deepspeed==0.14.2 (from -r requirements.txt (line 3))
Using cached https://mirrors.aliyun.com/pypi/packages/f0/84/a7b8ff287f7e1a5f01a010880b0bc5e58f718eaca784b4be592034eab3de/deepspeed-0.14.2.tar.gz (1.3 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-6nshc3v0\deepspeed_b7a9ce2f848f4d0b81eaf02517880495\setup.py", line 222, in
create_dir_symlink('..\..\csrc', '.\deepspeed\ops\csrc')
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-6nshc3v0\deepspeed_b7a9ce2f848f4d0b81eaf02517880495\setup.py", line 214, in create_dir_symlink
os.remove(dest)
PermissionError: [WinError 5] 拒绝访问。: '.\deepspeed\ops\csrc'

我去microsoft/DeepSpeed#1189 他们那里看了,貌似windows环境支持得不是很好?得用linux部署或者WSL?

使用3s极速复制出来的语音停顿感不是很好,请问有没优化的方法

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

ttsfrd apple metal cpu(arm64) support

Is your feature request related to a problem? Please describe.
Wanna run cosyvoice on macbook metal cpu

Describe the solution you'd like
provide ttsfrd-0.3.6-cp38-cp38-linux_arm64.whl

Describe alternatives you've considered

Additional context

no module named 'matcha.models'.

当我按照readme中步骤安装了环境下载了模型然后运行export PYTHONPATH=third_party/AcademiCodec:third_party/Matcha-TTS和from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice('speech_tts/CosyVoice-300M-SFT')

sft usage

print(cosyvoice.list_avaliable_spks())
output = cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女')
torchaudio.save('sft.wav', output['tts_speech'], 22050)

cosyvoice = CosyVoice('speech_tts/CosyVoice-300M')

zero_shot usage

prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
output = cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k)
torchaudio.save('zero_shot.wav', output['tts_speech'], 22050)

cross_lingual usage

prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
output = cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that's coming into the family is a reason why sometimes we don't buy the whole thing.', prompt_speech_16k)
torchaudio.save('cross_lingual.wav', output['tts_speech'], 22050)

cosyvoice = CosyVoice('speech_tts/CosyVoice-300M-Instruct')

instruct usage

output = cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的勇气智慧。', '中文男', 'Theo 'Crimson', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.')
torchaudio.save('instruct.wav', output['tts_speech'], 22050)会报错raceback (most recent call last):
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/pydoc.py", line 343, in safeimport
module = import(path)
File "/media/user/E/CosyVoice-main/cosyvoice/flow/flow_matching.py", line 16, in
from matcha.models.components.flow_matching import BASECFM
ModuleNotFoundError: No module named 'matcha.models'; 'matcha' is not a package

During handling of the above exception, another exception occurred:
cosyvoice = CosyVoice('speech_tts/CosyVoice-300M-SFT')
File "/media/user/E/CosyVoice-main/cosyvoice/cli/cosyvoice.py", line 29, in init
configs = load_hyperpyyaml(f)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/hyperpyyaml/core.py", line 188, in load_hyperpyyaml
hparams = yaml.load(yaml_stream, Loader=loader)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/yaml/init.py", line 81, in load
return loader.get_single_data()
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 116, in get_single_data
return self.construct_document(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 120, in construct_document
data = self.construct_object(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 147, in construct_object
data = self.construct_non_recursive_object(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 188, in construct_non_recursive_object
for _dummy in generator:
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 633, in construct_yaml_map
value = self.construct_mapping(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 429, in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 244, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 147, in construct_object
data = self.construct_non_recursive_object(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 183, in construct_non_recursive_object
data = constructor(self, tag_suffix, node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/hyperpyyaml/core.py", line 480, in _construct_object
args, kwargs = _load_node(loader, node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/hyperpyyaml/core.py", line 434, in _load_node
kwargs = loader.construct_mapping(node, deep=True)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 429, in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 244, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 147, in construct_object
data = self.construct_non_recursive_object(node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 183, in construct_non_recursive_object
data = constructor(self, tag_suffix, node)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/hyperpyyaml/core.py", line 470, in construct_object
callable
= pydoc.locate(callable_string)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/pydoc.py", line 1626, in locate
nextmodule = safeimport('.'.join(parts[:n+1]), forceload)
File "/home/user/anaconda3/envs/cosyvoice/lib/python3.8/pydoc.py", line 358, in safeimport
raise ErrorDuringImport(path, sys.exc_info())
pydoc.ErrorDuringImport: problem in cosyvoice.flow.flow_matching - ModuleNotFoundError: No module named 'matcha.models'; 'matcha' is not a package

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.