lenml / chattts-forge Goto Github PK

🍦 ChatTTS-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Home Page: https://huggingface.co/spaces/lenML/ChatTTS-Forge

License: GNU Affero General Public License v3.0

Python 89.82% JavaScript 8.09% HTML 0.27% Dockerfile 0.08% Jupyter Notebook 0.64% CSS 1.10%

chattts ssml tts chattts-forge agent gpt llm text-to-speech colab llama

chattts-forge's People

Contributors

Stargazers

Watchers

chattts-forge's Issues

未知原因造成进程频繁结束

API工作正常,webUI每次切换风格或音色,文本输入新内容都会造成进程结束,日志内也找不到有效的提示.
如下:

2024-06-11 11:54:05,705 - modules.models - INFO - ChatTTS models loaded
2024-06-11 11:54:05,722 - modules.generate_audio - INFO - ('spk', 'female2')
2024-06-11 11:54:05,722 - modules.generate_audio - INFO - {'text': ['公司的年度总结会议将在下周三举行，请各部门提前准备好相关材料，确保会议顺利进行 [lbreak] '], 'infer_seed': 42, 'temperature': 0.3, 'top_P': 0.7, 'top_K': 20, 'prompt1': '', 'prompt2': '', 'prefix': ''}
0%| | 0/2048 [00:00<?, ?it/s]C:\Users\embra.conda\envs\ChatTTS-Forge\Lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
100%|██████████| 2048/2048 [00:05<00:00, 395.60it/s]
2024-06-11 11:54:11,176 - modules.repos_static.resemble_enhance.hparams - INFO - Reading hparams from C:\0_AI_0\ChatTTS-Forge\20\models\resemble-enhance\hparams.yaml
2024-06-11 11:54:11,735 - modules.repos_static.resemble_enhance.enhancer.lcfm.lcfm - INFO - Freeze ae (encoder and decoder)
100%|██████████| 1/1 [00:02<00:00, 2.12s/it]
2024-06-11 11:54:15,278 - modules.repos_static.resemble_enhance.inference - INFO - Elapsed time: 2.198 s, 170.603 kHz

进程已结束，退出代码为 0

建议增加workers参数，以便增加并发

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

能否实现流式推理生成音频？

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

api 接口报错

api 接口http://localhost:8000/v1/audio/speech

报错

2024-06-10 08:42:17,204 - modules.generate_audio - INFO - ('spk', 'female2') 2024-06-10 08:42:17,204 - modules.generate_audio - INFO - {'text': ['string'], 'infer_seed': 42, 'temperature': 0.3, 'top_P': 0.7, 'top_K': 20, 'prompt1': '', 'prompt2': '', 'prefix': ''} 0%| | 0/2048 [00:00<?, ?it/s]C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( 100%|████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:01<00:00, 1322.41it/s] C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\vocos\heads.py:67: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ..\aten\src\ATen\EmptyTensor.cpp:42.) S = mag * (x + 1j * y) INFO: 127.0.0.1:4774 - "POST /v1/audio/speech HTTP/1.1" 200 OK ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 408, in run_asgi result = await app( # type: ignore[func-returns-value] File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\applications.py", line 1106, in __call__ await super().__call__(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\errors.py", line 184, in __call__ raise exc File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\cors.py", line 91, in __call__ await self.simple_response(scope, receive, send, request_headers=headers) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\cors.py", line 146, in simple_response await self.app(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__ raise exc File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__ await self.app(scope, receive, sender) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 20, in __call__ raise e File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 17, in __call__ await self.app(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\routing.py", line 718, in __call__ await route.handle(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\routing.py", line 276, in handle await self.app(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\routing.py", line 69, in app await response(scope, receive, send) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\responses.py", line 270, in __call__ async with anyio.create_task_group() as task_group: File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 597, in __aexit__ raise exceptions[0] File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\responses.py", line 273, in wrap await func() File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\responses.py", line 262, in stream_response async for chunk in self.body_iterator: File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 63, in iterate_in_threadpool yield await anyio.to_thread.run_sync(_next, iterator) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run result = context.run(func, *args) File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 53, in _next return next(iterator) TypeError: '_TemporaryFileWrapper' object is not an iterator

Optimize `audio_data_to_segment` Function to Reduce Processing Time by ~2000ms

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

Issue Description:

Summary:

The current implementation of the audio_data_to_segment function in code/ChatTTS-Forge/modules/SynthesizeSegments.py is inefficient and results in significant processing time. By optimizing the function, we can reduce the processing time by approximately 2000 milliseconds.

Current Implementation:

The current function converts audio data to a byte stream and then reads it back into an AudioSegment object, which is time-consuming.

def audio_data_to_segment(audio_data, sr):
    byte_io = io.BytesIO()
    write(byte_io, rate=sr, data=audio_data)
    byte_io.seek(0)

    return AudioSegment.from_file(byte_io, format="wav")

Proposed Optimization:

The optimized function ensures the audio data is in the correct format and directly creates an AudioSegment object from the byte data, significantly reducing the processing time.

import numpy as np
def audio_data_to_segment(audio_data, sr):
    # Ensure the audio data is in the correct format
    audio_data = (audio_data * 32767).astype(np.int16)  # Convert float32 to int16
    audio_segment = AudioSegment(
        audio_data.tobytes(), 
        frame_rate=sr, 
        sample_width=audio_data.dtype.itemsize, 
        channels=1  # Assuming mono audio
    )
    return audio_segment

Performance Improvement:

Testing has shown that the optimized function can reduce the processing time by nearly 2000 milliseconds, making the system more efficient and responsive.

Action Required:

Please review the proposed changes and consider integrating the optimized function into the project to improve performance.

Thank you for your attention to this matter.

[ISSUE]长文本声音会出现跳跃的情况，30s的语音，前20s一个声音，后面一个声音

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

[ISSUE]WEBUI中长文本声音会出现跳跃的情况，30s的语音，前20s一个声音，后面一个声音

启用降噪和启用增强还是有问题

启用增强无法通过控制台下载模型，我是打开dl_enhance.py文件复制链接手动下载的，然后启用增强挺慢的，等了五到十分钟才能生成样例。
启用降噪则无法使用
Traceback (most recent call last):
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\gradio\queueing.py", line 532, in process_events
response = await route_utils.call_process_api(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1923, in process_api
result = await self.call_function(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1509, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\gradio\utils.py", line 832, in wrapper
response = f(*args, **kwargs)
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\modules\webui\webui_utils.py", line 221, in tts_generate
audio_data, sample_rate = apply_audio_enhance(
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\modules\webui\webui_utils.py", line 102, in apply_audio_enhance
tensor, sr = enhancer.denoise(tensor, sr)
File "D:\BaiduNetdiskDownload\ChatTTS-Forge\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
TypeError: ResembleEnhance.denoise() missing 1 required positional argument: 'device'

新的 requirements.dev.txt 测试发现，Python 版本和安装包混乱不知道是不是个例

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

ERROR: Ignored the following versions that require a different python version: 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement networkx==3.3 (from versions: 0.34, 0.35, 0.35.1, 0.36, 0.37, 0.99, 1.0rc1, 1.0, 1.0.1, 1.1, 1.2rc1, 1.2, 1.3rc1, 1.3, 1.4rc1, 1.4, 1.5rc1, 1.5, 1.6rc1, 1.6, 1.7rc1, 1.7, 1.8rc1, 1.8, 1.8.1, 1.9rc1, 1.9, 1.9.1, 1.10rc2, 1.10, 1.11rc1, 1.11rc2, 1.11, 2.0, 2.1, 2.2rc1, 2.2, 2.3rc3, 2.3rc4, 2.3, 2.4rc1, 2.4rc2, 2.4, 2.5rc1, 2.5, 2.5.1, 2.6rc1, 2.6rc2, 2.6, 2.6.1, 2.6.2, 2.6.3, 2.7rc1, 2.7, 2.7.1, 2.8rc1, 2.8, 2.8.1rc1, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.8.5, 2.8.6, 2.8.7, 2.8.8, 3.0b1, 3.0rc1, 3.0, 3.1rc0, 3.1, 3.2rc0, 3.2, 3.2.1)
ERROR: No matching distribution found for networkx==3.3

关于speaker增加的问题

hi，我想咨询下data 路径下speaker增加的方法。是使用了VQ Encoder 将speaker的语音转换成了embedding并保存为pt文件吗？还是有其他别的方法？谢谢~

今日更新后报错'ValueError: high is out of bounds for int32'

今天更新后报错,日志如下
(Chattts-ui) PS C:\0_AI_0\ChatTTS-Forge\10> python webui.py --half
device use cuda
Traceback (most recent call last):
File "C:\0_AI_0\ChatTTS-Forge\10\webui.py", line 19, in
from modules.synthesize_audio import synthesize_audio
File "C:\0_AI_0\ChatTTS-Forge\10\modules\synthesize_audio.py", line 4, in
from modules.SynthesizeSegments import SynthesizeSegments, combine_audio_segments
File "C:\0_AI_0\ChatTTS-Forge\10\modules\SynthesizeSegments.py", line 57, in
class SynthesizeSegments:
File "C:\0_AI_0\ChatTTS-Forge\10\modules\SynthesizeSegments.py", line 58, in SynthesizeSegments
batch_default_spk_seed = int(np.random.randint(0, 2**32 - 1))
File "numpy\random\mtrand.pyx", line 780, in numpy.random.mtrand.RandomState.randint
File "numpy\random\_bounded_integers.pyx", line 1423, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

问了Gemini,Gemini解释如下:
"您在 ChatTTS-ui 项目中遇到的 ValueError: high is out of bounds for int32 错误，源于 C:\0_AI_0\ChatTTS-Forge\10\modules\SynthesizeSegments.py 文件中的第 58 行。该代码尝试使用 np.random.randint(0, 232 - 1) 生成一个随机整数 (batch_default_spk_seed)。然而，32 位整数 (int32) 的最大可表示值为 2,147,483,647。代码试图生成一个高达 232 - 1 (即 4,294,967,295) 的数字，超出了 int32 的限制。"

我暂时修改代码如下:
batch_default_spk_seed = int(np.random.randint(0, 2**32 - 1, dtype=np.int64))

程序顺利运行.

报告一下,不提交了.

祝码运昌隆!

关于openai api格式接口问题

您项目里的文档里写的是OpenAI 风格 API: /v1/openai/audio/speech 提供类似 OpenAI 的语音生成接口。
但是官方文档里的格式为 https://api.openai.com/v1/audio/speech \

报错信息:
INFO: 192.168.1.188:61456 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
INFO: 192.168.1.188:61450 - "POST /v1/openai/audio/speech HTTP/1.1" 200 OK
接口填写的模型和音色：
"model": "chattts-4w",
"voice": "female2",

no file mp_rank_00_model_states.pt

请问mp_rank_00_model_states.pt从哪里下载的，找不到这玩意。

为什么一段长文本会出现多种不同的声音

我的一段长文本，会出现不同人对话的场景，不知哪里配置有问题。
配置如下：

建議：import 按照 google 規範

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

Google 的 Python 規範

是否可以按照推薦來進行操作呢。

現在的代碼無論怎麽樣，看著都不舒服。

當然，這是一個優化而已，純屬個人强迫。

接口 /v1/openai/audio/speech 有BUG

Describe the bug
500 错误

Inference Seed return error in webui

The first press of the button is normal, but pressing it again will return an error.

Traceback (most recent call last):
File "D:\ChatTTS-Forge\venv\lib\site-packages\gradio\queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
File "D:\ChatTTS-Forge\venv\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "D:\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1941, in process_api
inputs = await self.preprocess_data(
File "D:\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1655, in preprocess_data
processed_input.append(block.preprocess(inputs_cached))
File "D:\ChatTTS-Forge\venv\lib\site-packages\gradio\components\number.py", line 118, in preprocess
elif self.minimum is not None and payload < self.minimum:
TypeError: '<' not supported between instances of 'str' and 'int'

能不能增加版本区分和更新内容，这样能更清楚更新了哪些。

如题

大佬好，开启增强和降噪就会报错。

rompt1': '', 'prompt2': '', 'prefix': ''}
100%|█████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:13<00:00, 151.70it/s]
INFO:modules.repos_static.resemble_enhance.hparams:Reading hparams from D:\AI\ChatTTS-Forge\models\resemble-enhance\hparams.yaml
INFO:modules.repos_static.resemble_enhance.enhancer.lcfm.lcfm:Freeze ae (encoder and decoder)
Traceback (most recent call last):
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\queueing.py", line 532, in process_events
    response = await route_utils.call_process_api(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\blocks.py", line 1928, in process_api
    result = await self.call_function(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\blocks.py", line 1514, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\utils.py", line 832, in wrapper
    response = f(*args, **kwargs)
  File "D:\AI\ChatTTS-Forge\modules\webui\webui_utils.py", line 221, in tts_generate
    audio_data, sample_rate = apply_audio_enhance(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AI\ChatTTS-Forge\modules\webui\webui_utils.py", line 94, in apply_audio_enhance
    enhancer = load_enhancer(device)
  File "D:\AI\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 24, in load_enhancer
    resemble_enhance.load_model()
  File "D:\AI\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 38, in load_model
    state_dict = torch.load(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 997, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 444, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 425, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\AI\\ChatTTS-Forge\\models\\resemble-enhance\\mp_rank_00_model_states.pt'

以上是开启增强的报错
下面是开启降噪的报错。不开启增强和降噪就不会报错。

(ChatTTS_Forge) D:\AI\ChatTTS-Forge>python webui.py
INFO:modules.webui.app:WebUI module initialized
INFO:modules.webui.localization:Loaded localization file D:\AI\ChatTTS-Forge\language\zh-CN.json
INFO:modules.generate_audio:LRU cache enabled with size 64
INFO:modules.devices.devices:Using full precision: torch.float32
INFO:modules.devices.devices:Using device: cuda
Running on local URL:  http://0.0.0.0:7860
INFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
INFO:httpx:HTTP Request: GET http://localhost:7860/startup-events "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
INFO:modules.models:Loading ChatTTS models
INFO:modules.ChatTTS.ChatTTS.core:Load from local: ./models/ChatTTS
INFO:modules.ChatTTS.ChatTTS.core:vocos loaded.
INFO:modules.ChatTTS.ChatTTS.core:dvae loaded.
INFO:modules.ChatTTS.ChatTTS.core:gpt loaded.
INFO:modules.ChatTTS.ChatTTS.core:decoder loaded.
INFO:modules.ChatTTS.ChatTTS.core:tokenizer loaded.
INFO:modules.ChatTTS.ChatTTS.core:All initialized.
INFO:modules.models:ChatTTS models loaded
INFO:modules.generate_audio:('spk', 'female2')
INFO:modules.generate_audio:{'text': ['chat T T S 是一款强大的对话式文本转语音模型。它有中英混读和多说话人的能力。'], 'infer_seed': 42, 'temperature': 0.3, 'top_P': 0.7, 'top_K': 20, 'prompt1': '', 'prompt2': '', 'prefix': ''}
  0%|                                                                                         | 0/2048 [00:00<?, ?it/s]D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
100%|█████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:11<00:00, 177.40it/s]
INFO:modules.repos_static.resemble_enhance.hparams:Reading hparams from D:\AI\ChatTTS-Forge\models\resemble-enhance\hparams.yaml
INFO:modules.repos_static.resemble_enhance.enhancer.lcfm.lcfm:Freeze ae (encoder and decoder)
Traceback (most recent call last):
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\queueing.py", line 532, in process_events
    response = await route_utils.call_process_api(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\blocks.py", line 1928, in process_api
    result = await self.call_function(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\blocks.py", line 1514, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\gradio\utils.py", line 832, in wrapper
    response = f(*args, **kwargs)
  File "D:\AI\ChatTTS-Forge\modules\webui\webui_utils.py", line 221, in tts_generate
    audio_data, sample_rate = apply_audio_enhance(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AI\ChatTTS-Forge\modules\webui\webui_utils.py", line 94, in apply_audio_enhance
    enhancer = load_enhancer(device)
  File "D:\AI\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 24, in load_enhancer
    resemble_enhance.load_model()
  File "D:\AI\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 38, in load_model
    state_dict = torch.load(
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 997, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 444, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "D:\AI\anaconda3\envs\ChatTTS_Forge\lib\site-packages\torch\serialization.py", line 425, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\AI\\ChatTTS-Forge\\models\\resemble-enhance\\mp_rank_00_model_states.pt'

看着是缺少模型，但是没看到哪里有这俩模型下载。

SSML出错wave.Error: bad sample width

SSML出错，Examples里的每个都试过，报错一样
Traceback (most recent call last):
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\queueing.py", line 532, in process_events
response = await route_utils.call_process_api(
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\blocks.py", line 1933, in process_api
data = await self.postprocess_data(block_fn, result["prediction"], state)
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\blocks.py", line 1756, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\components\audio.py", line 266, in postprocess
file_path = processing_utils.save_audio_to_cache(
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\processing_utils.py", line 244, in save_audio_to_cache
audio_to_file(sample_rate, data, filename, format=format)
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\gradio\processing_utils.py", line 575, in audio_to_file
file = audio.export(filename, format=format)
File "F:\audio\ChatTTS-Forge\python\lib\site-packages\pydub\audio_segment.py", line 890, in export
wave_data.setsampwidth(self.sample_width)
File "F:\audio\ChatTTS-Forge\python\lib\wave.py", line 353, in setsampwidth
raise Error('bad sample width')
wave.Error: bad sample width

关于语气控制的问题

感谢分享！
但是webui左下角use decoder、prompt1、prompt2、prefix的作用似乎是语气控制，但具体填什么没有说明。还有风格选择似乎也没有效果。

本地部署报错，无法生成，貌似无法使用cuda导致

前端可以启动，点击生成报‘Error’
日志输出
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
DEBUG:matplotlib.pyplot:Loaded backend agg version v2.2.
ERROR:modules.ssml:apply style failed, Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
INFO:modules.ssml:collect len(segments): 1
DEBUG:matplotlib.pyplot:Loaded backend QtAgg version 5.15.10.
看起来是无法使用cuda加速导致的。我计算机应该没问题，另外两个同样基于ChatTTS的webUI都可以运行。

语言问题

现在界面上的语言不统一，可以统一用英文便于国际传播，然后增加一个类似fooocus的语言文件功能，这样可以自己翻译成其他语言进行界面语言更改，不然界面语言有点乱。然后可以把说明里面的内容附加到每个功能下面以小字方式显示会容易对照着看。
另外webui.py尾部可以增加一行inbrowser=True,这样就可以自动用浏览器打开，节省一步用户操作。

项目很赞👍

工程跑起来，docs里面的swagger的测试api可以正常使用。

但是Playground部分文档没有生成语音的按钮，不知道怎么通过Playground直接现在调试API。

请补充一下Playground的使用说明，谢谢。

希望您的项目越来越好，至少目前是最棒的。

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 4556: illegal multibyte sequence

(chattts-forge) I:\AI\ChatTTS-Forge>python webui.py
fatal: No names found, cannot describe anything.
2024-06-11 18:57:06,415 - httpx - INFO - HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"
2024-06-11 18:57:07,358 - modules.generate_audio - INFO - LRU cache enabled with size 64
2024-06-11 18:57:07,358 - modules.devices.devices - INFO - Using full precision: torch.float32
2024-06-11 18:57:07,360 - modules.devices.devices - INFO - Using device: cuda
2024-06-11 18:57:07,644 - modules.webui.app - INFO - WebUI module initialized
2024-06-11 18:57:07,644 - modules.webui.localization - INFO - Loaded localization file I:\AI\ChatTTS-Forge\language\zh-CN.json
Traceback (most recent call last):
File "I:\AI\ChatTTS-Forge\webui.py", line 152, in
process_webui_args(args)
File "I:\AI\ChatTTS-Forge\webui.py", line 93, in process_webui_args
demo = create_interface()
^^^^^^^^^^^^^^^^^^
File "I:\AI\ChatTTS-Forge\modules\webui\app.py", line 67, in create_interface
gradio_extensions.reload_javascript()
File "I:\AI\ChatTTS-Forge\modules\webui\gradio_extensions.py", line 49, in reload_javascript
js = javascript_html()
^^^^^^^^^^^^^^^^^
File "I:\AI\ChatTTS-Forge\modules\webui\gradio_extensions.py", line 34, in javascript_html
head += sf("js/index.js")
^^^^^^^^^^^^^^^^^
File "I:\AI\ChatTTS-Forge\modules\webui\gradio_extensions.py", line 29, in sf
return s(read_file(fp))
^^^^^^^^^^^^^
File "I:\AI\ChatTTS-Forge\modules\webui\gradio_extensions.py", line 18, in read_file
return f.read()
^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 4556: illegal multibyte sequence

NUMPY 2.0.0 的无效问题

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

在 Numpy 2.0.0 下出现诡异的问题：

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

不知道是不是我自己的问题。

将 requirements.txt 修改如下：

numpy==1.26.4
scipy
lxml
pydub
fastapi
soundfile
pyrubberband
omegaconf
pypinyin
vocos
pandas
vector_quantize_pytorch
einops
transformers~=4.41.1
omegaconf~=2.3.0
tqdm
# torch
# torchvision
# torchaudio
gradio
emojiswitch
python-dotenv
zhon
mistune==3.0.2
cn2an
# audio_denoiser
python-box
modelscope

可以解决上述的问题

Request: XTTSv2 API

Discussed in #38

^{Originally posted by anthonyshibo June 12, 2024}
感谢作者，对于想平移国外项目的人来说太实用了

[BUG:WebUI] UI启动后，输入的文本合成不全，只合成第一个词

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

Forge Commit 或者 Tag

v0.6.0

Python 版本

python3.10

PyTorch 版本

torch==2.1.1

操作系统信息

Ubuntu 22.04.1 LTS

浏览器信息

chrome

BUG 描述

复现步骤

启动webui后，直接运行

期望结果

输出文本只是一个字，并不是所有的文本

实际结果

输出文本只是一个字，并不是所有的文本

错误信息

INFO:modules.repos_static.resemble_enhance.inference:Elapsed time: 0.239 s, 62.875 kHz
100%|█████████████████████████████████████| 2048/2048 [00:01<00:00, 1058.83it/s]
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  4.51it/s]
INFO:modules.repos_static.resemble_enhance.inference:Elapsed time: 0.223 s, 67.437 kHz
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  3.14it/s]
INFO:modules.repos_static.resemble_enhance.inference:Elapsed time: 0.322 s, 46.701 kHz

[Feature] .gitignore 中的PyCharm环境打开

verify

我已经仔细阅读项目文档，确认现有功能无法解决我的需求
我已经检索过现有issue，确认与现有issue的内容并不重复
确认并非问题讨论而是 Feature request

功能描述

我注意到 .gitignore 文件中的 PyCharm 部分是注意的，是否可以打开呢。


# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

可能的解决方案


# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

[BUG:API] windows11手动bulid模式下，即便ffmepg已经加到环境变量，并且手动拷贝文件到ffmpeg目录下，启动api还是提示找不到ffmpeg扩展

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

fatal: No names found, cannot describe anything.
2024-06-22 15:47:53,492 - torio._extension.utils - DEBUG - Loading FFmpeg6
2024-06-22 15:47:53,493 - torio._extension.utils - DEBUG - Failed to load FFmpeg6 extension.
Traceback (most recent call last):
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 94, in load_lib
torch.ops.load_library(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torch_ops.py", line 1032, in load_library
ctypes.CDLL(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Administrator.conda\envs\chatForge\Lib\site-packages\torio\lib\libtorio_ffmpeg6.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
2024-06-22 15:47:53,495 - torio._extension.utils - DEBUG - Loading FFmpeg5
2024-06-22 15:47:53,496 - torio._extension.utils - DEBUG - Failed to load FFmpeg5 extension.
Traceback (most recent call last):
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 94, in load_lib
torch.ops.load_library(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torch_ops.py", line 1032, in load_library
ctypes.CDLL(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Administrator.conda\envs\chatForge\Lib\site-packages\torio\lib\libtorio_ffmpeg5.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
2024-06-22 15:47:53,497 - torio._extension.utils - DEBUG - Loading FFmpeg4
2024-06-22 15:47:53,498 - torio._extension.utils - DEBUG - Failed to load FFmpeg4 extension.
Traceback (most recent call last):
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 94, in load_lib
torch.ops.load_library(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torch_ops.py", line 1032, in load_library
ctypes.CDLL(path)
File "C:\Users\Administrator.conda\envs\chatForge\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Administrator.conda\envs\chatForge\Lib\site-packages\torio\lib\libtorio_ffmpeg4.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
2024-06-22 15:47:53,499 - torio._extension.utils - DEBUG - Loading FFmpeg
2024-06-22 15:47:53,500 - torio._extension.utils - DEBUG - Failed to load FFmpeg extension.
Traceback (most recent call last):
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "C:\Users\Administrator.conda\envs\chatForge\lib\site-packages\torio_extension\utils.py", line 106, in _find_versionsed_ffmpeg_extension
raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.

API中的google_text_synthesize问题

您好，当我用google_text_synthesize方法传入ssml时：
File "F:\ChatTTS-Forge\modules\api\impl\google_api.py", line 159, in google_text_synthesize
sf.write(buffer, audio_data, sample_rate, format="wav")
File "C:\Users\Mayn\AppData\Local\Programs\Python\Python310\lib\site-packages\soundfile.py", line 429, in write
channels = data.shape[1]
IndexError: tuple index out of range
ChatGPT的回复是“问题很可能出在传递给soundfile.write()的音频数据的格式和形状上。我们需要确保audio_data的格式是正确的，以便soundfile.write()能够处理它。首先，让我们确保audio_data是一个NumPy数组，并且它的形状适用于soundfile.write()。”

项目需要python3.10版本,docker镜像中的python版本为3.9.19,运行报错

readme 中docker章节指令有一点书写错误

原：
下载模型: python -m scripts/download_models --source huggingface
改正：
python -m scripts.download_models --source huggingface

原：
webui: docker-compose -f ./docker-cmopose.webui.yml up -d
api: docker-compose -f ./docker-cmopose.api.yml up -d
改正：
webui: docker-compose -f ./docker-compose.webui.yml up -d
api: docker-compose -f ./docker-compose.api.yml up -d

Mac 下 ffmpeg 的查找问题

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

在 Mac 系统中，即使安装了 ffmpeg 也无法被加载到，Windows 下是使用拷贝文件到方式，但是Mac下不好拷贝。所以我用到的方法如下：

modules/ffmpeg_env.py

import os
from modules.utils.constants import ROOT_DIR
import logging
import subprocess

logger = logging.getLogger(__name__)

def find_ffmpeg_path():
    try:
        # 执行命令查找ffmpeg路径
        result = subprocess.run(['which', 'ffmpeg'], capture_output=True, text=True)
        ffmpeg_path = result.stdout.strip()
        if ffmpeg_path:
            print(f"ffmpeg found at: {ffmpeg_path}")
            return ffmpeg_path
        else:
            print("ffmpeg not found.")
            return None
    except Exception as e:
        print(f"Error finding ffmpeg: {e}")
        return None


def setup_ffmpeg_path():
    ffmpeg_path = os.path.join(ROOT_DIR, "ffmpeg")
    import platform

    # 获取当前系统名称
    system = platform.system()

    if system == 'Darwin':
        print("当前系统是 Mac 系统")
        # 调用函数查找ffmpeg路径
        ffmpeg_path = find_ffmpeg_path()
    else:
        print(f"当前系统是 {system} 系统")

    logger.info("ffmpeg_path=" + str(ffmpeg_path))
    os.environ["PATH"] = ffmpeg_path + os.pathsep + os.environ["PATH"]

    import pydub.utils

    if pydub.utils.which("ffmpeg") is None:
        logger.error("ffmpeg not found in PATH")
        raise Exception("ffmpeg not found in PATH")

使用GPT4修改了部分openai_api.py的代码，以便正常运行

使用GPT4修改的文件 https://github.com/lenML/ChatTTS-Forge/blob/main/modules/api/impl/openai_api.py
完整的代码：
from fastapi import HTTPException, Body
from fastapi.responses import StreamingResponse
import io
from numpy import clip
import soundfile as sf
from pydantic import BaseModel, Field
from fastapi.responses import FileResponse
from modules.synthesize_audio import synthesize_audio
from modules.normalization import text_normalize
from modules import generate_audio as generate
from typing import Literal
import pyrubberband as pyrb
from modules.api import utils as api_utils
from modules.api.Api import APIManager
import numpy as np

class AudioSpeechRequest(BaseModel):
input: str # 需要合成的文本
model: str = "chattts-4w"
voice: str = "female2"
response_format: Literal["mp3", "wav"] = "mp3"
speed: float = Field(1, ge=0.1, le=10, description="Speed of the audio")
style: str = ""
batch_size: int = Field(1, ge=1, le=20, description="Batch size")
spliter_threshold: float = Field(
100, ge=10, le=1024, description="Threshold for sentence spliter"
)
seed: int = 42 # 默认值
temperature: float = Field(0.3, ge=0.0, le=1.0, description="Temperature for audio generation")

async def openai_speech_api(
request: AudioSpeechRequest = Body(
..., description="JSON body with model, input text, and voice"
)
):
try:
model = request.model
input_text = request.input
voice = request.voice
style = request.style
response_format = request.response_format
batch_size = request.batch_size
spliter_threshold = request.spliter_threshold
speed = request.speed
speed = clip(speed, 0.1, 10)
temperature = request.temperature

    if not input_text:
        raise HTTPException(status_code=400, detail="Input text is required.")

    # Normalize the text
    text = text_normalize(input_text, is_end=True)

    # Calculate speaker and style based on input voice
    params = api_utils.calc_spk_style(spk=voice, style=style)

    spk = params.get("spk", -1)
    seed = params.get("seed", request.seed or 42)
    prompt1 = params.get("prompt1", "")
    prompt2 = params.get("prompt2", "")
    prefix = params.get("prefix", "")

    # Generate audio
    sample_rate, audio_data = synthesize_audio(
        text,
        temperature=temperature,
        top_P=0.7,
        top_K=20,
        spk=spk,
        infer_seed=seed,
        batch_size=batch_size,
        spliter_threshold=spliter_threshold,
        prompt1=prompt1,
        prompt2=prompt2,
        prefix=prefix,
    )

    if speed != 1:
        audio_data = pyrb.time_stretch(audio_data, sample_rate, speed)

    # Convert audio data to wav format
    buffer = io.BytesIO()
    sf.write(buffer, audio_data, sample_rate, format="wav")
    buffer.seek(0)

    if response_format == "mp3":
        # Convert wav to mp3
        buffer = api_utils.wav_to_mp3(buffer)

    return StreamingResponse(buffer, media_type="audio/mp3")

except Exception as e:
    import logging

    logging.exception(e)
    raise HTTPException(status_code=500, detail=str(e))

def setup(api_manager: APIManager):
api_manager.post(
"/v1/audio/speech",
response_class=FileResponse,
description="""
openai api document:
https://platform.openai.com/docs/guides/text-to-speech

以下属性为本系统自定义属性，不在openai文档中：

batch_size: 是否开启batch合成，小于等于1表示不使用batch （不推荐）
spliter_threshold: 开启batch合成时，句子分割的阈值
style: 风格

model 可填任意值
""",
)(openai_speech_api)

Mac M1 下 sndfile 无法找到

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

错误如下：

/Users/laoshi/PycharmProjects/ChatTTS-Forge/.venv/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Traceback (most recent call last):
  File "/Users/laoshi/PycharmProjects/ChatTTS-Forge/.venv/lib/python3.9/site-packages/soundfile.py", line 267, in <module>
    _snd = _ffi.dlopen('sndfile')
  File "/Users/laoshi/PycharmProjects/ChatTTS-Forge/.venv/lib/python3.9/site-packages/cffi/api.py", line 150, in dlopen
    lib, function_cache = _make_ffi_library(self, name, flags)
  File "/Users/laoshi/PycharmProjects/ChatTTS-Forge/.venv/lib/python3.9/site-packages/cffi/api.py", line 832, in _make_ffi_library
    backendlib = _load_backend_lib(backend, libname, flags)
  File "/Users/laoshi/PycharmProjects/ChatTTS-Forge/.venv/lib/python3.9/site-packages/cffi/api.py", line 827, in _load_backend_lib
    raise OSError(msg)
OSError: ctypes.util.find_library() did not manage to locate a library called 'sndfile'

解决方案：

按照 dependencies.md 中的步骤,无法实现。

可以参考https://github.com/bastibe/python-soundfile/ 实现。

加一个保存音色.pt功能？

很棒的项目哦，感谢付出。请问可以加一个保存音色的功能吗？要不然用种子，下次重启启动，种子的音色也是不一样的欸。谢谢宝。

希望能增强长文本生成能力

希望能有像隔壁ChatTTS_colab那样生成长音频的能力，这个项目就是完美的项目了。

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

Running on local URL: http://0.0.0.0:7860
INFO:httpx:HTTP Request: GET http://localhost:7860/startup-events "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK"

To create a public link, set share=True in launch().
INFO:modules.models:Loading ChatTTS models
INFO:modules.ChatTTS.ChatTTS.core:Load from local: ./models/ChatTTS
INFO:modules.ChatTTS.ChatTTS.core:vocos loaded.
INFO:modules.ChatTTS.ChatTTS.core:dvae loaded.
INFO:modules.ChatTTS.ChatTTS.core:gpt loaded.
INFO:modules.ChatTTS.ChatTTS.core:decoder loaded.
INFO:modules.ChatTTS.ChatTTS.core:tokenizer loaded.
INFO:modules.ChatTTS.ChatTTS.core:All initialized.
INFO:modules.models:ChatTTS models loaded
INFO:modules.generate_audio:('spk', 'female2')
INFO:modules.generate_audio:{'text': ['chat T T S 是一款强大的对话式文本转语音模型。它有中英混读和多说话人的能力。'], 'infer_seed': 42, 'temperature': 0.3, 'top_P': 0.7, 'top_K': 20, 'prompt1': '', 'prompt2': '', 'prefix': ''}
0%| | 0/2048 [00:00<?, ?it/s]D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
100%|█████████████████████████████████████████████████████████████████████████████| 2048/2048 [00:02<00:00, 685.41it/s]
D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning)
Traceback (most recent call last):
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\gradio\queueing.py", line 532, in process_events
response = await route_utils.call_process_api(
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1928, in process_api
result = await self.call_function(
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\gradio\blocks.py", line 1514, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\gradio\utils.py", line 832, in wrapper
response = f(*args, **kwargs)
File "D:\chatTTS\ChatTTS-Forge\modules\webui\webui_utils.py", line 205, in tts_generate
sample_rate, audio_data = synthesize_audio(
File "D:\chatTTS\ChatTTS-Forge\modules\synthesize_audio.py", line 63, in synthesize_audio
audio_segments = synthesizer.synthesize_segments(text_segments)
File "D:\chatTTS\ChatTTS-Forge\modules\SynthesizeSegments.py", line 219, in synthesize_segments
self.process_voice_segments(segments, bucket, audio_segments)
File "D:\chatTTS\ChatTTS-Forge\modules\SynthesizeSegments.py", line 179, in process_voice_segments
audio_segment = audio_data_to_segment(audio_data, sr)
File "D:\chatTTS\ChatTTS-Forge\modules\SynthesizeSegments.py", line 25, in audio_data_to_segment
return AudioSegment.from_file(byte_io, format="wav")
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\pydub\audio_segment.py", line 728, in from_file
info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit)
File "D:\chatTTS\ChatTTS-Forge\venv\lib\site-packages\pydub\utils.py", line 274, in mediainfo_json
res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

各种版本号如下↓
🍦 [ChatTTS-Forge]) version: [v0.5.5-36-gb34a0f8] | branch: main | python: 3.10.6 | torch: 2.2.2+cu121

mac M芯片无法识别，怎么设置

启动webui.py运行完自动退出了，最新代码

如图所示，启动完自动退出了：

无法安装依赖'resemble-enhance'

错误日志如下:
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\embra\AppData\Local\Temp\pip-install-ge4atg9m\deepspeed_a86ec1095150421b9182b476bcffcf33\setup.py", line 182, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\embra\AppData\Local\Temp\pip-install-ge4atg9m\deepspeed_a86ec1095150421b9182b476bcffcf33\setup.py", line 52, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

对于Windows用户来说,安装libaio实在太艰难了!求大佬换个姿势吧.

[Feature] 希望支持离线整合包方便使用

verify

我已经仔细阅读项目文档，确认现有功能无法解决我的需求
我已经检索过现有issue，确认与现有issue的内容并不重复
确认并非问题讨论而是 Feature request

功能描述

希望支持离线整合包降低门槛方便使用

可能的解决方案

类似项目ChatTTS_colab 提供了离线整合包可以参考
https://github.com/6drf21e/ChatTTS_colab

windows下勾选Enable Enhance生成音频出错

Traceback (most recent call last):
  File "E:\ChatTTS-Forge\py310\lib\site-packages\gradio\queueing.py", line 528, in process_events
    response = await route_utils.call_process_api(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
    output = await app.get_blocks().process_api(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
    result = await self.call_function(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\gradio\blocks.py", line 1485, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "E:\ChatTTS-Forge\py310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\gradio\utils.py", line 808, in wrapper
    response = f(*args, **kwargs)
  File "E:\ChatTTS-Forge\modules\webui\webui_utils.py", line 193, in tts_generate
    audio_data, sample_rate = apply_audio_enhance(
  File "E:\ChatTTS-Forge\py310\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\ChatTTS-Forge\modules\webui\webui_utils.py", line 80, in apply_audio_enhance
    enhancer = load_enhancer(device)
  File "E:\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 24, in load_enhancer
    resemble_enhance.load_model()
  File "E:\ChatTTS-Forge\modules\Enhancer\ResembleEnhance.py", line 36, in load_model
    hparams = HParams.load(Path(MODELS_DIR) / "resemble-enhance")
  File "E:\ChatTTS-Forge\modules\repos_static\resemble_enhance\hparams.py", line 109, in load
    hps.append(cls.from_yaml(run_dir / "hparams.yaml"))
  File "E:\ChatTTS-Forge\modules\repos_static\resemble_enhance\hparams.py", line 94, in from_yaml
    return cls(**dict(OmegaConf.merge(cls(), OmegaConf.load(path))))
  File "E:\ChatTTS-Forge\py310\lib\site-packages\omegaconf\omegaconf.py", line 190, in load
    obj = yaml.load(f, Loader=get_yaml_loader())
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 60, in construct_document
    for dummy in generator:
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\omegaconf\_utils.py", line 151, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\yaml\constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "E:\ChatTTS-Forge\py310\lib\site-packages\omegaconf\_utils.py", line 183, in <lambda>
    lambda loader, node: pathlib.PosixPath(*loader.construct_sequence(node)),
  File "E:\ChatTTS-Forge\py310\lib\pathlib.py", line 962, in __new__
    raise NotImplementedError("cannot instantiate %r on your system"
NotImplementedError: cannot instantiate 'PosixPath' on your system

不勾选Enable Enhance可以正常运行

[BUG:FT] 调节器无法使用

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

Forge Commit 或者 Tag

调节器问题

Python 版本

3.10.11

PyTorch 版本

torch: 2.3.1+cpu

操作系统信息

windows11

BUG 描述

无法使用调节器进行生成声音，ffmpeg正常 ffmpeg: N-115838-g4e4444f97c-20240615

复现步骤

修改调节器参数进行音频生成。

期望结果

能够正常使用调节器

实际结果

控制台报错缺失rubberband-cli

错误信息

2024-06-24 14:40:58,256 - httpx - INFO - HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"
2024-06-24 14:40:59,899 - numexpr.utils - INFO - Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-06-24 14:40:59,899 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\transformers\utils\hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
2024-06-24 14:41:04,535 - modules.generate_audio - INFO - LRU cache enabled with size 64
2024-06-24 14:41:04,535 - modules.devices.devices - INFO - Using full precision: torch.float32
2024-06-24 14:41:04,535 - modules.devices.devices - INFO - Using device: cpu
2024-06-24 14:41:04,540 - modules.webui.app - INFO - WebUI module initialized
2024-06-24 14:41:04,540 - modules.webui.localization - INFO - Loaded localization file D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\language\zh-CN.json
2024-06-24 14:41:05,020 - httpx - INFO - HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
Running on local URL:  http://0.0.0.0:7860
2024-06-24 14:41:05,540 - httpx - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2024-06-24 14:41:07,210 - httpx - INFO - HTTP Request: GET http://localhost:7860/startup-events "HTTP/1.1 200 OK"
2024-06-24 14:41:09,264 - httpx - INFO - HTTP Request: HEAD http://localhost:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
2024-06-24 14:41:35,501 - modules.models - INFO - Loading ChatTTS models
2024-06-24 14:41:35,501 - modules.ChatTTS.ChatTTS.core - INFO - Load from local: ./models/ChatTTS
2024-06-24 14:41:35,775 - modules.ChatTTS.ChatTTS.core - INFO - vocos loaded.
2024-06-24 14:41:35,885 - modules.ChatTTS.ChatTTS.core - INFO - dvae loaded.
2024-06-24 14:41:39,120 - modules.ChatTTS.ChatTTS.core - INFO - gpt loaded.
2024-06-24 14:41:39,363 - modules.ChatTTS.ChatTTS.core - INFO - decoder loaded.
2024-06-24 14:41:39,380 - modules.ChatTTS.ChatTTS.core - INFO - tokenizer loaded.
2024-06-24 14:41:39,380 - modules.ChatTTS.ChatTTS.core - INFO - All initialized.
2024-06-24 14:41:39,383 - modules.models - INFO - ChatTTS models loaded
Traceback (most recent call last):
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\pyrubberband\pyrb.py", line 74, in __rubberband
    subprocess.check_call(arguments, stdout=DEVNULL, stderr=DEVNULL)
  File "subprocess.py", line 364, in check_call
  File "subprocess.py", line 345, in call
  File "subprocess.py", line 971, in __init__
  File "subprocess.py", line 1456, in _execute_child
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\queueing.py", line 532, in process_events
    response = await route_utils.call_process_api(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\blocks.py", line 1928, in process_api
    result = await self.call_function(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\blocks.py", line 1514, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\utils.py", line 832, in wrapper
    response = f(*args, **kwargs)
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\gradio\utils.py", line 832, in wrapper
    response = f(*args, **kwargs)
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\modules\webui\webui_utils.py", line 256, in tts_generate
    audio_data, sample_rate = handler.enqueue()
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\modules\api\impl\handler\TTSHandler.py", line 95, in enqueue
    audio_data = apply_prosody_to_audio_data(
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\modules\utils\audio.py", line 106, in apply_prosody_to_audio_data
    audio_data = pyrb.time_stretch(audio_data, sr=sr, rate=rate)
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\pyrubberband\pyrb.py", line 142, in time_stretch
    return __rubberband(y, sr, **rbargs)
  File "D:\BaiduNetdiskDownload\ChatTTS-Forge20240624\pythonEmbed\lib\site-packages\pyrubberband\pyrb.py", line 84, in __rubberband
    six.raise_from(RuntimeError('Failed to execute rubberband. '
  File "<string>", line 3, in raise_from
RuntimeError: Failed to execute rubberband. Please verify that rubberband-cli is installed.

实验功能怎么玩？好像选中不了

如题

[ISSUE] --use_cpu 参数并没有在webui.py里

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

你的issues

启动时加了--use_cpu=“all” 没有任何作用，然后在webui.py文件的代码里并没有看到有接受这个参数

[BUG:WebUI] 音色抽卡失败

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

Forge Commit 或者 Tag

123

Python 版本

3.10

PyTorch 版本

操作系统信息

Windows

浏览器信息

No response

BUG 描述

复现步骤

期望结果

正常使用

实际结果

出错

错误信息

2024-06-24 19:00:36,090 - modules.models - INFO - Loading ChatTTS models
2024-06-24 19:00:36,091 - modules.ChatTTS.ChatTTS.core - INFO - Load from local: ./models/ChatTTS
2024-06-24 19:00:36,335 - modules.ChatTTS.ChatTTS.core - INFO - vocos loaded.
2024-06-24 19:00:36,457 - modules.ChatTTS.ChatTTS.core - INFO - dvae loaded.
2024-06-24 19:00:39,480 - modules.ChatTTS.ChatTTS.core - INFO - gpt loaded.
2024-06-24 19:00:39,665 - modules.ChatTTS.ChatTTS.core - INFO - decoder loaded.
2024-06-24 19:00:39,681 - modules.ChatTTS.ChatTTS.core - INFO - tokenizer loaded.
2024-06-24 19:00:39,681 - modules.ChatTTS.ChatTTS.core - INFO - All initialized.
2024-06-24 19:00:39,682 - modules.models - INFO - ChatTTS models loaded
C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\transformers\models\llama\modeling_llama.py:649: Use
rWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\
transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
2024-06-24 19:02:32,291 - modules.ChatTTS.ChatTTS.core - WARNING - Invalid characters found! : {'?'}     
Traceback (most recent call last):
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\queueing.py", line 532, in process_ev
ents
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\route_utils.py", line 276, in call_pr
ocess_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\blocks.py", line 1514, in call_functi
on
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync   
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in 
run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\_backends\_asyncio.py", line 859, in r
un
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\utils.py", line 832, in wrapper      
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\utils.py", line 832, in wrapper      
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\torch\utils\_contextlib.py", line 115, in de
corate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\webui\speaker\speaker_creator.py", line 85,
 in test_spk_voice
    return tts_generate(spk=seed, text=text, progress=progress)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\webui\webui_utils.py", line 247, in tts_gen
erate
    handler = TTSHandler(
              ^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\api\impl\handler\TTSHandler.py", line 31, i
n __init__
    assert isinstance(spk, Speaker), "spk should be Speaker"
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: spk should be Speaker
Traceback (most recent call last):
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\queueing.py", line 532, in process_ev
ents
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\route_utils.py", line 276, in call_pr
ocess_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\blocks.py", line 1514, in call_functi
on
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync   
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in 
run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\anyio\_backends\_asyncio.py", line 859, in r
un
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\utils.py", line 832, in wrapper      
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\gradio\utils.py", line 832, in wrapper      
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\.conda\envs\ChatTTS\Lib\site-packages\torch\utils\_contextlib.py", line 115, in de
corate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\webui\speaker\speaker_creator.py", line 85,
 in test_spk_voice
    return tts_generate(spk=seed, text=text, progress=progress)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\webui\webui_utils.py", line 247, in tts_gen
erate
    handler = TTSHandler(
              ^^^^^^^^^^^
  File "C:\Users\win10\Documents\GitHub\ChatTTS-Forge\modules\api\impl\handler\TTSHandler.py", line 31, i
n __init__
    assert isinstance(spk, Speaker), "spk should be Speaker"
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: spk should be Speaker

[BUG:API] ERROR:root:float() argument must be a string or a real number, not 'NoneType'

阅读 README.md 和 dependencies.md

我已经阅读过 README.md 和 dependencies.md 文件

检索 issue 和 discussion

我已经确认之前没有 issue 或 discussion 涉及此 BUG

检查 Forge 版本

我已经确认问题发生在最新代码或稳定版本中

Forge Commit 或者 Tag

main

Python 版本

3.10

PyTorch 版本

2.31.

操作系统信息

linux

BUG 描述

请求tts, 貌似长文本分隔有问题。

BUG 端点

/v1/tts

复现参数

curl --location 'http://localhost/v1/tts?text=%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82%E7%9C%9F%E7%9A%84%E6%98%AF%E5%BF%AB%E5%A6%82%E7%8B%97%E7%A8%B3%E5%A6%82%E7%8B%97%E3%80%82&spk=Bob&style=advertisement_upbeat_p&temperature=0.3&top_P=0.5&top_K=20&seed=42&format=wav&bs=8&thr=100&prompt1=null&prompt2=null&prefix=null' \
--header 'accept: */*'

期望结果

不报错

实际结果

报错

ERROR:root:float() argument must be a string or a real number, not 'NoneType'
Traceback (most recent call last):
  File "/root/autodl-tmp/ChatTTS-Forge/modules/api/impl/tts_api.py", line 96, in synthesize_tts
    sample_rate, audio_data = synthesize_audio(
  File "/root/autodl-tmp/ChatTTS-Forge/modules/synthesize_audio.py", line 54, in synthesize_audio
    combined_audio = combine_audio_segments(audio_segments)
  File "/root/autodl-tmp/ChatTTS-Forge/modules/SynthesizeSegments.py", line 33, in combine_audio_segments
    combined_audio += segment
  File "/root/autodl-tmp/vms/chattts/lib/python3.10/site-packages/pydub/audio_segment.py", line 366, in __add__
    return self.apply_gain(arg)
  File "/root/autodl-tmp/vms/chattts/lib/python3.10/site-packages/pydub/audio_segment.py", line 1172, in apply_gain
    db_to_float(float(volume_change))))
TypeError: float() argument must be a string or a real number, not 'NoneType'

错误信息

No response

Missing `prompt1`, `prompt2`, and `prefix` Parameters in `google_text_synthesize` Function of `google_api.py`

Issue Description

Hello,

I have noticed a potential issue in the google_text_synthesize function within the google_api.py module of the ChatTTS-Forge project. Specifically, the ChatTTSConfig object is being instantiated without the prompt1, prompt2, and prefix parameters. The current code snippet is as follows:

tts_config = ChatTTSConfig(
    style=params.get("style", ""),
    temperature=voice.temperature,
    top_k=voice.topK,
    top_p=voice.topP,
)

However, it should include the prompt1, prompt2, and prefix parameters to ensure that the TTS generation can accurately reflect the desired emotional tone and style. The corrected code should look like this:

tts_config = ChatTTSConfig(
    style=params.get("style", ""),
    temperature=voice.temperature,
    top_k=voice.topK,
    top_p=voice.topP,
    prompt1=params.get("prompt1", ""),
    prompt2=params.get("prompt2", ""),
    prefix=params.get("prefix", ""),
)

Without these parameters, downstream components will not be able to access prompt1, prompt2, and prefix, which may result in the generated speech lacking the intended emotional variation corresponding to different styles.

Could you please confirm whether this omission was intentional or an oversight? If it was an oversight, could you update the code to include these parameters?

Thank you for your attention to this matter.

lenml / chattts-forge Goto Github PK

chattts-forge's People

Contributors

Stargazers

Watchers

Forkers

chattts-forge's Issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

Issue Description:

Summary:

Current Implementation:

Proposed Optimization:

Performance Improvement:

Action Required:

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

Discussed in #38

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

Forge Commit 或者 Tag

Python 版本

PyTorch 版本

操作系统信息

浏览器信息

BUG 描述

复现步骤

期望结果

实际结果

错误信息

verify

功能描述

可能的解决方案

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

你的issues

verify

功能描述

可能的解决方案

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本

Forge Commit 或者 Tag

Python 版本

PyTorch 版本

操作系统信息

BUG 描述

复现步骤

期望结果

实际结果

错误信息

阅读 README.md 和 dependencies.md

检索 issue 和 discussion

检查 Forge 版本