kuangdd / ttskit Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 201.0 57.46 MB

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

License: MIT License

Python 99.39% Dockerfile 0.09% HTML 0.52%

audio chinese tts vc

ttskit's People

Contributors

Stargazers

Watchers

Forkers

keven1894 afeizh sqqlyd assassindesign jjandnn mynameiziji muyangit rdaim kerwinchina wogeo objone mrcuix macroustc weimeng23 baifengbai z342526265 rongya privapps yaffils whtech dekinsq jiangxuhao rephoneyin x-ccs hello-web lynchying locotar giiiiiiiiiiiiiiiiiiiiiit cebernic lincolnhao jesse3692 asdlei99 adambear evil123kitten fireae bruedream tangulak bikong2 lixianglong1205 fanhuafeng nandadao tushuyuehou wuxiaoxrj derektso moodykeke python-ordinary dobzhao ganjunhong iamjasonchoi newuserforstudy chinamcat muyangren907 lubinszarm yfq512 zhiji6 pcitboy heycms russell-shu foow grasshourse wansuiye09 dunn007 menfanjia lianchangqiong saizyca augustrush phenxie shensj wangpanqiao thixiaoxiao farkar1208 yorhamodelnumbernine larvacent iloseall flywolfs antonizdp astroler crzaizxw1314 iskysir wuyeqingchen originprince sunny635 janwool craii magicianchen mofasjang codti lyhiving wut0n9 colorfulbalck h5wawaji mrywhh larygwil alitrack straitrobot shaogx cloudinskywith xiaolvdouya milky916 road2018

ttskit's Issues

pyworld目前python 3.8+都装不了吧，我看clone项目那里又说去掉了pyworld的依赖

from ttskit.resource import _speaker_dict

from ttskit.resource import _speaker_dict

How to solve this problem

module 'ttskit' has no attribute 'tts'

ttskit.tts('这是个示例', audio='14')

AttributeError: module 'ttskit' has no attribute 'tts'

macos python3.9无法安装

Building wheels for collected packages: llvmlite
Building wheel for llvmlite (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [26 lines of output]
running bdist_wheel
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
      main()
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
      main_posix('osx', '.dylib')
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
      raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
  RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
  error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llvmlite
Running setup.py clean for llvmlite
Failed to build llvmlite
Installing collected packages: llvmlite, zope.interface, zope.event, torch, numba, greenlet, matplotlib, gevent, umap-learn, ttskit
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.38.0
Uninstalling llvmlite-0.38.0:
Successfully uninstalled llvmlite-0.38.0
Running setup.py install for llvmlite ... error
error: subprocess-exited-with-error

× Running setup.py install for llvmlite did not run successfully.
│ exit code: 1
╰─> [29 lines of output]
running install
running build
got version from file /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'}
running build_ext
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
      main()
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
      main_posix('osx', '.dylib')
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
      raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
  RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
  error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: No metadata found in /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages
Rolling back uninstall of llvmlite
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite-0.38.0.dist-info/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite-0.38.0.dist-info
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> llvmlite

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
[wuyu@wuyudeMacBook-Pro Documents]$>

找到一个可用的

Microsoft TTS Downloader

import error

INFO:audio_player:ImportError: No module named 'sounddevice'
INFO:audio_player:ImportError: No module named 'pyaudio'
INFO:audio_griffinlim:ImportError: No module named 'tensorflow'

但功能正常。请问会有什么影响吗？

No such file or directory: '..../reference_audio.tar'

the calling code:

    def text2wave(
        self,
        txt: str,
        audio="14",
        speaker="",
        sampling_rate=22050,
        processes=2,
        maxlen=60,
        save_to: Optional[str] = None,
    ) -> Optional[bytes]:
        wav = sdk_api.tts_sdk(
            txt, audio=audio, processes=processes, sampling_rate=sampling_rate
        )

        if save_to is not None:
            with open(save_to, "wb") as f:
                f.write(wav)
        else:
            return wav

anything wrong with my call?

是否会有docker版本？

每次部署环境都太久了

在线网站demo看不了，能否将语音demo放到md上？生成后效果是怎样的？

好不容易运行起来了能帮忙看下这是啥错误么

cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils'

Traceback (most recent call last):
File "", line 1, in
File "/home/new/ly_test/ttskit-main/ttskit/init.py", line 50, in
import sdk_api
File "/home/new/ly_test/ttskit-main/ttskit/sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/text/init.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/init.py", line 94, in
from phkit.chinese import doc as doc_chinese
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/init.py", line 37, in
from .pinyin import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/pinyin.py", line 11, in
from ..pinyinkit import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/init.py", line 6, in
from .core import lazy_pinyin, pinyin, slug, Style, initialize
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/core.py", line 20, in
from pypinyin.utils import _replace_tone2_style_dict_to_default
ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' (/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/pypinyin/utils.py)

请问ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' 这个问题怎么解决呢？

Installation errors on windows 10 and python 3.9 environment

I tried to install and test the repo but it post errors for not build the package included in the setup script. I am testing it on a windows machine with a python 3.9 environment. I suspect this repo is supported by lower version python. Can you list the environment requirement as part of the read me file?

web api合成完的音频在哪

web api请求完可以保存音频吗，请求完都没看到

pad_center() takes 1 positional argument but 2 were given

我剛開始測試，使用readme裡的範例:
from ttskit import sdk_api
wav = sdk_api.tts_sdk('文本', audio='24')
但是得到
File ~\anaconda3\lib\site-packages\ttskit\mellotron\stft.py:67 in __init__ fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given

請問是什麼問題?
我的環境是anaconda3@Windows11，使用Spyder+IPython

怎么下载训练数据啊

请问怎么在Python代码中播放？

from ttskit import sdk_api

wav = sdk_api.tts_sdk('文本', audio='24')
怎么让他放出声音

十几个字合成要1秒多时间，请问正常吗？

在用某云提供的tts接口合成，基本在200毫秒内，ttskit能否做到500毫秒以内呢？

请教：出现错误'NoneType' object has no attribute 'inference'

我采用pip直接安装ttskit包，运行环境为Ubuntu18.04 + Python3.8.0。
用tscli运行时，出现以下错误：
Input text (输入文本或exit退出，不输入则随机):
你好
Input kwargs (输入控制参数，格式：audio=1,speaker=biaobei，不输入则默认)

dictionary update sequence element #0 has length 1; 2 is required
Text: 你好
Kwargs: {'audio': '6', 'speaker': 'tmp'}
TTS running ...
INFO:sdk_api:Synthesizing: 你好
load phrase: 36778it [00:00, 239649.94it/s]
load pinyin: 41459it [00:00, 592173.16it/s]
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.393 seconds.
DEBUG:jieba:Loading model cost 0.393 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
Traceback (most recent call last):
File "/home/season/.local/bin/tkcli", line 8, in
sys.exit(tts_cli())
File "/home/season/.local/lib/python3.8/site-packages/ttskit/cli_api.py", line 155, in tts_cli
sdk_api.tts_sdk(text=text,
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 377, in tts_sdk_base
wavs = melgan.generate_wave(mel=mels_postnet)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/melgan/inference.py", line 63, in generate_wave
wav = _model.inference(mel)
AttributeError: 'NoneType' object has no attribute 'inference'

对新增SDK返回numpy.array的方法的请求与个人解决方案

如果要对获得的音频进行降噪、升调、降调等复杂处理，或者将多个speaker的返回内容拼合成一个音频的话，势必要用到numpy.array类型的音频数据，按目前的SDK只能将返回值写入文件，然后再读入，有些繁复，因此建议作者 加入直接返回numpy.array的SDK参数.(如果本来就有而我没找到的话，就抱歉了)

我目前对 sdk_api.py 文件内的函数 tts_sdk() 末尾(445行左右)做了如下修改以达到此功能

原代码 :

...
    return wav

更改后的代码 :

...
    wav_array = np.array(wav_out.get_array_of_samples())
    if kwargs.get('array', False):return wav_array
    else:return wav

使用示例 :

from ttskit import sdk_api
wav_array = sdk_api.tts_sdk(text='返回数组',array = True)

有了这样的返回值后，就可以方便地对返回音频进行傅里叶变换等复杂处理了。我对这个库的代码编写不完全熟悉，因此不确定这个更改是否会产生未知错误。在我小数据量测试中，我的修改是稳定可行的，希望作者可以阅读我的代码，确定其安全有效后，将其更新入这个库中，谢谢!

并发访问时报， stack expects each tensor to be equal size，大佬帮看下是什么问题，怀疑是不支持多线程

ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 9
ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 8

MacBook arm版本的支持？

因在arm mac，我替换tensorflow而安装了pip install tensorflow-macos，其他无变化。
环境: arm mac os 12.6 + Python 3.9.16
进入到python后:

>>> from ttskit import http_server
>>> http_server.start_sever()

第二个命令报错

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/http_server.py", line 95, in start_sever
from . import sdk_api
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/sdk_api.py", line 55, in <module>
_stft = TacotronSTFT(File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/layers.py", line 64, in __init__
self.stft_fn = STFT(filter_length, hop_length, win_length)
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/stft.py", line 67, in __init__
fft_window = pad_center(fft_window, filter_length
TypeError: pad_center() takes 1 positional argument but 2 were given

大神，能在安卓设备上运行不？

在windows上能正常运行，想将将语音合成移植到安卓（android6,android7）设备上，能提供思路不？

list of speaker's sound category and quality

Classification according to my own hearing, no guarantee on the accuracy.
Some voices are buggy? Some are unclear and have repetition problem.

'Aibao',child female
'Aicheng',teen male
'Aida',adult male
'Aijia',adult female
'Aijing',adult female
'Aimei',teen female
'Aina',teen female
'Aiqi',teen female
'Aitong',child female
'Aiwei',child female
'Aixia',teen female
'Aiya',adult female
'Aiyu',adult female
'Aiyue',adult female
'Siyue',adult female
'Xiaobei',child female
'Xiaogang',adult female
'Xiaomei',child female
'Xiaomeng',teen female
'Xiaowei',adult female
'Xiaoxue',adult female
'Xiaoyun',adult male
'Yina',teen female
'biaobei',teen female
'cctvfa',adult female
'cctvfb',adult male
'cctvma',buggy
'cctvmb',buggy
'cctvmc',adult female
'cctvmd',buggy

有人知道如何训练自己的数据吗

我想要训练自己的发声人，翻遍整个库都没有找到系统的教程，所以想问下如何训练自己的数据

多进程亲测过吗？dataloader一直卡死不动

两个问题辛苦大神了

非常感谢提供这么好的工具。有两个问题想问一下：
1、可以支持更改语速吗？
2、生成语音文件的速度个人感觉很慢，这是正常的现象还是？

另外，在长文本生成上，建议可以用标点来进行分句会不会更好一些？

再次感谢！！

感谢大神分享，问一下参数 audio（发音人参考音频）是什么意思

参数 speaker 为发声人，和audio（参考音频）有啥区别么？是语音克隆么？最好能举个带这两个参数的例子

多音字能手动标出读音吗？

“还”、“长”等等这样的多音字总是读错，有办法手动标出读音减少错误读音吗？

import ttskit and failed in importing name _speaker_dict

Windows 10

Python 3.6.5

installed completely and install something manually

tensorflow
pyaudio
sounddevice
pyworld

When I execute "import ttskit" in python cli:

>>> import ttskit
2021-10-28 12:24:34.621870: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-10-28 12:24:34.627138: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "F:\Python\Python36\lib\site-packages\ttskit\__init__.py", line 50, in <module>
    import sdk_api
  File "F:\Python\Python36\lib\site-packages\ttskit\sdk_api.py", line 41, in <module>
    from ttskit.resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'
>>>

大佬，请问如何支持朗读英文字母呢？

例如TTS这种缩写字母的朗读，谢谢！

吗

你好，这里有个错误，ImportError: cannot import name '_speaker_dict'

这里的resource 是ttskit包里面的resource吗，并没有看到这个函数
(pytorch1.6) C:\Users\Administrator>python E:\TTS\ttskit\myTest.py
Traceback (most recent call last):
File "E:\TTS\ttskit\myTest.py", line 3, in
from ttskit import sdk_api
File "E:\TTS\ttskit\ttskit\sdk_api.py", line 49, in
from .resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you find it useful.

Here are the key functionalities of the project:

Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

David Martin Rius

ydub/audio_segment.py", line 374, in radd raise TypeError("Gains must be the second addend after the " TypeError: Gains must be the second addend after the AudioSegment 2022-10-27T02:42:03Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50648', 'HTTP_HOST': 'localhost:9000', (hidden keys: 26)} failed with TypeError

ydub/audio_segment.py", line 374, in radd
raise TypeError("Gains must be the second addend after the "
TypeError: Gains must be the second addend after the AudioSegment
2022-10-27T02:42:03Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50648', 'HTTP_HOST': 'localhost:9000', (hidden keys: 26)} failed with TypeError

训练的时候出现这个问题，该怎么解决

D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)

长文本合成音频，总是只有最后一句。

长文本合成音频，总是只有最后一句。
#!usr/bin/env python

-- coding: utf-8 --

from ttskit import sdk_api
var='工业和信息化部总工程师田玉龙在国新办新闻发布会上介绍'
wav = sdk_api.tts_sdk_for(var,speaker='cctvfa', output=r'E:\TTS\ttskits\my9.wav')

VC相关的功能大概啥思路啊，建议最好能做成实时的

能否实现 Real-Time-Voice-Cloning里的实时说话人语音克隆？

网页版快速使用流程(亲测有效)

从 GitHub下载代码并解压，将文件夹 ttskit-main 作为自己的项目文件夹
从百度网盘(下载地址) 下载 resource 将其放到 ttskit-main\ttskit 文件夹中覆盖原有的 resource 文件夹
以上是作者提供步骤的大致描述, 但有一个小问题
实际上, 在替换过程中, ttskit-main\ttskit\resoure\__init__.py 不要替换
或者完全替换之后, 再把 github 下载的文件解压一份, 然后用那里面的 resoure\__init__.py 单独替换回来
然后就在 ttskit-main 目录打开命令行, 输入并回车 pip install -U ttskit
pip 结束后, 在 ttskit-main 文件夹中建立一个 demo.py 文件, 并输入以下代码
```
from ttskit import http_server

http_server.start_sever()```
```
然后在命令行中输入 py demo.py
过一段时间, 命令行最下端出现一段网址, 将它复制到浏览器粘贴即可
示例图片:

不要浪费时间了这个项目根本跑不起来

如何指定cpu运行

ImportError

发生异常: ImportError
cannot import name 'replace_tone2_style_dict_to_default' from 'pypinyin.utils' (D:\Program Files\Python\lib\site-packages\pypinyin\utils.py)
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\text_init.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "E:\Work\Github\ttskit\TTS_kit\ttskit\sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "E:\Work\Github\ttskit\TTS_kit\ttskit_init_.py", line 50, in
import sdk_api
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 31, in test_http_server
from ttskit import http_server
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 42, in
test_http_server()

$ tkcli -h
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "/usr/local/bin/tkcli", line 5, in <module>
    from ttskit.cli_api import tts_cli
  File "/usr/local/lib/python3.10/dist-packages/ttskit/cli_api.py", line 68, in <module>
    from . import sdk_api
  File "/usr/local/lib/python3.10/dist-packages/ttskit/sdk_api.py", line 55, in <module>
    _stft = TacotronSTFT(
  File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/layers.py", line 64, in __init__
    self.stft_fn = STFT(filter_length, hop_length, win_length)
  File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/stft.py", line 67, in __init__
    fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given