Git Product home page Git Product logo

ttskit's People

Contributors

kuangdd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ttskit's Issues

macos python3.9无法安装

Building wheels for collected packages: llvmlite
Building wheel for llvmlite (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [26 lines of output]
running bdist_wheel
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
      main()
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
      main_posix('osx', '.dylib')
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
      raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
  RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
  error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llvmlite
Running setup.py clean for llvmlite
Failed to build llvmlite
Installing collected packages: llvmlite, zope.interface, zope.event, torch, numba, greenlet, matplotlib, gevent, umap-learn, ttskit
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.38.0
Uninstalling llvmlite-0.38.0:
Successfully uninstalled llvmlite-0.38.0
Running setup.py install for llvmlite ... error
error: subprocess-exited-with-error

× Running setup.py install for llvmlite did not run successfully.
│ exit code: 1
╰─> [29 lines of output]
running install
running build
got version from file /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'}
running build_ext
/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py
LLVM version... Traceback (most recent call last):
File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 105, in main_posix
out = subprocess.check_output([llvm_config, '--version'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 168, in <module>
      main()
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 162, in main
      main_posix('osx', '.dylib')
    File "/private/var/folders/yn/4byzlmls27n1sn4b19pwp6nm0000gn/T/pip-install-o9dwgbtj/llvmlite_14beb0db95e84b99a61cb1db7d4980bb/ffi/build.py", line 107, in main_posix
      raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
  RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
  error: command '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: No metadata found in /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages
Rolling back uninstall of llvmlite
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite-0.38.0.dist-info/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite-0.38.0.dist-info
Moving to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/llvmlite/
from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/~lvmlite
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> llvmlite

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
[wuyu@wuyudeMacBook-Pro Documents]$>

import error

INFO:audio_player:ImportError: No module named 'sounddevice'
INFO:audio_player:ImportError: No module named 'pyaudio'
INFO:audio_griffinlim:ImportError: No module named 'tensorflow'

但功能正常。请问会有什么影响吗?

No such file or directory: '..../reference_audio.tar'

the calling code:

    def text2wave(
        self,
        txt: str,
        audio="14",
        speaker="",
        sampling_rate=22050,
        processes=2,
        maxlen=60,
        save_to: Optional[str] = None,
    ) -> Optional[bytes]:
        wav = sdk_api.tts_sdk(
            txt, audio=audio, processes=processes, sampling_rate=sampling_rate
        )

        if save_to is not None:
            with open(save_to, "wb") as f:
                f.write(wav)
        else:
            return wav

anything wrong with my call?

cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils'

Traceback (most recent call last):
File "", line 1, in
File "/home/new/ly_test/ttskit-main/ttskit/init.py", line 50, in
import sdk_api
File "/home/new/ly_test/ttskit-main/ttskit/sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "/home/new/ly_test/ttskit-main/ttskit/mellotron/text/init.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/init.py", line 94, in
from phkit.chinese import doc as doc_chinese
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/init.py", line 37, in
from .pinyin import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/chinese/pinyin.py", line 11, in
from ..pinyinkit import text2pinyin, split_pinyin, change_diao
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/init.py", line 6, in
from .core import lazy_pinyin, pinyin, slug, Style, initialize
File "/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/phkit/pinyinkit/core.py", line 20, in
from pypinyin.utils import _replace_tone2_style_dict_to_default
ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' (/root/miniconda/envs/ly_tts_try/lib/python3.7/site-packages/pypinyin/utils.py)

请问ImportError: cannot import name '_replace_tone2_style_dict_to_default' from 'pypinyin.utils' 这个问题怎么解决呢?

Installation errors on windows 10 and python 3.9 environment

I tried to install and test the repo but it post errors for not build the package included in the setup script. I am testing it on a windows machine with a python 3.9 environment. I suspect this repo is supported by lower version python. Can you list the environment requirement as part of the read me file?

pad_center() takes 1 positional argument but 2 were given

我剛開始測試,使用readme裡的範例:
from ttskit import sdk_api
wav = sdk_api.tts_sdk('文本', audio='24')
但是得到
File ~\anaconda3\lib\site-packages\ttskit\mellotron\stft.py:67 in __init__ fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given

請問是什麼問題?
我的環境是anaconda3@Windows11,使用Spyder+IPython

请教:出现错误'NoneType' object has no attribute 'inference'

我采用pip直接安装ttskit包,运行环境为Ubuntu18.04 + Python3.8.0。
用tscli运行时,出现以下错误:
Input text (输入文本或exit退出,不输入则随机):
你好
Input kwargs (输入控制参数,格式:audio=1,speaker=biaobei,不输入则默认)

dictionary update sequence element #0 has length 1; 2 is required
Text: 你好
Kwargs: {'audio': '6', 'speaker': 'tmp'}
TTS running ...
INFO:sdk_api:Synthesizing: 你好
load phrase: 36778it [00:00, 239649.94it/s]
load pinyin: 41459it [00:00, 592173.16it/s]
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.393 seconds.
DEBUG:jieba:Loading model cost 0.393 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
Traceback (most recent call last):
File "/home/season/.local/bin/tkcli", line 8, in
sys.exit(tts_cli())
File "/home/season/.local/lib/python3.8/site-packages/ttskit/cli_api.py", line 155, in tts_cli
sdk_api.tts_sdk(text=text,
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/sdk_api.py", line 377, in tts_sdk_base
wavs = melgan.generate_wave(mel=mels_postnet)
File "/home/season/.local/lib/python3.8/site-packages/ttskit/melgan/inference.py", line 63, in generate_wave
wav = _model.inference(mel)
AttributeError: 'NoneType' object has no attribute 'inference'

对新增SDK返回numpy.array的方法的请求与个人解决方案

​ 如果要对获得的音频进行降噪、升调、降调等复杂处理,或者将多个speaker的返回内容拼合成一个音频的话,势必要用到numpy.array类型的音频数据,按目前的SDK只能将返回值写入文件,然后再读入,有些繁复,因此建议作者 加入直接返回numpy.array的SDK参数.(如果本来就有而我没找到的话,就抱歉了)

我目前对 sdk_api.py 文件内的函数 tts_sdk() 末尾(445行左右)做了如下修改以达到此功能

原代码 :

...
    return wav

更改后的代码 :

...
    wav_array = np.array(wav_out.get_array_of_samples())
    if kwargs.get('array', False):return wav_array
    else:return wav

使用示例 :

from ttskit import sdk_api
wav_array = sdk_api.tts_sdk(text='返回数组',array = True)

​ 有了这样的返回值后,就可以方便地对返回音频进行傅里叶变换等复杂处理了。我对这个库的代码编写不完全熟悉,因此不确定这个更改是否会产生未知错误。在我小数据量测试中,我的修改是稳定可行的,希望作者可以阅读我的代码,确定其安全有效后,将其更新入这个库中,谢谢!

并发访问时报, stack expects each tensor to be equal size, 大佬帮看下是什么问题,怀疑是不支持多线程

ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 9
ERROR:ttskit.web_api:Exception on /tts [GET]
Traceback (most recent call last):
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\D\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\codedemo\tts\ttskit-main\ttskit\web_api.py", line 50, in tts_web
wav = sdk_api.tts_sdk(text=text, speaker=speaker, audio=audio)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 425, in tts_sdk
wav = tts_sdk_base_one(kw)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 395, in tts_sdk_base_one
return tts_sdk_base(**kwargs)
File "D:\codedemo\tts\ttskit-main\ttskit\sdk_api.py", line 364, in tts_sdk_base
mels, mels_postnet, gates, alignments = mellotron.generate_mel(text_data, style_data, speaker_data, f0_data)
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\inference.py", line 77, in generate_mel
mels, mels_postnet, gates, alignments = _model.inference((text, style, speaker, f0))
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 677, in inference
mel_outputs, gate_outputs, alignments = self.decoder.inference(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 524, in inference
mel_outputs, gate_outputs, alignments = self.parse_decoder_outputs(
File "D:\codedemo\tts\ttskit-main\ttskit\mellotron\model.py", line 357, in parse_decoder_outputs
alignments = torch.stack(alignments).transpose(0, 1)
RuntimeError: stack expects each tensor to be equal size, but got [1, 20] at entry 0 and [1, 68] at entry 8

MacBook arm版本的支持?

因在arm mac,我替换tensorflow而安装了pip install tensorflow-macos,其他无变化。
环境: arm mac os 12.6 + Python 3.9.16
进入到python后:

>>> from ttskit import http_server
>>> http_server.start_sever()  

第二个命令报错

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/http_server.py", line 95, in start_sever
from . import sdk_api
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/sdk_api.py", line 55, in <module>
_stft = TacotronSTFT(File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/layers.py", line 64, in __init__
self.stft_fn = STFT(filter_length, hop_length, win_length)
File "/Users/cherish/py3.9/lib/python3.9/site-packages/ttskit/mellotron/stft.py", line 67, in __init__
fft_window = pad_center(fft_window, filter_length
TypeError: pad_center() takes 1 positional argument but 2 were given

list of speaker's sound category and quality

Classification according to my own hearing, no guarantee on the accuracy.
Some voices are buggy? Some are unclear and have repetition problem.

  1. 'Aibao',child female
  2. 'Aicheng',teen male
  3. 'Aida',adult male
  4. 'Aijia',adult female
  5. 'Aijing',adult female
  6. 'Aimei',teen female
  7. 'Aina',teen female
  8. 'Aiqi',teen female
  9. 'Aitong',child female
  10. 'Aiwei',child female
  11. 'Aixia',teen female
  12. 'Aiya',adult female
  13. 'Aiyu',adult female
  14. 'Aiyue',adult female
  15. 'Siyue',adult female
  16. 'Xiaobei',child female
  17. 'Xiaogang',adult female
  18. 'Xiaomei',child female
  19. 'Xiaomeng',teen female
  20. 'Xiaowei',adult female
  21. 'Xiaoxue',adult female
  22. 'Xiaoyun',adult male
  23. 'Yina',teen female
  24. 'biaobei',teen female
  25. 'cctvfa',adult female
  26. 'cctvfb',adult male
  27. 'cctvma',buggy
  28. 'cctvmb',buggy
  29. 'cctvmc',adult female
  30. 'cctvmd',buggy

两个问题辛苦大神了

非常感谢提供这么好的工具。有两个问题想问一下:
1、可以支持更改语速吗?
2、生成语音文件的速度个人感觉很慢,这是正常的现象还是?

另外,在长文本生成上,建议可以用标点来进行分句会不会更好一些?

再次感谢!!

import ttskit and failed in importing name _speaker_dict

Windows 10

Python 3.6.5

installed completely and install something manually

  • tensorflow
  • pyaudio
  • sounddevice
  • pyworld

When I execute "import ttskit" in python cli:

>>> import ttskit
2021-10-28 12:24:34.621870: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-10-28 12:24:34.627138: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "F:\Python\Python36\lib\site-packages\ttskit\__init__.py", line 50, in <module>
    import sdk_api
  File "F:\Python\Python36\lib\site-packages\ttskit\sdk_api.py", line 41, in <module>
    from ttskit.resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'
>>>

你好,这里有个错误,ImportError: cannot import name '_speaker_dict'

这里的resource 是ttskit包里面的resource吗,并没有看到这个函数
(pytorch1.6) C:\Users\Administrator>python E:\TTS\ttskit\myTest.py
Traceback (most recent call last):
File "E:\TTS\ttskit\myTest.py", line 3, in
from ttskit import sdk_api
File "E:\TTS\ttskit\ttskit\sdk_api.py", line 49, in
from .resource import _speaker_dict
ImportError: cannot import name '_speaker_dict'

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you find it useful.

Here are the key functionalities of the project:

  1. Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).

  2. Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.

  3. Sound Quality Improvement: It improves the quality of the audio when needed.

  4. Audio Segmentation: It can segment audio files within specified second ranges.

  5. Transcription: The project transcribes the segmented audio, providing a textual representation.

  6. Gender Identification: It identifies the gender of each speaker in the audio.

  7. Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.

  8. Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.

  9. Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.

  10. Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.

  11. Syllabic and words-per-minute metrics

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

David Martin Rius

ydub/audio_segment.py", line 374, in __radd__ raise TypeError("Gains must be the second addend after the " TypeError: Gains must be the second addend after the AudioSegment 2022-10-27T02:42:03Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50648', 'HTTP_HOST': 'localhost:9000', (hidden keys: 26)} failed with TypeError

ydub/audio_segment.py", line 374, in radd
raise TypeError("Gains must be the second addend after the "
TypeError: Gains must be the second addend after the AudioSegment
2022-10-27T02:42:03Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '50648', 'HTTP_HOST': 'localhost:9000', (hidden keys: 26)} failed with TypeError

训练的时候出现这个问题,该怎么解决

D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
D:\ai\ttskit\ttskit\mellotron\stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
D:\ai\ttskit\ttskit\mellotron\layers.py:66: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)

长文本合成音频,总是只有最后一句。

长文本合成音频,总是只有最后一句。
#!usr/bin/env python

-- coding: utf-8 --

from ttskit import sdk_api
var='工业和信息化部总工程师田玉龙在国新办新闻发布会上介绍'
wav = sdk_api.tts_sdk_for(var,speaker='cctvfa', output=r'E:\TTS\ttskits\my9.wav')

网页版快速使用流程(亲测有效)

  1. 从 GitHub下载代码并解压,将文件夹 ttskit-main 作为自己的项目文件夹
  2. 从百度网盘(下载地址) 下载 resource 将其放到 ttskit-main\ttskit 文件夹中覆盖原有的 resource 文件夹
  3. 以上是作者提供步骤的大致描述, 但有一个小问题
  4. 实际上, 在替换过程中, ttskit-main\ttskit\resoure\__init__.py 不要替换
  5. 或者 完全替换之后, 再把 github 下载的文件解压一份, 然后用那里面的 resoure\__init__.py 单独替换回来
  6. 然后就在 ttskit-main 目录打开命令行, 输入并回车 pip install -U ttskit
  7. pip 结束后, 在 ttskit-main 文件夹中建立一个 demo.py 文件, 并输入以下代码
    from ttskit import http_server
    
    http_server.start_sever()```
  8. 然后在命令行中输入 py demo.py
  9. 过一段时间, 命令行最下端出现一段网址, 将它复制到浏览器粘贴即可
  10. 示例图片:
    image
    image

ImportError

发生异常: ImportError
cannot import name 'replace_tone2_style_dict_to_default' from 'pypinyin.utils' (D:\Program Files\Python\lib\site-packages\pypinyin\utils.py)
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\text_init
.py", line 7, in
from phkit.chinese import text_to_sequence as text_to_sequence_phkit, sequence_to_text, text2pinyin
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\data_utils.py", line 24, in
from mellotron.text import text_to_sequence, cmudict
File "E:\Work\Github\ttskit\TTS_kit\ttskit\mellotron\inference.py", line 26, in
from .data_utils import transform_mel, transform_text, transform_f0, transform_embed, transform_speaker
File "E:\Work\Github\ttskit\TTS_kit\ttskit\sdk_api.py", line 37, in
from ttskit.mellotron import inference as mellotron
File "E:\Work\Github\ttskit\TTS_kit\ttskit_init_.py", line 50, in
import sdk_api
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 31, in test_http_server
from ttskit import http_server
File "E:\Work\Github\ttskit\TTS_kit\test.py", line 42, in
test_http_server()

有关字数和合成质量的问题

合成字数一多,就大概率会得到很差的语音,内容都听不出来
比如我连续合成2,4,6直到20个字的语音。
16,18,20个字的语音基本都是没法听的,这种语音的语音时长都是一个固定的值11.629931972789116s。
有什么方法可以解决这个问题吗?问题是出在哪里?是否和我频繁合成有关系?

请教一下

搞了两天,终于跑起来了。很Nice,赞一个
还想请教几个问题:
1.如何添加自定义发音人,和如何训练发音人
2.如何添加对英文的支持

在WSL2子系统中运行不成功

wsl2 ubuntu 22.04子系统中,执行 命令行报错,这个是什么问题?该如何解决,按理说在没有 Nvidia GPU 的情况下,将默认使用 CPU 运行

$ tkcli -h
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "/usr/local/bin/tkcli", line 5, in <module>
    from ttskit.cli_api import tts_cli
  File "/usr/local/lib/python3.10/dist-packages/ttskit/cli_api.py", line 68, in <module>
    from . import sdk_api
  File "/usr/local/lib/python3.10/dist-packages/ttskit/sdk_api.py", line 55, in <module>
    _stft = TacotronSTFT(
  File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/layers.py", line 64, in __init__
    self.stft_fn = STFT(filter_length, hop_length, win_length)
  File "/usr/local/lib/python3.10/dist-packages/ttskit/mellotron/stft.py", line 67, in __init__
    fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.