Git Product home page Git Product logo

kan-tts's Introduction



PyPI

license open issues GitHub pull-requests GitHub latest commit Leaderboard

Discord

modelscope%2Fmodelscope | Trendshift

English | 中文 | 日本語

Introduction

ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation.

In particular, with rich layers of API-abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered-APIs, allowing easy and unified access to their models. Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of codes. In the meantime, flexibilities are also provided so that different components in the model applications can be customized wherever necessary.

Apart from harboring implementations of a wide range of different models, ModelScope library also enables the necessary interactions with ModelScope backend services, particularly with the Model-Hub and Dataset-Hub. Such interactions facilitate management of various entities (models and datasets) to be performed seamlessly under-the-hood, including entity lookup, version control, cache management, and many others.

Models and Online Accessibility

Hundreds of models are made publicly available on ModelScope (700+ and counting), covering the latest development in areas such as NLP, CV, Audio, Multi-modality, and AI for Science, etc. Many of these models represent the SOTA in their specific fields, and made their open-sourced debut on ModelScope. Users can visit ModelScope(modelscope.cn) and experience first-hand how these models perform via online experience, with just a few clicks. Immediate developer-experience is also possible through the ModelScope Notebook, which is backed by ready-to-use CPU/GPU development environment in the cloud - only one click away on ModelScope.



Some representative examples include:

LLM:

Multi-Modal:

CV:

Audio:

AI for Science:

Note: Most models on ModelScope are public and can be downloaded without account registration on modelscope website(www.modelscope.cn), please refer to instructions for model download, for dowloading models with api provided by modelscope library or git.

QuickTour

We provide unified interface for inference using pipeline, fine-tuning and evaluation using Trainer for different tasks.

For any given task with any type of input (image, text, audio, video...), inference pipeline can be implemented with only a few lines of code, which will automatically load the underlying model to get inference result, as is exemplified below:

>>> from modelscope.pipelines import pipeline
>>> word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')
>>> word_segmentation('今天天气不错,适合出去游玩')
{'output': '今天 天气 不错 , 适合 出去 游玩'}

Given an image, portrait matting (aka. background-removal) can be accomplished with the following code snippet:

image

>>> import cv2
>>> from modelscope.pipelines import pipeline

>>> portrait_matting = pipeline('portrait-matting')
>>> result = portrait_matting('https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_matting.png')
>>> cv2.imwrite('result.png', result['output_img'])

The output image with the background removed is: image

Fine-tuning and evaluation can also be done with a few more lines of code to set up training dataset and trainer, with the heavy-lifting work of training and evaluation a model encapsulated in the implementation of traner.train() and trainer.evaluate() interfaces.

For example, the gpt3 base model (1.3B) can be fine-tuned with the chinese-poetry dataset, resulting in a model that can be used for chinese-poetry generation.

>>> from modelscope.metainfo import Trainers
>>> from modelscope.msdatasets import MsDataset
>>> from modelscope.trainers import build_trainer

>>> train_dataset = MsDataset.load('chinese-poetry-collection', split='train'). remap_columns({'text1': 'src_txt'})
>>> eval_dataset = MsDataset.load('chinese-poetry-collection', split='test').remap_columns({'text1': 'src_txt'})
>>> max_epochs = 10
>>> tmp_dir = './gpt3_poetry'

>>> kwargs = dict(
     model='damo/nlp_gpt3_text-generation_1.3B',
     train_dataset=train_dataset,
     eval_dataset=eval_dataset,
     max_epochs=max_epochs,
     work_dir=tmp_dir)

>>> trainer = build_trainer(name=Trainers.gpt3_trainer, default_args=kwargs)
>>> trainer.train()

Why should I use ModelScope library

  1. A unified and concise user interface is abstracted for different tasks and different models. Model inferences and training can be implemented by as few as 3 and 10 lines of code, respectively. It is convenient for users to explore models in different fields in the ModelScope community. All models integrated into ModelScope are ready to use, which makes it easy to get started with AI, in both educational and industrial settings.

  2. ModelScope offers a model-centric development and application experience. It streamlines the support for model training, inference, export and deployment, and facilitates users to build their own MLOps based on the ModelScope ecosystem.

  3. For the model inference and training process, a modular design is put in place, and a wealth of functional module implementations are provided, which is convenient for users to customize their own model inference, training and other processes.

  4. For distributed model training, especially for large models, it provides rich training strategy support, including data parallel, model parallel, hybrid parallel and so on.

Installation

Docker

ModelScope Library currently supports popular deep learning framework for model training and inference, including PyTorch, TensorFlow and ONNX. All releases are tested and run on Python 3.7+, Pytorch 1.8+, Tensorflow1.15 or Tensorflow2.0+.

To allow out-of-box usage for all the models on ModelScope, official docker images are provided for all releases. Based on the docker image, developers can skip all environment installation and configuration and use it directly. Currently, the latest version of the CPU image and GPU image can be obtained from:

CPU docker image

# py37
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-1.6.1

# py38
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py38-torch2.0.1-tf2.13.0-1.9.5

GPU docker image

# py37
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.3.0-py37-torch1.11.0-tf1.15.5-1.6.1

# py38
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.5

Setup Local Python Environment

One can also set up local ModelScope environment using pip and conda. ModelScope supports python3.7 and above. We suggest anaconda for creating local python environment:

conda create -n modelscope python=3.8
conda activate modelscope

PyTorch or TensorFlow can be installed separately according to each model's requirements.

  • Install pytorch doc
  • Install tensorflow doc

After installing the necessary machine-learning framework, you can install modelscope library as follows:

If you only want to play around with the modelscope framework, of trying out model/dataset download, you can install the core modelscope components:

pip install modelscope

If you want to use multi-modal models:

pip install modelscope[multi-modal]

If you want to use nlp models:

pip install modelscope[nlp] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use cv models:

pip install modelscope[cv] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use audio models:

pip install modelscope[audio] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use science models:

pip install modelscope[science] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

Notes:

  1. Currently, some audio-task models only support python3.7, tensorflow1.15.4 Linux environments. Most other models can be installed and used on Windows and Mac (x86).

  2. Some models in the audio field use the third-party library SoundFile for wav file processing. On the Linux system, users need to manually install libsndfile of SoundFile(doc link). On Windows and MacOS, it will be installed automatically without user operation. For example, on Ubuntu, you can use following commands:

    sudo apt-get update
    sudo apt-get install libsndfile1
  3. Some models in computer vision need mmcv-full, you can refer to mmcv installation guide, a minimal installation is as follows:

    pip uninstall mmcv # if you have installed mmcv, uninstall it
    pip install -U openmim
    mim install mmcv-full

Learn More

We provide additional documentations including:

License

This project is licensed under the Apache License (Version 2.0).

Citation

@Misc{modelscope,
  title = {ModelScope: bring the notion of Model-as-a-Service to life.},
  author = {The ModelScope Team},
  howpublished = {\url{https://github.com/modelscope/modelscope}},
  year = {2023}
}

kan-tts's People

Contributors

alibaba-oss avatar ginchow avatar hukai-sun avatar tonyhehahaha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kan-tts's Issues

SyntaxError: invalid syntax (3690364265.py, line 2) in notebook

特征提取

python kantts/preprocess/data_process.py --voice_input_dir ptts_spk0_autolabel --voice_output_dir training_stage/test_male_ptts_feats --audio_config kantts/configs/audio_config_se_16k.yaml --speaker F7 --se_model speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.*

扩充epoch

stage0=training_stage
voice=test_male_ptts_feats

cat $stage0/$voice/am_valid.lst >> $stage0/$voice/am_train.lst
lines=0
while [ $lines -lt 400 ]
do
shuf $stage0/$voice/am_train.lst >> $stage0/$voice/am_train.lst.tmp
lines=$(wc -l < "$stage0/$voice/am_train.lst.tmp")
done
mv $stage0/$voice/am_train.lst.tmp $stage0/$voice/am_train.lst

关于ttsfrd

你好,能分发一下 ttsfrd 的aarch64版本吗,在tts模型部署的链条上,貌似就差这个的aarch64版本,就可以在各种边缘计算设备上跑通了

关于多说话人的疑问

请问speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k这个模型能同时训练多个说话人吗?

auto_label 出错 Authentication token does not exist, failed to access model

ps: 解决方案:将 v1.0.4 改为 v1.0.7,https://modelscope.cn/models/Jinglin/personal_voice/summary 这个链接里头的版本比较老旧所以无法运行

运行代码

from modelscope.tools import run_auto_label

input_wav = "./test_wavs/"
output_data = "./output_training_data/"

ret, report = run_auto_label(input_wav=input_wav, work_dir=output_data, resource_revision="v1.0.4")

出错

2023-09-15 10:20:33,746 - modelscope - INFO - PyTorch version 2.0.1 Found.
2023-09-15 10:20:33,746 - modelscope - INFO - Loading ast index from /home/ubuntu/.cache/modelscope/ast_indexer
2023-09-15 10:20:35,583 - modelscope - INFO - Loading done! Current index file version is 1.9.0, with md5 66c797046ce7835fbc6d499ee4dcf5e4 and a total number of 921 components indexed
2023-09-15 10:20:40,467 - modelscope - INFO - Use user-specified model revision: v1.0.4
INFO:root:2023-09-15 10:20:54
INFO:root:TTS-AutoLabel version: 1.1.8
INFO:root:TTS-AutoLabel resource path: /home/ubuntu/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model
INFO:root:Target sampling rate: 16000
INFO:root:Input wav dir: /cfs/user/ubuntu/work/tts/ali-tts/test_female
INFO:root:Output data dir: /cfs/user/ubuntu/work/tts/ali-tts/output_training_data
INFO:root:wav_preprocess start...
INFO:root:---  There is this folder!  ---
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 21.49it/s]
INFO:root:[VAD] chunk recordings for training.
INFO:root:wav cut by vad start...
2023-09-15 10:20:55,457 - modelscope - ERROR - Authentication token does not exist, failed to access model /home/ubuntu/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/speech_vad_assert/fsmn_vad_16k which may not exist or may be                 private. Please login first.
Traceback (most recent call last):
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/hub/errors.py", line 81, in handle_http_response
    response.raise_for_status()
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://www.modelscope.cn/api/v1/models//home/ubuntu/.cache/modelscope/hub/damo/speech_ptts_autolabel_16k/model/speech_vad_assert/fsmn_vad_16k/revisions?EndTime=1693929600

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/tts_autolabel/audio2phone/funasr_onnx/vad_bin.py", line 42, in __init__
    model_dir = snapshot_download(model_dir, cache_dir=cache_dir)
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/hub/snapshot_download.py", line 96, in snapshot_download
    revision = _api.get_valid_revision(
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/hub/api.py", line 464, in get_valid_revision
    revisions = self.list_model_revisions(
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/hub/api.py", line 433, in list_model_revisions
    handle_http_response(r, logger, cookies, model_id)
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/hub/errors.py", line 88, in handle_http_response
    raise HTTPError('Response details: %s' % message) from error
requests.exceptions.HTTPError: Response details: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "label.py", line 6, in <module>
    ret, report = run_auto_label(input_wav=input_wav, work_dir=output_data, resource_revision="v1.0.4")
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/modelscope/tools/speech_tts_autolabel.py", line 78, in run_auto_label
    ret_code, report = auto_labeling.run()
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/tts_autolabel/auto_label.py", line 853, in run
    self.wav_cut_by_vad(self.resample_wav_dir, self.cut_wav_dir)
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/tts_autolabel/auto_label.py", line 437, in wav_cut_by_vad
    vad_cut(input_wav_dir, output_wav_dir, self.resource_dir)
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/tts_autolabel/audiocut/vad.py", line 350, in vad_cut
    vad_pipeline_superhigh = Fsmn_vad(
  File "/cfs/user/ubuntu/anaconda3/envs/modelscope/lib/python3.8/site-packages/tts_autolabel/audio2phone/funasr_onnx/vad_bin.py", line 44, in __init__
    raise "model_dir must be model_name in modelscope or local path downloaded from modelscope, but is {}".format(
TypeError: exceptions must derive from BaseException

运行 KAN-TTS 的 preprocess 时,出现 [ONNXRuntimeError] : 7 : INVALID_PROTOBUF ERROR !

运行 KAN-TTS 的 preprocess 脚本时,遇到了一个 ONNXRuntimeError,错误提示是在加载 se.onnx 模型时出现了 Protobuf 解析错误。

我参考的训练流程是这里的:https://www.modelscope.cn/models/damo/speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/summary

image

运行的命令和结果输出如下:

python kantts/preprocess/data_process.py --voice_input_dir /home/speech_personal_sambert_modelscope/KAN-TTS/resource/test_male_autolabel --voice_output_dir training_stage/test_male_ptts_feats --audio_config kantts/configs/audio_config_se_16k.yaml --speaker test_male --se_model speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.onnx

(maas) [root@VM-16-3-centos KAN-TTS]# python kantts/preprocess/data_process.py --voice_input_dir /home/speech_personal_sambert_modelscope/KAN-TTS/resource/test_male_autolabel --voice_output_dir training_stage/test_male_ptts_feats --audio_config kantts/configs/audio_config_se_16k.yaml --speaker test_male --se_model speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.onnx
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 13163.76it/s]
2023-03-30:23:02:36 INFO [TextScriptConvertor.py:469] TextScriptConvertor.process:
Save script to: training_stage/test_male_ptts_feats/Script.xml
2023-03-30:23:02:36 INFO [TextScriptConvertor.py:490] TextScriptConvertor.process:
Save metafile to: training_stage/test_male_ptts_feats/raw_metafile.txt
2023-03-30:23:02:36 INFO [audio_processor.py:90] [AudioProcessor] Initialize AudioProcessor.
2023-03-30:23:02:36 INFO [audio_processor.py:91] [AudioProcessor] config params:
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] wav_normalize: True
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] trim_silence: True
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] trim_silence_threshold_db: 60
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] preemphasize: False
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] sampling_rate: 16000
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] hop_length: 200
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] win_length: 1000
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] n_fft: 2048
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] n_mels: 80
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] fmin: 0.0
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] fmax: 8000.0
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] phone_level_feature: True
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] se_feature: True
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] norm_type: mean_std
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] max_norm: 1.0
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] symmetric: False
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] min_level_db: -100.0
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] ref_level_db: 20
2023-03-30:23:02:36 INFO [audio_processor.py:93] [AudioProcessor] num_workers: 16
2023-03-30:23:02:36 INFO [audio_processor.py:201] [AudioProcessor] Amplitude normalization started
2023-03-30:23:02:36 INFO [utils.py:184] Volume statistic proceeding...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 191.17it/s]
2023-03-30:23:02:36 INFO [utils.py:170] Average amplitude RMS : 0.054727649999999996
2023-03-30:23:02:36 INFO [utils.py:186] Volume statistic done.
2023-03-30:23:02:36 INFO [utils.py:194] Volume normalization proceeding...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 2199.60it/s]
2023-03-30:23:02:36 INFO [utils.py:221] Volume normalization done.
2023-03-30:23:02:36 INFO [audio_processor.py:204] [AudioProcessor] Amplitude normalization finished
2023-03-30:23:02:36 INFO [audio_processor.py:394] [AudioProcessor] Duration generation started
  0%|                                                                                                                                                                | 0/20 [00:00<?, ?it/s]2023-03-30:23:02:36 INFO [audio_processor.py:411] [AudioProcessor] Duration align with mel is proceeding...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 182.71it/s]
2023-03-30:23:02:36 INFO [audio_processor.py:453] [AudioProcessor] Duration generate finished
2023-03-30:23:02:36 INFO [audio_processor.py:278] [AudioProcessor] Trim silence with interval started
2023-03-30:23:02:36 INFO [audio_processor.py:216] [AudioProcessor] Start to load pcm from training_stage/test_male_ptts_feats/wav
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 179.38it/s]
  0%|                                                                                                                                                                | 0/20 [00:00<?, ?it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4150.72it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:314] [AudioProcessor] Trim silence finished
2023-03-30:23:02:37 INFO [audio_processor.py:322] [AudioProcessor] Melspec extraction started
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 56.95it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:361] [AudioProcessor] Melspec extraction finished
2023-03-30:23:02:37 INFO [audio_processor.py:365] Melspec statistic proceeding...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 39107.73it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 10867.48it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:368] Melspec statistic done
2023-03-30:23:02:37 INFO [audio_processor.py:374] [AudioProcessor] melspec mean and std saved to:
training_stage/test_male_ptts_feats/mel/mel_mean.txt,
training_stage/test_male_ptts_feats/mel/mel_std.txt
2023-03-30:23:02:37 INFO [audio_processor.py:378] [AudioProcessor] Melspec mean std norm is proceeding...
2023-03-30:23:02:37 INFO [audio_processor.py:384] [AudioProcessor] Melspec normalization finished
2023-03-30:23:02:37 INFO [audio_processor.py:385] [AudioProcessor] Normed Melspec saved to training_stage/test_male_ptts_feats/mel
2023-03-30:23:02:37 INFO [audio_processor.py:467] [AudioProcessor] Pitch extraction started
  0%|                                                                                                                                                                | 0/20 [00:00<?, ?it/s]2023-03-30:23:02:37 INFO [audio_processor.py:483] [AudioProcessor] Pitch align with mel is proceeding...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 73.88it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:510] [AudioProcessor] Pitch normalization is proceeding...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 72565.81it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 33209.06it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:518] [AudioProcessor] f0 mean and std saved to:
training_stage/test_male_ptts_feats/f0/f0_mean.txt,
training_stage/test_male_ptts_feats/f0/f0_std.txt
2023-03-30:23:02:37 INFO [audio_processor.py:521] [AudioProcessor] Pitch mean std norm is proceeding...
2023-03-30:23:02:37 INFO [audio_processor.py:548] [AudioProcessor] Pitch turn to phone-level is proceeding...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 182.40it/s]
2023-03-30:23:02:37 INFO [audio_processor.py:580] [AudioProcessor] Pitch normalization finished
2023-03-30:23:02:37 INFO [audio_processor.py:581] [AudioProcessor] Normed f0 saved to training_stage/test_male_ptts_feats/f0
2023-03-30:23:02:37 INFO [audio_processor.py:582] [AudioProcessor] Pitch extraction finished
2023-03-30:23:02:37 INFO [audio_processor.py:593] [AudioProcessor] Energy extraction started
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 116.35it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 59033.13it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 31968.78it/s]
2023-03-30:23:02:38 INFO [audio_processor.py:638] [AudioProcessor] energy mean and std saved to:
training_stage/test_male_ptts_feats/energy/energy_mean.txt,
training_stage/test_male_ptts_feats/energy/energy_std.txt
2023-03-30:23:02:38 INFO [audio_processor.py:642] [AudioProcessor] Energy mean std norm is proceeding...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 186.50it/s]
2023-03-30:23:02:38 INFO [audio_processor.py:690] [AudioProcessor] Energy normalization finished
2023-03-30:23:02:38 INFO [audio_processor.py:691] [AudioProcessor] Normed Energy saved to training_stage/test_male_ptts_feats/energy
2023-03-30:23:02:38 INFO [audio_processor.py:692] [AudioProcessor] Energy extraction finished
2023-03-30:23:02:38 INFO [audio_processor.py:774] [AudioProcessor] All features extracted successfully!
2023-03-30:23:02:38 INFO [data_process.py:192] Processing audio done.
2023-03-30:23:02:38 INFO [se_processor.py:63] [SpeakerEmbeddingProcessor] Speaker embedding extractor started
2023-03-30:23:02:38 ERROR [data_process.py:237] [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.onnx failed:Protobuf parsing failed.
Traceback (most recent call last):
  File "kantts/preprocess/data_process.py", line 234, in <module>
    args.se_model,
  File "kantts/preprocess/data_process.py", line 199, in process_data
    se_model,
  File "/home/speech_personal_sambert_modelscope/KAN-TTS/kantts/preprocess/se_processor/se_processor.py", line 67, in process
    sess = onnxruntime.InferenceSession(se_onnx, sess_options=opts)
  File "/root/anaconda3/envs/maas/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/root/anaconda3/envs/maas/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 397, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.onnx failed:Protobuf parsing failed.
(maas) [root@VM-16-3-centos KAN-TTS]# 


运行过程中,不断打印“Load pinyin_en_mix_dict failed”

运行过程中,不断打印“Load pinyin_en_mix_dict failed”。虽然能正常输出音频,但不知道这条log是否表明运行有问题?是不是我配置还缺了啥?

直接用的是SambertHifigan语音合成-中文-多人预训练-16k模型。
主角本如下:
#!/bin/bash

SambertHifigan语音合成-中文-多人预训练-16k

git clone -b pretrain http://www.modelscope.cn/speech_tts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k.git

speaker_list: 'F7,F74,FBYN,FRXL,M7,xiaoyu'} all except M7 are female

res_zip=../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/resource.zip
am_ckpt=../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/sambert/ckpt/checkpoint_980000.pth
voc_ckpt=../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth
spk=xiaoyu

outdir=out_$spk
[ -d $outdir ] && rm -rf $outdir; mkdir -p $outdir

python ./kantts/bin/text_to_wav.py
--txt ./test_data/txt
--output_dir $outdir
--res_zip $res_zip
--am_ckpt $am_ckpt
--voc_ckpt $voc_ckpt
--speaker $spk

运行过程中打印的log如下:
Converting text to symbols...
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init
AM is infering...
Loading checkpoint: ../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/sambert/ckpt/checkpoint_980000.pth
Inference sentence: 0_0
x_band_width:7, h_band_width: 7
Inference sentence: 1_0
x_band_width:6, h_band_width: 6
Inference sentence: 2_0
x_band_width:8, h_band_width: 8
Inference sentence: 3_0
x_band_width:7, h_band_width: 7
Vocoder is infering...
Loss = {'discriminator_adv_loss': {'enable': True, 'params': {'average_by_discriminators': False}, 'weights': 1.0}, 'feat_match_loss': {'enable': True, 'params': {'average_by_discriminators': False, 'average_by_layers': False}, 'weights': 2.0}, 'generator_adv_loss': {'enable': True, 'params': {'average_by_discriminators': False}, 'weights': 1.0}, 'mel_loss': {'enable': True, 'params': {'fft_size': 2048, 'fmax': 8000, 'fmin': 0, 'fs': 16000, 'hop_size': 200, 'log_base': None, 'num_mels': 80, 'win_length': 1000, 'window': 'hann'}, 'weights': 45.0}, 'stft_loss': {'enable': False}, 'subband_stft_loss': {'enable': False, 'params': {'fft_sizes': [384, 683, 171], 'hop_sizes': [35, 75, 15], 'win_lengths': [150, 300, 60], 'window': 'hann_window'}}}
Model = {'Generator': {'optimizer': {'params': {'betas': [0.5, 0.9], 'lr': 0.0002, 'weight_decay': 0.0}, 'type': 'Adam'}, 'params': {'bias': True, 'channels': 256, 'in_channels': 80, 'kernel_size': 7, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1, 'resblock_dilations': [[1, 3, 5, 7], [1, 3, 5, 7], [1, 3, 5, 7]], 'resblock_kernel_sizes': [3, 7, 11], 'upsample_kernal_sizes': [20, 10, 4, 4], 'upsample_scales': [10, 5, 2, 2], 'use_weight_norm': True}, 'scheduler': {'params': {'gamma': 0.5, 'milestones': [200000, 400000, 600000, 800000]}, 'type': 'MultiStepLR'}}, 'MultiPeriodDiscriminator': {'optimizer': {'params': {'betas': [0.5, 0.9], 'lr': 0.0002, 'weight_decay': 0.0}, 'type': 'Adam'}, 'params': {'discriminator_params': {'bias': True, 'channels': 32, 'downsample_scales': [3, 3, 3, 3, 1], 'in_channels': 1, 'kernel_sizes': [5, 3], 'max_downsample_channels': 1024, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1, 'use_spectral_norm': False}, 'periods': [2, 3, 5, 7, 11]}, 'scheduler': {'params': {'gamma': 0.5, 'milestones': [200000, 400000, 600000, 800000]}, 'type': 'MultiStepLR'}}, 'MultiScaleDiscriminator': {'optimizer': {'params': {'betas': [0.5, 0.9], 'lr': 0.0002, 'weight_decay': 0.0}, 'type': 'Adam'}, 'params': {'discriminator_params': {'bias': True, 'channels': 128, 'downsample_scales': [4, 4, 4, 4, 1], 'in_channels': 1, 'kernel_sizes': [15, 41, 5, 3], 'max_downsample_channels': 1024, 'max_groups': 16, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1}, 'downsample_pooling': 'DWT', 'downsample_pooling_params': {'kernel_size': 4, 'padding': 2, 'stride': 2}, 'follow_official_norm': True, 'scales': 3}, 'scheduler': {'params': {'gamma': 0.5, 'milestones': [200000, 400000, 600000, 800000]}, 'type': 'MultiStepLR'}}}
allow_cache = True
audio_config = {'fmax': 8000.0, 'fmin': 0.0, 'hop_length': 200, 'max_norm': 1.0, 'min_level_db': -100.0, 'n_fft': 2048, 'n_mels': 80, 'norm_type': 'mean_std', 'num_workers': 16, 'phone_level_feature': True, 'preemphasize': False, 'ref_level_db': 20, 'sampling_rate': 16000, 'symmetric': False, 'trim_silence': True, 'trim_silence_threshold_db': 60, 'wav_normalize': True, 'win_length': 1000}
batch_max_steps = 9600
batch_size = 16
create_time = 2022-09-18 14:11:30
discriminator_grad_norm = -1
discriminator_train_start_steps = 0
eval_interval_steps = 10000
generator_grad_norm = -1
generator_train_start_steps = 1
git_revision_hash = 22ae438
log_interval_steps = 1000
model_type = hifigan
num_save_intermediate_results = 4
num_workers = 2
pin_memory = True
remove_short_samples = False
save_interval_steps = 20000
train_max_steps = 2500000
Loaded model parameters from ../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth.
Removing weight norm...
Finished generation of 4 utterances (RTF = 0.310).
['out_xiaoyu/0_0_mel_gen.wav', 'out_xiaoyu/1_0_mel_gen.wav', 'out_xiaoyu/2_0_mel_gen.wav', 'out_xiaoyu/3_0_mel_gen.wav']
Text to wav finished!

Conda list:

Name Version Build Channel

_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
absl-py 1.4.0 pypi_0 pypi
addict 2.4.0 pypi_0 pypi
aiohttp 3.8.4 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
aliyun-python-sdk-core 2.13.36 pypi_0 pypi
aliyun-python-sdk-kms 2.16.1 pypi_0 pypi
aniso8601 9.0.1 pypi_0 pypi
async-timeout 4.0.2 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
audioread 3.0.0 pypi_0 pypi
autopep8 2.0.2 pypi_0 pypi
bitstring 4.0.2 pypi_0 pypi
ca-certificates 2023.05.30 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2023.5.7 pypi_0 pypi
cffi 1.15.1 pypi_0 pypi
cfgv 3.3.1 pypi_0 pypi
charset-normalizer 3.1.0 pypi_0 pypi
click 8.0.4 pypi_0 pypi
cmake 3.26.4 pypi_0 pypi
coloredlogs 14.0 pypi_0 pypi
contourpy 1.1.0 pypi_0 pypi
crcmod 1.7 pypi_0 pypi
cryptography 41.0.1 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
cython 0.29.35 pypi_0 pypi
datasets 2.8.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dill 0.3.6 pypi_0 pypi
distance 0.1.3 pypi_0 pypi
distlib 0.3.6 pypi_0 pypi
dnspython 2.3.0 pypi_0 pypi
easyasr 0.0.7 pypi_0 pypi
edit-distance 1.0.6 pypi_0 pypi
editdistance 0.5.2 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
espnet-tts-frontend 0.0.3 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
eventlet 0.33.3 pypi_0 pypi
filelock 3.12.2 pypi_0 pypi
flask 2.1.3 pypi_0 pypi
flask-cors 3.0.10 pypi_0 pypi
flask-restful 0.3.10 pypi_0 pypi
flask-socketio 4.3.2 pypi_0 pypi
flask-talisman 1.0.0 pypi_0 pypi
fonttools 4.40.0 pypi_0 pypi
frozenlist 1.3.3 pypi_0 pypi
fsspec 2023.6.0 pypi_0 pypi
funasr 0.6.1 pypi_0 pypi
future 0.18.3 pypi_0 pypi
g2p 1.1.20230511 pypi_0 pypi
g2p-en 2.1.0 pypi_0 pypi
gast 0.5.4 pypi_0 pypi
greenlet 2.0.2 pypi_0 pypi
grpcio 1.54.2 pypi_0 pypi
h5py 3.8.0 pypi_0 pypi
huggingface-hub 0.15.1 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
hyperpyyaml 1.2.1 pypi_0 pypi
identify 2.5.24 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 6.6.0 pypi_0 pypi
importlib-resources 5.12.0 pypi_0 pypi
inflect 6.0.4 pypi_0 pypi
itsdangerous 2.1.2 pypi_0 pypi
jaconv 0.3.4 pypi_0 pypi
jamo 0.4.1 pypi_0 pypi
jedi 0.18.2 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
joblib 1.2.0 pypi_0 pypi
kaldiio 2.18.0 pypi_0 pypi
kantts 0.0.1 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
kwsbp 0.0.6 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.4.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
librosa 0.9.2 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
lit 16.0.5.post0 pypi_0 pypi
llvmlite 0.40.1rc1 pypi_0 pypi
lxml 4.9.2 pypi_0 pypi
markdown 3.4.3 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib 3.7.1 pypi_0 pypi
mindaec 0.0.2 pypi_0 pypi
mir-eval 0.7 pypi_0 pypi
modelscope 1.6.1 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.0.5 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
multiprocess 0.70.14 pypi_0 pypi
munkres 1.1.4 pypi_0 pypi
ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx 2.8.4 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
nodeenv 1.8.0 pypi_0 pypi
numba 0.57.0 pypi_0 pypi
numpy 1.22.0 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
openpyxl 3.1.2 pypi_0 pypi
openssl 3.0.8 h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
oss2 2.18.0 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pandas 1.5.3 pypi_0 pypi
panphon 0.20.0 pypi_0 pypi
parso 0.8.3 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pillow 9.5.0 pypi_0 pypi
pip 23.1.2 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
platformdirs 3.5.3 pypi_0 pypi
pooch 1.7.0 pypi_0 pypi
pre-commit 3.3.3 pypi_0 pypi
prompt-toolkit 3.0.38 pypi_0 pypi
protobuf 3.20.0 pypi_0 pypi
ptflops 0.7 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
py-sound-connect 0.2.1 pypi_0 pypi
pyarrow 12.0.1 pypi_0 pypi
pycodestyle 2.10.0 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pycryptodome 3.18.0 pypi_0 pypi
pydantic 1.10.9 pypi_0 pypi
pygments 2.15.1 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
pypinyin 0.44.0 pypi_0 pypi
pysptk 0.1.21 pypi_0 pypi
python 3.8.16 h955ad1f_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.8.2 pypi_0 pypi
python-engineio 3.14.2 pypi_0 pypi
python-socketio 4.6.1 pypi_0 pypi
pytorch-wavelets 1.3.0 pypi_0 pypi
pytorch-wpe 0.0.1 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pywavelets 1.4.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
regex 2023.6.3 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
resampy 0.4.2 pypi_0 pypi
responses 0.18.0 pypi_0 pypi
rotary-embedding-torch 0.2.3 pypi_0 pypi
ruamel-yaml 0.17.28 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
scikit-learn 1.2.2 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 67.8.0 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
simplejson 3.19.1 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sortedcontainers 2.4.0 pypi_0 pypi
soundfile 0.12.1 pypi_0 pypi
sox 1.4.1 pypi_0 pypi
speechbrain 0.5.14 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sympy 1.12 pypi_0 pypi
tensorboard 1.15.0 pypi_0 pypi
tensorboardx 2.6 pypi_0 pypi
text-unidecode 1.3 pypi_0 pypi
textgrid 1.5 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tomli 2.0.1 pypi_0 pypi
torch 2.0.1 pypi_0 pypi
torch-complex 0.4.3 pypi_0 pypi
torchaudio 2.0.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
ttsfrd 0.2.1 pypi_0 pypi
typeguard 2.13.3 pypi_0 pypi
typing-extensions 4.6.3 pypi_0 pypi
unicodecsv 0.14.1 pypi_0 pypi
unidecode 1.3.6 pypi_0 pypi
urllib3 2.0.3 pypi_0 pypi
virtualenv 20.23.0 pypi_0 pypi
wcwidth 0.2.6 pypi_0 pypi
werkzeug 2.0.3 pypi_0 pypi
wheel 0.38.4 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xxhash 3.2.0 pypi_0 pypi
xz 5.4.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yapf 0.40.0 pypi_0 pypi
yarl 1.9.2 pypi_0 pypi
zipp 3.15.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

git rev-parse HEAD:
8caf892

HiFi-GAN声码器的激活函数(?)选择

你好,

我注意到我们的HiFi-GAN在upsample部分, 会先进行一个

x = torch.sin(x) + x

这里似乎是替换了原始实现中的激活函数, 请问这个有参考文献吗? 还是实验结果验证?

无法多卡训练?

微信截图_20230117111115

If your GPU devices are enough, you can use distributed training, which is a lot faster than single GPU training. For example, assign GPU device indexes with CUDA_VISIBLE_DEVICES system variable, --nproc_per_node denotes the count of GPU devices.

--nproc_per_node 没有这个参数呀?

使用通用格式数据微调Sambert-hifigan出现报错

微调步骤参考:https://mp.weixin.qq.com/s/Xo-pMe3-P-fJ-32Z1JLonA
已尝试过:
kantts/configs/sambert_16k_MAS.yaml(发音人已修改)
speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/sambert/config.yaml(发音人已修改)

通用数据格式如下:

.
├── prosody
│   └── prosody.txt
└── wav
    ├── 1.wav
    ├── 2.wav
    ├── ...
    └── 9000.wav

报错信息如下:

Traceback (most recent call last):
  File "kantts/bin/train_sambert.py", line 231, in <module>
    args.local_rank,
  File "kantts/bin/train_sambert.py", line 122, in train
    pin_memory=config["pin_memory"],
  File "/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore
  File "/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 104, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

kantts/bin/text_to_wav.py 加载SambertHifigan语音合成-中文-多人预训练-16k 合成音频报错

报错信息
Converting text to symbols...
Traceback (most recent call last):
File "./kantts/bin/text_to_wav.py", line 168, in
text_to_wav(
File "./kantts/bin/text_to_wav.py", line 102, in text_to_wav
speaker = config["linguistic_unit"]["speaker_list"].split(",")[0]
KeyError: 'linguistic_unit'

运行角本
#!/bin/bash

SambertHifigan语音合成-中文-多人预训练-16k

speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k

outdir=out
[ -d $outdir ] && rm -rf $outdir; mkdir -p $outdir

python ./kantts/bin/text_to_wav.py
--txt ./test_data/txt
--output_dir $outdir
--res_zip ../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/resource.zip
--am_ckpt ../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth
--voc_ckpt ../funtts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth

运行环境 linux
conda list

packages in environment at /home/eeodev/anaconda3/envs/funtts:

Name Version Build Channel

_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
absl-py 1.4.0 pypi_0 pypi
addict 2.4.0 pypi_0 pypi
aiohttp 3.8.4 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
aliyun-python-sdk-core 2.13.36 pypi_0 pypi
aliyun-python-sdk-kms 2.16.1 pypi_0 pypi
aniso8601 9.0.1 pypi_0 pypi
async-timeout 4.0.2 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
audioread 3.0.0 pypi_0 pypi
autopep8 2.0.2 pypi_0 pypi
bitstring 4.0.2 pypi_0 pypi
ca-certificates 2023.05.30 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2023.5.7 pypi_0 pypi
cffi 1.15.1 pypi_0 pypi
cfgv 3.3.1 pypi_0 pypi
charset-normalizer 3.1.0 pypi_0 pypi
click 8.0.4 pypi_0 pypi
cmake 3.26.4 pypi_0 pypi
coloredlogs 14.0 pypi_0 pypi
contourpy 1.1.0 pypi_0 pypi
crcmod 1.7 pypi_0 pypi
cryptography 41.0.1 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
cython 0.29.35 pypi_0 pypi
datasets 2.8.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dill 0.3.6 pypi_0 pypi
distance 0.1.3 pypi_0 pypi
distlib 0.3.6 pypi_0 pypi
dnspython 2.3.0 pypi_0 pypi
easyasr 0.0.7 pypi_0 pypi
edit-distance 1.0.6 pypi_0 pypi
editdistance 0.5.2 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
espnet-tts-frontend 0.0.3 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
eventlet 0.33.3 pypi_0 pypi
filelock 3.12.2 pypi_0 pypi
flask 2.1.3 pypi_0 pypi
flask-cors 3.0.10 pypi_0 pypi
flask-restful 0.3.10 pypi_0 pypi
flask-socketio 4.3.2 pypi_0 pypi
flask-talisman 1.0.0 pypi_0 pypi
fonttools 4.40.0 pypi_0 pypi
frozenlist 1.3.3 pypi_0 pypi
fsspec 2023.6.0 pypi_0 pypi
funasr 0.6.1 pypi_0 pypi
future 0.18.3 pypi_0 pypi
g2p 1.1.20230511 pypi_0 pypi
g2p-en 2.1.0 pypi_0 pypi
gast 0.5.4 pypi_0 pypi
greenlet 2.0.2 pypi_0 pypi
grpcio 1.54.2 pypi_0 pypi
h5py 3.8.0 pypi_0 pypi
huggingface-hub 0.15.1 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
hyperpyyaml 1.2.1 pypi_0 pypi
identify 2.5.24 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 6.6.0 pypi_0 pypi
importlib-resources 5.12.0 pypi_0 pypi
inflect 6.0.4 pypi_0 pypi
itsdangerous 2.1.2 pypi_0 pypi
jaconv 0.3.4 pypi_0 pypi
jamo 0.4.1 pypi_0 pypi
jedi 0.18.2 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
joblib 1.2.0 pypi_0 pypi
kaldiio 2.18.0 pypi_0 pypi
kantts 0.0.1 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
kwsbp 0.0.6 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.4.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
librosa 0.9.2 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
lit 16.0.5.post0 pypi_0 pypi
llvmlite 0.40.1rc1 pypi_0 pypi
lxml 4.9.2 pypi_0 pypi
markdown 3.4.3 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib 3.7.1 pypi_0 pypi
mindaec 0.0.2 pypi_0 pypi
mir-eval 0.7 pypi_0 pypi
modelscope 1.6.1 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.0.5 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
multiprocess 0.70.14 pypi_0 pypi
munkres 1.1.4 pypi_0 pypi
ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx 2.8.4 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
nodeenv 1.8.0 pypi_0 pypi
numba 0.57.0 pypi_0 pypi
numpy 1.22.0 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
openpyxl 3.1.2 pypi_0 pypi
openssl 3.0.8 h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
oss2 2.18.0 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pandas 1.5.3 pypi_0 pypi
panphon 0.20.0 pypi_0 pypi
parso 0.8.3 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pillow 9.5.0 pypi_0 pypi
pip 23.1.2 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
platformdirs 3.5.3 pypi_0 pypi
pooch 1.7.0 pypi_0 pypi
pre-commit 3.3.3 pypi_0 pypi
prompt-toolkit 3.0.38 pypi_0 pypi
protobuf 3.20.0 pypi_0 pypi
ptflops 0.7 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
py-sound-connect 0.2.1 pypi_0 pypi
pyarrow 12.0.1 pypi_0 pypi
pycodestyle 2.10.0 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pycryptodome 3.18.0 pypi_0 pypi
pydantic 1.10.9 pypi_0 pypi
pygments 2.15.1 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
pypinyin 0.44.0 pypi_0 pypi
pysptk 0.1.21 pypi_0 pypi
python 3.8.16 h955ad1f_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.8.2 pypi_0 pypi
python-engineio 3.14.2 pypi_0 pypi
python-socketio 4.6.1 pypi_0 pypi
pytorch-wavelets 1.3.0 pypi_0 pypi
pytorch-wpe 0.0.1 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pywavelets 1.4.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
regex 2023.6.3 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
resampy 0.4.2 pypi_0 pypi
responses 0.18.0 pypi_0 pypi
rotary-embedding-torch 0.2.3 pypi_0 pypi
ruamel-yaml 0.17.28 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
scikit-learn 1.2.2 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 67.8.0 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
simplejson 3.19.1 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sortedcontainers 2.4.0 pypi_0 pypi
soundfile 0.12.1 pypi_0 pypi
sox 1.4.1 pypi_0 pypi
speechbrain 0.5.14 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sympy 1.12 pypi_0 pypi
tensorboard 1.15.0 pypi_0 pypi
tensorboardx 2.6 pypi_0 pypi
text-unidecode 1.3 pypi_0 pypi
textgrid 1.5 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tomli 2.0.1 pypi_0 pypi
torch 2.0.1 pypi_0 pypi
torch-complex 0.4.3 pypi_0 pypi
torchaudio 2.0.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
ttsfrd 0.2.1 pypi_0 pypi
typeguard 2.13.3 pypi_0 pypi
typing-extensions 4.6.3 pypi_0 pypi
unicodecsv 0.14.1 pypi_0 pypi
unidecode 1.3.6 pypi_0 pypi
urllib3 2.0.3 pypi_0 pypi
virtualenv 20.23.0 pypi_0 pypi
wcwidth 0.2.6 pypi_0 pypi
werkzeug 2.0.3 pypi_0 pypi
wheel 0.38.4 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xxhash 3.2.0 pypi_0 pypi
xz 5.4.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yapf 0.40.0 pypi_0 pypi
yarl 1.9.2 pypi_0 pypi
zipp 3.15.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

About license

Hi~:

Thanks for the great job!But we did not find an open source license in the project. Is there any content about the license?

训练声码器 的时候报错 CUFFT_INTERNAL_ERROR

CUDA_VISIBLE_DEVICES=0 python kantts/bin/train_hifigan.py --model_config speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/config.yaml --resume_path speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth --root_dir newtest/training_stage/SSB0009_feats --stage_dir newtest/training_stage/SSB0009_hifigan_ckpt

cuFFT error: CUFFT_INTERNAL_ERROR
Traceback (most recent call last):
File "kantts/bin/train_hifigan.py", line 171, in train
trainer.train()
File "/home/dufei/git/KAN-TTS/kantts/train/trainer.py", line 199, in train
self.train_epoch()
File "/home/dufei/git/KAN-TTS/kantts/train/trainer.py", line 207, in train_epoch
self.train_step(batch)
File "/home/dufei/git/KAN-TTS/kantts/train/trainer.py", line 509, in train_step
mel_loss = self.criterion["mel_loss"](y_, y)
File "/root/anaconda3/envs/maasKan1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dufei/git/KAN-TTS/kantts/train/loss.py", line 307, in forward
mel_hat = self.mel_spectrogram(y_hat)
File "/root/anaconda3/envs/maasKan1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dufei/git/KAN-TTS/kantts/utils/audio_torch.py", line 175, in forward
x_stft = torch.stft(x, window=window, **self.stft_params)
File "/root/anaconda3/envs/maasKan1/lib/python3.7/site-packages/torch/functional.py", line 633, in stft
normalized, onesided, return_complex)
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
2023-08-17:16:52:02, INFO [train_hifigan.py:179] Successfully saved checkpoint @ 1steps.

Some questions about training and inferencing

Hello, I have some questions.

  1. If I want to train a model on some other datasets, say, the Nancy Corpus (blizzard challenge 2011 dataset), how can I prepare the data? Is there an example of the "general data" mentioned in the wiki?
  2. Is it possible to control the speed of the speech when inferencing?
    Thank you!

pqmf

hi, is there a config for multiband(pqmf) ?

在声音特征提取的步骤中出错

在进行特征提取的步骤中,程序运行正常,没有报错和warming,但是最终生成的文件少了一个se的文件夹。所使用的tts-autolabel是1.1.7,modelscole是1.8.1。

subprocess.CalledProcessError: Command '['git', 'rev-parse', 'HEAD']' returned non-zero exit status 128.

OS: centos7.9

Python/C++ Version:python3.9 gcc4.8.5

Package Version:pytorch==1.13.1、modelscope==1.5.2、kantts==1.0.0、torchaudio==0.13.1

Model:
speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k

Command:

from modelscope.metainfo import Trainers
from modelscope.trainers import build_trainer
from modelscope.utils.audio.audio_utils import TtsTrainType

pretrained_model_id = 'damo/speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k'

dataset_id = "./output_training_data/"
pretrain_work_dir = "./pretrain_work_dir/"

# 训练信息,用于指定需要训练哪个或哪些模型,这里展示AM和Vocoder模型皆进行训练
# 目前支持训练:TtsTrainType.TRAIN_TYPE_SAMBERT, TtsTrainType.TRAIN_TYPE_VOC
# 训练SAMBERT会以模型最新step作为基础进行finetune
train_info = {
    TtsTrainType.TRAIN_TYPE_SAMBERT: {  # 配置训练AM(sambert)模型
        'train_steps': 202,  # 训练多少个step
        'save_interval_steps': 200,  # 每训练多少个step保存一次checkpoint
        'log_interval': 10  # 每训练多少个step打印一次训练日志
    }
}

# 配置训练参数,指定数据集,临时工作目录和train_info
kwargs = dict(
    model=pretrained_model_id,  # 指定要finetune的模型
    model_revision="v1.0.5",
    work_dir=pretrain_work_dir,  # 指定临时工作目录
    train_dataset=dataset_id,  # 指定数据集id
    train_type=train_info  # 指定要训练类型及参数
)

trainer = build_trainer(Trainers.speech_kantts_trainer,
                        default_args=kwargs)

trainer.train()

ERROR:

(audio) [root@ecs-97b3-0001 /data/audio/kantts]# python train.py
2023-06-20 17:46:00,777 - modelscope - INFO - PyTorch version 1.13.1+cu116 Found.
2023-06-20 17:46:00,778 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-06-20 17:46:00,804 - modelscope - INFO - Loading done! Current index file version is 1.5.2, with md5 2b4346fea97faefdf1f85f3cdc38c819 and a total number of 860 components indexed
2023-06-20 17:46:02,103 - modelscope - INFO - Use user-specified model revision: v1.0.5
2023-06-20 17:46:02,871 - modelscope - INFO - Use user-specified model revision: v1.0.5
2023-06-20 17:46:04,294 - modelscope - INFO - Set workdir to ./pretrain_work_dir/
2023-06-20 17:46:04,555 - modelscope - INFO - load ./output_training_data/
2023-06-20 17:46:04,704 - modelscope - INFO - Use user-specified model revision: v1.0.5
2023-06-20 17:46:05,905 - modelscope - INFO - am_config=./pretrain_work_dir/orig_model/basemodel_16k/sambert/config.yaml voc_config=./pretrain_work_dir/orig_model/basemodel_16k/hifigan/config.yaml
2023-06-20 17:46:05,905 - modelscope - INFO - audio_config=./pretrain_work_dir/orig_model/basemodel_16k/audio_config_se_16k.yaml
2023-06-20 17:46:05,905 - modelscope - INFO - am_ckpts=OrderedDict([(2400000, './pretrain_work_dir/orig_model/basemodel_16k/sambert/ckpt/checkpoint_2400000.pth')])
2023-06-20 17:46:05,905 - modelscope - INFO - voc_ckpts=OrderedDict([(2400000, './pretrain_work_dir/orig_model/basemodel_16k/hifigan/ckpt/checkpoint_2400000.pth')])
2023-06-20 17:46:05,905 - modelscope - INFO - se_path=./pretrain_work_dir/orig_model/se.npy se_model_path=./pretrain_work_dir/orig_model/basemodel_16k/speaker_embedding/se.onnx
2023-06-20 17:46:05,905 - modelscope - INFO - mvn_path=./pretrain_work_dir/orig_model/mvn.npy
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init
fatal: Not a git repository (or any parent up to mount point /data)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last):
  File "/data/audio/kantts/train.py", line 33, in <module>
    trainer.train()
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/modelscope/trainers/audio/tts_trainer.py", line 229, in train
    self.prepare_data()
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/modelscope/trainers/audio/tts_trainer.py", line 205, in prepare_data
    self.audio_data_preprocessor(self.raw_dataset_path, self.data_dir,
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/modelscope/preprocessors/tts.py", line 36, in __call__
    self.do_data_process(data_dir, output_dir, audio_config_path,
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/modelscope/preprocessors/tts.py", line 56, in do_data_process
    process_data(datadir, outputdir, audio_config, speaker_name,
  File "/data/audio/kantts/kantts/preprocess/data_process.py", line 137, in process_data
    config["git_revision_hash"] = get_git_revision_hash()
  File "/data/audio/kantts/kantts/utils/log.py", line 26, in get_git_revision_hash
    return subprocess.check_output(["git", "rev-parse", "HEAD"]).decode("ascii").strip()
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'rev-parse', 'HEAD']' returned non-zero exit status 128.

sambert和hifigan的微调命名规则不统一,推理男声必须说话人为F7

1.按照https://modelscope.cn/models/damo/speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/summary 中的教程,sambert的微调后的model的名字是延续basemodel的steps的,hifigan却没有,是从0开始,但根据log看,hifigan也是根据resume path进行finetune的。

https://modelscope.cn/docs/sambert 中微调sambert和hifigan的train_max_steps设置也不统一,前者加了basemodel的step,后者没有

2.在modelscope开源的模型中,aixiang、zhida、zhishuo等男声模型,在用text_to_wav.py推理时,必须加上--speaker F7,否则音质会很差。一般来说女声是F7,男声是M7

KeyError: 'test_male'

Original Traceback (most recent call last):
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/datasets/dataset.py", line 461, in getitem
ling_data = self.ling_unit.encode_symbol_sequence(ling_txt)
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 226, in encode_symbol_sequence
lfeat_symbol_separate[index].strip(), self._lfeat_type_list[index]
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 293, in encode_sub_unit
sequence = self.encode_speaker_category(this_lfeat_symbol)
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 393, in encode_speaker_category
sequence.append(self._speaker_to_id[this_speaker])
KeyError: 'test_male'
Traceback (most recent call last):
File "kantts/bin/train_sambert.py", line 190, in train
trainer.train()
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/train/trainer.py", line 199, in train
self.train_epoch()
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/train/trainer.py", line 206, in train_epoch
for batch in tqdm(self.train_loader):
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/akira/anaconda3/envs/maas/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/datasets/dataset.py", line 461, in getitem
ling_data = self.ling_unit.encode_symbol_sequence(ling_txt)
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 226, in encode_symbol_sequence
lfeat_symbol_separate[index].strip(), self._lfeat_type_list[index]
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 293, in encode_sub_unit
sequence = self.encode_speaker_category(this_lfeat_symbol)
File "/home/akira/KAN-TTS-1/KAN-TTS/kantts/utils/ling_unit/ling_unit.py", line 393, in encode_speaker_category
sequence.append(self._speaker_to_id[this_speaker])
KeyError: 'test_male',这个问题有人遇见过吗,怎么解决

hifigan结构设计?

Hi,想问一下关于hifigan结构设计上,用到了原始hifigan结构中的transpose_upsamples外,还叠加了nn.Upsample的出发点是什么?这么做的好处是结果更稳定么?
感觉把这两个结构堆叠在一起增加了计算量?

请问ttsfrd包是怎么下的?

无论是直接pip install ttsfrd,还是使用清华、豆瓣、阿里云镜像都会出现
ERROR: Could not find a version that satisfies the requirement ttsfrd (from versions: none)
ERROR: No matching distribution found for ttsfrd

kantts支持多发音人模型吗?

看kantts的训练指引是单发音人的,即使用aishell3的多发音人的数据训练,也是一个一个训练。
在modelscope开源的多发音人模型,内部是多个声学模型组合的,而不是一个模型。

问下kantts是否能一个模型支持多个发音人?
如果不行,具体的原因是什么呢,什么原理导致多发音人会相互干扰?
如果可以,具体怎么操作才可以达到这种效果呢?

run WuuShanghai and Cantonese error in KAN-TTS or ModelScope

python3.8 x86_64 Ubuntu server

run WuuShanghai in basemodel_16k directory in KAN-TTS:
Traceback (most recent call last):
File "kantts/bin/text_to_wav.py", line 218, in
text_to_wav(
File "kantts/bin/text_to_wav.py", line 157, in text_to_wav
am_infer(symbols_file, am_ckpt, output_dir, se_file)
File "/data2/caoyangang/open_project/KAN-TTS/kantts/bin/infer_sambert.py", line 217, in am_infer
mel, mel_post, dur, f0, energy = am_synthesis(
File "/data2/caoyangang/open_project/KAN-TTS/kantts/bin/infer_sambert.py", line 83, in am_synthesis
inputs_ling = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [16] at entry 0 and [78] at entry 1

run WuuShanghai in voices directory in KAN-TTS get the same error as above

but run WuuShanghai in ModelScope is fine.

run Cantonese model in basemodel_16k directory in KAN-TTS:
Traceback (most recent call last):
File "kantts/bin/text_to_wav.py", line 218, in
text_to_wav(
File "kantts/bin/text_to_wav.py", line 157, in text_to_wav
am_infer(symbols_file, am_ckpt, output_dir, se_file)
File "/data2/caoyangang/open_project/KAN-TTS/kantts/bin/infer_sambert.py", line 217, in am_infer
mel, mel_post, dur, f0, energy = am_synthesis(
File "/data2/caoyangang/open_project/KAN-TTS/kantts/bin/infer_sambert.py", line 83, in am_synthesis
inputs_ling = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [16] at entry 0 and [78] at entry 1

run Cantonese model in voices directory in KAN-TTS:
Traceback (most recent call last):
File "kantts/bin/text_to_wav.py", line 218, in
text_to_wav(
File "kantts/bin/text_to_wav.py", line 157, in text_to_wav
am_infer(symbols_file, am_ckpt, output_dir, se_file)
File "/data2/caoyangang/open_project/KAN-TTS/kantts/bin/infer_sambert.py", line 201, in am_infer
fsnet.load_state_dict(state_dict["model"], strict=False)
File "/home/eeodev/anaconda3/envs/kan-tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for KanTtsSAMBERT:
size mismatch for text_encoder.sy_emb.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([147, 512]).
size mismatch for text_encoder.tone_emb.weight: copying a param with shape torch.Size([14, 512]) from checkpoint, the shape in current model is torch.Size([10, 512]).

run Cantonese in ModelScope get the same error as above

bug?

hello, in kantts_sambert.py code, line 967-975.
 LFR_text_inputs = LR_text_outputs.contiguous().view(batch_size, -1, self.mel_decoder.r * text_hid.shape[-1])
 LFR_emo_inputs = LR_emo_outputs.contiguous().view(batch_size, -1, self.mel_decoder.r * emo_hid.shape[-1])[:, :, : emo_hid.shape[-1]]
LFR_spk_inputs = LR_spk_outputs.contiguous().view(batch_size, -1, self.mel_decoder.r * spk_hid.shape[-1])[:, :, : spk_hid.shape[-1]]

if data.size() = [64, 100, 40], r = 3,
data.contiguous().view(64, -1, 3 * 40) error

音频格式转换时出错,TypeError: check_argument_types() missing 1 required positional argument: 'func'

2023-06-19 14:08:44,582 - modelscope - INFO - Use user-specified model revision: v1.0.5
2023-06-19:14:08:44, INFO [api.py:470] Use user-specified model revision: v1.0.5
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████| 1.02G/1.02G [00:33<00:00, 32.5MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████| 6.27k/6.27k [00:00<00:00, 1.16MB/s]
2023-06-19:14:09:30, INFO [auto_label.py:91] ---  New folder /data/audio/ptts_spk0_autolabel/paragraph/prosody...  ---
2023-06-19:14:09:30, INFO [auto_label.py:92] ---  OK  ---
2023-06-19:14:09:30, INFO [auto_label.py:91] ---  New folder /data/audio/ptts_spk0_autolabel/sp_interval...  ---
2023-06-19:14:09:30, INFO [auto_label.py:92] ---  OK  ---
2023-06-19:14:09:30, INFO [auto_label.py:91] ---  New folder /data/audio/ptts_spk0_autolabel/wav...  ---
2023-06-19:14:09:30, INFO [auto_label.py:92] ---  OK  ---
2023-06-19:14:09:30, INFO [auto_label.py:91] ---  New folder /data/audio/ptts_spk0_autolabel/log...  ---
2023-06-19:14:09:30, INFO [auto_label.py:92] ---  OK  ---
2023-06-19:14:09:30, INFO [auto_label.py:301] 2023-06-19 14:09:30
2023-06-19:14:09:30, INFO [auto_label.py:355] wav_preprocess start...
---  new folder...  ---
---  OK  ---
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 128.04it/s]
2023-06-19:14:09:30, INFO [auto_label.py:367] wav cut by vad start...
Traceback (most recent call last):
  File "/data/audio/tran.py", line 8, in <module>
    ret, report = run_auto_label(input_wav = input_wav,
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/modelscope/tools/speech_tts_autolabel.py", line 78, in run_auto_label
    ret_code, report = auto_labeling.run()
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/tts_autolabel/auto_label.py", line 765, in run
    self.wav_cut_by_vad()
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/tts_autolabel/auto_label.py", line 371, in wav_cut_by_vad
    vad_cut(self.resample_wav_dir, self.cut_wav_dir, self.resource_dir)
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/tts_autolabel/audiocut/vad.py", line 55, in vad_cut
    vad_pipeline = Fsmn_vad(vad_model_dir)
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/tts_autolabel/audio2phone/funasr_onnx/vad_bin.py", line 62, in __init__
    self.frontend = WavFrontend(
  File "/data/soft/anaconda3/envs/audio/lib/python3.9/site-packages/tts_autolabel/audio2phone/funasr_onnx/utils/frontend.py", line 32, in __init__
    check_argument_types()
TypeError: check_argument_types() missing 1 required positional argument: 'func'

########################################################################################
使用的代码是:

# 导入run_auto_label工具, 初次运行会下载相关库文件
from modelscope.tools import run_auto_label
# 运行 autolabel进行自动标注,20句音频的自动标注约4分钟
import os 

input_wav = '/mnt/workspace/Data/ptts_spk0_wav' # wav audio path
work_dir = '/mnt/workspace/Data/ptts_spk0_autolabel' # output path
os.makedirs(work_dir, exist_ok=True)

ret, report = run_auto_label(input_wav = input_wav,
                             work_dir = work_dir,
                            resource_revision='v1.0.5')
print(report)

Could not load library libcudnn_cnn_infer.so.8.

(/media/lab-hp/B23AB5DD3AB59F33/condaenv/maas) lab-hp@labhp-HP:~/桌面/KAN-TTS$ python ./kantts/bin/text_to_wav.py --txt test.txt --output_dir res --res_zip speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/resource.zip --am_ckpt speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/sambert/ckpt/checkpoint_980000.pth --voc_ckpt speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/hifigan/ckpt/checkpoint_2000000.pth --speaker xiaoyu
2023-07-24:22:10:22, INFO [text_to_wav.py:97] Converting text to symbols...
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init
2023-07-24:22:10:26, INFO [text_to_wav.py:109] AM is infering...
2023-07-24:22:10:29, INFO [infer_sambert.py:198] Loading checkpoint: speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/basemodel_16k/sambert/ckpt/checkpoint_980000.pth
2023-07-24:22:10:29, INFO [infer_sambert.py:210] Inference sentence: 0_0
Could not load library libcudnn_cnn_infer.so.8. Error: /media/lab-hp/B23AB5DD3AB59F33/condaenv/maas/bin/../lib/libnvrtc.so: undefined symbol: nvrtcGetCUBIN
已放弃 (核心已转储)

请教推理的时候出现Load pinyin_en_mix_dict failed如何避免?

进行语音和合成,经常会出现报错:

Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init

发现是 ttsfrd包加载zip文件引起的

fe = ttsfrd.TtsFrontendEngine()
fe.initialize(resources_dir)

虽然最后输出音频没有问题,但是感觉减慢的推理速度,请教各位大佬如何解决呢?

train_sambert报错

inputs_text_embedding + pitch_embeddings + energy_embeddings
RuntimeError: The size of tensor a (50) must match the size of tensor b (411) at non-singleton dimension 1
2023-07-03:11:07:19 INFO [trainer.py:903] torch.Size([32, 50, 4])
2023-07-03:11:07:19 INFO [trainer.py:904] torch.Size([32, 50])
2023-07-03:11:07:19 INFO [trainer.py:905] torch.Size([32, 50])
torch.Size([32, 50, 32])
torch.Size([32, 411, 32])
torch.Size([32, 411, 32])

运行text_to_wav.py时一个小bug

需要把speech_sambert-hifigan_tts_zhitian_emo_zh-cn_16k/voices/zhitian_emo/audio_config.yaml里的内容拷贝到speech_sambert-hifigan_tts_zhitian_emo_zh-cn_16k/voices/zhitian_emo/voc/config.yaml里,否则会报错

sybert怎么应用、有预训练模型吗?

大神好,这个项目太赞了!不过有一些问题不太明白:

  1. 问一下sybert具体怎么使用的,预期达到怎样的效果(训练过程没有音频数据参与,sybert对发音特征建模不太理解)?
    在kantts看到有sybert的训练模块,但是没有推理模块,也没有应用的模块。

  2. sybert有预训练模型吗?
    尝试加载resource目录的languagedata_embedded.bin,但是貌似这不是pytorch模型。

Need python 3.9 & python 3.10 supports

pip install git+https://github.com/alibaba-damo-academy/KAN-TTS

INFO: pip is looking at multiple versions of kantts to determine which version is compatible with other requirements. This could take a while.
ERROR: Package 'kantts' requires a different Python: 3.9.16 not in '<3.9,>=3.7.0'

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.