shibing624 / parrots Goto Github PK

View Code? Open in Web Editor NEW

450.0 12.0 83.0 12.53 MB

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成，支持多语言，准确率高

License: Apache License 2.0

Python 100.00%

speech-recognition tts parrot text-to-speech-python3 pinyin2hanzi chinese-speech-recognition chinese-speech-synthesis

parrots's Introduction

🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models

Online Demo

Parrots: ASR and TTS toolkit

Introduction

Parrots, Automatic Speech Recognition(ASR), Text-To-Speech(TTS) toolkit, support Chinese, English, Japanese, etc.

parrots实现了语音识别和语音合成模型一键调用，开箱即用，支持中英文。

Features

ASR：基于distilwhisper实现的中文语音识别（ASR）模型，支持中、英等多种语言
TTS：基于GPT-SoVITS训练的语音合成（TTS）模型，支持中、英、日等多种语言

Install

pip install torch # or conda install pytorch
pip install -r requirements.txt
pip install parrots

pip install torch # or conda install pytorch
git clone https://github.com/shibing624/parrots.git
cd parrots
python setup.py install

Demo

Offical Demo: https://www.mulanai.com/product/tts/
HuggingFace Demo: https://huggingface.co/spaces/shibing624/parrots

run example: examples/tts_gradio_demo.py to see the demo:

python examples/tts_gradio_demo.py

Usage

ASR(Speech Recognition)

example: examples/demo_asr.py

import os
import sys

sys.path.append('..')
from parrots import SpeechRecognition

pwd_path = os.path.abspath(os.path.dirname(__file__))

if __name__ == '__main__':
    m = SpeechRecognition()
    r = m.recognize_speech_from_file(os.path.join(pwd_path, 'tushuguan.wav'))
    print('[提示] 语音识别结果：', r)

output:

{'text': '北京图书馆'}

TTS(Speech Synthesis)

example: examples/demo_tts.py

import sys
sys.path.append('..')
import parrots
from parrots import TextToSpeech
parrots_path = parrots.__path__[0]
sys.path.append(parrots_path)

m = TextToSpeech(
    speaker_model_path="shibing624/parrots-gpt-sovits-speaker-maimai",
    speaker_name="MaiMai",
)
m.predict(
    text="你好，欢迎来北京。welcome to the city.",
    text_language="auto",
    output_path="output_audio.wav"
)

output:

Save audio to output_audio.wav

命令行模式（CLI）

支持通过命令行方式执行ARS和TTS任务，代码：cli.py

> parrots -h                                    

NAME
    parrots

SYNOPSIS
    parrots COMMAND

COMMANDS
    COMMAND is one of the following:

     asr
       Entry point of asr, recognize speech from file

     tts
       Entry point of tts, generate speech audio from text

run：

pip install parrots -U
# asr example
parrots asr -h
parrots asr examples/tushuguan.wav

# tts example
parrots tts -h
parrots tts "你好，欢迎来北京。welcome to the city." output_audio.wav

asr、tts是二级命令，asr是语音识别，tts是语音合成，默认使用的模型是中文模型
各二级命令使用方法见parrots asr -h
上面示例中examples/tushuguan.wav是asr方法的audio_file_path参数，输入的音频文件（required）

Release Models

ASR

BELLE-2/Belle-distilwhisper-large-v2-zh

TTS

shibing624/parrots-gpt-sovits-speaker

speaker name	说话人名	character	角色特点	language	语言
KuileBlanc	葵·勒布朗	lady	标准美式女声	en	英
LongShouRen	龙守仁	gentleman	标准美式男声	en	英
MaiMai	卖卖	singing female anchor	唱歌女主播声	zh	中
XingTong	星瞳	singing ai girl	活泼女声	zh	中
XuanShen	炫神	game male anchor	游戏男主播声	zh	中
KusanagiNene	草薙寧々	loli	萝莉女学生声	ja	日

shibing624/parrots-gpt-sovits-speaker-maimai

speaker name	说话人名	character	角色特点	language	语言
MaiMai	卖卖	singing female anchor	唱歌女主播声	zh	中

Contact

Issue(建议)：
邮件我：xuming: [email protected]
微信我：加我微信号：xuming624, 进Python-NLP交流群，备注：姓名-公司名-NLP

Citation

如果你在研究中使用了parrots，请按如下格式引用：

@misc{parrots,
  title={parrots: ASR and TTS Tool},
  author={Ming Xu},
  year={2024},
  howpublished={\url{https://github.com/shibing624/parrots}},
}

License

授权协议为 The Apache License 2.0，可免费用做商业用途。请在产品说明中附加parrots的链接和授权协议。

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python -m pytest来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

ASR(Speech Recognition)

TTS(Speech Synthesis)

parrots's People

Contributors

Stargazers

Watchers

parrots's Issues

请问老师,这个库性能如何啊能支持多少并发呢? 若做成websocket方式流传入该如何做啊? 谢谢!

如题感谢大佬!

AttributeError: Can't get attribute 'HParams' on <module 'utils' from 'C:\\..\\Python39\\lib\\site-packages\\utils\\init.py'>

我根據您的操作步驟出現Can't get attribute 'HParams'的問題
我也確定 examples/tts_gradio_demo.py 有加上 sys.path.append(parrots_path)

python examples/tts_gradio_demo.py

2024-05-20 15:17:29.265 | DEBUG | parrots.tts:init:305 - Use device: cuda
C:\Users\88692\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
2024-05-20 15:17:33.210 | INFO | parrots.tts:init:319 - Load pretrained parrots speaker: shibing624/parrots-gpt-sovits-speaker-maimai
Fetching 6 files: 100%|█████████████████████████████| 6/6 [00:00<?, ?it/s]
2024-05-20 15:17:33.465 | DEBUG | parrots.tts:init:332 - Reference speaker config: {'reference_wav': 'ref.wav', 'speaker': 'MaiMai', 'character': 'singing female anchor', 'reference_language': 'zh', 'reference_prompt': '那我们，唠也唠了这么久了唠了有十几分钟了我们要不来唱唱，唱唱歌，想听什么，今天想听什么。'}, loaded from C:\Users\88692.cache\huggingface\hub\models--shibing624--parrots-gpt-sovits-speaker-maimai\snapshots\369f6de40db8590be8eb1627d7f55fbbdb4fa63b\MaiMai\config.json
Traceback (most recent call last):
File "C:\Users\88692\Desktop\code\myself\parrots\test.py", line 8, in
m = TextToSpeech(
File "C:\Users\88692\Desktop\code\myself\parrots\parrots\tts.py", line 342, in init
sovits_dict = torch.load(sovits_model_path, map_location="cpu")
File "C:\Users\88692\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 1026, in load
return _load(opened_zipfile,
File "C:\Users\88692\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 1438, in _load
result = unpickler.load()
File "C:\Users\88692\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 1431, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'HParams' on <module 'utils' from 'C:\Users\88692\AppData\Local\Programs\Python\Python39\lib\site-packages\utils\init.py'>

TypeError: _sanitize_parameters() got an unexpected keyword argument 'low_cpu_mem_usage'

Describe the Question

Please provide a clear and concise description of what the que---------------------------------------------------------------------------

运行例子时遇到了这个错误，应该如何解决

TypeError Traceback (most recent call last)
Cell In[1], line 5
1 from parrots import SpeechRecognition
4 if name == 'main':
----> 5 m = SpeechRecognition('/app/pretrained_models/Belle-distilwhisper-large-v2-zh', low_cpu_mem_usage=False)
6 r = m.recognize_speech_from_file('./output.wav')
7 print('[提示] 语音识别结果：', r)

File /app/project/ASR-TTS/parrots/asr.py:81, in SpeechRecognition.init(self, model_name_or_path, use_cuda, cuda_device, max_new_tokens, chunk_length_s, batch_size, torch_dtype, use_flash_attention_2, language, **kwargs)
78 self.model.to(self.device)
80 self.processor = AutoProcessor.from_pretrained(model_name_or_path)
---> 81 self.pipe = pipeline(
82 "automatic-speech-recognition",
83 model=self.model,
84 tokenizer=self.processor.tokenizer,
85 feature_extractor=self.processor.feature_extractor,
86 device=self.device,
87 torch_dtype=torch_dtype,
88 max_new_tokens=max_new_tokens,
89 batch_size=batch_size,
90 chunk_length_s=chunk_length_s,
91 **kwargs
92 )
93 if language == 'zh':
94 self.pipe.model.config.forced_decoder_ids = (
95 self.pipe.tokenizer.get_decoder_prompt_ids(
96 language=language,
97 task="transcribe"
98 )
99 )

File ~/miniconda3/envs/speech_recognition/lib/python3.9/site-packages/transformers/pipelines/init.py:1108, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
1105 if device is not None:
1106 kwargs["device"] = device
-> 1108 return pipeline_class(model=model, framework=framework, task=task, **kwargs)

File ~/miniconda3/envs/speech_recognition/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py:220, in AutomaticSpeechRecognitionPipeline.init(self, model, feature_extractor, tokenizer, decoder, device, torch_dtype, **kwargs)
217 else:
218 self.type = "ctc"
--> 220 super().init(model, tokenizer, feature_extractor, device=device, torch_dtype=torch_dtype, **kwargs)

File ~/miniconda3/envs/speech_recognition/lib/python3.9/site-packages/transformers/pipelines/base.py:894, in Pipeline.init(self, model, tokenizer, feature_extractor, image_processor, modelcard, framework, task, args_parser, device, torch_dtype, binary_output, **kwargs)
892 self._batch_size = kwargs.pop("batch_size", None)
893 self._num_workers = kwargs.pop("num_workers", None)
--> 894 self._preprocess_params, self._forward_params, self._postprocess_params = self._sanitize_parameters(**kwargs)
896 # Pipelines calling generate: if the tokenizer has a pad token but the model doesn't, set it in the
897 # forward params so that generate is aware of the pad token.
898 if (
899 self.tokenizer is not None
900 and self.model.can_generate()
901 and self.tokenizer.pad_token_id is not None
902 and self.model.generation_config.pad_token_id is None
903 ):

TypeError: _sanitize_parameters() got an unexpected keyword argument 'low_cpu_mem_usage'stion is.

parrots pip安装后，使用命令打开一直卡住

运行web 页面也是卡着不动

语音转文字识别率低

环境：
Windows 10 专业版

问题：
安装环境之后，使用example中存在的例子和个人素材进行demo：

example ：

个人素材也是同样识别出第一个音，后面就没有了。

目的：
想请教大佬们，目前转化的准确率是存在问题，后面能进一步提高嘛？

AttributeError

import parrots
text = parrots.speech_recognition_from_file('./16k.wav')
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'parrots' has no attribute 'speech_recognition_from_file'

How to solve this problem? Thanks.

这tts是需要联网在线服务吗

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

运行官方示例，没有声音输出

最近一直想如何将文字转换为语音，找到这个项目。首先感谢作者的付出，但是我使用的时候有点问题。

测试代码为：

import sys

sys.path.append('..')
from parrots import TextToSpeech

if __name__ == '__main__':
    m = TextToSpeech()
    # say text
    m.speak('北京图书馆')

输出为：

2023-03-08 21:39:16.605 | DEBUG    | parrots.tts:speak:66 - ['bei3', 'jing1', 'tu2', 'shu1', 'guan3']

但是没有声音播放。在windows平台，测试其它的文本转语音项目，可以输出声音。

tf版本？

tf什么版本的

无法下载

I checked to make sure that this is not a duplicate issue

Describe the solution you'd like

A clear and concise description of what you want to happen.
https://huggingface.co/spaces/shibing624/parrots 无法打开
建议在国内网盘设置一个下载点

说话方式太机械化了

从试用体验来看，当面的文字转语音太机械化了，基本是按照相同的时间间隔来吐词。大佬有没有考虑利用深度学习技术使得语气更加的拟人化？

requirements.txt 确实 keras

配合 tensorflow==1.13.1 需要使用的keras版本号是？

How do I train my model? Didn't see the script to train my own model?

Architectural description of parrots and how to train in english or any other language

keras库版本?

请问这个报错可以怎么解决啊？是我的keras库版本太低？还是？
报错信息：
Traceback (most recent call last):
File "paddle_asr.py", line 25, in
test_parrots("/data/wav_ocr/2022103000000012/")
File "paddle_asr.py", line 22, in test_parrots
r = m.recognize_speech_from_file(input_path+wav)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 197, in recognize_speech_from_file
return self.recognize_speech(signal, fs)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 178, in recognize_speech
self.check_initialized()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 66, in check_initialized
self.initialize()
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/parrots/asr.py", line 53, in initialize
self._model.load_weights(self.model_path)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1516, in load_weights
saving.load_weights_from_hdf5_group(f, self.layers)
File "/root/anaconda3/envs/noise_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/saving.py", line 772, in load_weights_from_hdf5_group
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'
简单代码调用用来做性能测试：
from parrots import SpeechRecognition, Pinyin2Hanzi
import time
start_time=time.time()
#m = SpeechRecognition()
#n = Pinyin2Hanzi()
def test_parrots(input_path):
m = SpeechRecognition()
n = Pinyin2Hanzi()
for wav in os.listdir(input_path):
if wav.endswith(".wav"):
r = m.recognize_speech_from_file(input_path+wav)
text = n.pinyin_2_hanzi(r)
print("parrots-ocr-finished")
test_parrots("/data/wav_ocr/2022103000000012/")
end_time=time.time()
print(end_time-start_time)

大佬您好，关于显存释放问题

大佬您好，执行完m.predict(）推理以后，显存是不会立马释放掉，请问如何可以释放掉显存占用。

distil-whisper 中文支持？效果能用？

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

pretrained model

Great job!Thanks for your sharing!Where is the pretraind model?And the syllables.zip file can not be available.Looking forward to your reply!

你好，你能提供mapping.json文件嘛？

大神你好，你能提供mapping.json文件嘛？我想学习一下。

Can't get attribute 'HParams' on <module 'utils'

args: Namespace(speaker_model='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai', device='cpu', half=False, text='你好，欢迎来北京。welcome to the city.', lang='auto', output_path='output_audio.wav')
2024-03-17 01:11:46.818 | DEBUG | parrots.tts:init:302 - Use device: cpu
2024-03-17 01:11:49.862 | INFO | parrots.tts:init:316 - Load pretrained parrots speaker: shibing624/parrots-gpt-sovits-speaker-maimai
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 62601.55it/s]
2024-03-17 01:11:50.537 | DEBUG | parrots.tts:init:329 - Reference speaker config: {'reference_wav': 'ref.wav', 'speaker': 'MaiMai', 'character': 'singing female anchor', 'reference_language': 'zh', 'reference_prompt': '那我们，唠也唠了这么久了唠了有十几分钟了我们要不来唱唱，唱唱歌，想听什么，今天想听什么。'}, loaded from /Users/kevinlinpr/.cache/huggingface/hub/models--shibing624--parrots-gpt-sovits-speaker-maimai/snapshots/369f6de40db8590be8eb1627d7f55fbbdb4fa63b/MaiMai/config.json
Traceback (most recent call last):
File "/Users/kevinlinpr/AI-Waifu-Vtuber/parrots_test.py", line 23, in
m = TextToSpeech(
^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/parrots/tts.py", line 339, in init
sovits_dict = torch.load(sovits_model_path, map_location="cpu")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/serialization.py", line 1026, in load
return _load(opened_zipfile,
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/serialization.py", line 1438, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/serialization.py", line 1431, in find_class
return super().find_class(mod_name, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'HParams' on <module 'utils' (<_frozen_importlib_external.NamespaceLoader object at 0x2d761c4d0>)>

关于直接使用默认模型生成效果问题

Describe the Question

Please provide a clear and concise description of what the question is.

群主您好，我用单独再训练的模型放进去，效果跟gpt-sovits的那个基本是一样的，效果也非常好。
但有个问题是，我在测试gpt-sovits这个默认模型的时候，gpt-sovits的效果很好，
但是在parrots引入用gpt-sovits的默认模型，效果却不一样，不知道是否是代码还需要在哪里完善呢？
sovits_model_path 和gpt_model_path都是默认模型地址
m = TextToSpeech(
bert_model_path = pwd_path+"/models/gpts_pretrained_models/chinese-roberta-wwm-ext-large",
hubert_model_path = pwd_path+"/models/gpts_pretrained_models/chinese-hubert-base",
sovits_model_path = sovits_model_path,
gpt_model_path = gpt_model_path,
speaker_model_path = "usermodels",
speaker_name = "{username}".format(username=username),
device = 'cuda',
half = True,
)

下载问题

rying to resume download...
pytorch_model.bin: 19%|██████████ | 126M/651M [01:30<05:12, 1.68MB/s]
pytorch_model.bin: 26%|█████████████▋ | 168M/651M [01:33<22:34, 357kB/s]
请问这个可以提起下载吗，怎么操作呢

安装时 keras 导入包报错

ImportError: cannot import name 'Adam' from 'keras.optimizers'

module 'tensorflow' has no attribute 'get_default_graph'

"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe" "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 57254 --file C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py
pydev debugger: process 64336 is connecting

Connected to pydev debugger (build 201.7846.77)
2020-07-10 18:18:36.025768: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\pinyin_hanzi_dict.txt, size: 1421
2020-07-10 18:18:42,676 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\char_idx.txt, size: 5832
2020-07-10 18:18:43,380 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\word_idx.txt, size: 568646
2020-07-10 18:18:43,630 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\pinyin2hanzi.py - DEBUG - Loaded: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\data\pinyin2hanzi\dic_pinyin.txt, size: 96117
Backend TkAgg is interactive backend. Turning interactive mode on.
2020-07-10 18:18:46.081700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-10 18:18:47.311791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.312512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.357397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.396445: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.404674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.411615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.458602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.689152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.690095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:47.691017: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-10 18:18:47.701188: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20fce644d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:47.701708: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-10 18:18:47.702573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-07-10 18:18:47.703142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 18:18:47.703421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 18:18:47.703701: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 18:18:47.703979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 18:18:47.704255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 18:18:47.704539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 18:18:47.704827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 18:18:47.705628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 18:18:48.633762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-10 18:18:48.634075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-07-10 18:18:48.634244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-07-10 18:18:48.635184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4602 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-07-10 18:18:48.639308: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20f87a4a760 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-10 18:18:48.639690: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2020-07-10 18:18:49,452 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading pinyin dict cost 0.016 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Loading model cost 0.063 seconds.
2020-07-10 18:18:49,514 - C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py - DEBUG - Speech recognition model has been built ok.
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/16413/Documents/GitHub/LostXmas/seq2seq/data/mining/SpeechRec/sr.py", line 4, in <module>
    text = parrots.recognize_speech_from_file('voice.wav')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 203, in recognize_speech_from_file
    return self.recognize_speech(signal, fs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 184, in recognize_speech
    self.check_initialized()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 69, in check_initialized
    self.initialize()
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\parrots\speech_recognition.py", line 64, in initialize
    self.graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Process finished with exit code 1

调用m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai') 报错

第一步调用就报错了,我的pytorch版本是2.2.1+cu121, 是不是太高了?

Cell In[3], line 1
----> 1 m = TextToSpeech(speaker_model_path='shibing624/parrots-gpt-sovits-speaker-maimai', speaker_name='MaiMai')

File e:\bomb\proj\python\BarkVoice\parrots\tts.py:342, in TextToSpeech.init(self, bert_model_path, hubert_model_path, sovits_model_path, gpt_model_path, speaker_model_path, speaker_name, device, half)
339 raise ValueError("sovits_model_path, gpt_model_path or speaker_model_path must be provided")
341 # SoVITS
--> 342 sovits_dict = torch.load(sovits_model_path, map_location="cpu")
343 hps = DictToAttrRecursive(sovits_dict["config"])
344 logger.debug(f"SoVITS config: {hps}")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1026, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1024 except RuntimeError as e:
1025 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1026 return _load(opened_zipfile,
1027 map_location,
1028 pickle_module,
1029 overall_storage=overall_storage,
1030 **pickle_load_args)
1031 if mmap:
1032 raise RuntimeError("mmap can only be used with files saved with "
1033 "`torch.save(_use_new_zipfile_serialization=True), "
1034 "please torch.save your checkpoint with this option in order to use mmap.")

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1438, in _load(zip_file, map_location, pickle_module, pickle_file, overall_storage, **pickle_load_args)
1436 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
1437 unpickler.persistent_load = persistent_load
-> 1438 result = unpickler.load()
1440 torch._utils._validate_loaded_sparse_tensors()
1441 torch._C._log_api_usage_metadata(
1442 "torch.load.metadata", {"serialization_id": zip_file.serialization_id()}
1443 )

File d:\CondaEnv\envs\normal\lib\site-packages\torch\serialization.py:1431, in _load..UnpicklerWrapper.find_class(self, mod_name, name)
1429 pass
1430 mod_name = load_module_mapping.get(mod_name, mod_name)
-> 1431 return super().find_class(mod_name, name)

ModuleNotFoundError: No module named 'utils'

Another audio file input error

ValueError: could not broadcast input array from shape (91597,200,1) into shape (1600,200,1)

shibing624 / parrots Goto Github PK

parrots's Introduction

Parrots: ASR and TTS toolkit

Introduction

Features

Install

Demo

Usage

ASR(Speech Recognition)

TTS(Speech Synthesis)

命令行模式（CLI）

Release Models

ASR

TTS

Contact

Citation

License

Contribute

Reference

ASR(Speech Recognition)

TTS(Speech Synthesis)

parrots's People

Contributors

Stargazers

Watchers

Forkers

parrots's Issues

Describe the Question

Describe the bug

Describe the solution you'd like

Describe the bug

Describe the Question

Recommend Projects

Recommend Topics

Recommend Org