Git Product home page Git Product logo

kedreamix / linly-talker Goto Github PK

View Code? Open in Web Editor NEW
569.0 569.0 114.0 59.62 MB

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

Home Page: https://kedreamix.github.io/

License: MIT License

Python 94.37% Shell 0.06% Jupyter Notebook 5.57%

linly-talker's Introduction

Digital Human Intelligent Dialogue System - Linly-Talker — 'Interactive Dialogue with Your Virtual Self'

Linly-Talker WebUI

madewithlove


Open In Colab Licence Huggingface

English | 中文简体

2023.12 Update 📆

Users can upload any images for the conversation

2024.01 Update 📆📆

  • Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.
  • The deployment invocation method for FastAPI has been updated.
  • The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.
  • Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.

2024.02 Update 📆

  • Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.
  • ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).
  • I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.
  • I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.

Content

Introduction

Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) 🤖, Automatic Speech Recognition (ASR) 🎙️, Text-to-Speech (TTS) 🗣️, and voice cloning technology 🎤. This system offers an interactive web interface through the Gradio platform 🌐, allowing users to upload images 📷 and engage in personalized dialogues with AI 💬.

The core features of the system include:

  1. Multi-Model Integration: Linly-Talker combines major models such as Linly, GeminiPro, Qwen, as well as visual models like Whisper, SadTalker, to achieve high-quality dialogues and visual generation.
  2. Multi-Turn Conversational Ability: Through the multi-turn dialogue system powered by GPT models, Linly-Talker can understand and maintain contextually relevant and coherent conversations, significantly enhancing the authenticity of the interaction.
  3. Voice Cloning: Utilizing technologies like GPT-SoVITS, users can upload a one-minute voice sample for fine-tuning, and the system will clone the user's voice, enabling the digital human to converse in the user's voice.
  4. Real-Time Interaction: The system supports real-time speech recognition and video captioning, allowing users to communicate naturally with the digital human via voice.
  5. Visual Enhancement: With digital human generation technologies, Linly-Talker can create realistic digital human avatars, providing a more immersive experience.

The design philosophy of Linly-Talker is to create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication.

The system architecture of multimodal human–computer interaction.

You can watch the demo video here.

TO DO LIST

  • Completed the basic conversation system flow, capable of voice interactions.
  • Integrated the LLM large model, including the usage of Linly, Qwen, and GeminiPro.
  • Enabled the ability to upload any digital person's photo for conversation.
  • Integrated FastAPI invocation for Linly.
  • Utilized Microsoft TTS with advanced options, allowing customization of voice and tone parameters to enhance audio diversity.
  • Added subtitles to video generation for improved visualization.
  • GPT Multi-turn Dialogue System (Enhance the interactivity and realism of digital entities, bolstering their intelligence)
  • Optimized the Gradio interface by incorporating additional models such as Wav2Lip, FunASR, and others.
  • Voice Cloning Technology (Synthesize one's own voice using voice cloning to enhance the realism and interactive experience of digital entities)
  • Real-time Speech Recognition (Enable conversation and communication between humans and digital entities using voice)

🔆 The Linly-Talker project is ongoing - pull requests are welcome! If you have any suggestions regarding new model approaches, research, techniques, or if you discover any runtime errors, please feel free to edit and submit a pull request. You can also open an issue or contact me directly via email. 📩⭐ If you find this repository useful, please give it a star! 🤩

If you encounter any issues during deployment, please consult the Common Issues Summary section, where I have compiled a list of all potential problems. Additionally, a discussion group is available here, and I will provide regular updates. Thank you for your attention and use of Linly-Talker!

Example

文字/语音对话 数字人回答
应对压力最有效的方法是什么?
example_answer1.mp4
如何进行时间管理?
example_answer2.mp4
撰写一篇交响乐音乐会评论,讨论乐团的表演和观众的整体体验。
example_answer3.mp4
翻译成中文:Luck is a dividend of sweat. The more you sweat, the luckier you get.
example_answer4.mp4

Setup Environment

To install the environment using Anaconda and PyTorch, follow the steps below:

conda create -n linly python=3.10
conda activate linly

# PyTorch Installation Method 1: Conda Installation (Recommended)
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

# PyTorch Installation Method 2: Pip Installation
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

If you want to use models like voice cloning, you may need a higher version of PyTorch. However, the functionality will be more diverse. You may need to use CUDA 11.8 as the driver version, which you can choose.

conda create -n linly python=3.10  
conda activate linly

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

# Install dependencies for voice cloning
pip install -r VITS/requirements_gptsovits.txt

Next, you need to install the corresponding models. You can download them using the following methods. Once downloaded, place the files in the specified folder structure (explained at the end of this document).

HuggingFace Download

If the download speed is too slow, consider using a mirror site. For more information, refer to Efficiently Obtain Hugging Face Models Using Mirror Sites.

# Download pre-trained models from Hugging Face
git lfs install
git clone https://huggingface.co/Kedreamix/Linly-Talker

ModelScope Download

# Download pre-trained models from ModelScope
# 1. Git method
git lfs install
git clone https://www.modelscope.cn/Kedreamix/Linly-Talker.git

# 2. Python code download
pip install modelscope
from modelscope import snapshot_download
model_dir = snapshot_download('Kedreamix/Linly-Talker')

Move All Models to the Current Directory

If you downloaded from Baidu Netdisk, you can refer to the directory structure at the end of the document to move the models.

# Move all models to the current directory
# Checkpoints contain SadTalker and Wav2Lip
mv Linly-Talker/checkpoints/* ./checkpoints/

# Enhanced GFPGAN for SadTalker
# pip install gfpgan
# mv Linly-Talker/gfpan ./

# Voice cloning models
mv Linly-Talker/GPT_SoVITS/pretrained_models/* ./GPT_SoVITS/pretrained_models/

# Qwen large model
mv Linly-Talker/Qwen ./

For the convenience of deployment and usage, an configs.py file has been updated. You can modify some hyperparameters in this file for customization:

# Device Running Port
port = 7870

# API Running Port and IP
# Localhost port is 127.0.0.1; for global port forwarding, use "0.0.0.0"
ip = '127.0.0.1'
api_port = 7871

# Linly Model Path
mode = 'api'  # For 'api', Linly-api-fast.py must be run first
mode = 'offline'
model_path = 'Linly-AI/Chinese-LLaMA-2-7B-hf'

# SSL Certificate (required for microphone interaction)
# Preferably an absolute path
ssl_certfile = "./https_cert/cert.pem"
ssl_keyfile = "./https_cert/key.pem"

This file allows you to adjust parameters such as the device running port, API running port, Linly model path, and SSL certificate paths for ease of deployment and configuration.

ASR - Speech Recognition

For detailed information about the usage and code implementation of Automatic Speech Recognition (ASR), please refer to ASR - Bridging the Gap with Digital Humans.

Whisper

To implement ASR (Automatic Speech Recognition) using OpenAI's Whisper, you can refer to the specific usage methods provided in the GitHub repository: https://github.com/openai/whisper

FunASR

The speech recognition performance of Alibaba's FunASR is quite impressive and it is actually better than Whisper in terms of Chinese language. Additionally, FunASR is capable of achieving real-time results, making it a great choice. You can experience FunASR by accessing the FunASR file in the ASR folder. Please refer to https://github.com/alibaba-damo-academy/FunASR for more information.

TTS - Edge TTS

For detailed information about the usage and code implementation of Text-to-Speech (TTS), please refer to TTS - Empowering Digital Humans with Natural Speech Interaction.

To use Microsoft Edge's online text-to-speech service from Python without needing Microsoft Edge or Windows or an API key, you can refer to the GitHub repository at https://github.com/rany2/edge-tts. It provides a Python module called "edge-tts" that allows you to utilize the service. You can find detailed installation instructions and usage examples in the repository's README file.

Voice Clone

For detailed information about the usage and code implementation of Voice Clone, please refer to Voice Clone - Stealing Your Voice Quietly During Conversations.

GPT-SoVITS(Recommend)

Thank you for your open source contribution. I have also found the GPT-SoVITS voice cloning model to be quite impressive. You can find the project at https://github.com/RVC-Boss/GPT-SoVITS.

XTTS

Coqui XTTS is a leading deep learning toolkit for Text-to-Speech (TTS) tasks, allowing for voice cloning and voice transfer to different languages using a 5-second or longer audio clip.

🐸 TTS is a library for advanced text-to-speech generation.

🚀 Over 1100 pre-trained models for various languages.

🛠️ Tools for training new models and fine-tuning existing models in any language.

📚 Utility programs for dataset analysis and management.

THG - Avatar

Detailed information about the usage and code implementation of digital human generation can be found in THG - Building Intelligent Digital Humans.

SadTalker

Digital persona generation can utilize SadTalker (CVPR 2023). For detailed information, please visit https://sadtalker.github.io.

Before usage, download the SadTalker model:

bash scripts/sadtalker_download_models.sh  

Baidu (百度云盘) (Password: linl)

If downloading from Baidu Cloud, remember to place it in the checkpoints folder. The model downloaded from Baidu Cloud is named sadtalker by default, but it should be renamed to checkpoints.

Wav2Lip

Digital persona generation can also utilize Wav2Lip (ACM 2020). For detailed information, refer to https://github.com/Rudrabha/Wav2Lip.

Before usage, download the Wav2Lip model:

Model Description Link to the model
Wav2Lip Highly accurate lip-sync Link
Wav2Lip + GAN Slightly inferior lip-sync, but better visual quality Link
Expert Discriminator Weights of the expert discriminator Link
Visual Quality Discriminator Weights of the visual disc trained in a GAN setup Link

ER-NeRF (Coming Soon)

ER-NeRF (ICCV 2023) is a digital human built using the latest NeRF technology. It allows for the customization of digital characters and can reconstruct them using just a five-minute video of a person. For more details, please refer to https://github.com/Fictionarry/ER-NeRF.

Further updates will be provided regarding this.

LLM - Conversation

For detailed information about the usage and code implementation of Large Language Models (LLM), please refer to LLM - Empowering Digital Humans with Powerful Language Models.

Linly-AI

Linly-AI is a Large Language model developed by CVI at Shenzhen University. You can find more information about Linly-AI on their GitHub repository: https://github.com/CVI-SZU/Linly

Download Linly models: https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

You can use git to download:

git lfs install
git clone https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Linly-AI/Chinese-LLaMA-2-7B-hf --local-dir Linly-AI/Chinese-LLaMA-2-7B-hf

Qwen

Qwen is an AI model developed by Alibaba Cloud. You can check out the GitHub repository for Qwen here: https://github.com/QwenLM/Qwen

If you want to quickly use Qwen, you can choose the 1.8B model, which has fewer parameters and can run smoothly even with limited GPU memory. Of course, this part can be replaced with other options.

You can download the Qwen 1.8B model from this link: https://huggingface.co/Qwen/Qwen-1_8B-Chat

You can use git to download:

git lfs install
git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Qwen/Qwen-1_8B-Chat --local-dir Qwen/Qwen-1_8B-Chat

Gemini-Pro

Gemini-Pro is an AI model developed by Google. To learn more about Gemini-Pro, you can visit their website: https://deepmind.google/technologies/gemini/

If you want to request an API key for Gemini-Pro, you can visit this link: https://makersuite.google.com/

LLM Model Selection

In the app.py file, tailor your model choice with ease.

# Uncomment and set up the model of your choice:

# llm = LLM(mode='offline').init_model('Linly', 'Linly-AI/Chinese-LLaMA-2-7B-hf')
# llm = LLM(mode='offline').init_model('Gemini', 'gemini-pro', api_key = "your api key")
# llm = LLM(mode='offline').init_model('Qwen', 'Qwen/Qwen-1_8B-Chat')

# Manual download with a specific path
llm = LLM(mode=mode).init_model('Qwen', model_path)

Optimizations

Some optimizations:

  • Use fixed input face images, extract features beforehand to avoid reading each time
  • Remove unnecessary libraries to reduce total time
  • Only save final video output, don't save intermediate results to improve performance
  • Use OpenCV to generate final video instead of mimwrite for faster runtime

Gradio

Gradio is a Python library that provides an easy way to deploy machine learning models as interactive web apps.

For Linly-Talker, Gradio serves two main purposes:

  1. Visualization & Demo: Gradio provides a simple web GUI for the model, allowing users to see the results intuitively by uploading an image and entering text. This is an effective way to showcase the capabilities of the system.

  2. User Interaction: The Gradio GUI can serve as a frontend to allow end users to interact with Linly-Talker. Users can upload their own images and ask arbitrary questions or have conversations to get real-time responses. This provides a more natural speech interaction method.

Specifically, we create a Gradio Interface in app.py that takes image and text inputs, calls our function to generate the response video, and displays it in the GUI. This enables browser interaction without needing to build complex frontend.

In summary, Gradio provides visualization and user interaction interfaces for Linly-Talker, serving as effective means for showcasing system capabilities and enabling end users.

Start WebUI

Previously, I had separated many versions, but it became cumbersome to run multiple versions. Therefore, I have added a WebUI feature to provide a single interface for a seamless experience. I will continue to update it in the future.

The current features available in the WebUI are as follows:

  • Text/Voice-based dialogue with virtual characters (fixed characters with male and female roles)
  • Dialogue with virtual characters using any image (you can upload any character image)
  • Multi-turn GPT dialogue (incorporating historical dialogue data to maintain context)
  • Voice cloning dialogue (based on GPT-SoVITS settings for voice cloning, including a built-in smoky voice that can be cloned based on the voice of the dialogue)
# WebUI
python webui.py

There are three modes for the current startup, and you can choose a specific setting based on the scenario.

The first mode involves fixed Q&A with a predefined character, eliminating preprocessing time.

python app.py

The first mode has recently been updated to include the Wav2Lip model for dialogue.

python appv2.py

The second mode allows for conversing with any uploaded image.

python app_img.py

The third mode builds upon the first one by incorporating a large language model for multi-turn GPT conversations.

python app_multi.py

Now, the part of voice cloning has been added, allowing for freely switching between cloned voice models and corresponding person images. Here, I have chosen a deep, smoky voice and an image of a male.

python app_vits.py

Folder structure

The folder structure of the weight files is as follows:

  • Baidu (百度云盘): You can download the weights from here (Password: linl).
  • huggingface: You can access the weights at this link.
  • modelscope: The weights will be available soon at this link.
Linly-Talker/ 
├── checkpoints
│   ├── hub
│   │   └── checkpoints
│   │       └── s3fd-619a316812.pth
│   ├── lipsync_expert.pth
│   ├── mapping_00109-model.pth.tar
│   ├── mapping_00229-model.pth.tar
│   ├── SadTalker_V0.0.2_256.safetensors
│   ├── visual_quality_disc.pth
│   ├── wav2lip_gan.pth
│   └── wav2lip.pth
├── gfpgan
│   └── weights
│       ├── alignment_WFLW_4HG.pth
│       └── detection_Resnet50_Final.pth
├── GPT_SoVITS
│   └── pretrained_models
│       ├── chinese-hubert-base
│       │   ├── config.json
│       │   ├── preprocessor_config.json
│       │   └── pytorch_model.bin
│       ├── chinese-roberta-wwm-ext-large
│       │   ├── config.json
│       │   ├── pytorch_model.bin
│       │   └── tokenizer.json
│       ├── README.md
│       ├── s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
│       ├── s2D488k.pth
│       ├── s2G488k.pth
│       └── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
├── Qwen
│   └── Qwen-1_8B-Chat
│       ├── assets
│       │   ├── logo.jpg
│       │   ├── qwen_tokenizer.png
│       │   ├── react_showcase_001.png
│       │   ├── react_showcase_002.png
│       │   └── wechat.png
│       ├── cache_autogptq_cuda_256.cpp
│       ├── cache_autogptq_cuda_kernel_256.cu
│       ├── config.json
│       ├── configuration_qwen.py
│       ├── cpp_kernels.py
│       ├── examples
│       │   └── react_prompt.md
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00002.safetensors
│       ├── model-00002-of-00002.safetensors
│       ├── modeling_qwen.py
│       ├── model.safetensors.index.json
│       ├── NOTICE
│       ├── qwen_generation_utils.py
│       ├── qwen.tiktoken
│       ├── README.md
│       ├── tokenization_qwen.py
│       └── tokenizer_config.json
└── README.md

Reference

ASR

TTS

LLM

THG

Voice Clone

Star History

Star History Chart

linly-talker's People

Contributors

kaixindelele avatar kedreamix avatar yarkable avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linly-talker's Issues

API

大佬,什么时候能开放API出来哇?(坐等中...

测试下来,发现几个问题,请大佬指点解决。

首先,觉得这个项目挺好,所以才会本地部署起来测试,这是值得肯定的!

其次,先描述本人系统状态:

  1. Lenovo P52 笔记本,64GB 内存,P3200 6GB + 外接 P40 24GB 双显卡
  2. Windows 11 x64,Python 3.10.13,CUDA 11.8, Torch 2.0.1 环境
  3. 采用 Linly-AI-7B 做对话模型
    首先,根据大佬的 requirements_app.txt 列出的依赖项,补充了环境里没有的:
    gradio==3.38.0
    edge-tts>=6.1.9
    openai-whisper
    zhconv
    google-generativeai
    transformers==4.32.0
    其它环境里具备的,按 pip install -r requirements_app.txt 走。

一、直接 Python app.py,成功执行,会有警告:
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Python\Python310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Python\Python310\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
查了网上资料,发现这个与网络连接有关的问题,很常见,但不影响使用,具体原因应该是asyncio库在运行时,没有判别系统平台是Windows还是Linux或别的,都直接调用了asyncio.set_event_loop_policy()类引起的,解决方法可通过加入判断:
if platform.system() == 'Windows':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
之后,就不会再报错。

二、运行过程加载模型的时候,会有提示:
bin C:\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.dll
[2024-01-25 17:08:13,225] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
NOTE: Redirects are currently not supported in Windows or MacOs.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:37<00:00, 18.90s/it]
using safetensor as default
但其实,本人安装的是 windows 版编译的 bitsandbytes,可能是跟某个模型加速的库调用有关,不影响使用。

三、测试 app_img.py,视频合成的最后阶段,报错如下:
{'checkpoint': 'checkpoints\SadTalker_V0.0.2_256.safetensors', 'dir_of_BFM_fitting': 'src/config', 'audio2pose_yaml_path': 'src/config\auido2pose.yaml', 'audio2exp_yaml_path': 'src/config\auido2exp.yaml', 'pirender_yaml_path': 'src/config\facerender_pirender.yaml', 'pirender_checkpoint': 'checkpoints\epoch_00190_iteration_000400000_checkpoint.pt', 'use_safetensor': True, 'mappingnet_checkpoint': 'checkpoints\mapping_00229-model.pth.tar', 'facerender_yaml': 'src/config\facerender.yaml'}
temp\1822631dac470091cee138bad413911fac97da9e\image.png
landmark Det:: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.03it/s]
3DMM Extraction In Video:: 100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.77it/s]
audio2exp:: 100%|███████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 110.95it/s]
Face Renderer:: 100%|██████████████████████████████████████████████████████████████| 123/123 [00:34<00:00, 3.54it/s]
fps: 25 123
ffmpeg error
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_img.py", line 84, in text_response
video = sad_talker.test2(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 279, in test2
return_path = self.animate_from_coeff.generate(data, save_dir, pic_path, crop_info, enhancer='gfpgan' if use_enhancer else None, preprocess=preprocess, img_size=size)
File "D:\AITest\LinlyTalker\src\facerender\animate.py", line 272, in generate
os.remove(path)
FileNotFoundError: [WinError 3] 系统找不到指定的路径。: './results/85200a0a-e6c9-4143-980f-a82b4a8dd3b5\temp_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\first_frame_dir\image_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\input\answer.mp4'

1706240978862

这个可能与大佬传递的系统 path 变量有关,但没找到如何解决,请大佬帮忙分析解决。

四、在使用 app.py 和 app_multi.py 时,想修改默认的头像 example.png 为别的头像,但发现修改脚本里面的 image 路径是不管用的,最后直接删除掉 inputs 目录下的 first_frame_dir 整个目录,执行得到报错信息如下:
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 39, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_multi.py", line 148, in human_respone
video_path = sad_talker.test(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 153, in test
batch = get_data(first_coeff_path, audio_path, self.device, ref_eyeblink_coeff_path=ref_eyeblink_coeff_path, still=still_mode,
File "D:\AITest\LinlyTalker\src\generate_batch.py", line 82, in get_data
source_semantics_dict = scio.loadmat(source_semantics_path)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 225, in loadmat
with _open_file_context(file_name, appendmat) as f:
File "C:\Python\Python310\lib\contextlib.py", line 135, in enter
return next(self.gen)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 17, in _open_file_context
f, opened = _open_file(file_like, appendmat, mode)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 45, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

感觉这个脚本里面哪里被写死了,请大佬指点修改哪里可以实现替换不同默认头像的功能,谢谢!

pip install -r VITS/requirements_gptsovits.txt报错

按照你的说明,安装这个的时候依赖出了问题

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 2
╰─> [63 lines of output]
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Ignoring oldest-supported-numpy: markers 'python_version < "3.9"' don't match your environment
ERROR: Exception:
Traceback (most recent call last):

gr.Error("无克隆环境或者无克隆模型权重,无法克隆声音", e)

事先在单独工作区中训练了GPTsoVITS,然后再将训练好的权重放在了GPT_weights和SoVITS_weights中,然后运行克隆声音时出现以下error:

/Linly-Talker/webui.py", line 114, in LLM_response
gr.Error("无克隆环境或者无克隆模型权重,无法克隆声音", e)
TypeError: Error.init

或许声音克隆这个模块在webui.py中还需要改代码吗?

run app_img.py error!

config.py unchanged.
import gradio as gr
ValueError: Unknown scheme for proxy URL URL('socks://127.0.0.1:7890/')
Looking forward to your letter to resolving this issue.

关于GPT-SoVITS和XTTS,README写的太简单了

GPT-SoVITS和XTTS的配置写的太简单了。
GPT-SoVITS还有一堆包需要下载,还有nltk需要下载配置。
XTTS也是报:没有examples/female.wav、 tts_models--multilingual--multi-dataset--xtts_v2/config.json等错误。

README能否写详细点,或者类似Sadtalker,把调用的模型和存放位置都写一下。

pip install -r VITS/requirements_gptsovits.txt安装报错

按照你的说明,安装这个的时候依赖出了问题
pip install -r VITS/requirements_gptsovits.txt

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 2
╰─> [63 lines of output]
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Ignoring oldest-supported-numpy: markers 'python_version < "3.9"' don't match your environment
ERROR: Exception:
Traceback (most recent call last):

保存视频路径错误

Face Renderer:: 100%之后,提示路径错误,请问是配置的问题吗?

./results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\temp_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4: No such file or directory
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\linly\lib\shutil.py", line 791, in move
os.rename(src, real_dst)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。: '89cf9dcd-0120-4368-8106-ef56ecd5ed86.mp4' -> './results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4'

启动问题

启动后报错了,总是解决不了,请问这是什么原因,该怎么解决?

(linly) D:\Linly-Talker>python app.py
Traceback (most recent call last):
File "D:\Linly-Talker\app.py", line 5, in
from LLM import LLM
File "D:\Linly-Talker\LLM_init_.py", line 1, in
from .Linly import Linly
File "D:\Linly-Talker\LLM\Linly.py", line 2, in
import torch
File "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch_init_.py", line 130, in
raise err
OSError: [WinError 127] 找不到指定的程序。 Error loading "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch\lib\c10_cuda.dll" or one of its dependencies.

镜像

大神, 是不是可以做个更方便的镜像

LLM对话步骤出现错误:“对不起,你的请求出错了,请再次尝试。”

您好,我在使用webui时上传语音对话,识别完成后提交视频时发生了如下的问题。
使用的显卡为4090。

错误部分如下:
extern "C"
launch_bounds(512, 4)
global void reduction_prod_kernel(ReduceJitOp r){
r.run();
}
nvrtc: error: invalid value for --gpu-architecture (-arch)

对不起,你的请求出错了,请再次尝试。
Sorry, your request has encountered an error. Please try again.

函数 predict 运行时间: 3.0960586071014404 秒
函数 LLM_response 运行时间: 3.160871982574463 秒
audio2exp:: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 212.45it/s]Face Renderer:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 92/92 [00:18<00:00, 5.00it/s]fps: 20 183
./results/temp_girl_answer.mp4
函数 Talker_response 运行时间: 22.409300565719604 秒

我的Qwen文件夹结构如下:
image

希望能帮忙解答,谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.