kedreamix / linly-talker Goto Github PK

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

Home Page: https://kedreamix.github.io/

License: MIT License

Python 94.37% Shell 0.06% Jupyter Notebook 5.57%

linly-talker's Introduction

Digital Human Intelligent Dialogue System - Linly-Talker — 'Interactive Dialogue with Your Virtual Self'

Linly-Talker WebUI

English | 中文简体

2023.12 Update 📆

Users can upload any images for the conversation

2024.01 Update 📆📆

Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.
The deployment invocation method for FastAPI has been updated.
The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.
Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.

2024.02 Update 📆

Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.
ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).
I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.
I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.

Content

Digital Avatar Conversational System - Linly-Talker —— "Digital Persona Interaction: Interact with Your Virtual Self”

Introduction

Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) 🤖, Automatic Speech Recognition (ASR) 🎙️, Text-to-Speech (TTS) 🗣️, and voice cloning technology 🎤. This system offers an interactive web interface through the Gradio platform 🌐, allowing users to upload images 📷 and engage in personalized dialogues with AI 💬.

The core features of the system include:

Multi-Model Integration: Linly-Talker combines major models such as Linly, GeminiPro, Qwen, as well as visual models like Whisper, SadTalker, to achieve high-quality dialogues and visual generation.
Multi-Turn Conversational Ability: Through the multi-turn dialogue system powered by GPT models, Linly-Talker can understand and maintain contextually relevant and coherent conversations, significantly enhancing the authenticity of the interaction.
Voice Cloning: Utilizing technologies like GPT-SoVITS, users can upload a one-minute voice sample for fine-tuning, and the system will clone the user's voice, enabling the digital human to converse in the user's voice.
Real-Time Interaction: The system supports real-time speech recognition and video captioning, allowing users to communicate naturally with the digital human via voice.
Visual Enhancement: With digital human generation technologies, Linly-Talker can create realistic digital human avatars, providing a more immersive experience.

The design philosophy of Linly-Talker is to create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication.

You can watch the demo video here.

TO DO LIST

🔆 The Linly-Talker project is ongoing - pull requests are welcome! If you have any suggestions regarding new model approaches, research, techniques, or if you discover any runtime errors, please feel free to edit and submit a pull request. You can also open an issue or contact me directly via email. 📩⭐ If you find this repository useful, please give it a star! 🤩

If you encounter any issues during deployment, please consult the Common Issues Summary section, where I have compiled a list of all potential problems. Additionally, a discussion group is available here, and I will provide regular updates. Thank you for your attention and use of Linly-Talker!

Example

文字/语音对话	数字人回答
应对压力最有效的方法是什么？	example_answer1.mp4
如何进行时间管理？	example_answer2.mp4
撰写一篇交响乐音乐会评论，讨论乐团的表演和观众的整体体验。	example_answer3.mp4
翻译成中文：Luck is a dividend of sweat. The more you sweat, the luckier you get.	example_answer4.mp4

Setup Environment

To install the environment using Anaconda and PyTorch, follow the steps below:

conda create -n linly python=3.10
conda activate linly

# PyTorch Installation Method 1: Conda Installation (Recommended)
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

# PyTorch Installation Method 2: Pip Installation
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

If you want to use models like voice cloning, you may need a higher version of PyTorch. However, the functionality will be more diverse. You may need to use CUDA 11.8 as the driver version, which you can choose.

conda create -n linly python=3.10  
conda activate linly

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

# Install dependencies for voice cloning
pip install -r VITS/requirements_gptsovits.txt

Next, you need to install the corresponding models. You can download them using the following methods. Once downloaded, place the files in the specified folder structure (explained at the end of this document).

HuggingFace Download

If the download speed is too slow, consider using a mirror site. For more information, refer to Efficiently Obtain Hugging Face Models Using Mirror Sites.

# Download pre-trained models from Hugging Face
git lfs install
git clone https://huggingface.co/Kedreamix/Linly-Talker

ModelScope Download

# Download pre-trained models from ModelScope
# 1. Git method
git lfs install
git clone https://www.modelscope.cn/Kedreamix/Linly-Talker.git

# 2. Python code download
pip install modelscope
from modelscope import snapshot_download
model_dir = snapshot_download('Kedreamix/Linly-Talker')

Move All Models to the Current Directory

If you downloaded from Baidu Netdisk, you can refer to the directory structure at the end of the document to move the models.

# Move all models to the current directory
# Checkpoints contain SadTalker and Wav2Lip
mv Linly-Talker/checkpoints/* ./checkpoints/

# Enhanced GFPGAN for SadTalker
# pip install gfpgan
# mv Linly-Talker/gfpan ./

# Voice cloning models
mv Linly-Talker/GPT_SoVITS/pretrained_models/* ./GPT_SoVITS/pretrained_models/

# Qwen large model
mv Linly-Talker/Qwen ./

For the convenience of deployment and usage, an configs.py file has been updated. You can modify some hyperparameters in this file for customization:

# Device Running Port
port = 7870

# API Running Port and IP
# Localhost port is 127.0.0.1; for global port forwarding, use "0.0.0.0"
ip = '127.0.0.1'
api_port = 7871

# Linly Model Path
mode = 'api'  # For 'api', Linly-api-fast.py must be run first
mode = 'offline'
model_path = 'Linly-AI/Chinese-LLaMA-2-7B-hf'

# SSL Certificate (required for microphone interaction)
# Preferably an absolute path
ssl_certfile = "./https_cert/cert.pem"
ssl_keyfile = "./https_cert/key.pem"

This file allows you to adjust parameters such as the device running port, API running port, Linly model path, and SSL certificate paths for ease of deployment and configuration.

ASR - Speech Recognition

For detailed information about the usage and code implementation of Automatic Speech Recognition (ASR), please refer to ASR - Bridging the Gap with Digital Humans.

Whisper

To implement ASR (Automatic Speech Recognition) using OpenAI's Whisper, you can refer to the specific usage methods provided in the GitHub repository: https://github.com/openai/whisper

FunASR

The speech recognition performance of Alibaba's FunASR is quite impressive and it is actually better than Whisper in terms of Chinese language. Additionally, FunASR is capable of achieving real-time results, making it a great choice. You can experience FunASR by accessing the FunASR file in the ASR folder. Please refer to https://github.com/alibaba-damo-academy/FunASR for more information.

TTS - Edge TTS

For detailed information about the usage and code implementation of Text-to-Speech (TTS), please refer to TTS - Empowering Digital Humans with Natural Speech Interaction.

To use Microsoft Edge's online text-to-speech service from Python without needing Microsoft Edge or Windows or an API key, you can refer to the GitHub repository at https://github.com/rany2/edge-tts. It provides a Python module called "edge-tts" that allows you to utilize the service. You can find detailed installation instructions and usage examples in the repository's README file.

Voice Clone

For detailed information about the usage and code implementation of Voice Clone, please refer to Voice Clone - Stealing Your Voice Quietly During Conversations.

GPT-SoVITS（Recommend）

Thank you for your open source contribution. I have also found the GPT-SoVITS voice cloning model to be quite impressive. You can find the project at https://github.com/RVC-Boss/GPT-SoVITS.

XTTS

Coqui XTTS is a leading deep learning toolkit for Text-to-Speech (TTS) tasks, allowing for voice cloning and voice transfer to different languages using a 5-second or longer audio clip.

🐸 TTS is a library for advanced text-to-speech generation.

🚀 Over 1100 pre-trained models for various languages.

🛠️ Tools for training new models and fine-tuning existing models in any language.

📚 Utility programs for dataset analysis and management.

Experience XTTS online https://huggingface.co/spaces/coqui/xtts
Official GitHub repository: https://github.com/coqui-ai/TTS

THG - Avatar

Detailed information about the usage and code implementation of digital human generation can be found in THG - Building Intelligent Digital Humans.

SadTalker

Digital persona generation can utilize SadTalker (CVPR 2023). For detailed information, please visit https://sadtalker.github.io.

Before usage, download the SadTalker model:

bash scripts/sadtalker_download_models.sh

Baidu (百度云盘) (Password: linl)

If downloading from Baidu Cloud, remember to place it in the checkpoints folder. The model downloaded from Baidu Cloud is named sadtalker by default, but it should be renamed to checkpoints.

Wav2Lip

Digital persona generation can also utilize Wav2Lip (ACM 2020). For detailed information, refer to https://github.com/Rudrabha/Wav2Lip.

Before usage, download the Wav2Lip model:

Model	Description	Link to the model
Wav2Lip	Highly accurate lip-sync	Link
Wav2Lip + GAN	Slightly inferior lip-sync, but better visual quality	Link
Expert Discriminator	Weights of the expert discriminator	Link
Visual Quality Discriminator	Weights of the visual disc trained in a GAN setup	Link

ER-NeRF (Coming Soon)

ER-NeRF (ICCV 2023) is a digital human built using the latest NeRF technology. It allows for the customization of digital characters and can reconstruct them using just a five-minute video of a person. For more details, please refer to https://github.com/Fictionarry/ER-NeRF.

Further updates will be provided regarding this.

LLM - Conversation

For detailed information about the usage and code implementation of Large Language Models (LLM), please refer to LLM - Empowering Digital Humans with Powerful Language Models.

Linly-AI

Linly-AI is a Large Language model developed by CVI at Shenzhen University. You can find more information about Linly-AI on their GitHub repository: https://github.com/CVI-SZU/Linly

Download Linly models: https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

You can use git to download:

git lfs install
git clone https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Linly-AI/Chinese-LLaMA-2-7B-hf --local-dir Linly-AI/Chinese-LLaMA-2-7B-hf

Qwen

Qwen is an AI model developed by Alibaba Cloud. You can check out the GitHub repository for Qwen here: https://github.com/QwenLM/Qwen

If you want to quickly use Qwen, you can choose the 1.8B model, which has fewer parameters and can run smoothly even with limited GPU memory. Of course, this part can be replaced with other options.

You can download the Qwen 1.8B model from this link: https://huggingface.co/Qwen/Qwen-1_8B-Chat

You can use git to download:

git lfs install
git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Qwen/Qwen-1_8B-Chat --local-dir Qwen/Qwen-1_8B-Chat

Gemini-Pro

Gemini-Pro is an AI model developed by Google. To learn more about Gemini-Pro, you can visit their website: https://deepmind.google/technologies/gemini/

If you want to request an API key for Gemini-Pro, you can visit this link: https://makersuite.google.com/

LLM Model Selection

In the app.py file, tailor your model choice with ease.

# Uncomment and set up the model of your choice:

# llm = LLM(mode='offline').init_model('Linly', 'Linly-AI/Chinese-LLaMA-2-7B-hf')
# llm = LLM(mode='offline').init_model('Gemini', 'gemini-pro', api_key = "your api key")
# llm = LLM(mode='offline').init_model('Qwen', 'Qwen/Qwen-1_8B-Chat')

# Manual download with a specific path
llm = LLM(mode=mode).init_model('Qwen', model_path)

Optimizations

Some optimizations:

Use fixed input face images, extract features beforehand to avoid reading each time
Remove unnecessary libraries to reduce total time
Only save final video output, don't save intermediate results to improve performance
Use OpenCV to generate final video instead of mimwrite for faster runtime

Gradio

Gradio is a Python library that provides an easy way to deploy machine learning models as interactive web apps.

For Linly-Talker, Gradio serves two main purposes:

Visualization & Demo: Gradio provides a simple web GUI for the model, allowing users to see the results intuitively by uploading an image and entering text. This is an effective way to showcase the capabilities of the system.
User Interaction: The Gradio GUI can serve as a frontend to allow end users to interact with Linly-Talker. Users can upload their own images and ask arbitrary questions or have conversations to get real-time responses. This provides a more natural speech interaction method.

Specifically, we create a Gradio Interface in app.py that takes image and text inputs, calls our function to generate the response video, and displays it in the GUI. This enables browser interaction without needing to build complex frontend.

In summary, Gradio provides visualization and user interaction interfaces for Linly-Talker, serving as effective means for showcasing system capabilities and enabling end users.

Start WebUI

Previously, I had separated many versions, but it became cumbersome to run multiple versions. Therefore, I have added a WebUI feature to provide a single interface for a seamless experience. I will continue to update it in the future.

The current features available in the WebUI are as follows:

Text/Voice-based dialogue with virtual characters (fixed characters with male and female roles)
Dialogue with virtual characters using any image (you can upload any character image)
Multi-turn GPT dialogue (incorporating historical dialogue data to maintain context)
Voice cloning dialogue (based on GPT-SoVITS settings for voice cloning, including a built-in smoky voice that can be cloned based on the voice of the dialogue)

# WebUI
python webui.py

There are three modes for the current startup, and you can choose a specific setting based on the scenario.

The first mode involves fixed Q&A with a predefined character, eliminating preprocessing time.

python app.py

The first mode has recently been updated to include the Wav2Lip model for dialogue.

python appv2.py

The second mode allows for conversing with any uploaded image.

python app_img.py

The third mode builds upon the first one by incorporating a large language model for multi-turn GPT conversations.

python app_multi.py

Now, the part of voice cloning has been added, allowing for freely switching between cloned voice models and corresponding person images. Here, I have chosen a deep, smoky voice and an image of a male.

python app_vits.py

Folder structure

The folder structure of the weight files is as follows:

Baidu (百度云盘): You can download the weights from here (Password: linl).
huggingface: You can access the weights at this link.
modelscope: The weights will be available soon at this link.

Linly-Talker/ 
├── checkpoints
│   ├── hub
│   │   └── checkpoints
│   │       └── s3fd-619a316812.pth
│   ├── lipsync_expert.pth
│   ├── mapping_00109-model.pth.tar
│   ├── mapping_00229-model.pth.tar
│   ├── SadTalker_V0.0.2_256.safetensors
│   ├── visual_quality_disc.pth
│   ├── wav2lip_gan.pth
│   └── wav2lip.pth
├── gfpgan
│   └── weights
│       ├── alignment_WFLW_4HG.pth
│       └── detection_Resnet50_Final.pth
├── GPT_SoVITS
│   └── pretrained_models
│       ├── chinese-hubert-base
│       │   ├── config.json
│       │   ├── preprocessor_config.json
│       │   └── pytorch_model.bin
│       ├── chinese-roberta-wwm-ext-large
│       │   ├── config.json
│       │   ├── pytorch_model.bin
│       │   └── tokenizer.json
│       ├── README.md
│       ├── s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
│       ├── s2D488k.pth
│       ├── s2G488k.pth
│       └── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
├── Qwen
│   └── Qwen-1_8B-Chat
│       ├── assets
│       │   ├── logo.jpg
│       │   ├── qwen_tokenizer.png
│       │   ├── react_showcase_001.png
│       │   ├── react_showcase_002.png
│       │   └── wechat.png
│       ├── cache_autogptq_cuda_256.cpp
│       ├── cache_autogptq_cuda_kernel_256.cu
│       ├── config.json
│       ├── configuration_qwen.py
│       ├── cpp_kernels.py
│       ├── examples
│       │   └── react_prompt.md
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00002.safetensors
│       ├── model-00002-of-00002.safetensors
│       ├── modeling_qwen.py
│       ├── model.safetensors.index.json
│       ├── NOTICE
│       ├── qwen_generation_utils.py
│       ├── qwen.tiktoken
│       ├── README.md
│       ├── tokenization_qwen.py
│       └── tokenizer_config.json
└── README.md

Reference

ASR

TTS

https://github.com/rany2/edge-tts

LLM

THG

Voice Clone

Star History

linly-talker's People

Contributors

Stargazers

Watchers

Forkers

hsaigroup fingerx ljy2019 ariafyy hanwenyuan0907 strategist922 timkar164 fern001 hectorta1989 yangbod saiyi123 sujianwei1 hike2008 wxyv orangels yyheart huangweiboy2 ai-jie01 ai-framwork 731why siliconlife wangchaodeyuzhou ynag9508 tangyiyong xgymchq qinzhuguang yanniszhou ythyty anthonyyuan bi0nd0 todouer kaixindelele dafei1288 weizihua amorjnyh catspunch heefan sakuramaiii lily569 iweig zqz981 mocha-xsy xueminghui redstarxz zxh263 l1-j5n zhikanggfu riderdecade ilumiere jackstephen sqsjavaer chenmoyun jivaklong zjzkiss juno119 skic sunbin728 keyzf blackwhites wingjoezhou weblfe kuyacai cvcuiwei danvan freesteel ainisa20 yslion nksix bestpredicts colinyyj laohuguaiguai venbill weichunpeng yzhou9700 vbc11 jags111 mru4913 zhengmingshao tenzo444 nuffins hushi55 qwioer1 bytescientist newxlife aloukik21 nemodem opensorceycw mtcto ajeema 1192603654 rehberim360 scriptsnet meng-x zmy15501525166 julianyangjingjun

linly-talker's Issues

页面出来就是连接到网络，这什么情况

打开网页之后，会出现连接到网络需要登录，然后跳转http://edge-http.microsoft.com/captiveportal/generate_204
http://www.gstatic.com/generate_204

API

大佬，什么时候能开放API出来哇？（坐等中...

测试下来，发现几个问题，请大佬指点解决。

首先，觉得这个项目挺好，所以才会本地部署起来测试，这是值得肯定的！

其次，先描述本人系统状态：

Lenovo P52 笔记本，64GB 内存，P3200 6GB + 外接 P40 24GB 双显卡
Windows 11 x64，Python 3.10.13，CUDA 11.8， Torch 2.0.1 环境
采用 Linly-AI-7B 做对话模型
首先，根据大佬的 requirements_app.txt 列出的依赖项，补充了环境里没有的：
gradio==3.38.0
edge-tts>=6.1.9
openai-whisper
zhconv
google-generativeai
transformers==4.32.0
其它环境里具备的，按 pip install -r requirements_app.txt 走。

一、直接 Python app.py，成功执行，会有警告：
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Python\Python310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Python\Python310\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
查了网上资料，发现这个与网络连接有关的问题，很常见，但不影响使用，具体原因应该是asyncio库在运行时，没有判别系统平台是Windows还是Linux或别的，都直接调用了asyncio.set_event_loop_policy()类引起的，解决方法可通过加入判断：
if platform.system() == 'Windows':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
之后，就不会再报错。

二、运行过程加载模型的时候，会有提示：
bin C:\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.dll
[2024-01-25 17:08:13,225] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
NOTE: Redirects are currently not supported in Windows or MacOs.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:37<00:00, 18.90s/it]
using safetensor as default
但其实，本人安装的是 windows 版编译的 bitsandbytes，可能是跟某个模型加速的库调用有关，不影响使用。

三、测试 app_img.py，视频合成的最后阶段，报错如下：
{'checkpoint': 'checkpoints\SadTalker_V0.0.2_256.safetensors', 'dir_of_BFM_fitting': 'src/config', 'audio2pose_yaml_path': 'src/config\auido2pose.yaml', 'audio2exp_yaml_path': 'src/config\auido2exp.yaml', 'pirender_yaml_path': 'src/config\facerender_pirender.yaml', 'pirender_checkpoint': 'checkpoints\epoch_00190_iteration_000400000_checkpoint.pt', 'use_safetensor': True, 'mappingnet_checkpoint': 'checkpoints\mapping_00229-model.pth.tar', 'facerender_yaml': 'src/config\facerender.yaml'}
temp\1822631dac470091cee138bad413911fac97da9e\image.png
landmark Det:: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.03it/s]
3DMM Extraction In Video:: 100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.77it/s]
audio2exp:: 100%|███████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 110.95it/s]
Face Renderer:: 100%|██████████████████████████████████████████████████████████████| 123/123 [00:34<00:00, 3.54it/s]
fps: 25 123
ffmpeg error
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_img.py", line 84, in text_response
video = sad_talker.test2(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 279, in test2
return_path = self.animate_from_coeff.generate(data, save_dir, pic_path, crop_info, enhancer='gfpgan' if use_enhancer else None, preprocess=preprocess, img_size=size)
File "D:\AITest\LinlyTalker\src\facerender\animate.py", line 272, in generate
os.remove(path)
FileNotFoundError: [WinError 3] 系统找不到指定的路径。: './results/85200a0a-e6c9-4143-980f-a82b4a8dd3b5\temp_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\first_frame_dir\image_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\input\answer.mp4'

这个可能与大佬传递的系统 path 变量有关，但没找到如何解决，请大佬帮忙分析解决。

四、在使用 app.py 和 app_multi.py 时，想修改默认的头像 example.png 为别的头像，但发现修改脚本里面的 image 路径是不管用的，最后直接删除掉 inputs 目录下的 first_frame_dir 整个目录，执行得到报错信息如下：
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 39, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_multi.py", line 148, in human_respone
video_path = sad_talker.test(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 153, in test
batch = get_data(first_coeff_path, audio_path, self.device, ref_eyeblink_coeff_path=ref_eyeblink_coeff_path, still=still_mode,
File "D:\AITest\LinlyTalker\src\generate_batch.py", line 82, in get_data
source_semantics_dict = scio.loadmat(source_semantics_path)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 225, in loadmat
with _open_file_context(file_name, appendmat) as f:
File "C:\Python\Python310\lib\contextlib.py", line 135, in enter
return next(self.gen)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 17, in _open_file_context
f, opened = _open_file(file_like, appendmat, mode)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 45, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

感觉这个脚本里面哪里被写死了，请大佬指点修改哪里可以实现替换不同默认头像的功能，谢谢！

support qwen models?

当我运行python webui.py ，并点击提交视频生成时，我遇到了Connection errored out

终端如下图

请问各位大佬是什么原因呢

pip install -r VITS/requirements_gptsovits.txt报错

按照你的说明，安装这个的时候依赖出了问题

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 2
╰─> [63 lines of output]
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Ignoring oldest-supported-numpy: markers 'python_version < "3.9"' don't match your environment
ERROR: Exception:
Traceback (most recent call last):

gr.Error("无克隆环境或者无克隆模型权重，无法克隆声音", e)

事先在单独工作区中训练了GPTsoVITS，然后再将训练好的权重放在了GPT_weights和SoVITS_weights中，然后运行克隆声音时出现以下error:

/Linly-Talker/webui.py", line 114, in LLM_response
gr.Error("无克隆环境或者无克隆模型权重，无法克隆声音", e)
TypeError: Error.init

或许声音克隆这个模块在webui.py中还需要改代码吗？

这个proxy url要如何设置

需要单独构建一个魔法server吗

run app_img.py error!

config.py unchanged.
import gradio as gr
ValueError: Unknown scheme for proxy URL URL('socks://127.0.0.1:7890/')
Looking forward to your letter to resolving this issue.

麻烦问下数字人驱动这模块速度怎么样呢

感谢大佬可以正常运行

代码运行成功了

有语音唤醒实现吗

关于GPT-SoVITS和XTTS，README写的太简单了

GPT-SoVITS和XTTS的配置写的太简单了。
GPT-SoVITS还有一堆包需要下载，还有nltk需要下载配置。
XTTS也是报：没有examples/female.wav、 tts_models--multilingual--multi-dataset--xtts_v2/config.json等错误。

README能否写详细点，或者类似Sadtalker，把调用的模型和存放位置都写一下。

python app_img.py 生成视频的话，每次都会报错

python app_img.py 生成视频的话，每次都会报这种错， app.py 和 app_multi.py启动后是好着的

pip install -r VITS/requirements_gptsovits.txt安装报错

按照你的说明，安装这个的时候依赖出了问题
pip install -r VITS/requirements_gptsovits.txt

Installing build dependencies ... error
error: subprocess-exited-with-error

保存视频路径错误

Face Renderer:: 100%之后，提示路径错误，请问是配置的问题吗？

./results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\temp_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4: No such file or directory
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\linly\lib\shutil.py", line 791, in move
os.rename(src, real_dst)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。: '89cf9dcd-0120-4368-8106-ef56ecd5ed86.mp4' -> './results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4'

up主可以在语言克隆模块中加入RVC（简版SOVITS）项目嘛ㅠㅠ

sovits运行时一直报错了。。可能RVC会更好跑通一点..？（）

关于数字人的问题

请问我想传入一段话让数字人读，怎么实现？

colab的连接找不到启动文件··

运行报错 python app.py

你好，请教一个问题，项目根目录运行的时候会报这个错。

启动问题

启动后报错了，总是解决不了，请问这是什么原因，该怎么解决？

(linly) D:\Linly-Talker>python app.py
Traceback (most recent call last):
File "D:\Linly-Talker\app.py", line 5, in
from LLM import LLM
File "D:\Linly-Talker\LLM_init_.py", line 1, in
from .Linly import Linly
File "D:\Linly-Talker\LLM\Linly.py", line 2, in
import torch
File "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch_init_.py", line 130, in
raise err
OSError: [WinError 127] 找不到指定的程序。 Error loading "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch\lib\c10_cuda.dll" or one of its dependencies.

镜像

大神, 是不是可以做个更方便的镜像

"i got an error when options preprocess --full without --still"这个错误貌似仍存在

原Issues地址
OpenTalker/SadTalker#268

我看作者回复尽快修改，但是该bug至今仍是open的
有哪位大神解决了吗？

LLM对话步骤出现错误：“对不起，你的请求出错了，请再次尝试。”

您好，我在使用webui时上传语音对话，识别完成后提交视频时发生了如下的问题。
使用的显卡为4090。

错误部分如下：
extern "C"
launch_bounds(512, 4)
global void reduction_prod_kernel(ReduceJitOp r){
r.run();
}
nvrtc: error: invalid value for --gpu-architecture (-arch)

对不起，你的请求出错了，请再次尝试。
Sorry, your request has encountered an error. Please try again.

函数 predict 运行时间： 3.0960586071014404 秒
函数 LLM_response 运行时间： 3.160871982574463 秒
audio2exp:: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 212.45it/s]Face Renderer:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 92/92 [00:18<00:00, 5.00it/s]fps: 20 183
./results/temp_girl_answer.mp4
函数 Talker_response 运行时间： 22.409300565719604 秒

我的Qwen文件夹结构如下：

希望能帮忙解答，谢谢！

群聊二维码已过期，求拉。顺便问项目问题

我希望我来传入文本，本地数字人不调用大模型而是直接运行TTS和wav2lip，请问可以做到吗？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.