Git Product home page Git Product logo

opentalker / video-retalking Goto Github PK

View Code? Open in Web Editor NEW
5.8K 71.0 847.0 45.48 MB

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Home Page: https://opentalker.github.io/video-retalking/

License: Apache License 2.0

Python 96.52% C++ 0.21% Cuda 1.39% Shell 0.54% Jupyter Notebook 1.35%
lip-synchronization talking-head-videos video-editing siggraph-asia-2022

video-retalking's People

Contributors

abhishek213-alb avatar chenxwh avatar eltociear avatar kunncheng avatar mohitd404 avatar romitp4l avatar rudra-ji avatar sanyam-2026 avatar steven12138 avatar suravshresth avatar user-tony avatar vinthony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video-retalking's Issues

ValueError: cannot reshape array of size

错误信息提示:
[Step 1] Using saved landmarks.
Traceback (most recent call last):
File "inference.py", line 364, in
main()
File "inference.py", line 88, in main
lm = lm.reshape([len(full_frames), -1, 2])
ValueError: cannot reshape array of size 206040 into shape (6577,newaxis,2)
输入原始视频尺寸:
image

不知道具体什么原因?对于的尺寸有具体要求吗?还是文件大小导致的?

output resolution problem

This is a great open source, but there is a gap between the resolution of the output and the original video, the resolution of the output is low, and there are jagged, is there a way to deal with it, thank you very much!

虚拟人表情情绪的最佳实践

希望实现虚拟人对话中改变情绪,比如悲伤的说话,开心的说话,兴奋的说话 , 是使用不同的带有情绪的视频会比较好呢 ? 亦或者是有其他实现?

Optimise inference time

Thanks for the great work. you are all doing a great contribution to community.
However, is there any idea to reduce inference time, thanks,

Regards

FileNotFoundError: [WinError 2]

where I came in the sixth step,it broke out.
Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 271, in main
subprocess.call(command, shell=platform.system() != 'Windows')
File "E:\anaconda\envs\video_retalking\lib\subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "E:\anaconda\envs\video_retalking\lib\subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "E:\anaconda\envs\video_retalking\lib\subprocess.py", line 1311, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

Install Error /w requirements.txt

Followed instructions, got error

Installing collected packages: yapf, tensorboard-plugin-wit, pyasn1, ninja, lmdb, einops, dlib, addict, tensorboard-data-server, six, rsa, pyyaml, pyparsing, pyasn1-modules, protobuf, oauthlib, numpy, networkx, MarkupSafe, llvmlite, kiwisolver, importlib-resources, grpcio, future, fonttools, cycler, colorama, charset-normalizer, cachetools, absl-py, werkzeug, tqdm, tifffile, scipy, PyWavelets, python-dateutil, opencv-python, numba, markdown, kornia, imageio, google-auth, contourpy, scikit-image, resampy, requests-oauthlib, matplotlib, librosa, google-auth-oauthlib, filterpy, face-alignment, tb-nightly, facexlib, basicsr
Running setup.py install for dlib ... error
error: subprocess-exited-with-error

× Running setup.py install for dlib did not run successfully.
│ exit code: 1
╰─> [9 lines of output]
running install
C:\Users\chlyw\anaconda3\envs\video_retalking\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
running build_ext

  ERROR: CMake must be installed to build dlib

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> dlib

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

大神您好,请帮忙看看这个错误怎么解决。搞了8个小时,目前卡在这个错误,实在搞不定了,辛苦您帮忙看下,多谢多谢。
我的环境是本地电脑win10,64位,显卡GTX960 4G显存,已经安装了anaconda3,以及python3.8,cuda11.1,在环境变量里也配置了。
在powershell中输入命令:
python inference.py --face examples/face/1.mp4 --audio examples/audio/1.wav --outfile results/1_1.mp4

输出结果:
C:\Python\Python38\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
warnings.warn(
[Info] Using cuda for inference.
[Step 0] Number of frames available for inference: 135
[Step 1] Landmarks Extraction in Video.
landmark Det:: 100%|█████████████████████████████████████████████████████████████████| 135/135 [00:25<00:00, 5.29it/s]
[Step 2] 3DMM Extraction In Video:: 0%| | 0/135 [00:00<?, ?it/s]
Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 100, in main
trans_params, im_idx, lm_idx, _ = align_img(frame, lm_idx, lm3d_std)
File "C:\ai\video-retalking\third_part\face3d\util\preprocess.py", line 196, in align_img
trans_params = np.array([w0, h0, s, t[0], t[1]])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

原来顺利运行的,突然报错step1

[Step 1] Landmarks Extraction in Video.
Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 78, in main
kp_extractor = KeypointExtractor()
File "C:\Users\Qiyun\video-retalking\third_part\face3d\extract_kp_videos.py", line 16, in init
self.detector = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D)
File "C:\Users\Qiyun\miniconda3\envs\video_retalking\lib\enum.py", line 384, in getattr
raise AttributeError(name) from None
AttributeError: _2D

About model training

Thank you for your contribution video lipsync model !
Can I have any info about your code train model Lnet and ENet apply for my own dataset?
Looking forward to your response soon!

Using cuda for inference. ^C

While running google colab, I get:

/usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( [Info] Using cuda for inference. ^C

关于E-Net的效果问题

作者您好,我运行了demo发现面部生成效果有些问题,于是我把L-Net和E-Net的结果都保存了下来,发现L-Net效果正常,但E-Net超分效果比较差,和您的示例视频有所差距,想问一下原因,谢谢。
0000

0000

Video-retalking program errors out on step 6

I am running this program on my Windows machine using Anaconda. I run the following command and get all the way to step 6 before erroring out:

(video_retalking) E:\video-retalking>python inference.py --face examples/face/1.mp4 --audio examples/audio/1.wav --outfile results/1_1.mp4

See attached screen shot. inference.py
line 271. Please advise on what I can do to resolve this issue. Thanks.

video_retalking error screen shot.pdf

blurry inference result even if input video is 1080p

Hi, this is an awesome project and the lipsync is really good.

I’ve encountered some problems: if I use a high resolution video (1920*1080) as input, the output video is blurry on the whole (not just the face area) though the output resolution is also 1080p. It seems that the output video is scaled up from a low-res one.

Based on my understanding of the paper, the generated talking face is pasted back onto the original video. So I wonder if this global blurriness is normal…

I used the command from the readme for inference. Not sure if there are other options I missed.

python3 inference.py \
  --face examples/face/1.mp4 \
  --audio examples/audio/1.wav \
  --outfile results/1_1.mp4

Thank you for your great repo.

Traceback error

I am running into error when I try:
python inference.py \ --face examples/face/1.mp4 \ --audio examples/audio/1.wav \ --outfile results/1_1.mp4

Traceback (most recent call last): File "inference.py", line 4, in <module> from PIL import Image File "C:\Users\username4\anaconda3\envs\video_retalking\lib\site-packages\PIL\Image.py", line 103, in <module> from . import _imaging as core ImportError: DLL load failed while importing _imaging: The specified module could not be found.

第五步卡住不动了

[Step 5] Reference Enhancement: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [18:44<00:00, 10.32s/it]
landmark Det:: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:28<00:00,  3.80it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:00<00:00, 16176.46it/s] 
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:00<00:00, 500.96it/s] 
 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋            | 102/109 [00:00<00:00, 503.14it/s] 
FaceDet::   0%|                                                                                                                                                                                                 | 0/28 [00:00<?, ?it/s] 

Error in the fifth step of the program: (GPEN)

Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 198, in main
pred, _, _ = enhancer.process(img, img, face_enhance=True, possion_blending=False)
File "C:\AI-3\third_part\GPEN\gpen_face_enhancer.py", line 116, in process
mask_sharp = cv2.GaussianBlur(mask_sharp, (0,0), sigmaX=1, sigmaY=1, borderType = cv2.BORDER_DEFAULT)
UnboundLocalError: local variable 'mask_sharp' referenced before assignment

Looking for any help and answers, thank you teachers and designers.

你好传入图片的问题

你好传入图片的问题,生成出来的视频头没有动画,也没有任何报错,请问是我哪里出了问题吗?

Performance concern

I tried to produce a 3 minutes video with example/3.mp4 plus a 3 minutes wav audio, it took 20 minutes in step 6, i'm using 4090RTX, is this performance normal? Note: Cuda already enabled.

About the train code

Hi contributor,

This is fantastic work, and it is very exciting that the code has been released.

I really would like to train my own dataset based on your excellent work. So, may I ask for help with the training process or whether the training code will be released later?

Looking for your response.

I have a question about Identity aware enhancement network

I am also interested in talking head generation so that I have read your paper with impression in SIGGRAPH 2022.
I have a question about 'identity aware enhancement network'.
image
I can not understand this part in your paper. Does it mean that high resolution LRS2 dataset images from restoration network again put into L-Net? Then why to produce low resolution input of E-Net?

Image too big (error)

I'll leave the error log below but from what I understand my image size is too big for face detection on my gpu, I only have a 4GB GPU
using the following code
python inference.py --face examples/face/12.mp4 --audio examples/audio/3.wav --outfile results/1_6.mp4 --face_det_batch_size 2 --LNet_batch_size 2 --one_shot
I don't really want to be resizing my video as I want to preserve qaulity can i do something else like increase batch sizes or crop my video somehow (please if you are replying with code add to the lines I provided above if possible)

FaceDet::   0%|                                                                                | 0/204 [00:01<?, ?it/s]
Recovering from OOM error; New batch size: 1                                                   | 0/204 [00:00<?, ?it/s]
FaceDet::   0%|                                                                                | 0/407 [00:00<?, ?it/s]
[Step 6] Lip Synthesis::   0%|                                                                 | 0/204 [01:36<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\leolo\video-retalking\utils\inference_utils.py", line 118, in face_detect
    predictions.extend(detector.get_detections_for_batch(np.array(images[i:i + batch_size])))
  File "C:\Users\leolo\video-retalking\third_part\face_detection\api.py", line 66, in get_detections_for_batch
    detected_faces = self.face_detector.detect_from_batch(images.copy())
  File "C:\Users\leolo\video-retalking\third_part\face_detection\detection\sfd\sfd_detector.py", line 42, in detect_from_batch
    bboxlists = batch_detect(self.face_detector, images, device=self.device)
  File "C:\Users\leolo\video-retalking\third_part\face_detection\detection\sfd\detect.py", line 69, in batch_detect
    olist = net(imgs)
  File "C:\Users\leolo\.conda\envs\video_retalking\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leolo\video-retalking\third_part\face_detection\detection\sfd\net_s3fd.py", line 71, in forward
    h = F.relu(self.conv1_1(x))
  File "C:\Users\leolo\.conda\envs\video_retalking\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leolo\.conda\envs\video_retalking\lib\site-packages\torch\nn\modules\conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\leolo\.conda\envs\video_retalking\lib\site-packages\torch\nn\modules\conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 960.00 MiB (GPU 0; 4.00 GiB total capacity; 1.96 GiB already allocated; 0 bytes free; 2.67 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "inference.py", line 342, in <module>
    main()
  File "inference.py", line 211, in main
    for i, (img_batch, mel_batch, frames, coords, img_original, f_frames) in enumerate(tqdm(gen, desc='[Step 6] Lip Synthesis:', total=int(np.ceil(float(len(mel_chunks)) / args.LNet_batch_size)))):
  File "C:\Users\leolo\.conda\envs\video_retalking\lib\site-packages\tqdm\std.py", line 1178, in __iter__
    for obj in iterable:
  File "inference.py", line 292, in datagen
    face_det_results = face_detect(full_frames, args, jaw_correction=True)
  File "C:\Users\leolo\video-retalking\utils\inference_utils.py", line 121, in face_detect
    raise RuntimeError('Image too big to run face detection on GPU. Please use the --resize_factor argument')
RuntimeError: Image too big to run face detection on GPU. Please use the --resize_factor argument

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

[Step 2] 3DMM Extraction In Video:: 0%| | 0/135 [00:00<?, ?it/s]
Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 100, in main
trans_params, im_idx, lm_idx, _ = align_img(frame, lm_idx, lm3d_std)
File "G:\pythonProject\third_part\face3d\util\preprocess.py", line 196, in align_img
trans_params = np.array([w0, h0, s, t[0], t[1]])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.
PS G:\pythonProject>

Odd GPEN Error - 'mask_sharp' referenced before assignment

When running inference.py the following error occurs when Step 5 begins

  File "/inference.py", line 342, in <module>
    main()
  File "/inference.py", line 198, in main
    pred, _, _ = enhancer.process(img, img, face_enhance=True, possion_blending=False)
  File "/third_part/GPEN/gpen_face_enhancer.py", line 116, in process
    mask_sharp = cv2.GaussianBlur(mask_sharp, (0,0), sigmaX=1, sigmaY=1, borderType = cv2.BORDER_DEFAULT)
UnboundLocalError: local variable 'mask_sharp' referenced before assignment

What's likely the cause?

Different Errors while trying to install

I followed the instructions on the main page,
to make my first test simple I put the audio and wav in the root directory to make a short path:

python3 inference.py --face 1.mp4 --audio 1.wav --outfile 1_1.mp4

After installing the cuda and requirements as explained,
I had a lot of different errors of missing modules so tried to install them by myself
But the last error I'm stuck on I can't get rid of:

D:\Video_Retalking>python3 inference.py --face 1.mp4 --audio 1.wav --outfile 1_1.mp4
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3'
C:\Users\Alon\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Traceback (most recent call last):
  File "D:\Video_Retalking\inference.py", line 19, in <module>
    from utils.ffhq_preprocess import Croper
  File "D:\Video_Retalking\utils\ffhq_preprocess.py", line 31, in <module>
    import dlib
  File "C:\Users\Alon\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\dlib\__init__.py", line 19, in <module>
    from _dlib_pybind11 import *
ImportError: DLL load failed while importing _dlib_pybind11: The specified module could not be found.


I tried pip install it, pip3 install and even conda install but nothing solve the problem.
I also tried to re-install CUDA but I still get the same error as above.

Any idea how to fix that and make Video-Retalking run local on Anaconda?
Thanks ahead 🙏

The following error occurred when executing step six!

Traceback (most recent call last):
File "D:\AIGC\videotalk\video-retalking\utils\inference_utils.py", line 118, in face_detect
predictions.extend(detector.get_detections_for_batch(np.array(images[i:i + batch_size])))
File "D:\AIGC\videotalk\video-retalking\third_part\face_detection\api.py", line 66, in get_detections_for_batch
detected_faces = self.face_detector.detect_from_batch(images.copy())
File "D:\AIGC\videotalk\video-retalking\third_part\face_detection\detection\sfd\sfd_detector.py", line 42, in detect_from_batch
bboxlists = batch_detect(self.face_detector, images, device=self.device)
File "D:\AIGC\videotalk\video-retalking\third_part\face_detection\detection\sfd\detect.py", line 69, in batch_detect
olist = net(imgs)
File "C:\Users\menka.conda\envs\video_retalking\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AIGC\videotalk\video-retalking\third_part\face_detection\detection\sfd\net_s3fd.py", line 71, in forward
h = F.relu(self.conv1_1(x))
File "C:\Users\menka.conda\envs\video_retalking\lib\site-packages\torch\nn\functional.py", line 1298, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 4.00 GiB total capacity; 2.69 GiB already allocated; 0 bytes free; 3.11 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference.py", line 342, in
main()
File "inference.py", line 211, in main
for i, (img_batch, mel_batch, frames, coords, img_original, f_frames) in enumerate(tqdm(gen, desc='[Step 6] Lip Synthesis:', total=int(np.ceil(float(len(mel_chunks)) / args.LNet_batch_size)))):
File "C:\Users\menka.conda\envs\video_retalking\lib\site-packages\tqdm\std.py", line 1178, in iter
for obj in iterable:
File "inference.py", line 292, in datagen
face_det_results = face_detect(full_frames, args, jaw_correction=True)
File "D:\AIGC\videotalk\video-retalking\utils\inference_utils.py", line 121, in face_detect
raise RuntimeError('Image too big to run face detection on GPU. Please use the --resize_factor argument')
RuntimeError: Image too big to run face detection on GPU. Please use the --resize_factor argument

Colab notebook throwing error while running inference.py

Traceback (most recent call last):
File "/content/video-retalking/inference.py", line 342, in
main()
File "/content/video-retalking/inference.py", line 78, in main
kp_extractor = KeypointExtractor()
File "/content/video-retalking/third_part/face3d/extract_kp_videos.py", line 16, in init
self.detector = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D)
File "/usr/lib/python3.10/enum.py", line 437, in getattr
raise AttributeError(name) from None
AttributeError: _2D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.