opentalker / dpe Goto Github PK

View Code? Open in Web Editor NEW

417.0 22.0 45.0 29.95 MB

[CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

Home Page: https://carlyx.github.io/DPE/

License: MIT License

Python 100.00%

dpe's Introduction

DPE： Disentanglement of Pose and Expression for General Video Portrait Editing

Youxin Pang ^1,2,3 Yong Zhang ^3,* Weize Quan ^1,2 Yanbo Fan ³ Xiaodong Cun ³ Ying Shan ³ Dong-ming Yan ^1,2,*

¹ MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China ² School of Artificial Intelligence, University of Chinese Academy of Sciences ³ Tencent AI Lab, ShenZhen, China

CVPR 2023

🔥 Demo

🔥 Video editing: single source video & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.

Source video	Result
full_s.mp4	dpe.mp4
full_s.mp4	dpe.mp4
full_s.mp4	dpe.mp4

🔥 Video editing: single source image & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.

demo4_1.mp4

demo5_1.mp4

🔥 Video editing: single source image & two driving videos. We tranfer pose through the first video and transfer expression through the second video. Some videos are selected from here.

📋 Changelog

2023.07.21 Release code for one-shot driving.
2023.05.26 Release code for training.
2023.05.06 Support Enhancement.
2023.05.05 Support Video editing.
2023.04.30 Add some demos.
2023.03.18 Support Pose driving，Expression driving and Pose and Expression driving.
2023.03.18 Upload the pre-trained model, which is fine-tuning for expression generator.
2023.03.03 Release the test code!
2023.02.28 DPE has been accepted by CVPR 2023!

🚧 TODO

Test code for video driving.
Some demos.
Gradio/Colab Demo.
Training code of each componments.
Test code for video editing.
Test code for one-shot driving.
Integrate audio driven methods for video editing.
Integrate GFPGAN for face enhancement.

🔮 Inference

Dependence Installation

CLICK ME

git clone https://github.com/Carlyx/DPE
cd DPE 
conda create -n dpe python=3.8
source activate dpe
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
### install gpfgan for enhancer
pip install git+https://github.com/TencentARC/GFPGAN

Trained Models

CLICK ME

Please download our pre-trained model and put it in ./checkpoints.

Model	Description
checkpoints/dpe.pt	Pre-trained model (V1).

Expression driving

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face exp \
		--output_folder ./res

Pose driving

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face pose \
		--output_folder ./res

Expression and pose driving

Video driving:

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face both \
		--output_folder ./res

One-shot driving:

python run_demo_single.py --s_path ./data/s.jpg \
 		--pose_path ./data/pose.mp4 \
        --exp_path ./data/exp.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face both \
		--output_folder ./res

Crop full video

python crop_video.py

Video editing

Before video editing, you should run python crop_video.py to process the input full video. For pre-trained segmentation model, you can download from here and put it in ./checkpoints.

(Optional) You can run git clone https://github.com/TencentARC/GFPGAN and download the pre-trained enhancement model from here and put it in ./checkpoints. Then you can use --EN to make the result better.

python run_demo_paste.py --s_path <cropped source video> \
  --d_path <driving video> \
  --box_path <txt after running crop_video.py> \
  --model_path ./checkpoints/dpe.pt \
  --face exp \
  --output_folder ./res \
  --EN

Video editing for audio driving

  TODO

🔮 Training

Data preprocessing.

To train DPE, please follow video-preprocessing to download and pre-process the VoxCelebA dataset. We use the lmdb to improve I/O efficiency. (Or you can rewrite the Class VoxDataset in dataset.py to load data with .mp4 directly.)

Train DPE from scratch:

python train.py --data_root <DATA_PATH>

(Optional) If you want to accelerate convergence speed, you can download the pre-trained model of LIA and rename it to vox.pt.

python train.py --data_root <DATA_PATH> --resume_ckpt <model_path for vox.pt>

🛎 Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Pang_2023_CVPR,
    author    = {Pang, Youxin and Zhang, Yong and Quan, Weize and Fan, Yanbo and Cun, Xiaodong and Shan, Ying and Yan, Dong-Ming},
    title     = {DPE: Disentanglement of Pose and Expression for General Video Portrait Editing},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {427-436}
}

💗 Acknowledgements

Part of the code is adapted from LIA, PIRenderer, STIT. We thank authors for their contribution to the community.

🥂 Related Works

📢 Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

dpe's People

Contributors

Stargazers

Watchers

dpe's Issues

requirements file should be “requirements.txt”

train_list.txt

请问这个文件是需要自己生成吗

The loss functions in the project are inconsistent with the paper.

There are 4 different loss functions in the paper, while only 3 of them are found in the Trainer module:
g_loss = vgg_loss + l1_loss + gan_g_loss + vgg_loss_mid + rec_loss
where is the expression loss?
Sincerely looking forward to your reply.

video is very blurry

HI,The result of running the command ：
python run_demo_single.py --s_path ./data/s.jpg
--pose_path ./data/pose.mp4
--exp_path ./data/exp.mp4
--model_path ./checkpoints/dpe.pt
--face both
--output_folder ./res

The resulting video is very blurry. What could be the cause of t

edit.mp4

his?

使用run_demo_paste.py，唇部会有重影

使用run_demo_paste.py之后，唇部会有重影，请问应该如何解决？

如何确保原视频crop出来的头部视频，经过run_demo.py加工后，paste回原视频，脖子能接上

非常感谢作者的开源项目。
目前按照步骤将原视频origin.mp4 crop出头部分s.mp4，然后run_demo.py由另一视频d.mp4驱动，
发现edit.mp4中的头的比例和位置都跟d.mp4一致了，而跟s.mp4不同，这样edit.mp4直接paste回origin.mp4就会对不上。

请问我是不是哪里做得不对？

Great work, can you share an example of One-shot driving

为什么我找不到输入一张图片和两张视频驱动的代码

请问一下可以输入一张图片用两条视频进行驱动吗谢谢

SadTalker +DPE 生成效果较差

将SadTalker 生成视频作为源视频，data/s.mp4 作为驱动视频进行pose 驱动，和提供的demo 相比结果比较差，中间需要做什么处理吗

第一个视频是SadTalker 生成的结果，第二个是DPE pose 驱动的结果

506d87485268ab3249a5c3af6d8d86ed_512.chinese_poem2_enhanced.mp4

training code issue

Sorry, I didn't see the Expression loss (eq 10 in the paper) implemented in the training codes.

colab link

colab link not working

face deformation

Hello，
Thank for your great work. I tried your code to perform pose transfer with the code
python run_demo.py --s_path video.mp4 --d_path stable3.mp4 --model_path .\checkpoints\dpe.pt --face pose
But the result is quite weired (see attachment). Is there any setting to improve the result?

https://github.com/OpenTalker/DPE/assets/109195411/1fc0ee68-bcf0-4450-a78a-24c7a49d03d6
video is down for same reason. Please see https://www.bilibili.com/video/BV1844y1F7v1/?vd_source=68fd0a3864408b733915dd2c8b2676f7.

DPE处理过以后没有声音了

DPE处理过以后没有声音了，就是这样设计的，还是保留声音与要什么操作

License clarification, please?

We are considering, and have prototyped, the use of DPE as part of a video editing pipeline in a potential commercial project, and are excited about it's results. However, the license is not clear to us. The GitHub page says that it has an MIT license, as is included in the source, yet there is a note in the README that specifies research and non-commercial use only. Could you clarify if commercial use is OK?

Video editing with single source image & two driving videos?

Hi, just wonder when the video editing mode for single source image & two driving videos will be released?

kindly provide training code

mlp_pose and mlp_exp？

In the code of generator, stage pose uses mlp_exp and stage exp uses mlp_pose. Why so?

https://github.com/OpenTalker/DPE/blob/main/networks/generator.py#L134-L140

elif stage=='exp':
directions_expD = self.mlp_pose(directions_D)

elif stage=='pose':
directions_poseD = self.mlp_exp(directions_D)

source video question

Hi, thanks for this great project.

How did you generate the first source video on the homepage? (this one)

Was it from SadTalker?

thanks

hugging face/collab not working

paste pose

hi, I really appreciate your work... it's very interesting. however, when I tried the demo_paste and examined its contents, I found that the "--face" argument was set to the default value of "exp". since I wanted to try transferring the pose from a driving video, I ended up changing it to "pose". however, the masking result wasn't accurate. can you help me with this?

on run_demo_paste line 327 I change this
output_dict = self.gen(img_source, img_target, 'exp')

into this
output_dict = self.gen(img_source, img_target, 'pose')

and this is the result

Hello, I would like to request data

Dear author, hello, I am a graduate student who is doing research on posture transfer. Now that I read your article, I think it is of great help to me, so I want to do your experiment again to make a further step. Research, the difficulty I have encountered at present is that I have not been able to obtain your data, so I wonder if you can send me your voxcelebshu data and processed data, so that I can do further research.