Git Product home page Git Product logo

dpe's Introduction

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

            Open In Colab


1 MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China   2 School of Artificial Intelligence, University of Chinese Academy of Sciences   3 Tencent AI Lab, ShenZhen, China  

CVPR 2023


🔥 Demo

  • 🔥 Video editing: single source video & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.
Source video Result
full_s.mp4
dpe.mp4
full_s.mp4
dpe.mp4
full_s.mp4
dpe.mp4
  • 🔥 Video editing: single source image & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.

demo4_1.mp4
demo5_1.mp4

  • 🔥 Video editing: single source image & two driving videos. We tranfer pose through the first video and transfer expression through the second video. Some videos are selected from here.

dpe dpe

📋 Changelog

  • 2023.07.21 Release code for one-shot driving.
  • 2023.05.26 Release code for training.
  • 2023.05.06 Support Enhancement.
  • 2023.05.05 Support Video editing.
  • 2023.04.30 Add some demos.
  • 2023.03.18 Support Pose drivingExpression driving and Pose and Expression driving.
  • 2023.03.18 Upload the pre-trained model, which is fine-tuning for expression generator.
  • 2023.03.03 Release the test code!
  • 2023.02.28 DPE has been accepted by CVPR 2023!

🚧 TODO

  • Test code for video driving.
  • Some demos.
  • Gradio/Colab Demo.
  • Training code of each componments.
  • Test code for video editing.
  • Test code for one-shot driving.
  • Integrate audio driven methods for video editing.
  • Integrate GFPGAN for face enhancement.

🔮 Inference

Dependence Installation

CLICK ME
git clone https://github.com/Carlyx/DPE
cd DPE 
conda create -n dpe python=3.8
source activate dpe
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
### install gpfgan for enhancer
pip install git+https://github.com/TencentARC/GFPGAN

Trained Models

CLICK ME

Please download our pre-trained model and put it in ./checkpoints.

Model Description
checkpoints/dpe.pt Pre-trained model (V1).

Expression driving

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face exp \
		--output_folder ./res

Pose driving

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face pose \
		--output_folder ./res

Expression and pose driving

Video driving:

python run_demo.py --s_path ./data/s.mp4 \
 		--d_path ./data/d.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face both \
		--output_folder ./res

One-shot driving:

python run_demo_single.py --s_path ./data/s.jpg \
 		--pose_path ./data/pose.mp4 \
        --exp_path ./data/exp.mp4 \
		--model_path ./checkpoints/dpe.pt \
		--face both \
		--output_folder ./res

Crop full video

python crop_video.py

Video editing

Before video editing, you should run python crop_video.py to process the input full video. For pre-trained segmentation model, you can download from here and put it in ./checkpoints.

(Optional) You can run git clone https://github.com/TencentARC/GFPGAN and download the pre-trained enhancement model from here and put it in ./checkpoints. Then you can use --EN to make the result better.

python run_demo_paste.py --s_path <cropped source video> \
  --d_path <driving video> \
  --box_path <txt after running crop_video.py> \
  --model_path ./checkpoints/dpe.pt \
  --face exp \
  --output_folder ./res \
  --EN 

Video editing for audio driving

  TODO

🔮 Training

  • Data preprocessing.

To train DPE, please follow video-preprocessing to download and pre-process the VoxCelebA dataset. We use the lmdb to improve I/O efficiency. (Or you can rewrite the Class VoxDataset in dataset.py to load data with .mp4 directly.)

  • Train DPE from scratch:
python train.py --data_root <DATA_PATH>
  • (Optional) If you want to accelerate convergence speed, you can download the pre-trained model of LIA and rename it to vox.pt.
python train.py --data_root <DATA_PATH> --resume_ckpt <model_path for vox.pt>

🛎 Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Pang_2023_CVPR,
    author    = {Pang, Youxin and Zhang, Yong and Quan, Weize and Fan, Yanbo and Cun, Xiaodong and Shan, Ying and Yan, Dong-Ming},
    title     = {DPE: Disentanglement of Pose and Expression for General Video Portrait Editing},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {427-436}
}

💗 Acknowledgements

Part of the code is adapted from LIA, PIRenderer, STIT. We thank authors for their contribution to the community.

🥂 Related Works

📢 Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

dpe's People

Contributors

carlyx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dpe's Issues

video is very blurry

HI,The result of running the command :
python run_demo_single.py --s_path ./data/s.jpg
--pose_path ./data/pose.mp4
--exp_path ./data/exp.mp4
--model_path ./checkpoints/dpe.pt
--face both
--output_folder ./res

The resulting video is very blurry. What could be the cause of t

edit.mp4

his?

SadTalker +DPE 生成效果较差

将SadTalker 生成视频作为源视频,data/s.mp4 作为驱动视频进行pose 驱动,和提供的demo 相比结果比较差,中间需要做什么处理吗

第一个视频是SadTalker 生成的结果,第二个是DPE pose 驱动的结果

506d87485268ab3249a5c3af6d8d86ed_512.chinese_poem2_enhanced.mp4

edit

training code issue

Sorry, I didn't see the Expression loss (eq 10 in the paper) implemented in the training codes.

face deformation

Hello,
Thank for your great work. I tried your code to perform pose transfer with the code
python run_demo.py --s_path video.mp4 --d_path stable3.mp4 --model_path .\checkpoints\dpe.pt --face pose
But the result is quite weired (see attachment). Is there any setting to improve the result?

https://github.com/OpenTalker/DPE/assets/109195411/1fc0ee68-bcf0-4450-a78a-24c7a49d03d6
video is down for same reason. Please see https://www.bilibili.com/video/BV1844y1F7v1/?vd_source=68fd0a3864408b733915dd2c8b2676f7.

License clarification, please?

We are considering, and have prototyped, the use of DPE as part of a video editing pipeline in a potential commercial project, and are excited about it's results. However, the license is not clear to us. The GitHub page says that it has an MIT license, as is included in the source, yet there is a note in the README that specifies research and non-commercial use only. Could you clarify if commercial use is OK?

source video question

Hi, thanks for this great project.

How did you generate the first source video on the homepage? (this one)

Was it from SadTalker?

thanks

paste pose

hi, I really appreciate your work... it's very interesting. however, when I tried the demo_paste and examined its contents, I found that the "--face" argument was set to the default value of "exp". since I wanted to try transferring the pose from a driving video, I ended up changing it to "pose". however, the masking result wasn't accurate. can you help me with this?

on run_demo_paste line 327 I change this
output_dict = self.gen(img_source, img_target, 'exp')

into this
output_dict = self.gen(img_source, img_target, 'pose')

and this is the result
vlcsnap-2023-06-14-11h41m18s336

Hello, I would like to request data

Dear author, hello, I am a graduate student who is doing research on posture transfer. Now that I read your article, I think it is of great help to me, so I want to do your experiment again to make a further step. Research, the difficulty I have encountered at present is that I have not been able to obtain your data, so I wonder if you can send me your voxcelebshu data and processed data, so that I can do further research.

Hello, I would like to request data

Dear author, hello, I am a graduate student who is doing research on posture transfer. Now that I read your article, I think it is of great help to me, so I want to do your experiment again to make a further step. Research, the difficulty I have encountered at present is that I have not been able to obtain your data, so I wonder if you can send me your voxcelebshu data and processed data, so that I can do further research.

Gradio

When do you release gradio ?

Different from LIA

In LIA, "output video" contain the same proportion of body+head parts as in "source image". But in DPE, the proportion of body+head parts in "output video" is intuitively determined by "driving video", similar to a cropping process.

Is this result caused by the pre-trained model? Can I control the percentage of cropping?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.