Git Product home page Git Product logo

talkinggaussian's Introduction

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

This is the official repository for our ECCV 2024 paper TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting.

Paper | Project | Video

image

Installation

Tested on Ubuntu 18.04, CUDA 11.3, PyTorch 1.12.1

git clone [email protected]:Fictionarry/TalkingGaussian.git --recursive

conda env create --file environment.yml
conda activate talking_gaussian
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0

If encounter installation problem from the diff-gaussian-rasterization or gridencoder, please refer to gaussian-splatting and torch-ngp.

Preparation

  • Prepare face-parsing model and the 3DMM model for head pose estimation.

    bash scripts/prepare.sh
  • Download 3DMM model from Basel Face Model 2009:

    # 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/
    # 2. run following
    cd data_utils/face_tracking
    python convert_BFM.py
  • Prepare the environment for EasyPortrait:

    # prepare mmcv
    conda activate talking_gaussian
    pip install -U openmim
    mim install mmcv-full==1.7.1
    
    # download model weight
    cd data_utils/easyportrait
    wget "https://n-ws-620xz-pd11.s3pd11.sbercloud.ru/b-ws-620xz-pd11-jux/easyportrait/experiments/models/fpn-fp-512.pth"

Usage

Important Notice

  • This code is provided for research purposes only. The author makes no warranties, express or implied, as to the accuracy, completeness, or fitness for a particular purpose of the code. Use this code at your own risk.

  • The author explicitly prohibits the use of this code for any malicious or illegal activities. By using this code, you agree to comply with all applicable laws and regulations, and you agree not to use it to harm others or to perform any actions that would be considered unethical or illegal.

  • The author will not be responsible for any damages, losses, or issues that arise from the use of this code.

  • Users are encouraged to use this code responsibly and ethically.

Video Dataset

Here we provide two video clips used in our experiments, which are captured from YouTube. Please respect the original content creators' rights and comply with YouTube’s copyright policies in the usage.

Other used videos can be found from GeneFace and AD-NeRF.

Pre-processing Training Video

  • Put training video under data/<ID>/<ID>.mp4.

    The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5 min.

  • Run script to process the video.

    python data_utils/process.py data/<ID>/<ID>.mp4
  • Obtain Action Units

    Run FeatureExtraction in OpenFace, rename and move the output CSV file to data/<ID>/au.csv.

  • Generate tooth masks

    export PYTHONPATH=./data_utils/easyportrait 
    python ./data_utils/easyportrait/create_teeth_mask.py ./data/<ID>

Audio Pre-process

In our paper, we use DeepSpeech features for evaluation.

  • DeepSpeech

    python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # saved to data/<name>.npy
  • HuBERT

    Similar to ER-NeRF, HuBERT is also available. Recommended for situations if the audio is not in English.

    Specify --audio_extractor hubert when training and testing.

    python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy
    

Train

# If resources are sufficient, partially parallel is available to speed up the training. See the script.
bash scripts/train_xx.sh data/<ID> output/<project_name> <GPU_ID>

Test

# saved to output/<project_name>/test/ours_None/renders
python synthesize_fuse.py -S data/<ID> -M output/<project_name> --eval  

Inference with target audio

python synthesize_fuse.py -S data/<ID> -M output/<project_name> --use_train --audio <preprocessed_audio_feature>.npy

Citation

Consider citing as below if you find this repository helpful to your project:

@article{li2024talkinggaussian,
    title={TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting}, 
    author={Jiahe Li and Jiawei Zhang and Xiao Bai and Jin Zheng and Xin Ning and Jun Zhou and Lin Gu},
    journal={arXiv preprint arXiv:2404.15264},
    year={2024}
}

Acknowledgement

This code is developed on gaussian-splatting with simple-knn, and a modified diff-gaussian-rasterization. Partial codes are from RAD-NeRF, DFRF, GeneFace, and AD-NeRF. Teeth mask is from EasyPortrait. Thanks for these great projects!

talkinggaussian's People

Contributors

fictionarry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.