Git Product home page Git Product logo

lia's Introduction

Latent Image Animator: Learning to Animate Images via Latent Space Navigation

Yaohui Wang, Di Yang, François Brémond, Antitza Dantcheva

This is the official PyTorch implementation of the ICLR 2022 paper "Latent Image Animator: Learning to Animate Images via Latent Space Navigation"

Replicate

Abstract: Due to the remarkable progress of deep generative models, animating images has become increasingly efficient, whereas associated results have become increasingly realistic. Current animation-approaches commonly exploit structure representation extracted from driving videos. Such structure representation is instrumental in transferring motion from driving videos to still images. However, such approaches fail in case the source image and driving video encompass large appearance variation. Moreover, the extraction of structure information requires additional modules that endow the animation-model with increased complexity. Deviating from such models, we here introduce the Latent Image Animator (LIA), a self-supervised autoencoder that evades need for structure representation. LIA is streamlined to animate images by linear navigation in the latent space. Specifically, motion in generated video is constructed by linear displacement of codes in the latent space. Towards this, we learn a set of orthogonal motion directions simultaneously, and use their linear combination, in order to represent any displacement in the latent space. Extensive quantitative and qualitative analysis suggests that our model systematically and significantly outperforms state-of-art methods on VoxCeleb, Taichi and TED-talk datasets w.r.t. generated quality.

BibTex

@inproceedings{
wang2022latent,
title={Latent Image Animator: Learning to Animate Images via Latent Space Navigation},
author={Yaohui Wang and Di Yang and Francois Bremond and Antitza Dantcheva},
booktitle={International Conference on Learning Representations},
year={2022}
}

Requirements

  • Python 3.7
  • PyTorch 1.5+
  • tensorboard
  • moviepy
  • av
  • tqdm
  • lpips

1. Animation demo

Download pre-trained checkpoints from here and put models under ./checkpoints. We have provided several demo source images and driving videos in ./data. To obtain demos, you could run following commands, generated results will be saved under ./res.

python run_demo.py --model vox --source_path ./data/vox/macron.png --driving_path ./data/vox/driving1.mp4 # using vox model
python run_demo.py --model taichi --source_path ./data/taichi/subject1.png --driving_path ./data/taichi/driving1.mp4 # using taichi model
python run_demo.py --model ted --source_path ./data/ted/subject1.png --driving_path ./data/ted/driving1.mp4 # using ted model

If you would like to use your own image and video, indicate <SOURCE_PATH> (source image), <DRIVING_PATH> (driving video), <DATASET> and run

python run_demo.py --model <DATASET> --source_path <SOURCE_PATH> --driving_path <DRIVING_PATH>

2. Datasets

Please follow the instructions in FOMM and MRAA to download and preprocess VoxCeleb, Taichi and Ted datasets. Put datasets under ./datasets and organize them as follows:

Vox (Taichi, Ted)

Video Dataset (vox, taichi, ted)
|-- train
    |-- video1
        |-- frame1.png
        |-- frame2.png
        |-- ...
    |-- video2
        |-- frame1.png
        |-- frame2.png
        |-- ...
    |-- ...
|-- test
    |-- video1
        |-- frame1.png
        |-- frame2.png
        |-- ...
    |-- video2
        |-- frame1.png
        |-- frame2.png
        |-- ...
    |-- ...

3. Training

By default, we use DistributedDataParallel on 8 V100 for all datasets. To train the netowrk, run

python train.py --dataset <DATSET> --exp_path <EXP_PATH> --exp_name <EXP_NAME>

The dataset list is as follows, <DATASET>: {vox,taichi,ted}. Tensorboard log and checkpoints will be saved in <EXP_PATH>/<EXP_NAME>/log and <EXP_PATH>/<EXP_NAME>/chekcpoints respectively.

To train from a checkpoint, run

python train.py --dataset <DATASET> --exp_path <EXP_PATH> --exp_name <EXP_NAME> --resume_ckpt <CHECKPOINT_PATH>

4. Evaluation

To obtain reconstruction and LPIPS results, put checkpoints under ./checkpoints and run

python evaluation.py --dataset <DATASET> --save_path <SAVE_PATH>

Generated videos will be save under <SAVE_PATH>. For other evaluation metrics, we use the code from here.

5. Linear manipulation

To obtain linear manipulation results of a single image, run

python linear_manipulation.py --model <DATAET> --img_path <IMAGE_PATH> --save_folder <RESULTS_PATH>

By default, results will be saved under ./res_manipulation.

Acknowledgement

Part of the code is adapted from FOMM and MRAA. We thank authors for their contribution to the community.

lia's People

Contributors

chenxwh avatar wyhsirius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lia's Issues

Question about the motion networks..

Hello,
I really appreciate your great work.

I have a question at the implementation of the encoder; motion network blocks:
https://github.com/wyhsirius/LIA/blob/main/networks/encoder.py#L252

Here, each EqualLinear block of motion networks does not have any activation function.
In this case, the motion networks do not have the non-linearity mathematically so it can be equal to the large size of a linear layer.

Is this intended?
If then, have you experienced any significant differences both using an activation function at the linear block or just using a single linear layer?

Additionally, stacked MLP blocks of the StyleGAN2 contain activation functions:
https://github.com/rosinality/stylegan2-pytorch/blob/master/model.py#L412

About FID value

I evaluate your model using video FID following the same implementation as you mentioned in paper, the FID value of same-identity reconstruction is 6.8586 rather than 0.161 of cross-video generation. As we know, cross-video generation is more difficult than same-identity reconstruction, and the FID value should be higher.

I'm confused about the above results and would like to know the details of your FID evaluation process.

inference with output of 512x512

Great work, thank you for publishing this. I was wondering how to get the high-resolution video outputs (512x512) that you include on the project page. Do I just need to set the size parameter to 512, instead of the default of 256 here?

Why use h_start?

img_recon = self.gen(self.img_source, img_target, h_start)

Thx for the great work! Could you please explain why we need to use h_start and why you set h_start=None for Ted data?

Problem about the contour of head

Hi, it's a great job in face animation, but i found there is some artifacts in the contour of head, it seems to lack of harmonization operation in the edge and contour.
Do u found it as well ?
is that a problem caused by network ?
if so, any methods to fix it ?

Tranining problems

Hey @wyhsirius,
I was training the model on 4gpus, Have you met the following problem:

  1. When I directly train start from 0,
    I can use batch_size=32 to train the model without any problem,
    image

  2. However, when I want to train the model with --resume_ckpt, it shows like below, and I can just use very small batch size to avoid the out of memory problem :
    image

I would appreciate it if you can share me some suggestion to solve this problem~

Bests,

Awesome but...

Awesome but where is the code? FOMM is still the leader despite being 3yo!
Regards

Training Time cost

Hey @wyhsirius,

Very impressive work, may I ask how long you cost for training the proposed method on each dataset?
I want to do some related research work, so it would help me greatly if I can know this.

Best regards

PyAV is not installed?

I am stuck with this error "ImportError: PyAV is not installed, and is necessary for the video operations in torchvision." but I have installed PyAV with "pip install av" and the error persists, I uninstalled it and reinstalled it back, I replaced all the files inside of the LIA folder for the git ones, and I don't know what else to do, can anyone help?

error PyAV av.error.FileNotFoundError: [Errno 2] No such file or directory

first error

!python run_demo.py --model vox --source_path /content/LIA/data/vox/241.jpg --driving_path /content/LIA/data/vox/faceexp2.mp4 --save_folder  /content/LIA/res # using vox model
==> loading model
==> loading data
Traceback (most recent call last):
  File "run_demo.py", line 110, in <module>
    demo = Demo(args)
  File "run_demo.py", line 72, in __init__
    self.vid_target, self.fps = vid_preprocessing(args.driving_path)
  File "run_demo.py", line 31, in vid_preprocessing
    vid_dict = torchvision.io.read_video(vid_path, pts_unit='sec')
  File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 273, in read_video
    _check_av_available()
  File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 42, in _check_av_available
    raise av
ImportError: PyAV is not installed, and is necessary for the video operations in torchvision.
See https://github.com/mikeboers/PyAV#installation for instructions on how to
install PyAV on your system.
!pip install av
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting av
  Downloading av-9.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.2 MB)
     |████████████████████████████████| 28.2 MB 1.2 MB/s 
Installing collected packages: av
Successfully installed av-9.2.0

second error

!python run_demo.py --model vox --source_path /content/LIA/data/vox/241.jpg --driving_path /content/LIA/data/vox/faceexp2.mp4 --save_folder  /content/LIA/res # using vox model
==> loading model
==> loading data
==> running
  0% 0/1273 [00:00<?, ?it/s]/content/LIA/networks/styledecoder.py:439: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
The boolean parameter 'some' has been replaced with a string parameter 'mode'.
Q, R = torch.qr(A, some)
should be replaced with
Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:1980.)
  Q, R = torch.qr(weight)  # get eignvector, orthogonal [n1, n2, n3, n4]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:4194: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  "Default grid_sample and affine_grid behavior has changed "
100% 1273/1273 [01:12<00:00, 17.61it/s]
Traceback (most recent call last):
  File "run_demo.py", line 111, in <module>
    demo.run()
  File "run_demo.py", line 93, in run
    save_video(vid_target_recon, self.save_path, self.fps)
  File "run_demo.py", line 44, in save_video
    torchvision.io.write_video(save_path, vid[0], fps=fps)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 135, in write_video
    container.mux(packet)
  File "av/container/output.pyx", line 211, in av.container.output.OutputContainer.mux
  File "av/container/output.pyx", line 217, in av.container.output.OutputContainer.mux_one
  File "av/container/output.pyx", line 172, in av.container.output.OutputContainer.start_encoding
  File "av/error.pyx", line 336, in av.error.err_check
av.error.FileNotFoundError: [Errno 2] No such file or directory

Offer to chip in for training VOX 512x512 !!

Hi all!
LIA is a really cool project and currently one of the best that gives quality animation. Great work!

My suggestion is the following.

  1. Find those willing to chip in for model training
  2. Find someone who can competently train the model, for example, on 8xV100 (with correct HQ datasets)
  3. Estimate the cost of training
    For example, the user @leg0m4n gave such an assessment for the model 256x256 of about 1000$

In the paper, the authors said that they have been training for approx. 6 days and used 8xV100. V100 being a predecessor to 3090, and 3090 cost on vast.ai is around 2.5$ per hour, I'd assume the training cost is around $1k.
Originally post: #5 (comment)

  1. Investing... I am ready to invest 100$ ^_^
  2. Select 3 trusted sponsors and create a multisig wallet.
    For example, I really liked this one: https://app.safe.global/
    Choose blockchain BNB and USDT as investment currency.
  3. Training vox-model 512x512 and sharing to participants.
  4. Bingo!

LIA 512 blurry output

I tried to infer LIA 512 model (found on gdrive) but it is not that sharp. It looks slightly sharper than 256 model. Is that normal?
Do i have to change any parameter other than size?

Why do we need to repeat the latent variable 14 times?

Thanks for your excellent work. I have a question.

Why do we need to repeat the latent variable (latent = latent.reshape((latent.shape[0], -1)).unsqueeze(1).repeat(1, inject_index, 1)) by repeating it 14 times? Why not use the latent variable (1, 512) directly as the input for all subsequent networks? I think repeating does not add new information here. Could you explain the reason here?

Thanks.

Training vox from scratch issues

Everything seems fine except the eyes movement. It seems my model doesn't correctly capture eyes movement from driving videos. It's now 300k steps, should I wait for more steps? Are there any more parameter setting tricks?

about architecture

Hi, thanks for the nice work. I have some questions about architecture.

  1. In Fig.8 in, toRGB of shape 4 x 4, 512 not is used (refer to the red box in the below image), right? This is implemented by "self.to_rgb1=ToRGB(...)" in styledecoder.py, but didn't used in forward pass and also in backward pass. (This yields unused parameter error when training). I'm not sure this is intended implementation. I think this layer is important because it is the starting layer and the only the un-warped feature, which may helps to preserve id.

스크린샷 2022-09-25 오전 1 12 19

  1. Did you use NoiseInjection in Synthesis network? The parameters are assigned when initializing the NoiseInjection, but it is not used in the model.

If I missed something, please free to tell. Anyway, I am enjoying LIA. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.