wyhsirius / lia Goto Github PK
View Code? Open in Web Editor NEW[ICLR 22] Latent Image Animator: Learning to Animate Images via Latent Space Navigation
Home Page: https://wyhsirius.github.io/LIA-project/
License: Other
[ICLR 22] Latent Image Animator: Learning to Animate Images via Latent Space Navigation
Home Page: https://wyhsirius.github.io/LIA-project/
License: Other
Great work, thank you for publishing this. I was wondering how to get the high-resolution video outputs (512x512) that you include on the project page. Do I just need to set the size parameter to 512, instead of the default of 256 here?
Thanks for your excellent work. I have a question.
Why do we need to repeat the latent variable (latent = latent.reshape((latent.shape[0], -1)).unsqueeze(1).repeat(1, inject_index, 1)) by repeating it 14 times? Why not use the latent variable (1, 512) directly as the input for all subsequent networks? I think repeating does not add new information here. Could you explain the reason here?
Thanks.
Hey @wyhsirius ,
thanks for the nice work, can I ask have you tried to train the proposed LIA on TED with size 384 * 384?
Everything seems fine except the eyes movement. It seems my model doesn't correctly capture eyes movement from driving videos. It's now 300k steps, should I wait for more steps? Are there any more parameter setting tricks?
Hello,
I really appreciate your great work.
I have a question at the implementation of the encoder; motion network blocks:
https://github.com/wyhsirius/LIA/blob/main/networks/encoder.py#L252
Here, each EqualLinear
block of motion networks does not have any activation function.
In this case, the motion networks do not have the non-linearity mathematically so it can be equal to the large size of a linear layer.
Is this intended?
If then, have you experienced any significant differences both using an activation function at the linear block or just using a single linear layer?
Additionally, stacked MLP blocks of the StyleGAN2 contain activation functions:
https://github.com/rosinality/stylegan2-pytorch/blob/master/model.py#L412
I tried to infer LIA 512 model (found on gdrive) but it is not that sharp. It looks slightly sharper than 256 model. Is that normal?
Do i have to change any parameter other than size
?
I am stuck with this error "ImportError: PyAV is not installed, and is necessary for the video operations in torchvision." but I have installed PyAV with "pip install av" and the error persists, I uninstalled it and reinstalled it back, I replaced all the files inside of the LIA folder for the git ones, and I don't know what else to do, can anyone help?
Recently I have been trying to train higher resolution (512*512) on LIA. I have uploaded a checkpoint on https://huggingface.co/taocode/LIA_512. The result seems to be very poor. Does anyone succeeded in training on higher resolution based on this repo?
How to control intensity of animation?
Hi, it's a great job in face animation, but i found there is some artifacts in the contour of head, it seems to lack of harmonization operation in the edge and contour.
Do u found it as well ?
is that a problem caused by network ?
if so, any methods to fix it ?
I evaluate your model using video FID following the same implementation as you mentioned in paper, the FID value of same-identity reconstruction is 6.8586 rather than 0.161 of cross-video generation. As we know, cross-video generation is more difficult than same-identity reconstruction, and the FID value should be higher.
I'm confused about the above results and would like to know the details of your FID evaluation process.
first error
!python run_demo.py --model vox --source_path /content/LIA/data/vox/241.jpg --driving_path /content/LIA/data/vox/faceexp2.mp4 --save_folder /content/LIA/res # using vox model
==> loading model
==> loading data
Traceback (most recent call last):
File "run_demo.py", line 110, in <module>
demo = Demo(args)
File "run_demo.py", line 72, in __init__
self.vid_target, self.fps = vid_preprocessing(args.driving_path)
File "run_demo.py", line 31, in vid_preprocessing
vid_dict = torchvision.io.read_video(vid_path, pts_unit='sec')
File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 273, in read_video
_check_av_available()
File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 42, in _check_av_available
raise av
ImportError: PyAV is not installed, and is necessary for the video operations in torchvision.
See https://github.com/mikeboers/PyAV#installation for instructions on how to
install PyAV on your system.
!pip install av
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting av
Downloading av-9.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.2 MB)
|████████████████████████████████| 28.2 MB 1.2 MB/s
Installing collected packages: av
Successfully installed av-9.2.0
second error
!python run_demo.py --model vox --source_path /content/LIA/data/vox/241.jpg --driving_path /content/LIA/data/vox/faceexp2.mp4 --save_folder /content/LIA/res # using vox model
==> loading model
==> loading data
==> running
0% 0/1273 [00:00<?, ?it/s]/content/LIA/networks/styledecoder.py:439: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
The boolean parameter 'some' has been replaced with a string parameter 'mode'.
Q, R = torch.qr(A, some)
should be replaced with
Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:1980.)
Q, R = torch.qr(weight) # get eignvector, orthogonal [n1, n2, n3, n4]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:4194: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
"Default grid_sample and affine_grid behavior has changed "
100% 1273/1273 [01:12<00:00, 17.61it/s]
Traceback (most recent call last):
File "run_demo.py", line 111, in <module>
demo.run()
File "run_demo.py", line 93, in run
save_video(vid_target_recon, self.save_path, self.fps)
File "run_demo.py", line 44, in save_video
torchvision.io.write_video(save_path, vid[0], fps=fps)
File "/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py", line 135, in write_video
container.mux(packet)
File "av/container/output.pyx", line 211, in av.container.output.OutputContainer.mux
File "av/container/output.pyx", line 217, in av.container.output.OutputContainer.mux_one
File "av/container/output.pyx", line 172, in av.container.output.OutputContainer.start_encoding
File "av/error.pyx", line 336, in av.error.err_check
av.error.FileNotFoundError: [Errno 2] No such file or directory
Hey @wyhsirius,
Very impressive work, may I ask how long you cost for training the proposed method on each dataset?
I want to do some related research work, so it would help me greatly if I can know this.
Best regards
Awesome but where is the code? FOMM is still the leader despite being 3yo!
Regards
Hi all!
LIA is a really cool project and currently one of the best that gives quality animation. Great work!
My suggestion is the following.
In the paper, the authors said that they have been training for approx. 6 days and used 8xV100. V100 being a predecessor to 3090, and 3090 cost on vast.ai is around 2.5$ per hour, I'd assume the training cost is around $1k.
Originally post: #5 (comment)
Hi, I found the activation in this sub-network is set as None. If it was None, the self.fc
should equal to the cascaded multiple linear layer without activation, therefore equal to one Linear Layer? Is it right?
Hey @wyhsirius,
I was training the model on 4gpus, Have you met the following problem:
When I directly train start from 0,
I can use batch_size=32 to train the model without any problem,
However, when I want to train the model with --resume_ckpt
, it shows like below, and I can just use very small batch size to avoid the out of memory
problem :
I would appreciate it if you can share me some suggestion to solve this problem~
Bests,
Hello,
When I try the demo here: https://replicate.com/wyhsirius/lia
It outputs this error:
mat1 and mat2 shapes cannot be multiplied (6656x6 and 512x512)
What should I do to fix this?
Line 90 in d120cb2
Thx for the great work! Could you please explain why we need to use h_start and why you set h_start=None for Ted data?
Hi, thanks for the nice work. I have some questions about architecture.
If I missed something, please free to tell. Anyway, I am enjoying LIA. Thanks.
Thanks for your greate works.
Would you please share the 512x512 checkpoints? It would be get better result.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.