Git Product home page Git Product logo

ip_lap's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ip_lap's Issues

面部会有细微抖动

非常感谢您如此出色的工作,在测试之后发现,生成的视频人脸会有很细微的抖动,经过拆帧之后发现,两帧之间生成的区别过大,其中一张下巴很长,像是双下巴或者下巴部分重叠一样,因此产生了面部的抖动,我已经尝试了修改面部平滑的数值,也尝试减小面部检测框的大小,但还是不行,请问您那边有什么思路吗

load trained_checkpoint error

训练完landmark 和render模型执行inference_single.py推理时,提示加载checkpoint错误,状态字典缺少一些keys。信息如下:

landmark_generator_model loaded from : checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
renderer loaded from : checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Load checkpoint from: checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)

Perceptual loss:
Mode: vgg19
Load checkpoint from: checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Traceback (most recent call last):
File "IP_LAP/inference_single.py", line 194, in
renderer = load_model(model=Renderer(), path=renderer_checkpoint_path)
File "IP_LAP/inference_single.py", line 173, in load_model
model.load_state_dict(new_s)
File "local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Renderer:
Missing key(s) in state_dict: "flow_module.conv1.weight", "flow_module.conv1.bias", "flow_module.conv1_bn.weight", "flow_module.conv1_bn.bias", "flow_module.conv1_bn.running_mean", "flow_module.conv1_bn.running_var", "flow_module.conv2.weight", "flow_module.conv2.bias", "flow_module.conv2_bn.weight", "flow_module.conv2_bn.bias", "flow_module.conv2_bn.running_mean", "flow_module.conv2_bn.running_var", "flow_module.spade_layer_1.conv_1.weight", "flow_module.spade_layer_1.conv_1.bias", "flow_module.spade_layer_1.conv_2.weight", "flow_module.spade_layer_1.conv_2.bias", "flow_module.spade_layer_1.spade_layer_1.conv1.weight", "flow_module.spade_layer_1.spade_layer_1.conv1.bias", "flow_module.spade_layer_1.spade_layer_1.gamma.weight", "flow_module.spade_layer_1.spade_layer_1.gamma.bias", "flow_module.spade_layer_1.spade_layer_1.beta.weight", "flow_module.spade_layer_1.spade_layer_1.beta.bias", "flow_module.spade_layer_1.spade_layer_2.conv1.weight", "flow_module.spade_layer_1.spade_layer_2.conv1.bias", "flow_module.spade_layer_1.spade_layer_2.gamma.weight", "flow_module.spade_layer_1.spade_layer_2.gamma.bias", "flow_module.spade_layer_1.spade_layer_2.beta.weight", "flow_module.spade_layer_1.spade_layer_2.beta.bias", "flow_module.spade_layer_2.conv_1.weight", "flow_module.spade_layer_2.conv_1.bias", "flow_module.spade_layer_2.conv_2.weight", "flow_module.spade_layer_2.conv_2.bias", "flow_module.spade_layer_2.spade_layer_1.conv1.weight", "flow_module.spade_layer_2.spade_layer_1.conv1.bias", "flow_module.spade_layer_2.spade_layer_1.gamma.weight", "flow_module.spade_layer_2.spade_layer_1.gamma.bias", "flow_module.spade_layer_2.spade_layer_1.beta.weight", "flow_module.spade_layer_2.spade_layer_1.beta.bias", "flow_module.spade_layer_2.spade_layer_2.conv1.weight", "flow_module.spade_layer_2.spade_layer_2.conv1.bias", "flow_module.spade_layer_2.spade_layer_2.gamma.weight", "flow_module.spade_layer_2.spade_layer_2.gamma.bias", "flow_module.spade_layer_2.spade_layer_2.beta.weight", "flow_module.spade_layer_2.spade_layer_2.beta.bias", "flow_module.spade_layer_4.conv_1.weight", "flow_module.spade_layer_4.conv_1.bias", "flow_module.spade_layer_4.conv_2.weight", "flow_module.spade_layer_4.conv_2.bias", "flow_module.spade_layer_4.spade_layer_1.conv1.weight", "flow_module.spade_layer_4.spade_layer_1.conv1.bias", "flow_module.spade_layer_4.spade_layer_1.gamma.weight", "flow_module.spade_layer_4.spade_layer_1.gamma.bias", "flow_module.spade_layer_4.spade_layer_1.beta.weight", "flow_module.spade_layer_4.spade_layer_1.beta.bias", "flow_module.spade_layer_4.spade_layer_2.conv1.weight", "flow_module.spade_layer_4.spade_layer_2.conv1.bias", "flow_module.spade_layer_4.spade_layer_2.gamma.weight", "flow_module.spade_layer_4.spade_layer_2.gamma.bias", "flow_module.spade_layer_4.spade_layer_2.beta.weight", "flow_module.spade_layer_4.spade_layer_2.beta.bias", "flow_module.conv_4.weight", "flow_module.conv_4.bias", "flow_module.conv_5.0.weight", "flow_module.conv_5.0.bias", "flow_module.conv_5.2.weight", "flow_module.conv_5.2.bias".

Segmentation fault

在训练video_render的时候使用8*V100/16GB 的时候会提示“Segmentation fault”;但当把batch-size该成16的时候就可以正常训练了,请问这个是正常的吗 ?如果想尽可能地利用我的8张V100 应该如何修改代码呢 ?

静默音频嘴无法停下的问题

非常感谢作者能开源这么优秀的项目!
我使用自己的视频及音频做了下推理测试,发现了类似于wav2lip的问题,即是当出现音频停下的静默片段,人物的嘴仍然会露出原始嘴型(可能嘴是闭上的,但是嘴角仍然按照原始嘴型在抽动)。想请问下作者这个问题是否有解决思路?如果在训练时添加这样静默的数据,是否会改善这个问题?

The face cannot be detected when I infer my own input video.

Thank for your great work!
But when I run the inference_single.py using my own video, the code cannot work. It show:

Traceback (most recent call last): File "inference_single.py", line 509, in <module> full = merge_face_contour_only(original_background, T_input_frame[2], T_ori_face_coordinates[2][1],fa) #(H,W,3) File "inference_single.py", line 145, in merge_face_contour_only preds = fa.get_landmarks(input_img)[0] # 68x2 TypeError: 'NoneType' object is not subscriptable

So, I wonder how to solve the problem, tks again!

video renderer speed is slow

Thank you for your open source.
I tried the CUDA_VISIBLE_DEVICES=0 python inference_single.py
and got the result ./test_result/129result_N_25_Nl_15.mp4
every thing goes fine, except the inference speed is slow(only got around 5.0it/s using RTX4090)
Is there any suggestion for optimizing the speed?

Mandarin lip synchronization

Hello, thank you very much for opening up this excellent work. When using your code for rendering, we found that the lip synchronization effect in Mandarin is not very good, and there may be unnatural synthesis effects on the face. Do you have any suggestions for optimization? Thank you.

Video pre-processing step

I see the pre-generated crops of faces from individual frames are currently resized to N x N.

Will aligning the faces in the same video clip using the facial landmarks that are output during the landmark detection help make a better landmark generator?

large content landmarks

Nice work!

trying to train IP_LAP with custom data, but got results: content landmark is generally larger than the pose landmark. => therefore there is a mismatch. However, if I use pretrained model , the size of resulting content landmark is correct.

training dataset: 5 min 480 x 640 video

截屏2023-06-02 21 23 37

The talking head speaks at silence parts

Hi,

it's me again.

I have successfully train a talking head using your repo.

However, I noticed there are some bad cases: when the audio is actually silent, the lips of talking head still move and open mouth.

Any insights about this? thanks.

Setup:

  • dataset used for training: 30 hours

why dataloader is too slow?

Thanks for your work!
When I run the training code, the process is too slow,and I found the time is mainly spent on the dataloader.
How to solve the problem?

AttributeError: _2D

Hello I recently installed with no issues but when I run the test on windows annaconda I get the following

(iplip) C:\Users\leolo\IP_LAP>python inference_single.py
Traceback (most recent call last):
  File "inference_single.py", line 34, in <module>
    fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False, device='cuda')
  File "C:\Users\leolo\anaconda3\envs\iplip\lib\enum.py", line 354, in __getattr__
    raise AttributeError(name) from None
AttributeError: _2D

All my versions seems correct according to the repo and the requirments.txt, the only thing i am doing differently is using
python inference_single.py
as when i use
CUDA_VISIBLE_DEVICES=0 python inference_single.py
i get the following error

(iplip) C:\Users\leolo\IP_LAP>CUDA_VISIBLE_DEVICES=0 python inference_single.py
'CUDA_VISIBLE_DEVICES' is not recognized as an internal or external command,
operable program or batch file.

训练video renderer的时候进度条卡住

非常感谢作者您为社区带来的开源贡献!我遇到了一个问题,在训练video renderer的时候数据加载完毕之后卡住了,如下图,我只有一个gpu,执行的命令是:python train_video_renderer.py --sketch_root .\output\lrs2_sketch --face_img_root .\output\lrs2_face --audio_root .\output\lrs2_audio windows系统
image

train, inference code and pretrained weight

Thank you for your hard work on this project! I'm excited to see the code and would love to know when it will be made available to the public. Do you have an estimated timeline for when this might happen?

finetune question

我最近阅读了您的论文,并发现它对该领域做出了有益的贡献。感谢您宝贵的研究成果。

我想知道这个提供的在英文数据集上的预训练模型,是否可以在中文数据集上微调。

是否具有可行性和有效性,是否有任何见解或建议。在将这些模型应用于中文语言时,是否存在任何挑战或需要注意的事项?此外,如果您对于在中文数据上微调的具体技术或方法有任何建议,我将非常感谢您的见解。

not detect face

En: When I run CUDA_VISIBLE_DEVICES=0 python inference_single.py, if the face in the frame is particularly large, the face detection works fine. However, when the face in the frame is relatively small, it throws an error "not detect face" with the following code:

inference_single.py code line 254: with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True, min_detection_confidence=0.5) as face_mesh:

I have even tried changing min_detection_confidence to a very small value like 0.01, but it still cannot detect the face. How can I modify it? Thank you for creating such an amazing project. I really appreciate it.

中文:当我运行了CUDA_VISIBLE_DEVICES=0 python inference_single.py之后,如果人脸在画面中特别大,人脸检测正常,当人脸在画面比例较小时候,就会报错not detect face,with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True,
min_detection_confidence=0.5) as face_mesh:我把min_detection_confidence改成特别小的数值也不行,比如0.01,还是检测不到人脸。怎么修改?感谢你们写了这么棒的项目。非常感谢

the lip shakes heavily after training

Hello, I trained the landmarks model by the videos I collected from bilibili, running_L1_loss is 0.0066 and running_velocity_loss is 0.0048. but in the inference result, the lip shakes heavily and mismatch with the upper face

4result_N_25_Nl_15.mp4

could you give me some suggestions to fix that? Thanks very much!

is this compare with lsp?

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation. i mean, the method seems be very familiar. and inference fps of 3090ti is?

Can share the log in the process of training video_renderer or give some guidance in the loss change?

Hi,
Thanks for sharing code and your contribution in the field of talking head generation.
I am confused about the training loss of the video_renderer. I have trained the video_renderer in LRS2 dataset which passed 24 epoches, 33220 steps, but the running_gen_loss seemed change randomly. The logs are as follow:
image
Is that normal or I did something wrong with that? Looking for your reply! Thanks a lot!

train video render

你好,我想训练一个只适用于某特定人物的模型,请问训练video render时数据集的时长以及多样性是否有要求?

The lip has a little shaking

Thanks for the awesome work!
I have 2 questions:

  1. In the generated videos, the lip has a little shaking, do you have any suggestions to improve it?
  2. Can we change the sketch manually during the salient part to make the mouth close?

调用load_checkpoint导致的问题

我在训练 landmark_generator时,为了能够从上次中断的地方继续训练,发现调用 load_checkpoint()传入的参数 reset_optimizer=False时,会出现下面的错误

开始 landmark_generator_training******************
Project_name: landmarks
Load checkpoint from: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
Load optimizer state from ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49644/49644 [00:04<00:00, 11383.00it/s]
complete,with available vids: 49475

init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 11265.29it/s]
complete,with available vids: 9976

0%| | 0/30 [00:00<?, ?it/s]Saved checkpoint: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch1166_step000035000.pth
Evaluating model for 25 epochs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.62s/it]
eval_L1_loss 0.005300633320584894 global_step: 35000█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.63s/it]
eval_velocity_loss 0.04097183309495449 global_step: 35000
0%| | 0/30 [02:56<?, ?it/s]
Traceback (most recent call last):
File "train_landmarks_generator.py", line 341, in
optimizer.step()
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 23, in use_grad
ret = func(self, *args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 252, in step
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 316, in adam
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 363, in single_tensor_adam
exp_avg.mul
(beta1).add
(grad, alpha=1 - beta1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

如果改为 reset_optimizer=True,可以正常训练,但是生成的 pth文件大小和之前不太一样,不知道为啥会相差十几K,这个正常吗

-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 02:20 landmarks_epoch_166_checkpoint_step000005000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 03:40 landmarks_epoch_333_checkpoint_step000010000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 05:00 landmarks_epoch_500_checkpoint_step000015000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 06:21 landmarks_epoch_666_checkpoint_step000020000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 07:41 landmarks_epoch_833_checkpoint_step000025000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 09:01 landmarks_epoch_1000_checkpoint_step000030000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 10:21 landmarks_epoch_1166_checkpoint_step000035000.pth

调用 load_checkpoint() 传入 reset_optimizer=True

-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:36 landmarks_epoch1199_step000036000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:54 landmarks_epoch1232_step000037000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:01 landmarks_epoch1266_step000038000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:19 landmarks_epoch1299_step000039000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:38 landmarks_epoch1332_step000040000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:56 landmarks_epoch1366_step000041000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:15 landmarks_epoch1399_step000042000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:33 landmarks_epoch1432_step000043000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:51 landmarks_epoch1466_step000044000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:10 landmarks_epoch1499_step000045000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:28 landmarks_epoch1532_step000046000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:47 landmarks_epoch1566_step000047000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 18:05 landmarks_epoch1599_step000048000.pth

Videos at 25fps tend to be choppy by default

I'm currently working on a project where I'm utilizing your method. I've observed that the method is configured to operate at 25 frames per second (fps), and I'm trying to understand the rationale behind this choice.

I have a collection of videos in my dataset that are recorded at 30fps, which is a standard frame rate for many recording devices including the smartphone.

However, when I downsample these videos (with ffmpeg) from 30fps to 25fps in order to match the method's operating rate, the resultant videos appear very choppy and lack smoothness. I have tried adding motion blur between the frames, but without much success.

Are there specific reasons why the method is set to work at 25fps instead of the more commonly used 30fps? Would it be possible to modify the method to operate at 30fps without significantly impacting its performance or the results?

Additionally, I would appreciate any suggestions on how to prevent the loss of smoothness when downsampling videos from 30fps to 25fps.

Slow Training Speed on LRS2 Dataset with 4x RTX 4090 GPUs,(train_video_renderer.py)

I attempted to run train_video_renderer.py on the LRS2 dataset using four RTX 4090 GPUs, but the training speed is exceptionally slow. In a previous issue, I noticed that the author suggested running approximately 300 epochs for optimal results. However, the speed I'm experiencing is much lower than expected. Does anyone has same issue?
image

Inquiries about training operations

I prepared the materials required for training and executed the training file, and the following error occurred, I don't know how to solve it, please advise

I have installed the environment I need for the program to run

When I execute the training code, something like this happens:
af671abd56642efb93e51a9b673a3887

I noticed that the program filtered out all my footage, so I changed the min_len to 0
90154d88fa386c8e5bfc7d923ff7f946

As you can see, he will make a mistake
fb53fe30ede8be58a0b74912c37281cc

The error of ValueError: Caught ValueError in DataLoader worker process 0. can be modified to 0 num_workers removed, but obviously this is not appropriate
29947a6f1b98cf479a1b15471ff52cc2

train landmark generator

Hi,请问在训练landmark generator的时候,是需要手动停止吗?因为我看这个while循环的意思似乎没有break的逻辑:

https://github.com/Weizhi-Zhong/IP_LAP/blob/main/train_landmarks_generator.py#L303

如果是这样的话,请问给出的预训练模型是在训练到了多少个step才停下来的 、训练了多久呢 ?感谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.