My implementation of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (Egor Zakharov et al.).

License: GNU General Public License v3.0

Python 100.00%

realistic-neural-talking-head-models's Issues

errors occur when running embedder_inference.py

Traceback (most recent call last):
File "Realistic/embedder_inference.py", line 27, in
frame_mark_video = frame_mark_video.transpose(2,4).to(device) #T,2,3,256,256
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

Convert caffe model to PyTorch model

When I run the last command to convert caffe model to pytorch model: "mmtomodel -f pytorch -in Pytorch_VGGFACE_IR.py -iw Pytorch_VGGFACE_IR.npy -o Pytorch_VGGFACE.pth", I met the following problem:

Traceback (most recent call last):
File "/root/anaconda3/bin/mmtomodel", line 10, in
sys.exit(_main())
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/_script/dump_code.py", line 79, in _main
ret = dump_code(args.framework, args.inputNetwork, args.inputWeight, args.outputModel, args.dump_tag)
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/_script/dump_code.py", line 32, in dump_code
save_model(MainModel, network_filepath, weight_filepath, dump_filepath)
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/pytorch/saver.py", line 5, in save_model
model = MainModel.KitModel(weight_filepath)
File "Pytorch_VGGFACE_IR.py", line 27, in init
self.conv1_1 = self.__conv(2, name='conv1_1', in_channels=3, out_channels=64, kernel_size=(3, 3), stride=(1, 1), groups=1, bias=True)
File "Pytorch_VGGFACE_IR.py", line 129, in _conv
layer.state_dict()['bias'].copy(torch.from_numpy(__weights_dict[name]['bias']))
RuntimeError: output with shape [64] doesn't match the broadcast shape [1, 1, 1, 64]

Any ideas to solve this problem?

Weird results on example images

Hello,

Thank you for sharing the code. I have tried to run it on example image from https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/tree/master/examples/fine_tuning/test_images .

Even after fine tuning finetuning_training.py it produces quite strange not-realistic results with provided pretrained model:

Any idea what might goes wrong?

Thank you in advance.

How can I obtain my results from the pre-trained model?

How can I obtain my results from the pre-trained model? And what meanings are the samples for each image? I'm confused about which one is the gt and which one is the output? Is there a landmark here? Thank you very much!

Generating video for own image

I see that embedder_inference.py and finetuning_training.py both use test image and video of a person in the fake. I would like to generate fake video for own image of a person using the only image, I have no video. How I can do it?

Where can I obtain e_hat.tar used in webcam demo?

About results

Does your embedder work well? After training for 300+epochs, I find the embedder totally didn't work and the results only depend on the input of the generator.

embedder_inference.py errors

Hello,

I have followed all of the steps you outlined in the README and I believe everything is set up properly. When I try to run embedder_inference.py I just get a bunch of errors, like so:

Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
/home/tomljr2/UF/project/dataset/video_extraction_conversion.py:118: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
fig = plt.figure(figsize=(input.shape[1]/dpi, input.shape[0]/dpi), dpi = dpi)
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Traceback (most recent call last):
File "embedder_inference.py", line 25, in
frame_mark_video = generate_cropped_landmarks(frame_mark_video, pad=50)
File "/home/tomljr2/UF/project/dataset/video_extraction_conversion.py", line 151, in generate_cropped_landmarks
frame_landmark_list.append(frame_landmark_list[i])
IndexError: list index out of range

Any idea what is causing this?

Also if you could detail the flow of how you actually use the program once I've got it working, I would really appreciate it because I didn't see much in the README.

Thanks!

Could you create a COLAB Notebook?

question about training

Hi,

Why not update the discriminator parameters when training the generator?

problems in pretrained model

Thanks for your pretrained model on Google Drive. I download it but failed to unzip it. The decompression software fialed to unzip it, Dose any one meet this problem?

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

File "train.py", line 91, in
for i_batch, (f_lm, x, g_y, i) in enumerate(dataLoader, start=i_batch_current):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/shared/Realistic-Neural-Talking-Head-Models-master/dataset/dataset_class.py", line 42, in getitem
frame_mark = frame_mark.transpose(2,4).to(self.device) #K,2,3,224,224
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

Downloaded the vggceleb2 set and tried to run the train.py. Kept getting this error. Tried this on the test vggset and same error.

Also when running webcam_inference, getting only a black screen on the fake image.

Can't see any fake face

Hi.

I run webcam_inference.py file and I got 3 windows which are fake, me, landmark.
But only me and landmarks show. Any fake face wasn't made.
How can I fix this?

Error when running the "finetuning_training.py"

Hi, thanks for opening your work.

I converted caffe models and run the embedder_inference without any problem.

But when I run the "finetuning_training.py", the following error happens.

Traceback (most recent call last): File "finetuning_training.py", line 99, in <module> lossG = criterionG(x, x_hat, r_hat, D_res_list, D_hat_res_list) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/Realistic-Neural-Talking-Head-Models/loss/loss_generator.py", line 183, in forward loss_cnt = self.LossCnt(x, x_hat) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/Realistic-Neural-Talking-Head-Models/loss/loss_generator.py", line 87, in forward self.VGG19(x) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torchvision/models/vgg.py", line 44, in forward x = self.classifier(x) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 67, in forward return F.linear(input, self.weight, self.bias) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1352, in linear ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t()) RuntimeError: size mismatch, m1: [1 x 32768], m2: [25088 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

Do you have any idea what went wrong?

Generate Noises

Hi, thanks for your work on reproducing this paper.

During the test, I ran the embedder_inference.py, without fine-tuning, and ran video_inference.py
but I got a noise output from models.

Any suggestions? Thanks!

Questions about matching loss

Hello~I have two questions about matching loss:

In my opinion, the matrix W and the pose-ignorance code are acutually used in two ways. W is used as the projected embeddings of classes in projection cGAN, while the other serves as the "identity code". However, i don't actually understand what it means that we choose to make Wi closer to the identity code, and how it'll affect the fine-tuning process.
the dataset used in fine-tuning: if we've meta-trained the network with large dataset and we would like to fine-tune it with one personal image (One-shot), does that means that we only have one training data in fine-tuning phase? Then how can we train the network in cGAN way? Consisitently feeding this data into D as real data?

I will appreciate that if you can share your opinions with me~

Regarding Question about discriminator last layer

hi,
there is no activation function at end of discriminator
https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/network/model.py#L246

Is there any reason not putting activation(sigmod,tanh)?

Instance normalization is used only in generator's downsampling

Seeing current implementation of this repository, discriminator has instance normalization, but it is against architecture on the paper.

According to the paper,

We base our generator network G(yi(t), eˆi; ψ, P) on the
image-to-image translation architecture proposed by Johnson et. al. [20], but replace downsampling and upsampling layers with residual blocks similarly to [6] (with batch normalization [16] replaced by instance normalization [38])

ional part of the discriminator V (xi(t), yi(t); θ), we use similar networks, which consist of residual downsampling blocks (same as the ones used in the generator, but without normalization layers)

Training on Multiple GPUs

Hello, thanks for the model and the explanations, this is very helpful for my research. I obtained very good results from (fine-tuned) pre-trained model and I want to increase the quality by training the model with the whole VoxCeleb2 dataset. I have all the data set folder prepared but I couldn't run the training on multiple GPUs. I have 2 GPUs and I want to use both of them since one GPU can not carry that much data and gives RunTimeError: CUDA out of memory.
How should I modify the code in order to take advantage of both GPUs on my system?
Any help would be appreciated.
Thank you

How can I get the images in example dir . ？

I tried the code in finetuning_training.py ,but it did not save images.Is there some other codes?

donations?

How can we donate money to help fund development of this project?

thanks

What GPU did you use?

I have tested code with NVIDIA TITAN RTX (24GB GPU RAM), but OOM occured.
What GPU did you use to train network?

about training data samplers

as original paper saied 'The generator G(yi(t); e^i; ; P) takes the landmark image yi(t) for the video frame not seen by the embedder,the predicted video embedding e^i and outputs a synthesized video frame x^i(t). ', however, in your dataloader part, the image used in generator was appeared in embedder?

numpy 1.16.1

I'm having an issue installing version 1.16.1. I'm getting an error stating that that version cannot be imported because it does not exist.

Is there no way to get the pickle working with 1.16.4?

Bilinear Down/Upsampling

Thanks a lot for the great efforts!

Although the Samsung paper didn't mention the interpolation method, according to the style-gan paper, they actually used bilinear interpolation for up/downsampling operations. When I train using your code, I have seen the checkerboard effect a lot especially in the early training stage. Have you considered using the bilinear interpolation method for the upsampling function?

Mismatch between the architecture of Generator.py and the "G_state_dict" in pretrained "model_weights.tar"

Hello, thanks for your wonderful reproduction first!
However, when I try to run "finetune_training.py" to finetune the model with pretrained "model_weights.tar", some error arise:

RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "conv2d.weight", "conv2d.bias".
Unexpected key(s) in state_dict: "resDown5.conv_l1.bias", "resDown5.conv_l1.weight_orig", "resDown5.conv_l1.weight_u", "resDown5.conv_l1.weight_v", "resDown5.conv_r1.bias", "resDown5.conv_r1.weight_orig", "resDown5.conv_r1.weight_u", "resDown5.conv_r1.weight_v", "resDown5.conv_r2.bias", "resDown5.conv_r2.weight_orig", "resDown5.conv_r2.weight_u", "resDown5.conv_r2.weight_v", "in5.weight", "in5.bias", "resDown6.conv_l1.bias", "resDown6.conv_l1.weight_orig", "resDown6.conv_l1.weight_u", "resDown6.conv_l1.weight_v", "resDown6.conv_r1.bias", "resDown6.conv_r1.weight_orig", "resDown6.conv_r1.weight_u", "resDown6.conv_r1.weight_v", "resDown6.conv_r2.bias", "resDown6.conv_r2.weight_orig", "resDown6.conv_r2.weight_u", "resDown6.conv_r2.weight_v", "in6.weight", "in6.bias", "resUp5.conv_l1.bias", "resUp5.conv_l1.weight_orig", "resUp5.conv_l1.weight_u", "resUp5.conv_l1.weight_v", "resUp5.conv_r1.bias", "resUp5.conv_r1.weight_orig", "resUp5.conv_r1.weight_u", "resUp5.conv_r1.weight_v", "resUp5.conv_r2.bias", "resUp5.conv_r2.weight_orig", "resUp5.conv_r2.weight_u", "resUp5.conv_r2.weight_v", "resUp6.conv_l1.bias", "resUp6.conv_l1.weight_orig", "resUp6.conv_l1.weight_u", "resUp6.conv_l1.weight_v", "resUp6.conv_r1.bias", "resUp6.conv_r1.weight_orig", "resUp6.conv_r1.weight_u", "resUp6.conv_r1.weight_v", "resUp6.conv_r2.bias", "resUp6.conv_r2.weight_orig", "resUp6.conv_r2.weight_u", "resUp6.conv_r2.weight_v".

I check the code in network/model.py, and find that there are only 4 ResUpblocks in Generator. But in the "G_state_dict" of "model_weights.tar", it has 6 ResUpblocks in Generator. What's more, "G_state_dict" of "model_weights.tar" lacks keys "conv2d.weight", "conv2d.bias".
Maybe you have uploaded a mismatch pretrained model checkpoint with different generator architecture?

Thanks for your nice reproduction again and long for your reply

How to use webcam inference for mp4

Hi,
I try to use webcam_inference.py to inference a mp4 and get every frame outputs(fake/me/landmarks), but I got all outputs in full black color.
Process:
1、download the pretrained model.
2、run embedder_inference.py to get e_hat_images.tar
3、modify “cap = cv2.VideoCapture(0)” to “cap = cv2.VideoCapture(xxx.mp4)” in webcam_inference.py
4、run webcam_inference.py to get outputs(fake/me/landmarks)

Any suggestions? Thanks.

Cannot decompress pretrained weights

Pretrained weights on Google Drive cannot be decompressed on MacOS Mojave

$ tar xvf model_weights.tar
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.

$ tar --version
bsdtar 2.8.3 - libarchive 2.8.3

Some issue i met on this repo

Thanks for the excellent work! but i found some bugs may exist in this repo:

preprocess.py do you mean use generate_crop_landmarks rather than generate_landmarks?
also, you may need to calculate image IOU since the cropping image may come from different people in the same video. It may better not just simply use fd.preds[0].

Error while loading the pretrained model in video_inference.py

Seems like a mismatch between the pretrained model and the network architecture provided . Can u please check ?

Traceback (most recent call last):
File "/home/vishnu/Realistic-Neural-Talking-Head-Models/video_inference.py", line 35, in
G.load_state_dict(checkpoint['G_state_dict'])
File "/home/vishnu/miniconda3/envs/neural-talk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "conv2d.weight", "conv2d.bias".
Unexpected key(s) in state_dict: "resDown5.conv_l1.bias", "resDown5.conv_l1.weight_orig", "resDown5.conv_l1.weight_u", "resDown5.conv_l1.weight_v", "resDown5.conv_r1.bias", "resDown5.conv_r1.weight_orig", "resDown5.conv_r1.weight_u", "resDown5.conv_r1.weight_v", "resDown5.conv_r2.bias", "resDown5.conv_r2.weight_orig", "resDown5.conv_r2.weight_u", "resDown5.conv_r2.weight_v", "in5.weight", "in5.bias", "resDown6.conv_l1.bias", "resDown6.conv_l1.weight_orig", "resDown6.conv_l1.weight_u", "resDown6.conv_l1.weight_v", "resDown6.conv_r1.bias", "resDown6.conv_r1.weight_orig", "resDown6.conv_r1.weight_u", "resDown6.conv_r1.weight_v", "resDown6.conv_r2.bias", "resDown6.conv_r2.weight_orig", "resDown6.conv_r2.weight_u", "resDown6.conv_r2.weight_v", "in6.weight", "in6.bias", "resUp5.conv_l1.bias", "resUp5.conv_l1.weight_orig", "resUp5.conv_l1.weight_u", "resUp5.conv_l1.weight_v", "resUp5.conv_r1.bias", "resUp5.conv_r1.weight_orig", "resUp5.conv_r1.weight_u", "resUp5.conv_r1.weight_v", "resUp5.conv_r2.bias", "resUp5.conv_r2.weight_orig", "resUp5.conv_r2.weight_u", "resUp5.conv_r2.weight_v", "resUp6.conv_l1.bias", "resUp6.conv_l1.weight_orig", "resUp6.conv_l1.weight_u", "resUp6.conv_l1.weight_v", "resUp6.conv_r1.bias", "resUp6.conv_r1.weight_orig", "resUp6.conv_r1.weight_u", "resUp6.conv_r1.weight_v", "resUp6.conv_r2.bias", "resUp6.conv_r2.weight_orig", "resUp6.conv_r2.weight_u", "resUp6.conv_r2.weight_v".
size mismatch for p: copying a param with shape torch.Size([17158, 512]) from checkpoint, the shape in current model is torch.Size([13184, 512]).
size mismatch for psi: copying a param with shape torch.Size([17158, 1]) from checkpoint, the shape in current model is torch.Size([13184, 1]).
size mismatch for resUp1.conv_l1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_l1.weight_orig: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for resUp1.conv_l1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r1.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for resUp1.conv_r1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for resUp1.conv_r2.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([2304]).
size mismatch for resUp2.conv_l1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_l1.weight_orig: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
size mismatch for resUp2.conv_l1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_l1.weight_v: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp2.conv_r1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r1.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]).
size mismatch for resUp2.conv_r1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r1.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([2304]).
size mismatch for resUp2.conv_r2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r2.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for resUp2.conv_r2.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r2.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for resUp3.conv_l1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_l1.weight_orig: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
size mismatch for resUp3.conv_l1.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_l1.weight_v: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp3.conv_r1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r1.weight_orig: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 128, 3, 3]).
size mismatch for resUp3.conv_r1.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r1.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for resUp3.conv_r2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r2.weight_orig: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for resUp3.conv_r2.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r2.weight_v: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([576]).
size mismatch for resUp4.conv_l1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_l1.weight_orig: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 64, 1, 1]).
size mismatch for resUp4.conv_l1.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_l1.weight_v: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp4.conv_r1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r1.weight_orig: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 64, 3, 3]).
size mismatch for resUp4.conv_r1.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r1.weight_v: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([576]).
size mismatch for resUp4.conv_r2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r2.weight_orig: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for resUp4.conv_r2.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r2.weight_v: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([288]).

Generator architecture doesn't match the one described in paper

First, thanks a lot for the sharing of your implementation.

But I notice there is an architecture mismatch in your code and the original paper:

Realistic-Neural-Talking-Head-Models/network/model.py

Lines 156 to 173 in e43ca9f

 out = self.resUp1(out, e_psi[:, self.slice_idx[5]:self.slice_idx[6], :]) 

 out = self.resUp2(out, e_psi[:, self.slice_idx[6]:self.slice_idx[7], :]) 

 out = self.self_att_Up(out) 

 out = self.resUp3(out, e_psi[:, self.slice_idx[7]:self.slice_idx[8], :]) 

 out = self.resUp4(out, e_psi[:, self.slice_idx[8]:self.slice_idx[9], :]) 

 out = adaIN(out, 

 e_psi[:, 

 self.slice_idx[9]:(self.slice_idx[10]+self.slice_idx[9])//2, 

 :], 

 e_psi[:, 

 (self.slice_idx[10]+self.slice_idx[9])//2:self.slice_idx[10], 

 :] 

 )

you use AdaIN in the decode part of generator, while in the original paper they use IN.

Is this change made for better result?

About generating images

According to the provided source code and weight, using the VoxCeleb2 test set for training, the image generation is not good, and the image details are seriously lost. How long do you train to get the training effect in the example you provided?

pretrained model

can you share your pretrained model and testing code

Error when loading images in embedder_inference.py

I'm running the code on a Colab notebook with the provided test video and images and I get this error. It also happens when I provide my own images. Help?

Traceback (most recent call last):
File "embedder_inference.py", line 30, in
frame_mark_images = select_images_frames(path_to_images)
File "/content/Realistic-Neural-Talking-Head-Models/dataset/video_extraction_conversion.py", line 101, in select_images_frames
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.1.2) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed)

caffe-cuda can't be installed on Ubuntu16

hi Vincent, thanks for your great contribution.

I'm using Ubuntu16 and can't run "sudo apt install caffe-cuda"

the reason seems to be caffe-cuda is only supported on Ubuntu>17.04

could you please kindly tell me if there is some other way to replace "sudo apt install caffe-cuda"?

thank you very much!

Question about the input of the model.

Hi, thank you for your effort to reimplement the algorithm. But I want to know why did you feed the model with RGB image (0-255) directly instead of preprocessing by transform.ToTensor()?

Preliminary landmarks extraction

Hello!
Many thanks for your work.

One question. Are you planning to add preprocessing of landmarks?
As far as I understand, now the extraction of landmarks occurs during training and this increases the time for batch (4.5 seconds for me). It would be more efficient to extract the landmarks once and then just load them during training.

very awesome work, how about your test result.

Thank for your work, completely implement the all details of paper.
I am curious about your result

Cannot reproduce the results

Hello! I myself have also tried to reproduce this paper. However, with very similar network architecture and AdaIN settings, I can only achieve low-resolution faces placed on very fuzzy background. Training for more epoches can not improve performance. I am shocked by your example images, but they can not be reproduced by your GitHub code. After carefully reading all the code, I found that there may be some mistakes in your https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/loss/loss_generator.py#L30

   def vgg_x_hook(module, input, output):
       vgg_x_features.append(output.data)
   def vgg_xhat_hook(module, input, output):
      vgg_xhat_features.append(output.data)

The output.data operation seems to be cutting off the gradient flow, making the content loss useless.

I also would like to ask whether there exists any other hidden tricks in training, for example Adam momentum or the depth of generator. Thank you!

weird results on pretrained model

Thanks for your work!
I have tested the examples on the pretrained model.
I run embedder_inference.py firstly, the second is finetuning_training.py, the third is video_inference.py.
However, the generated results are noise, which is not a face. Could you help me why this happened?

finetuning training problem

thank you for your open project.
i have the problem when i run the finetuning_training.py

Traceback (most recent call last):
File "finetuning_training.py", line 99, in
lossG = criterionG(x, x_hat, r_hat, D_res_list, D_hat_res_list)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/workspace/workspace_lsh/face/headmodel/loss/loss_generator.py", line 157, in forward
loss_cnt = self.LossCnt(x, x_hat)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/workspace/workspace_lsh/face/headmodel/loss/loss_generator.py", line 65, in forward
self.VGG19(x)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torchvision/models/vgg.py", line 39, in forward
x = self.classifier(x)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/functional.py", line 1406, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [1 x 32768], m2: [25088 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268

embedder_inference.py crash

Hi, when I run the embedder_inference.py and it always crash on line 27
frame_mark_video = torch.from_numpy(np.array(frame_mark_video)).type(dtype = torch.float) #T,2,256,256,3

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1741, in
main()
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/_pydev_imps/pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/danieltian/Desktop/Code/github_try/Realistic-Neural-Talking-Head-Models/embedder_inference.py", line 29, in
temp1 = torch.from_numpy(temp)
TypeError: can't convert np.ndarray of type numpy.object. The only supported types are: float64, float32, float16, int64, int32, int16, int8, and uint8.

It seams that the value frame_mark_video returned from generate_cropped_landmarks is list of tuple, and it can not be converted to numpy array of float directly. Could you please fix this problem?
Thanks in advance.

Question for embedeer_inference.py

Hi.

I'm a newbie of PyTorch.
I run the embedder_inference.py with new test image and your mp4 file.
But the code save only 2 tar files. How to check the result mp4 file from this code?

Thanks

webcam_inference.py fake image is black

Looking at the code for webcam_inference and the fake image is black for me. I see the source for out1 has been commented out which gets it's source from the path_to_embedding which is the e_hat_video.tar.

Is that image or video sourced from the examples folder when embedder_inference is run? It seems like it is able to access the examples and save the tar files when I run it, but they end up being just 3kb in size.

Some guidance on what does what would be appreciated.

About the preprocessed dataset

Thank you for the perfect project, however, the dataset is so big, it's difficult to process, could you update the preprocessed dataset to Google Drive? thank you again.

sincerely yours,
Martin

about requirements.txt

Can you provide a requirements.txt？

I'm trying to replicate this project,but there's a lack of environment, and I'm worried about potential bugs caused by version problems.

Maybe you can use pipreqs to generate requirements.txt
pip install pipreqs
pipreqs ./

embedder_inference.py default code error

Hi,

first, thank you for your work, it is awesome. When I am trying to generate the e_hats for default testing with the webcam, I got the following error:

Traceback (most recent call last):
  File "/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/embedder_inference.py", line 26, in <module>
    frame_mark_video = torch.from_numpy(np.array(frame_mark_video)).type(dtype = torch.float) #T,2,256,256,3
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

I use torch 1.0.0 and 1.2.0 and numpy 1.16.4. I have the path to the image and to the video like this:

"""Hyperparameters and config"""
device = torch.device("cuda:0")
cpu = torch.device("cpu")
path_to_e_hat_video = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/test/e_hat_video.tar'
path_to_e_hat_images = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/test/e_hat_images.tar'
path_to_chkpt = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/pretrained/model_weights.tar'
path_to_video = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/examples/fine_tuning/test_video.mp4'
path_to_images = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/examples/fine_tuning/test_images'
T = 32

is there anything I am missing?

	out = self.resUp1(out, e_psi[:, self.slice_idx[5]:self.slice_idx[6], :])

	out = self.resUp2(out, e_psi[:, self.slice_idx[6]:self.slice_idx[7], :])

	out = self.self_att_Up(out)

	out = self.resUp3(out, e_psi[:, self.slice_idx[7]:self.slice_idx[8], :])

	out = self.resUp4(out, e_psi[:, self.slice_idx[8]:self.slice_idx[9], :])

	out = adaIN(out,
	e_psi[:,
	self.slice_idx[9]:(self.slice_idx[10]+self.slice_idx[9])//2,
	:],
	e_psi[:,
	(self.slice_idx[10]+self.slice_idx[9])//2:self.slice_idx[10],
	:]
	)

vincent-thevenin / realistic-neural-talking-head-models Goto Github PK

realistic-neural-talking-head-models's Issues

Recommend Projects

Recommend Topics

Recommend Org