Git Product home page Git Product logo

realistic-neural-talking-head-models's Introduction

Realistic-Neural-Talking-Head-Models

My implementation of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (Egor Zakharov et al.). https://arxiv.org/abs/1905.08233

Fake1 Real1

Fake2 Real2

Inference after 5 epochs of training on the smaller test dataset, due to a lack of compute ressources I stopped early (author did 75 epochs with finetuning method and 150 with feed-forward method on the full dataset).

IMAGE ALT TEXT HERE

Prerequisites

1.Loading and converting the caffe VGGFace model to pytorch for the content loss:

Follow these instructions to install the VGGFace from the paper (https://arxiv.org/pdf/1703.07332.pdf):

$ wget http://www.robots.ox.ac.uk/~vgg/software/vgg_face/src/vgg_face_caffe.tar.gz
$ tar xvzf vgg_face_caffe.tar.gz
$ sudo apt install caffe-cuda
$ pip install mmdnn

Convert Caffe to IR (Intermediate Representation)

$ mmtoir -f caffe -n vgg_face_caffe/VGG_FACE_deploy.prototxt -w vgg_face_caffe/VGG_FACE.caffemodel -o VGGFACE_IR

If you have a problem with pickle, delete your numpy and reinstall numpy with version 1.16.1

IR to Pytorch code and weights

$ mmtocode -f pytorch -n VGGFACE_IR.pb --IRWeightPath VGGFACE_IR.npy --dstModelPath Pytorch_VGGFACE_IR.py -dw Pytorch_VGGFACE_IR.npy

Pytorch code and weights to Pytorch model

$ mmtomodel -f pytorch -in Pytorch_VGGFACE_IR.py -iw Pytorch_VGGFACE_IR.npy -o Pytorch_VGGFACE.pth

At this point, you will have a few files in your directory. To save some space you can delete everything and keep Pytorch_VGGFACE_IR.py and Pytorch_VGGFACE.pth

2.Libraries

  • face-alignment
  • torch
  • numpy
  • cv2 (opencv-python)
  • matplotlib
  • tqdm

3.VoxCeleb2 Dataset

The VoxCeleb2 dataset has videos in zip format. (Very heavy 270GB for the dev one and 8GB for the test) http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html

4.Optional, my pretrained weights

Available at https://drive.google.com/open?id=1vdFz4sh23hC_KIQGJjwbTfUdPG-aYor8

How to use:

  • modify paths in params folder to reflect your path
  • preprocess.py: preprocess our data for faster inference and lighter dataset
  • train.py: initialize and train the network or continue training from trained network
  • embedder_inference.py: (Requires trained model) Run the embedder on videos or images of a person and get embedding vector in tar file
  • fine_tuning_trainng.py: (Requires trained model and embedding vector) finetune a trained model
  • webcam_inference.py: (Requires trained model and embedding vector) run the model using person from embedding vector and webcam input, just inference
  • video_inference.py: just like webcam_inference but on a video, change the path of the video at the start of the file

Architecture

I followed the architecture guidelines from the paper on top of details provided by M. Zakharov.

The images that are fed from voxceleb2 are resized from 224x224 to 256x256 by using zero-padding. This is done so that spatial dimensions don't get rounded when passing through downsampling layers.

The residuals blocks are from LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS(K. S. Andrew Brock, Jeff Donahue.).

Embedder

The embedder uses 6 downsampling residual blocks with no normalisation. A self-attention layer is added in the middle. The output from the last residual block is resized to a vector of size 512 via maxpooling.

Generator

The downsampling part of the generator uses the same architecture as the embedder with instance normalization added at each block following the paper.

The same dimension residual part uses 5 blocks. These blocks use adaptive instance normalization. Unlike the AdaIN paper(Xun Huang et al.) where the alpha and beta learnable parameters from instance normalisation are replaced with mean and variance of the input style, the adaptative parameters (mean and variance) are taken from psi. With psi = P*e, P the projection matrix and e the embedding vector calculated by the embedder.

(P is of size 2*(512*2*5 + 512*2 + 512*2+ 512+256 + 256+128 + 128+64 + 64+3) x 512 = 17158 x 512)

There are then 6 upsampling residual blocks. The final output is a tensor of dimensions 3x224x224. I rescale the image using a sigmoid and multiplying by 255. There are two adaIN layers in each upsampling block (they replace the normalisation layers from the Biggan paper).

Self-attention layers are added both in the downsampling part and upsampling part of the generator.

Discriminator

The discriminator uses the same architecture as the embedder.

realistic-neural-talking-head-models's People

Contributors

cclauss avatar nwatab avatar vincent-thevenin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

realistic-neural-talking-head-models's Issues

How can I obtain my results from the pre-trained model?

How can I obtain my results from the pre-trained model? And what meanings are the samples for each image? I'm confused about which one is the gt and which one is the output? Is there a landmark here? Thank you very much!

problems in pretrained model

Thanks for your pretrained model on Google Drive. I download it but failed to unzip it. The decompression software fialed to unzip it, Dose any one meet this problem?

Error while loading the pretrained model in video_inference.py

Seems like a mismatch between the pretrained model and the network architecture provided . Can u please check ?

Traceback (most recent call last):
File "/home/vishnu/Realistic-Neural-Talking-Head-Models/video_inference.py", line 35, in
G.load_state_dict(checkpoint['G_state_dict'])
File "/home/vishnu/miniconda3/envs/neural-talk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "conv2d.weight", "conv2d.bias".
Unexpected key(s) in state_dict: "resDown5.conv_l1.bias", "resDown5.conv_l1.weight_orig", "resDown5.conv_l1.weight_u", "resDown5.conv_l1.weight_v", "resDown5.conv_r1.bias", "resDown5.conv_r1.weight_orig", "resDown5.conv_r1.weight_u", "resDown5.conv_r1.weight_v", "resDown5.conv_r2.bias", "resDown5.conv_r2.weight_orig", "resDown5.conv_r2.weight_u", "resDown5.conv_r2.weight_v", "in5.weight", "in5.bias", "resDown6.conv_l1.bias", "resDown6.conv_l1.weight_orig", "resDown6.conv_l1.weight_u", "resDown6.conv_l1.weight_v", "resDown6.conv_r1.bias", "resDown6.conv_r1.weight_orig", "resDown6.conv_r1.weight_u", "resDown6.conv_r1.weight_v", "resDown6.conv_r2.bias", "resDown6.conv_r2.weight_orig", "resDown6.conv_r2.weight_u", "resDown6.conv_r2.weight_v", "in6.weight", "in6.bias", "resUp5.conv_l1.bias", "resUp5.conv_l1.weight_orig", "resUp5.conv_l1.weight_u", "resUp5.conv_l1.weight_v", "resUp5.conv_r1.bias", "resUp5.conv_r1.weight_orig", "resUp5.conv_r1.weight_u", "resUp5.conv_r1.weight_v", "resUp5.conv_r2.bias", "resUp5.conv_r2.weight_orig", "resUp5.conv_r2.weight_u", "resUp5.conv_r2.weight_v", "resUp6.conv_l1.bias", "resUp6.conv_l1.weight_orig", "resUp6.conv_l1.weight_u", "resUp6.conv_l1.weight_v", "resUp6.conv_r1.bias", "resUp6.conv_r1.weight_orig", "resUp6.conv_r1.weight_u", "resUp6.conv_r1.weight_v", "resUp6.conv_r2.bias", "resUp6.conv_r2.weight_orig", "resUp6.conv_r2.weight_u", "resUp6.conv_r2.weight_v".
size mismatch for p: copying a param with shape torch.Size([17158, 512]) from checkpoint, the shape in current model is torch.Size([13184, 512]).
size mismatch for psi: copying a param with shape torch.Size([17158, 1]) from checkpoint, the shape in current model is torch.Size([13184, 1]).
size mismatch for resUp1.conv_l1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_l1.weight_orig: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for resUp1.conv_l1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r1.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for resUp1.conv_r1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for resUp1.conv_r2.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp1.conv_r2.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([2304]).
size mismatch for resUp2.conv_l1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_l1.weight_orig: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
size mismatch for resUp2.conv_l1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_l1.weight_v: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for resUp2.conv_r1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r1.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]).
size mismatch for resUp2.conv_r1.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r1.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([2304]).
size mismatch for resUp2.conv_r2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r2.weight_orig: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for resUp2.conv_r2.weight_u: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp2.conv_r2.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for resUp3.conv_l1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_l1.weight_orig: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
size mismatch for resUp3.conv_l1.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_l1.weight_v: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for resUp3.conv_r1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r1.weight_orig: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 128, 3, 3]).
size mismatch for resUp3.conv_r1.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r1.weight_v: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for resUp3.conv_r2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r2.weight_orig: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for resUp3.conv_r2.weight_u: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp3.conv_r2.weight_v: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([576]).
size mismatch for resUp4.conv_l1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_l1.weight_orig: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 64, 1, 1]).
size mismatch for resUp4.conv_l1.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_l1.weight_v: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for resUp4.conv_r1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r1.weight_orig: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 64, 3, 3]).
size mismatch for resUp4.conv_r1.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r1.weight_v: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([576]).
size mismatch for resUp4.conv_r2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r2.weight_orig: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for resUp4.conv_r2.weight_u: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for resUp4.conv_r2.weight_v: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([288]).

embedder_inference.py crash

Hi, when I run the embedder_inference.py and it always crash on line 27
frame_mark_video = torch.from_numpy(np.array(frame_mark_video)).type(dtype = torch.float) #T,2,256,256,3

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1741, in
main()
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/_pydev_imps/pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/danieltian/Desktop/Code/github_try/Realistic-Neural-Talking-Head-Models/embedder_inference.py", line 29, in
temp1 = torch.from_numpy(temp)
TypeError: can't convert np.ndarray of type numpy.object
. The only supported types are: float64, float32, float16, int64, int32, int16, int8, and uint8.

It seams that the value frame_mark_video returned from generate_cropped_landmarks is list of tuple, and it can not be converted to numpy array of float directly. Could you please fix this problem?
Thanks in advance.

Training on Multiple GPUs

Hello, thanks for the model and the explanations, this is very helpful for my research. I obtained very good results from (fine-tuned) pre-trained model and I want to increase the quality by training the model with the whole VoxCeleb2 dataset. I have all the data set folder prepared but I couldn't run the training on multiple GPUs. I have 2 GPUs and I want to use both of them since one GPU can not carry that much data and gives RunTimeError: CUDA out of memory.
How should I modify the code in order to take advantage of both GPUs on my system?
Any help would be appreciated.
Thank you

webcam_inference.py fake image is black

Looking at the code for webcam_inference and the fake image is black for me. I see the source for out1 has been commented out which gets it's source from the path_to_embedding which is the e_hat_video.tar.

Is that image or video sourced from the examples folder when embedder_inference is run? It seems like it is able to access the examples and save the tar files when I run it, but they end up being just 3kb in size.

Some guidance on what does what would be appreciated.

errors occur when running embedder_inference.py

Traceback (most recent call last):
File "Realistic/embedder_inference.py", line 27, in
frame_mark_video = frame_mark_video.transpose(2,4).to(device) #T,2,3,256,256
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

about requirements.txt

Can you provide a requirements.txt?

I'm trying to replicate this project,but there's a lack of environment, and I'm worried about potential bugs caused by version problems.

Maybe you can use pipreqs to generate requirements.txt
pip install pipreqs
pipreqs ./

Some issue i met on this repo

Thanks for the excellent work! but i found some bugs may exist in this repo:

  1. preprocess.py do you mean use generate_crop_landmarks rather than generate_landmarks?
  2. also, you may need to calculate image IOU since the cropping image may come from different people in the same video. It may better not just simply use fd.preds[0].

How to use webcam inference for mp4

Hi,
I try to use webcam_inference.py to inference a mp4 and get every frame outputs(fake/me/landmarks), but I got all outputs in full black color.
Process:
1、download the pretrained model.
2、run embedder_inference.py to get e_hat_images.tar
3、modify “cap = cv2.VideoCapture(0)” to “cap = cv2.VideoCapture(xxx.mp4)” in webcam_inference.py
4、run webcam_inference.py to get outputs(fake/me/landmarks)

Any suggestions? Thanks.

embedder_inference.py errors

Hello,

I have followed all of the steps you outlined in the README and I believe everything is set up properly. When I try to run embedder_inference.py I just get a bunch of errors, like so:

Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
/home/tomljr2/UF/project/dataset/video_extraction_conversion.py:118: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
fig = plt.figure(figsize=(input.shape[1]/dpi, input.shape[0]/dpi), dpi = dpi)
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Error: Video corrupted or no landmarks visible
Traceback (most recent call last):
File "embedder_inference.py", line 25, in
frame_mark_video = generate_cropped_landmarks(frame_mark_video, pad=50)
File "/home/tomljr2/UF/project/dataset/video_extraction_conversion.py", line 151, in generate_cropped_landmarks
frame_landmark_list.append(frame_landmark_list[i])
IndexError: list index out of range

Any idea what is causing this?

Also if you could detail the flow of how you actually use the program once I've got it working, I would really appreciate it because I didn't see much in the README.

Thanks!

Can't see any fake face

Hi.

I run webcam_inference.py file and I got 3 windows which are fake, me, landmark.
But only me and landmarks show. Any fake face wasn't made.
How can I fix this?

Bilinear Down/Upsampling

Thanks a lot for the great efforts!

Although the Samsung paper didn't mention the interpolation method, according to the style-gan paper, they actually used bilinear interpolation for up/downsampling operations. When I train using your code, I have seen the checkerboard effect a lot especially in the early training stage. Have you considered using the bilinear interpolation method for the upsampling function?

Generator architecture doesn't match the one described in paper

First, thanks a lot for the sharing of your implementation.

But I notice there is an architecture mismatch in your code and the original paper:

out = self.resUp1(out, e_psi[:, self.slice_idx[5]:self.slice_idx[6], :])
out = self.resUp2(out, e_psi[:, self.slice_idx[6]:self.slice_idx[7], :])
out = self.self_att_Up(out)
out = self.resUp3(out, e_psi[:, self.slice_idx[7]:self.slice_idx[8], :])
out = self.resUp4(out, e_psi[:, self.slice_idx[8]:self.slice_idx[9], :])
out = adaIN(out,
e_psi[:,
self.slice_idx[9]:(self.slice_idx[10]+self.slice_idx[9])//2,
:],
e_psi[:,
(self.slice_idx[10]+self.slice_idx[9])//2:self.slice_idx[10],
:]
)

image

you use AdaIN in the decode part of generator, while in the original paper they use IN.

Is this change made for better result?

numpy 1.16.1

I'm having an issue installing version 1.16.1. I'm getting an error stating that that version cannot be imported because it does not exist.

Is there no way to get the pickle working with 1.16.4?

Question for embedeer_inference.py

Hi.

I'm a newbie of PyTorch.
I run the embedder_inference.py with new test image and your mp4 file.
But the code save only 2 tar files. How to check the result mp4 file from this code?

Thanks

donations?

How can we donate money to help fund development of this project?

thanks

Generate Noises

Hi, thanks for your work on reproducing this paper.

During the test, I ran the embedder_inference.py, without fine-tuning, and ran video_inference.py
but I got a noise output from models.
test

Any suggestions? Thanks!

Cannot decompress pretrained weights

Pretrained weights on Google Drive cannot be decompressed on MacOS Mojave

$ tar xvf model_weights.tar
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.

$ tar --version
bsdtar 2.8.3 - libarchive 2.8.3

Question about the input of the model.

Hi, thank you for your effort to reimplement the algorithm. But I want to know why did you feed the model with RGB image (0-255) directly instead of preprocessing by transform.ToTensor()?

Preliminary landmarks extraction

Hello!
Many thanks for your work.

One question. Are you planning to add preprocessing of landmarks?
As far as I understand, now the extraction of landmarks occurs during training and this increases the time for batch (4.5 seconds for me). It would be more efficient to extract the landmarks once and then just load them during training.

Convert caffe model to PyTorch model

When I run the last command to convert caffe model to pytorch model: "mmtomodel -f pytorch -in Pytorch_VGGFACE_IR.py -iw Pytorch_VGGFACE_IR.npy -o Pytorch_VGGFACE.pth", I met the following problem:

Traceback (most recent call last):
File "/root/anaconda3/bin/mmtomodel", line 10, in
sys.exit(_main())
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/_script/dump_code.py", line 79, in _main
ret = dump_code(args.framework, args.inputNetwork, args.inputWeight, args.outputModel, args.dump_tag)
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/_script/dump_code.py", line 32, in dump_code
save_model(MainModel, network_filepath, weight_filepath, dump_filepath)
File "/root/anaconda3/lib/python3.7/site-packages/mmdnn/conversion/pytorch/saver.py", line 5, in save_model
model = MainModel.KitModel(weight_filepath)
File "Pytorch_VGGFACE_IR.py", line 27, in init
self.conv1_1 = self.__conv(2, name='conv1_1', in_channels=3, out_channels=64, kernel_size=(3, 3), stride=(1, 1), groups=1, bias=True)
File "Pytorch_VGGFACE_IR.py", line 129, in _conv
layer.state_dict()['bias'].copy
(torch.from_numpy(__weights_dict[name]['bias']))
RuntimeError: output with shape [64] doesn't match the broadcast shape [1, 1, 1, 64]

Any ideas to solve this problem?

Questions about matching loss

Hello~I have two questions about matching loss:

  • In my opinion, the matrix W and the pose-ignorance code are acutually used in two ways. W is used as the projected embeddings of classes in projection cGAN, while the other serves as the "identity code". However, i don't actually understand what it means that we choose to make Wi closer to the identity code, and how it'll affect the fine-tuning process.

  • the dataset used in fine-tuning: if we've meta-trained the network with large dataset and we would like to fine-tune it with one personal image (One-shot), does that means that we only have one training data in fine-tuning phase? Then how can we train the network in cGAN way? Consisitently feeding this data into D as real data?

I will appreciate that if you can share your opinions with me~

Cannot reproduce the results

Hello! I myself have also tried to reproduce this paper. However, with very similar network architecture and AdaIN settings, I can only achieve low-resolution faces placed on very fuzzy background. Training for more epoches can not improve performance. I am shocked by your example images, but they can not be reproduced by your GitHub code. After carefully reading all the code, I found that there may be some mistakes in your https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/loss/loss_generator.py#L30

   def vgg_x_hook(module, input, output):
       vgg_x_features.append(output.data)
   def vgg_xhat_hook(module, input, output):
      vgg_xhat_features.append(output.data)

The output.data operation seems to be cutting off the gradient flow, making the content loss useless.

I also would like to ask whether there exists any other hidden tricks in training, for example Adam momentum or the depth of generator. Thank you!

Error when loading images in embedder_inference.py

I'm running the code on a Colab notebook with the provided test video and images and I get this error. It also happens when I provide my own images. Help?

Traceback (most recent call last):
File "embedder_inference.py", line 30, in
frame_mark_images = select_images_frames(path_to_images)
File "/content/Realistic-Neural-Talking-Head-Models/dataset/video_extraction_conversion.py", line 101, in select_images_frames
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.1.2) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed)

How would you increase the quality of the output?

Probably a kinda stupid question since I'm pretty new to machine learning, but how would you increase the quality of the output video? Is there any way to increase the resolution and or frame rate?

Error when running the "finetuning_training.py"

Hi, thanks for opening your work.

I converted caffe models and run the embedder_inference without any problem.

But when I run the "finetuning_training.py", the following error happens.

Traceback (most recent call last): File "finetuning_training.py", line 99, in <module> lossG = criterionG(x, x_hat, r_hat, D_res_list, D_hat_res_list) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/Realistic-Neural-Talking-Head-Models/loss/loss_generator.py", line 183, in forward loss_cnt = self.LossCnt(x, x_hat) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/Realistic-Neural-Talking-Head-Models/loss/loss_generator.py", line 87, in forward self.VGG19(x) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torchvision/models/vgg.py", line 44, in forward x = self.classifier(x) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 67, in forward return F.linear(input, self.weight, self.bias) File "/home/max/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1352, in linear ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t()) RuntimeError: size mismatch, m1: [1 x 32768], m2: [25088 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

Do you have any idea what went wrong?

Instance normalization is used only in generator's downsampling

Seeing current implementation of this repository, discriminator has instance normalization, but it is against architecture on the paper.

According to the paper,

We base our generator network G(yi(t), eˆi; ψ, P) on the
image-to-image translation architecture proposed by Johnson et. al. [20], but replace downsampling and upsampling layers with residual blocks similarly to [6] (with batch normalization [16] replaced by instance normalization [38])

ional part of the discriminator V (xi(t), yi(t); θ), we use similar networks, which consist of residual downsampling blocks (same as the ones used in the generator, but without normalization layers)

About results

Does your embedder work well? After training for 300+epochs, I find the embedder totally didn't work and the results only depend on the input of the generator.

embedder_inference.py default code error

Hi,

first, thank you for your work, it is awesome. When I am trying to generate the e_hats for default testing with the webcam, I got the following error:

Traceback (most recent call last):
  File "/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/embedder_inference.py", line 26, in <module>
    frame_mark_video = torch.from_numpy(np.array(frame_mark_video)).type(dtype = torch.float) #T,2,256,256,3
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

I use torch 1.0.0 and 1.2.0 and numpy 1.16.4. I have the path to the image and to the video like this:

"""Hyperparameters and config"""
device = torch.device("cuda:0")
cpu = torch.device("cpu")
path_to_e_hat_video = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/test/e_hat_video.tar'
path_to_e_hat_images = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/test/e_hat_images.tar'
path_to_chkpt = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/snapshots/pretrained/model_weights.tar'
path_to_video = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/examples/fine_tuning/test_video.mp4'
path_to_images = '/media/alejandro/DATA_SSD/PhD/py_workspace/Realistic-Neural-Talking-Head-Models/examples/fine_tuning/test_images'
T = 32

is there anything I am missing?

About the preprocessed dataset

Thank you for the perfect project, however, the dataset is so big, it's difficult to process, could you update the preprocessed dataset to Google Drive? thank you again.

sincerely yours,
Martin

weird results on pretrained model

Thanks for your work!
I have tested the examples on the pretrained model.
I run embedder_inference.py firstly, the second is finetuning_training.py, the third is video_inference.py.
However, the generated results are noise, which is not a face. Could you help me why this happened?

about training data samplers

as original paper saied 'The generator G(yi(t); e^i; ; P) takes the landmark image yi(t) for the video frame not seen by the embedder,the predicted video embedding e^i and outputs a synthesized video frame x^i(t). ', however, in your dataloader part, the image used in generator was appeared in embedder?

What GPU did you use?

I have tested code with NVIDIA TITAN RTX (24GB GPU RAM), but OOM occured.
What GPU did you use to train network?

caffe-cuda can't be installed on Ubuntu16

hi Vincent, thanks for your great contribution.

I'm using Ubuntu16 and can't run "sudo apt install caffe-cuda"

the reason seems to be caffe-cuda is only supported on Ubuntu>17.04

could you please kindly tell me if there is some other way to replace "sudo apt install caffe-cuda"?

thank you very much!

Mismatch between the architecture of Generator.py and the "G_state_dict" in pretrained "model_weights.tar"

Hello, thanks for your wonderful reproduction first!
However, when I try to run "finetune_training.py" to finetune the model with pretrained "model_weights.tar", some error arise:

RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "conv2d.weight", "conv2d.bias".
Unexpected key(s) in state_dict: "resDown5.conv_l1.bias", "resDown5.conv_l1.weight_orig", "resDown5.conv_l1.weight_u", "resDown5.conv_l1.weight_v", "resDown5.conv_r1.bias", "resDown5.conv_r1.weight_orig", "resDown5.conv_r1.weight_u", "resDown5.conv_r1.weight_v", "resDown5.conv_r2.bias", "resDown5.conv_r2.weight_orig", "resDown5.conv_r2.weight_u", "resDown5.conv_r2.weight_v", "in5.weight", "in5.bias", "resDown6.conv_l1.bias", "resDown6.conv_l1.weight_orig", "resDown6.conv_l1.weight_u", "resDown6.conv_l1.weight_v", "resDown6.conv_r1.bias", "resDown6.conv_r1.weight_orig", "resDown6.conv_r1.weight_u", "resDown6.conv_r1.weight_v", "resDown6.conv_r2.bias", "resDown6.conv_r2.weight_orig", "resDown6.conv_r2.weight_u", "resDown6.conv_r2.weight_v", "in6.weight", "in6.bias", "resUp5.conv_l1.bias", "resUp5.conv_l1.weight_orig", "resUp5.conv_l1.weight_u", "resUp5.conv_l1.weight_v", "resUp5.conv_r1.bias", "resUp5.conv_r1.weight_orig", "resUp5.conv_r1.weight_u", "resUp5.conv_r1.weight_v", "resUp5.conv_r2.bias", "resUp5.conv_r2.weight_orig", "resUp5.conv_r2.weight_u", "resUp5.conv_r2.weight_v", "resUp6.conv_l1.bias", "resUp6.conv_l1.weight_orig", "resUp6.conv_l1.weight_u", "resUp6.conv_l1.weight_v", "resUp6.conv_r1.bias", "resUp6.conv_r1.weight_orig", "resUp6.conv_r1.weight_u", "resUp6.conv_r1.weight_v", "resUp6.conv_r2.bias", "resUp6.conv_r2.weight_orig", "resUp6.conv_r2.weight_u", "resUp6.conv_r2.weight_v".

I check the code in network/model.py, and find that there are only 4 ResUpblocks in Generator. But in the "G_state_dict" of "model_weights.tar", it has 6 ResUpblocks in Generator. What's more, "G_state_dict" of "model_weights.tar" lacks keys "conv2d.weight", "conv2d.bias".
Maybe you have uploaded a mismatch pretrained model checkpoint with different generator architecture?

Thanks for your nice reproduction again and long for your reply

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

File "train.py", line 91, in
for i_batch, (f_lm, x, g_y, i) in enumerate(dataLoader, start=i_batch_current):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/shared/Realistic-Neural-Talking-Head-Models-master/dataset/dataset_class.py", line 42, in getitem
frame_mark = frame_mark.transpose(2,4).to(self.device) #K,2,3,224,224
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

Downloaded the vggceleb2 set and tried to run the train.py. Kept getting this error. Tried this on the test vggset and same error.

Also when running webcam_inference, getting only a black screen on the fake image.

converted vgg face

Hi! Thank you for sharing your great work first!
Any possibility for sharing the converted vgg face model? Sorry but I got some inconvenience making the environment

finetuning training problem

thank you for your open project.
i have the problem when i run the finetuning_training.py

Traceback (most recent call last):
File "finetuning_training.py", line 99, in
lossG = criterionG(x, x_hat, r_hat, D_res_list, D_hat_res_list)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/workspace/workspace_lsh/face/headmodel/loss/loss_generator.py", line 157, in forward
loss_cnt = self.LossCnt(x, x_hat)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/workspace/workspace_lsh/face/headmodel/loss/loss_generator.py", line 65, in forward
self.VGG19(x)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torchvision/models/vgg.py", line 39, in forward
x = self.classifier(x)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/nd/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/nn/functional.py", line 1406, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [1 x 32768], m2: [25088 x 4096] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268

Generating video for own image

I see that embedder_inference.py and finetuning_training.py both use test image and video of a person in the fake. I would like to generate fake video for own image of a person using the only image, I have no video. How I can do it?

About generating images

According to the provided source code and weight, using the VoxCeleb2 test set for training, the image generation is not good, and the image details are seriously lost. How long do you train to get the training effect in the example you provided?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.