aliaksandrsiarohin / monkey-net Goto Github PK

Animating Arbitrary Objects via Deep Motion Transfer

Python 91.76% Shell 0.20% HTML 8.04%

monkey-net's Issues

ValueError: cannot reshape array of size 10752 into shape (64,64,3)

I try to test on faces to get demo.gif result on a 64x64 input image but it I get this error

File "demo.py", line 62, in
source_image = VideoToTensor()(read_video(opt.source_image, opt.image_shape + (3,)))['video'][:, :1]
File "/content/monkey-net/frames_dataset.py", line 28, in read_video
video_array = video_array.reshape((-1,) + image_shape)
ValueError: cannot reshape array of size 10752 into shape (64,64,3)

using this command
!python demo.py --config config/nemo.yaml --driving_video sup-mat/driving.png --source_image source2.png --checkpoint /content/nemo-ckp.pth.tar --image_shape 64,64

What is the image meaning of each column when training shape dataset

Thanks for sharing the awesome code!
However, I am confused about the output images when training the shape dataset.
I knew that the first two columns are source and driving. How about the others?

Confirm the command on the readme file

I just read through the readme doc and try to make it work.
I executed the command in the doc:
python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

But it doesn't work.
It's missing the python file and I cannot find the driving_video.gif and source_image.gif in sup-mat.
I noticed that the network accept stacked '.png' and there are two png images under sup-mat.
Just want to confirm the correct command of running the check point. Should it be :

python demo.py --config config/moving-gif.yaml --driving_video driving.png --source_image source.png --checkpoint moving-gif-ckp.pth.tar

If it's correct, I'm happy to rise a PR to fix it :)

expected keypoint coordinates

Hello,
What is the intuition behind squishing the grid coordinates between [-1, 1] before calculating the expectation and covariance of the heatmaps?
In the paper it is mentioned that the grid coordinates have values within the H*W coordinates but in the code those values are reduced to [-1, 1]. Referring to the make_coordinate_grid method under /modules/util.py. Please let me know if I am missing something.

Thanks in advance!

training strategy

I use moving-gif-128 dataset and moving-gif.yaml you provide to train a model on my GPU, but my model performs much worse on video generation than the model you provided. What training strategy did you adopt to make the model work so well?

RuntimeError: CUDA error: out of memory

I run demo.py using the code you provided:
python demo.py --config config/moving-gif.yaml --driving_video sup-mat/driving.png --source_image sup-mat/source.png --checkpoint moving-gif-ckp.pth.tar
But it says "out of memory".Is my GPU memory insufficient? How much memory is required to run it?

Extract keypoints

Hi, thanks for this work! I wanted to ask how one could extract the keypoints from the KeyPointDetector using the mean and variance outputs and map them on the source image.

cuda running out of memory with vox 256x256 dataset chekpoint. suggest edits pls.

What do the images in the Shape folder mean?

How do I create images in the shape folder when training with my own data? What do the images in the Shape folder mean?

Log file - More number of lines in log file than expected

Hi,
I had trained the MonkeyNet for 200 epochs, when I checked the log file it has 312 lines.
Can someone explain why it is so?
Thanks in advance :-)

Unable to extract moving-gif-ckp.pth.tar

I have downloaded the checkpoint from the link in the readme (https://yadi.sk/d/BX-hwuPEVm6iNw) but am unable to extract it:

tar -xvf moving-gif-ckp.pth.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

How to make motion transfer demo working for frame by frame prediction?

Sometimes my video is too large to fit in the GPU and I would like to predict frame by frame.

I changed lines 64-70 of demo.py to the code below but the output video looks static. I checked the values between consecutive frames and they do have subtle differences. Please kindly advise how to modify the code to predict frame by frame.

driving_video = torch.from_numpy(driving_video).unsqueeze(0)
source_image = driving_video[:, :, 0].unsqueeze(2)
out_video_batch = []
for frame_idx in range(driving_video.shape[2]):
    driving_frame = driving_video[:, :, frame_idx, :, :].unsqueeze(2)
    out = transfer_one(generator, kp_detector, source_image, driving_frame, config['transfer_params'])
    out_video_batch.append(torch.squeeze(out['video_prediction']).permute(1, 2, 0).data.cpu().numpy())

Question about generator training

hi:
Have a question , in

monkey-net/train.py

Line 48 in 0c8aa4e

 losses = generator_loss(discriminator_maps_generated=discriminator_maps_generated, 

for the generator training, when computing the generator loss, it seems the discriminator parameters are not frozen,
while for discriminator training,

monkey-net/train.py

Line 70 in 0c8aa4e

 discriminator_maps_generated = self.discriminator(generated['video_prediction'].detach(), **kp_dict) 

, the parameters of generator frozen?
@AliaksandrSiarohin

Questions about kp2gaussian and gaussian2kp

Hi,
Thanks for your interesting work! I have some questions about the kp2gaussian and gaussian2kp in keypoint_detector.py

Are these functions invertible? I use a pretrained hourglass network to extract heatmap, by feed it to gaussion2kp and kp2gaussion, the output become weird.

To calculate the mean of heatmap, why apply "sum" function on it?
mean = (heatmap * grid).sum(dim=(3, 4))
If I have use pretrained landmark detector, for example, facial landmark detector, how should I modify the code?

The link of checkpoint is invalid

Training on custom data using pretrained weights

Firstly, great work on maintaining this repo.
I was trying to use the vox pretrained weights to train on my custom data using vox-full.yaml. But the training terminates. It shows 0/20 its and then exits.
I ran the following command (custom.yaml is based on vox-full.yaml):
CUDA_VISIBLE_DEVICES=0 python run.py --config config/custom.yaml --checkpoint log/custom/vox-cpk.pth.tar
Also, thank you

Pretrained checkpoint

Thank you very much for your impressive work.
I am interested in your work and want to use the model as a baseline of my Face Reenactment network.
However, the downloaded model has an error (maybe it is an invalid link) when extracting the files.
Can you provide me the pre-trained model directly?
For sure I will cite your paper on my work.
Thanks.

About pretrained models

Hi there @AliaksandrSiarohin , thanks for your wonderful work. Could you share your pre-trained models?
I only find the moving-gif model.

assigning zero weights to hourglass decoder while predicting mask

Hello,

In dense motion module, while running the predictions through the hourglass to calculate the mask, the hourglass' decoder network's weights are assigned to 0 and then the bias is initialized to a particular value.

Can you please let me know the reason for this as I am trying to correlate this with the original paper?
modules -> dense_motion_module.py
self.hourglass.decoder.conv.weight.data.zero_()
self.hourglass.decoder.conv.bias.data.copy_(torch.tensor(bias_init, dtype=torch.float))

transfer params

Can transfer parameters have any other format other than .gif? I wanted produce multiple stacked png outputs. Where should the files be placed(i.e the source and driving stacked images) because you have mentioned to use a csv like taichi.csv

Could you provide the download link of Taichi Dataset? Thanks！

RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator

when run demo.py,have some problem as follow,can you help me，thanks
Traceback (most recent call last):
File "demo.py", line 52, in
Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
File "/root/dance/monkey-net/logger.py", line 54, in load_cpk
generator.load_state_dict(checkpoint['generator'])
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

Add Arxiv paper link to readme

I see you have mentioned your Arxiv paper in the readme, but can you also add a hyperlink to it? The link is https://arxiv.org/abs/1812.08861

Mask Embedding

First of all, thank you for opensouring amazing code.

I have a question with Mask Embedding. For mask embedding, I've checked that difference of gaussian kp heatmap and deformed source image are needed(to concatenate).

What's your intent behind this movement encoding? Is there any reference of this?
Through Same block and Hourglass prediction, how can movement encoding act as a mask?

Thank you!

can you provide Taichi pretrain model ?

can you provide Taichi pretrain model ? thanks！

keypoint predictions

Hello,
I am currently generating keypoints on a different face dataset and they look a bit off (they do not focus on the supposedly moving parts). They also generate varying keypoints at each run.
I have tried with different normalizing constants while apply softmax to the heatmaps but it doesn't seem to focus. The overall U-net architecture is pretty much the same.

Can you please provide any suggestion to improve this?

Thank you.

Unclear on how to run motion transfer

I am trying to transfer the facial expressions in one gif (driving_video.gif) to another photo (source_image.jpg), using the pretrained checkpoint provided. In the readme the following command is given (missing a file - presumably this should be python demo.py):

python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

I've tried with the following (I have just put driving_video.gif, source_image.jpg and moving-gif-ckp.pth.tar in the root of the project folder):

python demo.py --config config/moving-gif.yaml --driving_video driving_video.gif --source_image source_image.jpg --checkpoint moving-gif-ckp.pth.tar

This results in the following:

Traceback (most recent call last):
  File "demo.py", line 52, in <module>
    Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
  File "/home/paperspace/monkey/monkey/logger.py", line 54, in load_cpk
    generator.load_state_dict(checkpoint['generator'])
  File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
        Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

Transfering motion to a bird.

Hello, I am trying to transfer motion of one bird to another. However, due to lack of data, I am trying to infer from using moving-gif weights provided with the code. After adding a white background to all frames, I tried to reconstruct the video giving the first frame of the driving video as the source image. I observed that apart from the generator not able to generate convincing images, sometimes the key points were estimated incorrectly. Is there any sort of relation in which the keypoint detector depends on the object (mostly quadrupeds in moving-gif) the network was trained on.

Suggestions for better numerical stability

Hi,

Thanks for open-sourcing the code, really benefits my research a lot!

I found two small problems in the code that may cause numerical instability:

Didn't guarantee s1 - s2 is positive before sqrt:

monkey-net/modules/util.py

Line 254 in 7c116b6

norm = torch.sqrt((s1 - s2) / 2)
The singular value can sometimes become too small, causing the output explodes to inf. Adding an epsilon to the denominator can help to stabilize it:

monkey-net/modules/keypoint_detector.py

Line 65 in 7c116b6

var = torch.max(min_norm, sg) * var / sg

CSV file for 'actions' is not found

I completed train and reconstruction steps. When i running motion transfer step, it's get error for csv file is not found. Can you share csv file? Or is there a way to create csv?

(monkeynet1) pc@monster:~/Desktop/monkey-net-master$ CUDA_VISIBLE_DEVICES=0 python run.py --config config/actions.yaml --mode transfer --checkpoint log/first/00000020-checkpoint.pth.tar 
run.py:35: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Use predefined train-test split.
Transfer...
Traceback (most recent call last):
  File "run.py", line 78, in <module>
    transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
  File "/home/pc/Desktop/monkey-net-master/transfer.py", line 87, in transfer
    dataset = PairedDataset(initial_dataset=dataset, number_of_pairs=transfer_params['num_pairs'])
  File "/home/pc/Desktop/monkey-net-master/frames_dataset.py", line 111, in __init__
    pairs = pd.read_csv(pairs_list)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'data/actions.csv' does not exist: b'data/actions.csv'

What's the difference between vox-full.yaml and vox.yaml?

I want to train your model with VoxCeleb dataset.
Which configuration do I use between them?

Also, could you provide VoxCeleb pretrain model?

Thank you

how do i reduce the batch size? demo.py

Strange results using pretrained model

I am running motion transfer with the following command, using the pretrained checkpoint in the readme (keeping everything in moving-gif.yaml the same):

python demo.py --config config/moving-gif.yaml --driving_video driver.gif --source_image source.png --checkpoint moving-gif-ckp.pth.tar

The driving gif is as follows:

Source image:

This results in the following:

align source image with the first frame of driver video

do you have any good strategy for aligning source image with the first frame of driver video

the size larger than 64x64 does work for nemo model

Hello, I get the pre-trained nemo model from [https://yadi.sk/d/EX7N9fuIuE4FNg], (https://yadi.sk/d/EX7N9fuIuE4FNg), but I get two problem:

when I try the image size of 64x64, driving image for test/213_deliberate_smile_1.png of nemo dataset, source image for the first five frames of test/505_spontaneous_smile_4.png of nemo dataset, the nemo model works very well, but when I try image size 128x128, 256x256, 512x512(using resize) for the same driving image and source image, the result.gif is bad.
when I try the driving image test/213_deliberate_smile_1.png of nemo dataset , source image for one 64x64 test.gif from common front face image, the result.gif is bad.

Can anyone give some advises to fix the above problem? @AliaksandrSiarohin
Thank you very much~

Error running code on 2 GPUs

Use predefined train-test split.
Transfer...
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)

0it [00:00, ?it/s]
Traceback (most recent call last):
File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

I understand that I need to put all the input tensors on the 0 device. But not sure exactly how to do that, I tried some ways from
https://discuss.pytorch.org/t/how-to-solve-the-problem-of-runtimeerror-all-tensors-must-be-on-devices-0/15198/5
however that did not work.

I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).

Running the model on 2 RTX 2080 with CUDA 10

aliaksandrsiarohin / monkey-net Goto Github PK

monkey-net's Issues

Recommend Projects

Recommend Topics

Recommend Org