Git Product home page Git Product logo

monkey-net's Issues

ValueError: cannot reshape array of size 10752 into shape (64,64,3)

I try to test on faces to get demo.gif result on a 64x64 input image but it I get this error

File "demo.py", line 62, in
source_image = VideoToTensor()(read_video(opt.source_image, opt.image_shape + (3,)))['video'][:, :1]
File "/content/monkey-net/frames_dataset.py", line 28, in read_video
video_array = video_array.reshape((-1,) + image_shape)
ValueError: cannot reshape array of size 10752 into shape (64,64,3)

using this command
!python demo.py --config config/nemo.yaml --driving_video sup-mat/driving.png --source_image source2.png --checkpoint /content/nemo-ckp.pth.tar --image_shape 64,64

Confirm the command on the readme file

Hi

I just read through the readme doc and try to make it work.
I executed the command in the doc:
python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

But it doesn't work.
It's missing the python file and I cannot find the driving_video.gif and source_image.gif in sup-mat.
I noticed that the network accept stacked '.png' and there are two png images under sup-mat.
Just want to confirm the correct command of running the check point. Should it be :

python demo.py --config config/moving-gif.yaml --driving_video driving.png --source_image source.png --checkpoint moving-gif-ckp.pth.tar

If it's correct, I'm happy to rise a PR to fix it :)

expected keypoint coordinates

Hello,
What is the intuition behind squishing the grid coordinates between [-1, 1] before calculating the expectation and covariance of the heatmaps?
In the paper it is mentioned that the grid coordinates have values within the H*W coordinates but in the code those values are reduced to [-1, 1]. Referring to the make_coordinate_grid method under /modules/util.py. Please let me know if I am missing something.

Thanks in advance!

training strategy

I use moving-gif-128 dataset and moving-gif.yaml you provide to train a model on my GPU, but my model performs much worse on video generation than the model you provided. What training strategy did you adopt to make the model work so well?

RuntimeError: CUDA error: out of memory

I run demo.py using the code you provided:
python demo.py --config config/moving-gif.yaml --driving_video sup-mat/driving.png --source_image sup-mat/source.png --checkpoint moving-gif-ckp.pth.tar
But it says "out of memory".Is my GPU memory insufficient? How much memory is required to run it?

Extract keypoints

Hi, thanks for this work! I wanted to ask how one could extract the keypoints from the KeyPointDetector using the mean and variance outputs and map them on the source image.

How to make motion transfer demo working for frame by frame prediction?

Sometimes my video is too large to fit in the GPU and I would like to predict frame by frame.

I changed lines 64-70 of demo.py to the code below but the output video looks static. I checked the values between consecutive frames and they do have subtle differences. Please kindly advise how to modify the code to predict frame by frame.

driving_video = torch.from_numpy(driving_video).unsqueeze(0)
source_image = driving_video[:, :, 0].unsqueeze(2)
out_video_batch = []
for frame_idx in range(driving_video.shape[2]):
    driving_frame = driving_video[:, :, frame_idx, :, :].unsqueeze(2)
    out = transfer_one(generator, kp_detector, source_image, driving_frame, config['transfer_params'])
    out_video_batch.append(torch.squeeze(out['video_prediction']).permute(1, 2, 0).data.cpu().numpy())

Question about generator training

hi:
Have a question , in

losses = generator_loss(discriminator_maps_generated=discriminator_maps_generated,

for the generator training, when computing the generator loss, it seems the discriminator parameters are not frozen,
while for discriminator training,
discriminator_maps_generated = self.discriminator(generated['video_prediction'].detach(), **kp_dict)
, the parameters of generator frozen?
@AliaksandrSiarohin

Questions about kp2gaussian and gaussian2kp

Hi,
Thanks for your interesting work! I have some questions about the kp2gaussian and gaussian2kp in keypoint_detector.py

  1. Are these functions invertible? I use a pretrained hourglass network to extract heatmap, by feed it to gaussion2kp and kp2gaussion, the output become weird.

image

  1. To calculate the mean of heatmap, why apply "sum" function on it?
    mean = (heatmap * grid).sum(dim=(3, 4))

  2. If I have use pretrained landmark detector, for example, facial landmark detector, how should I modify the code?

Training on custom data using pretrained weights

Firstly, great work on maintaining this repo.
I was trying to use the vox pretrained weights to train on my custom data using vox-full.yaml. But the training terminates. It shows 0/20 its and then exits.
I ran the following command (custom.yaml is based on vox-full.yaml):
CUDA_VISIBLE_DEVICES=0 python run.py --config config/custom.yaml --checkpoint log/custom/vox-cpk.pth.tar
Also, thank you

Pretrained checkpoint

Thank you very much for your impressive work.
I am interested in your work and want to use the model as a baseline of my Face Reenactment network.
However, the downloaded model has an error (maybe it is an invalid link) when extracting the files.
Can you provide me the pre-trained model directly?
For sure I will cite your paper on my work.
Thanks.

assigning zero weights to hourglass decoder while predicting mask

Hello,

In dense motion module, while running the predictions through the hourglass to calculate the mask, the hourglass' decoder network's weights are assigned to 0 and then the bias is initialized to a particular value.

Can you please let me know the reason for this as I am trying to correlate this with the original paper?
modules -> dense_motion_module.py
self.hourglass.decoder.conv.weight.data.zero_()
self.hourglass.decoder.conv.bias.data.copy_(torch.tensor(bias_init, dtype=torch.float))

transfer params

Can transfer parameters have any other format other than .gif? I wanted produce multiple stacked png outputs. Where should the files be placed(i.e the source and driving stacked images) because you have mentioned to use a csv like taichi.csv

RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator

when run demo.py,have some problem as follow,can you help me,thanks
Traceback (most recent call last):
File "demo.py", line 52, in
Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
File "/root/dance/monkey-net/logger.py", line 54, in load_cpk
generator.load_state_dict(checkpoint['generator'])
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

Mask Embedding

First of all, thank you for opensouring amazing code.

I have a question with Mask Embedding. For mask embedding, I've checked that difference of gaussian kp heatmap and deformed source image are needed(to concatenate).

  1. What's your intent behind this movement encoding? Is there any reference of this?

  2. Through Same block and Hourglass prediction, how can movement encoding act as a mask?

Thank you!

keypoint predictions

Hello,
I am currently generating keypoints on a different face dataset and they look a bit off (they do not focus on the supposedly moving parts). They also generate varying keypoints at each run.
I have tried with different normalizing constants while apply softmax to the heatmaps but it doesn't seem to focus. The overall U-net architecture is pretty much the same.

Can you please provide any suggestion to improve this?

Thank you.

Unclear on how to run motion transfer

I am trying to transfer the facial expressions in one gif (driving_video.gif) to another photo (source_image.jpg), using the pretrained checkpoint provided. In the readme the following command is given (missing a file - presumably this should be python demo.py):

python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

I've tried with the following (I have just put driving_video.gif, source_image.jpg and moving-gif-ckp.pth.tar in the root of the project folder):

python demo.py --config config/moving-gif.yaml --driving_video driving_video.gif --source_image source_image.jpg --checkpoint moving-gif-ckp.pth.tar

This results in the following:

Traceback (most recent call last):
  File "demo.py", line 52, in <module>
    Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
  File "/home/paperspace/monkey/monkey/logger.py", line 54, in load_cpk
    generator.load_state_dict(checkpoint['generator'])
  File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
        Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

Transfering motion to a bird.

Hello, I am trying to transfer motion of one bird to another. However, due to lack of data, I am trying to infer from using moving-gif weights provided with the code. After adding a white background to all frames, I tried to reconstruct the video giving the first frame of the driving video as the source image. I observed that apart from the generator not able to generate convincing images, sometimes the key points were estimated incorrectly. Is there any sort of relation in which the keypoint detector depends on the object (mostly quadrupeds in moving-gif) the network was trained on.

Suggestions for better numerical stability

Hi,

Thanks for open-sourcing the code, really benefits my research a lot!

I found two small problems in the code that may cause numerical instability:

  1. Didn't guarantee s1 - s2 is positive before sqrt:

    norm = torch.sqrt((s1 - s2) / 2)

  2. The singular value can sometimes become too small, causing the output explodes to inf. Adding an epsilon to the denominator can help to stabilize it:

    var = torch.max(min_norm, sg) * var / sg

CSV file for 'actions' is not found

I completed train and reconstruction steps. When i running motion transfer step, it's get error for csv file is not found. Can you share csv file? Or is there a way to create csv?

(monkeynet1) pc@monster:~/Desktop/monkey-net-master$ CUDA_VISIBLE_DEVICES=0 python run.py --config config/actions.yaml --mode transfer --checkpoint log/first/00000020-checkpoint.pth.tar 
run.py:35: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Use predefined train-test split.
Transfer...
Traceback (most recent call last):
  File "run.py", line 78, in <module>
    transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
  File "/home/pc/Desktop/monkey-net-master/transfer.py", line 87, in transfer
    dataset = PairedDataset(initial_dataset=dataset, number_of_pairs=transfer_params['num_pairs'])
  File "/home/pc/Desktop/monkey-net-master/frames_dataset.py", line 111, in __init__
    pairs = pd.read_csv(pairs_list)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'data/actions.csv' does not exist: b'data/actions.csv'

Strange results using pretrained model

I am running motion transfer with the following command, using the pretrained checkpoint in the readme (keeping everything in moving-gif.yaml the same):

python demo.py --config config/moving-gif.yaml --driving_video driver.gif --source_image source.png --checkpoint moving-gif-ckp.pth.tar

The driving gif is as follows:

driver

Source image:

source

This results in the following:

demo

the size larger than 64x64 does work for nemo model

Hello, I get the pre-trained nemo model from [https://yadi.sk/d/EX7N9fuIuE4FNg], (https://yadi.sk/d/EX7N9fuIuE4FNg), but I get two problem:

  1. when I try the image size of 64x64, driving image for test/213_deliberate_smile_1.png of nemo dataset, source image for the first five frames of test/505_spontaneous_smile_4.png of nemo dataset, the nemo model works very well, but when I try image size 128x128, 256x256, 512x512(using resize) for the same driving image and source image, the result.gif is bad.

  2. when I try the driving image test/213_deliberate_smile_1.png of nemo dataset , source image for one 64x64 test.gif from common front face image, the result.gif is bad.

Can anyone give some advises to fix the above problem? @AliaksandrSiarohin
Thank you very much~

Error running code on 2 GPUs

Use predefined train-test split.
Transfer...
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)

0it [00:00, ?it/s]
Traceback (most recent call last):
File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

I understand that I need to put all the input tensors on the 0 device. But not sure exactly how to do that, I tried some ways from
https://discuss.pytorch.org/t/how-to-solve-the-problem-of-runtimeerror-all-tensors-must-be-on-devices-0/15198/5
however that did not work.

I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).

Running the model on 2 RTX 2080 with CUDA 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.