Git Product home page Git Product logo

monkey-net's Introduction

Animating Arbitrary Objects via Deep Motion Transfer

This repository contains the source code for the CVPR oral paper Animating Arbitrary Objects via Deep Motion Transfer by Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci and Nicu Sebe. We call the proposed deep framework Monkey-Net, as it enables motion transfer by considering MOviNg KEYpoints. Check also the project website.

New version of the method can be found here.

Examples of motion transfer

The videos on the left show the driving videos. The first row on the right for each dataset shows the source images. The bottom row contains the animated sequences with motion transferred from the driving video and object taken from the source image. We trained a separate network for each task. Note that for each task the background, the object appearance are consistent in each generated video.

NEMO Face Dataset

Screenshot

Taichi Dataset

Screenshot

BAIR Robot Dataset

Screenshot

MGIF Dataset

Screenshot

Training and testing

Our framework can be used in several modes. In the motion transfer mode, a static image will be animated using a driving video. In the image-to-video translation mode, given a static image, the framework will predict future frames.

Installation

We support python3. To install the dependencies run:

pip install -r requirements.txt

YAML configs

There are several configuration (config/dataset_name.yaml) files one for each dataset. See config/actions.yaml to get description of each parameter.

Motion Transfer Demo

To run a demo, download a checkpoint and run the following command:

python demo.py --config  config/moving-gif.yaml --driving_video sup-mat/driving.png --source_image sup-mat/source.png --checkpoint path/to/checkpoint

The result will be stored in demo.gif.

Training

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml

The code will create a folder in the log directory (each run will create a time-stamped new directory). Checkpoints will be saved to this folder. To check the loss values during training in see log.txt. You can also check training data reconstructions in the train-vis subfolder.

Reconstruction

To evaluate the reconstruction performance run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode reconstruction --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the reconstruction subfolder will be created in the checkpoint folder. The generated video will be stored to this folderenerated video there and in png subfolder loss-less verstion in '.png' format.

Motion transfer

In order to perform motion transfer run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode transfer --checkpoint path/to/checkpoint

You will need to specify the path to the checkpoint, the transfer subfolder will be created in the same folder as the checkpoint. You can find the generated video there and its loss-less version in the png subfolder.

There are 2 different ways of performing transfer: by using absolute keypoint locations or by using relative keypoint locations.

  1. Absolute Transfer: the transfer is performed using the absolute postions of the driving video and appearance of the source image. In this way there are no specific requirements for the driving video and source appearance that is used. However this usually leads to poor performance since unrelevant details such as shape is transfered. Check transfer parameters in shapes.yaml to enable this mode.

  2. Realtive Transfer: from the driving video we first estimate the relative movement of each keypoint, then we add this movement to the absolute position of keypoints in the source image. This keypoint along with source image is used for transfer. This usually leads to better performance, however this requires that the object in the first frame of the video and in the source image have the same pose.

The approximately aligned pairs of videos are given in the data folder. (e.g data/taichi.csv).

Image-to-video translation

In order to perform image-to-video translation run:

CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode prediction --checkpoint path/to/checkpoint

The following steps will be performed:

  • Estimate the keypoints from the training set
  • Train rnn to predict the keypoints
  • Run the predictor for each video in the dataset, starting from the first frame. Again the prediction subfolder will be created in the same folder as the checkpoint. You can find the generated video there and in png subfolder.

Datasets

  1. Shapes. This dataset is saved along with repository. Training takes about 1 hour.

  2. Actions. This dataset is also saved along with repository. And training takes about 4 hours.

  3. Nemo. The preprocessed version of this dataset can be downloaded. Training takes about 6 hours.

  4. Taichi. We used the same data as MoCoGAN. Training takes about 15 hours.

  5. Bair. The preprocessed version of this dataset can be downloaded. Training takes about 4 hours.

  6. MGif. The preprocessed version of this dataset can be downloaded. Check for details on this dataset. Training takes about 8 hours, on 2 gpu.

  7. Vox. The dataset can be downloaded and preprocessed using a script: cd data; ./get_vox.sh.

Training on your own dataset

  1. Resize all the videos to the same size e.g 128x128, the videos can be in '.gif' or '.mp4' format. But we recommend to make them stacked '.png' (see data/shapes), because this format is lossless.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml (it is better to start from one of the existing configs, for 64x64 videos config/nemo.yaml, for 128x128 config\moving-gif.yaml, for 256x256 config\vox.yaml), in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

Additional notes

Citation:

@InProceedings{Siarohin_2019_CVPR,
  author={Siarohin, Aliaksandr and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  title={Animating Arbitrary Objects via Deep Motion Transfer},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2019}
}

monkey-net's People

Contributors

aliaksandrsiarohin avatar aywi avatar d-li14 avatar mlaradji avatar sergeytulyakov avatar vacancy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monkey-net's Issues

Extract keypoints

Hi, thanks for this work! I wanted to ask how one could extract the keypoints from the KeyPointDetector using the mean and variance outputs and map them on the source image.

expected keypoint coordinates

Hello,
What is the intuition behind squishing the grid coordinates between [-1, 1] before calculating the expectation and covariance of the heatmaps?
In the paper it is mentioned that the grid coordinates have values within the H*W coordinates but in the code those values are reduced to [-1, 1]. Referring to the make_coordinate_grid method under /modules/util.py. Please let me know if I am missing something.

Thanks in advance!

Transfering motion to a bird.

Hello, I am trying to transfer motion of one bird to another. However, due to lack of data, I am trying to infer from using moving-gif weights provided with the code. After adding a white background to all frames, I tried to reconstruct the video giving the first frame of the driving video as the source image. I observed that apart from the generator not able to generate convincing images, sometimes the key points were estimated incorrectly. Is there any sort of relation in which the keypoint detector depends on the object (mostly quadrupeds in moving-gif) the network was trained on.

Mask Embedding

First of all, thank you for opensouring amazing code.

I have a question with Mask Embedding. For mask embedding, I've checked that difference of gaussian kp heatmap and deformed source image are needed(to concatenate).

  1. What's your intent behind this movement encoding? Is there any reference of this?

  2. Through Same block and Hourglass prediction, how can movement encoding act as a mask?

Thank you!

Error running code on 2 GPUs

Use predefined train-test split.
Transfer...
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)

0it [00:00, ?it/s]
Traceback (most recent call last):
File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

I understand that I need to put all the input tensors on the 0 device. But not sure exactly how to do that, I tried some ways from
https://discuss.pytorch.org/t/how-to-solve-the-problem-of-runtimeerror-all-tensors-must-be-on-devices-0/15198/5
however that did not work.

I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).

Running the model on 2 RTX 2080 with CUDA 10

transfer params

Can transfer parameters have any other format other than .gif? I wanted produce multiple stacked png outputs. Where should the files be placed(i.e the source and driving stacked images) because you have mentioned to use a csv like taichi.csv

Strange results using pretrained model

I am running motion transfer with the following command, using the pretrained checkpoint in the readme (keeping everything in moving-gif.yaml the same):

python demo.py --config config/moving-gif.yaml --driving_video driver.gif --source_image source.png --checkpoint moving-gif-ckp.pth.tar

The driving gif is as follows:

driver

Source image:

source

This results in the following:

demo

assigning zero weights to hourglass decoder while predicting mask

Hello,

In dense motion module, while running the predictions through the hourglass to calculate the mask, the hourglass' decoder network's weights are assigned to 0 and then the bias is initialized to a particular value.

Can you please let me know the reason for this as I am trying to correlate this with the original paper?
modules -> dense_motion_module.py
self.hourglass.decoder.conv.weight.data.zero_()
self.hourglass.decoder.conv.bias.data.copy_(torch.tensor(bias_init, dtype=torch.float))

Question about generator training

hi:
Have a question , in

losses = generator_loss(discriminator_maps_generated=discriminator_maps_generated,

for the generator training, when computing the generator loss, it seems the discriminator parameters are not frozen,
while for discriminator training,
discriminator_maps_generated = self.discriminator(generated['video_prediction'].detach(), **kp_dict)
, the parameters of generator frozen?
@AliaksandrSiarohin

training strategy

I use moving-gif-128 dataset and moving-gif.yaml you provide to train a model on my GPU, but my model performs much worse on video generation than the model you provided. What training strategy did you adopt to make the model work so well?

Training on custom data using pretrained weights

Firstly, great work on maintaining this repo.
I was trying to use the vox pretrained weights to train on my custom data using vox-full.yaml. But the training terminates. It shows 0/20 its and then exits.
I ran the following command (custom.yaml is based on vox-full.yaml):
CUDA_VISIBLE_DEVICES=0 python run.py --config config/custom.yaml --checkpoint log/custom/vox-cpk.pth.tar
Also, thank you

RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator

when run demo.py,have some problem as follow,can you help me,thanks
Traceback (most recent call last):
File "demo.py", line 52, in
Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
File "/root/dance/monkey-net/logger.py", line 54, in load_cpk
generator.load_state_dict(checkpoint['generator'])
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

the size larger than 64x64 does work for nemo model

Hello, I get the pre-trained nemo model from [https://yadi.sk/d/EX7N9fuIuE4FNg], (https://yadi.sk/d/EX7N9fuIuE4FNg), but I get two problem:

  1. when I try the image size of 64x64, driving image for test/213_deliberate_smile_1.png of nemo dataset, source image for the first five frames of test/505_spontaneous_smile_4.png of nemo dataset, the nemo model works very well, but when I try image size 128x128, 256x256, 512x512(using resize) for the same driving image and source image, the result.gif is bad.

  2. when I try the driving image test/213_deliberate_smile_1.png of nemo dataset , source image for one 64x64 test.gif from common front face image, the result.gif is bad.

Can anyone give some advises to fix the above problem? @AliaksandrSiarohin
Thank you very much~

keypoint predictions

Hello,
I am currently generating keypoints on a different face dataset and they look a bit off (they do not focus on the supposedly moving parts). They also generate varying keypoints at each run.
I have tried with different normalizing constants while apply softmax to the heatmaps but it doesn't seem to focus. The overall U-net architecture is pretty much the same.

Can you please provide any suggestion to improve this?

Thank you.

Pretrained checkpoint

Thank you very much for your impressive work.
I am interested in your work and want to use the model as a baseline of my Face Reenactment network.
However, the downloaded model has an error (maybe it is an invalid link) when extracting the files.
Can you provide me the pre-trained model directly?
For sure I will cite your paper on my work.
Thanks.

How to make motion transfer demo working for frame by frame prediction?

Sometimes my video is too large to fit in the GPU and I would like to predict frame by frame.

I changed lines 64-70 of demo.py to the code below but the output video looks static. I checked the values between consecutive frames and they do have subtle differences. Please kindly advise how to modify the code to predict frame by frame.

driving_video = torch.from_numpy(driving_video).unsqueeze(0)
source_image = driving_video[:, :, 0].unsqueeze(2)
out_video_batch = []
for frame_idx in range(driving_video.shape[2]):
    driving_frame = driving_video[:, :, frame_idx, :, :].unsqueeze(2)
    out = transfer_one(generator, kp_detector, source_image, driving_frame, config['transfer_params'])
    out_video_batch.append(torch.squeeze(out['video_prediction']).permute(1, 2, 0).data.cpu().numpy())

Unclear on how to run motion transfer

I am trying to transfer the facial expressions in one gif (driving_video.gif) to another photo (source_image.jpg), using the pretrained checkpoint provided. In the readme the following command is given (missing a file - presumably this should be python demo.py):

python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

I've tried with the following (I have just put driving_video.gif, source_image.jpg and moving-gif-ckp.pth.tar in the root of the project folder):

python demo.py --config config/moving-gif.yaml --driving_video driving_video.gif --source_image source_image.jpg --checkpoint moving-gif-ckp.pth.tar

This results in the following:

Traceback (most recent call last):
  File "demo.py", line 52, in <module>
    Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
  File "/home/paperspace/monkey/monkey/logger.py", line 54, in load_cpk
    generator.load_state_dict(checkpoint['generator'])
  File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
        Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".

RuntimeError: CUDA error: out of memory

I run demo.py using the code you provided:
python demo.py --config config/moving-gif.yaml --driving_video sup-mat/driving.png --source_image sup-mat/source.png --checkpoint moving-gif-ckp.pth.tar
But it says "out of memory".Is my GPU memory insufficient? How much memory is required to run it?

Confirm the command on the readme file

Hi

I just read through the readme doc and try to make it work.
I executed the command in the doc:
python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint

But it doesn't work.
It's missing the python file and I cannot find the driving_video.gif and source_image.gif in sup-mat.
I noticed that the network accept stacked '.png' and there are two png images under sup-mat.
Just want to confirm the correct command of running the check point. Should it be :

python demo.py --config config/moving-gif.yaml --driving_video driving.png --source_image source.png --checkpoint moving-gif-ckp.pth.tar

If it's correct, I'm happy to rise a PR to fix it :)

CSV file for 'actions' is not found

I completed train and reconstruction steps. When i running motion transfer step, it's get error for csv file is not found. Can you share csv file? Or is there a way to create csv?

(monkeynet1) pc@monster:~/Desktop/monkey-net-master$ CUDA_VISIBLE_DEVICES=0 python run.py --config config/actions.yaml --mode transfer --checkpoint log/first/00000020-checkpoint.pth.tar 
run.py:35: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
Use predefined train-test split.
Transfer...
Traceback (most recent call last):
  File "run.py", line 78, in <module>
    transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
  File "/home/pc/Desktop/monkey-net-master/transfer.py", line 87, in transfer
    dataset = PairedDataset(initial_dataset=dataset, number_of_pairs=transfer_params['num_pairs'])
  File "/home/pc/Desktop/monkey-net-master/frames_dataset.py", line 111, in __init__
    pairs = pd.read_csv(pairs_list)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'data/actions.csv' does not exist: b'data/actions.csv'

Questions about kp2gaussian and gaussian2kp

Hi,
Thanks for your interesting work! I have some questions about the kp2gaussian and gaussian2kp in keypoint_detector.py

  1. Are these functions invertible? I use a pretrained hourglass network to extract heatmap, by feed it to gaussion2kp and kp2gaussion, the output become weird.

image

  1. To calculate the mean of heatmap, why apply "sum" function on it?
    mean = (heatmap * grid).sum(dim=(3, 4))

  2. If I have use pretrained landmark detector, for example, facial landmark detector, how should I modify the code?

Suggestions for better numerical stability

Hi,

Thanks for open-sourcing the code, really benefits my research a lot!

I found two small problems in the code that may cause numerical instability:

  1. Didn't guarantee s1 - s2 is positive before sqrt:

    norm = torch.sqrt((s1 - s2) / 2)

  2. The singular value can sometimes become too small, causing the output explodes to inf. Adding an epsilon to the denominator can help to stabilize it:

    var = torch.max(min_norm, sg) * var / sg

ValueError: cannot reshape array of size 10752 into shape (64,64,3)

I try to test on faces to get demo.gif result on a 64x64 input image but it I get this error

File "demo.py", line 62, in
source_image = VideoToTensor()(read_video(opt.source_image, opt.image_shape + (3,)))['video'][:, :1]
File "/content/monkey-net/frames_dataset.py", line 28, in read_video
video_array = video_array.reshape((-1,) + image_shape)
ValueError: cannot reshape array of size 10752 into shape (64,64,3)

using this command
!python demo.py --config config/nemo.yaml --driving_video sup-mat/driving.png --source_image source2.png --checkpoint /content/nemo-ckp.pth.tar --image_shape 64,64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.