aliaksandrsiarohin / monkey-net Goto Github PK
View Code? Open in Web Editor NEWAnimating Arbitrary Objects via Deep Motion Transfer
Animating Arbitrary Objects via Deep Motion Transfer
I try to test on faces to get demo.gif result on a 64x64 input image but it I get this error
File "demo.py", line 62, in
source_image = VideoToTensor()(read_video(opt.source_image, opt.image_shape + (3,)))['video'][:, :1]
File "/content/monkey-net/frames_dataset.py", line 28, in read_video
video_array = video_array.reshape((-1,) + image_shape)
ValueError: cannot reshape array of size 10752 into shape (64,64,3)
using this command
!python demo.py --config config/nemo.yaml --driving_video sup-mat/driving.png --source_image source2.png --checkpoint /content/nemo-ckp.pth.tar --image_shape 64,64
Hi
I just read through the readme doc and try to make it work.
I executed the command in the doc:
python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint
But it doesn't work.
It's missing the python file and I cannot find the driving_video.gif and source_image.gif in sup-mat.
I noticed that the network accept stacked '.png' and there are two png images under sup-mat.
Just want to confirm the correct command of running the check point. Should it be :
python demo.py --config config/moving-gif.yaml --driving_video driving.png --source_image source.png --checkpoint moving-gif-ckp.pth.tar
If it's correct, I'm happy to rise a PR to fix it :)
Hello,
What is the intuition behind squishing the grid coordinates between [-1, 1] before calculating the expectation and covariance of the heatmaps?
In the paper it is mentioned that the grid coordinates have values within the H*W coordinates but in the code those values are reduced to [-1, 1]. Referring to the make_coordinate_grid method under /modules/util.py. Please let me know if I am missing something.
Thanks in advance!
I use moving-gif-128 dataset and moving-gif.yaml you provide to train a model on my GPU, but my model performs much worse on video generation than the model you provided. What training strategy did you adopt to make the model work so well?
I run demo.py using the code you provided:
python demo.py --config config/moving-gif.yaml --driving_video sup-mat/driving.png --source_image sup-mat/source.png --checkpoint moving-gif-ckp.pth.tar
But it says "out of memory".Is my GPU memory insufficient? How much memory is required to run it?
Hi, thanks for this work! I wanted to ask how one could extract the keypoints from the KeyPointDetector using the mean and variance outputs and map them on the source image.
cuda running out of memory with vox 256x256 dataset chekpoint. suggest edits pls.
How do I create images in the shape folder when training with my own data? What do the images in the Shape folder mean?
Hi,
I had trained the MonkeyNet for 200 epochs, when I checked the log file it has 312 lines.
Can someone explain why it is so?
Thanks in advance :-)
I have downloaded the checkpoint from the link in the readme (https://yadi.sk/d/BX-hwuPEVm6iNw) but am unable to extract it:
tar -xvf moving-gif-ckp.pth.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Sometimes my video is too large to fit in the GPU and I would like to predict frame by frame.
I changed lines 64-70 of demo.py
to the code below but the output video looks static. I checked the values between consecutive frames and they do have subtle differences. Please kindly advise how to modify the code to predict frame by frame.
driving_video = torch.from_numpy(driving_video).unsqueeze(0)
source_image = driving_video[:, :, 0].unsqueeze(2)
out_video_batch = []
for frame_idx in range(driving_video.shape[2]):
driving_frame = driving_video[:, :, frame_idx, :, :].unsqueeze(2)
out = transfer_one(generator, kp_detector, source_image, driving_frame, config['transfer_params'])
out_video_batch.append(torch.squeeze(out['video_prediction']).permute(1, 2, 0).data.cpu().numpy())
hi:
Have a question , in
Line 48 in 0c8aa4e
Line 70 in 0c8aa4e
Hi,
Thanks for your interesting work! I have some questions about the kp2gaussian and gaussian2kp in keypoint_detector.py
To calculate the mean of heatmap, why apply "sum" function on it?
mean = (heatmap * grid).sum(dim=(3, 4))
If I have use pretrained landmark detector, for example, facial landmark detector, how should I modify the code?
The link of checkpoint is invalid
Firstly, great work on maintaining this repo.
I was trying to use the vox pretrained weights to train on my custom data using vox-full.yaml. But the training terminates. It shows 0/20 its and then exits.
I ran the following command (custom.yaml is based on vox-full.yaml):
CUDA_VISIBLE_DEVICES=0 python run.py --config config/custom.yaml --checkpoint log/custom/vox-cpk.pth.tar
Also, thank you
Thank you very much for your impressive work.
I am interested in your work and want to use the model as a baseline of my Face Reenactment network.
However, the downloaded model has an error (maybe it is an invalid link) when extracting the files.
Can you provide me the pre-trained model directly?
For sure I will cite your paper on my work.
Thanks.
Hi there @AliaksandrSiarohin , thanks for your wonderful work. Could you share your pre-trained models?
I only find the moving-gif model.
Hello,
In dense motion module, while running the predictions through the hourglass to calculate the mask, the hourglass' decoder network's weights are assigned to 0 and then the bias is initialized to a particular value.
Can you please let me know the reason for this as I am trying to correlate this with the original paper?
modules -> dense_motion_module.py
self.hourglass.decoder.conv.weight.data.zero_()
self.hourglass.decoder.conv.bias.data.copy_(torch.tensor(bias_init, dtype=torch.float))
Can transfer parameters have any other format other than .gif? I wanted produce multiple stacked png outputs. Where should the files be placed(i.e the source and driving stacked images) because you have mentioned to use a csv like taichi.csv
when run demo.py,have some problem as follow,can you help me,thanks
Traceback (most recent call last):
File "demo.py", line 52, in
Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
File "/root/dance/monkey-net/logger.py", line 54, in load_cpk
generator.load_state_dict(checkpoint['generator'])
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".
I see you have mentioned your Arxiv paper in the readme, but can you also add a hyperlink to it? The link is https://arxiv.org/abs/1812.08861
First of all, thank you for opensouring amazing code.
I have a question with Mask Embedding. For mask embedding, I've checked that difference of gaussian kp heatmap and deformed source image are needed(to concatenate).
What's your intent behind this movement encoding? Is there any reference of this?
Through Same block and Hourglass prediction, how can movement encoding act as a mask?
Thank you!
can you provide Taichi pretrain model ? thanks!
Hello,
I am currently generating keypoints on a different face dataset and they look a bit off (they do not focus on the supposedly moving parts). They also generate varying keypoints at each run.
I have tried with different normalizing constants while apply softmax to the heatmaps but it doesn't seem to focus. The overall U-net architecture is pretty much the same.
Can you please provide any suggestion to improve this?
Thank you.
I am trying to transfer the facial expressions in one gif (driving_video.gif) to another photo (source_image.jpg), using the pretrained checkpoint provided. In the readme the following command is given (missing a file - presumably this should be python demo.py):
python --config config/moving-gif.yaml --driving_video sup-mat/driving_video.gif --source_image sup-mat/source_image.gif --checkpoint path/to/checkpoint
I've tried with the following (I have just put driving_video.gif, source_image.jpg and moving-gif-ckp.pth.tar in the root of the project folder):
python demo.py --config config/moving-gif.yaml --driving_video driving_video.gif --source_image source_image.jpg --checkpoint moving-gif-ckp.pth.tar
This results in the following:
Traceback (most recent call last):
File "demo.py", line 52, in <module>
Logger.load_cpk(opt.checkpoint, generator=generator, kp_detector=kp_detector)
File "/home/paperspace/monkey/monkey/logger.py", line 54, in load_cpk
generator.load_state_dict(checkpoint['generator'])
File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.0.norm.num_batches_tracked", "appearance_encoder.down_blocks.1.norm.num_batches_tracked", "appearance_encoder.down_blocks.2.norm.num_batches_tracked", "appearance_encoder.down_blocks.3.norm.num_batches_tracked", "appearance_encoder.down_blocks.4.norm.num_batches_tracked", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "dense_motion_module.group_blocks.0.norm.num_batches_tracked", "dense_motion_module.group_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.encoder.down_blocks.4.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.0.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.1.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.2.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.3.norm.num_batches_tracked", "dense_motion_module.hourglass.decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.0.norm.num_batches_tracked", "video_decoder.up_blocks.1.norm.num_batches_tracked", "video_decoder.up_blocks.2.norm.num_batches_tracked", "video_decoder.up_blocks.3.norm.num_batches_tracked", "video_decoder.up_blocks.4.norm.num_batches_tracked", "video_decoder.up_blocks.5.norm.num_batches_tracked", "refinement_module.r0.norm1.num_batches_tracked", "refinement_module.r0.norm2.num_batches_tracked", "refinement_module.r1.norm1.num_batches_tracked", "refinement_module.r1.norm2.num_batches_tracked", "refinement_module.r2.norm1.num_batches_tracked", "refinement_module.r2.norm2.num_batches_tracked", "refinement_module.r3.norm1.num_batches_tracked", "refinement_module.r3.norm2.num_batches_tracked".
Hello, I am trying to transfer motion of one bird to another. However, due to lack of data, I am trying to infer from using moving-gif weights provided with the code. After adding a white background to all frames, I tried to reconstruct the video giving the first frame of the driving video as the source image. I observed that apart from the generator not able to generate convincing images, sometimes the key points were estimated incorrectly. Is there any sort of relation in which the keypoint detector depends on the object (mostly quadrupeds in moving-gif) the network was trained on.
Hi,
Thanks for open-sourcing the code, really benefits my research a lot!
I found two small problems in the code that may cause numerical instability:
Didn't guarantee s1 - s2
is positive before sqrt:
Line 254 in 7c116b6
The singular value can sometimes become too small, causing the output explodes to inf
. Adding an epsilon to the denominator can help to stabilize it:
monkey-net/modules/keypoint_detector.py
Line 65 in 7c116b6
I completed train and reconstruction steps. When i running motion transfer step, it's get error for csv file is not found. Can you share csv file? Or is there a way to create csv?
(monkeynet1) pc@monster:~/Desktop/monkey-net-master$ CUDA_VISIBLE_DEVICES=0 python run.py --config config/actions.yaml --mode transfer --checkpoint log/first/00000020-checkpoint.pth.tar
run.py:35: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Use predefined train-test split.
Transfer...
Traceback (most recent call last):
File "run.py", line 78, in <module>
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/pc/Desktop/monkey-net-master/transfer.py", line 87, in transfer
dataset = PairedDataset(initial_dataset=dataset, number_of_pairs=transfer_params['num_pairs'])
File "/home/pc/Desktop/monkey-net-master/frames_dataset.py", line 111, in __init__
pairs = pd.read_csv(pairs_list)
File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/pc/anaconda3/envs/monkeynet1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'data/actions.csv' does not exist: b'data/actions.csv'
I want to train your model with VoxCeleb dataset.
Which configuration do I use between them?
Also, could you provide VoxCeleb pretrain model?
Thank you
how do i reduce the batch size? demo.py
I am running motion transfer with the following command, using the pretrained checkpoint in the readme (keeping everything in moving-gif.yaml the same):
python demo.py --config config/moving-gif.yaml --driving_video driver.gif --source_image source.png --checkpoint moving-gif-ckp.pth.tar
The driving gif is as follows:
Source image:
This results in the following:
do you have any good strategy for aligning source image with the first frame of driver video
Hello, I get the pre-trained nemo model from [https://yadi.sk/d/EX7N9fuIuE4FNg], (https://yadi.sk/d/EX7N9fuIuE4FNg), but I get two problem:
when I try the image size of 64x64, driving image for test/213_deliberate_smile_1.png
of nemo dataset, source image for the first five frames of test/505_spontaneous_smile_4.png
of nemo dataset, the nemo model works very well, but when I try image size 128x128, 256x256, 512x512(using resize) for the same driving image and source image, the result.gif is bad.
when I try the driving image test/213_deliberate_smile_1.png
of nemo dataset , source image for one 64x64 test.gif from common front face image, the result.gif is bad.
Can anyone give some advises to fix the above problem? @AliaksandrSiarohin
Thank you very much~
Use predefined train-test split.
Transfer...
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
0it [00:00, ?it/s]
Traceback (most recent call last):
File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
I understand that I need to put all the input tensors on the 0 device. But not sure exactly how to do that, I tried some ways from
https://discuss.pytorch.org/t/how-to-solve-the-problem-of-runtimeerror-all-tensors-must-be-on-devices-0/15198/5
however that did not work.
I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).
Running the model on 2 RTX 2080 with CUDA 10
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.