Comments (3)
First of all, test/****.png is actually videos. The frames are stacked together for a simpler i/o.
-
What do you mean resize? You resize some of your images to 64x64, or resize test/***.png to 128x128? If you want to use the model for higher resolution, the model should be trained on higher resolution dataset. For example 256x256 models on nemo dataset can be found here, 256x256 models on VoxCeleb here.
-
What is the common face image? Please post your images and your results.
from monkey-net.
@AliaksandrSiarohin Thanks for your reply.
- I use
resize
. For example, for driving imagetest/213_deliberate_smile_1.png
, I modify some codes inframes_dataset.py
to :
video_array = np.moveaxis(image, 1, 0)
video_array = video_array.reshape((-1,) + image_shape)
video_array = np.moveaxis(video_array, 1, 2)
video_array = np.array([resize(frame, (256, 256)) for frame in video_array])
Of cause, I do the same resize
operation of 256x256 with the source image for the first five frames of test/505_spontaneous_smile_4.png
. The result image is blurred for resize
of 128x128 or 256x256 ops.
Thanks for your 256x256 pre-trained model, but how to modify some configuration in config/nemo.yaml
, because I get this error:
Traceback (most recent call last):
...
RuntimeError: Error(s) in loading state_dict for MotionTransferGenerator:
Unexpected key(s) in state_dict: "appearance_encoder.down_blocks.5.conv.weight", "appearance_encoder.down_blocks.5.conv.bias", "appearance_encoder.down_blocks.5.norm.weight", "appearance_encoder.down_blocks.5.norm.bias", "appearance_encoder.down_blocks.5.norm.running_mean", "appearance_encoder.down_blocks.5.norm.running_var", "appearance_encoder.down_blocks.5.norm.num_batches_tracked", "appearance_encoder.down_blocks.6.conv.weight", "appearance_encoder.down_blocks.6.conv.bias", "appearance_encoder.down_blocks.6.norm.weight", "appearance_encoder.down_blocks.6.norm.bias", "appearance_encoder.down_blocks.6.norm.running_mean", "appearance_encoder.down_blocks.6.norm.running_var", "appearance_encoder.down_blocks.6.norm.num_batches_tracked", "video_decoder.up_blocks.5.conv.weight", "video_decoder.up_blocks.5.conv.bias", "video_decoder.up_blocks.5.norm.weight", "video_decoder.up_blocks.5.norm.bias", "video_decoder.up_blocks.5.norm.running_mean", "video_decoder.up_blocks.5.norm.running_var", "video_decoder.up_blocks.5.norm.num_batches_tracked", "video_decoder.up_blocks.6.conv.weight", "video_decoder.up_blocks.6.conv.bias", "video_decoder.up_blocks.6.norm.weight", "video_decoder.up_blocks.6.norm.bias", "video_decoder.up_blocks.6.norm.running_mean", "video_decoder.up_blocks.6.norm.running_var", "video_decoder.up_blocks.6.norm.num_batches_tracked".
size mismatch for appearance_encoder.down_blocks.4.conv.weight: copying a param with shape torch.Size([1024, 512, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 3, 3]).
size mismatch for appearance_encoder.down_blocks.4.conv.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for appearance_encoder.down_blocks.4.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
...
- I upload 4 test images here , the
xxx_result.gif
is corresponding to thexxx.gif
that is source image fornemo-ckp.pth.tar
model and meanwhile I usetest/213_deliberate_smile_1.png
as driving image.
from monkey-net.
- Yes the keypoints is learned to be extracted at resolution 64x64, I doubt it generalize to higher resolutions. Need another model, or model trained on different resolutions.
Model params should be, same as in vox.yaml.:
model_params:
common_params:
num_kp: 10
kp_variance: 'matrix'
num_channels: 3
kp_detector_params:
temperature: 0.1
block_expansion: 32
max_features: 1024
scale_factor: 0.25
num_blocks: 5
clip_variance: 0.001
generator_params:
interpolation_mode: 'trilinear'
block_expansion: 32
max_features: 1024
num_blocks: 7
num_refinement_blocks: 4
dense_motion_params:
block_expansion: 32
max_features: 1024
num_blocks: 5
use_mask: True
use_correction: True
scale_factor: 0.25
mask_embedding_params:
use_heatmap: True
use_deformed_source_image: True
heatmap_type: 'difference'
norm_const: 100
num_group_blocks: 2
kp_embedding_params:
scale_factor: 0.25
use_heatmap: True
norm_const: 100
heatmap_type: 'difference'
discriminator_params:
kp_embedding_params:
norm_const: 100
block_expansion: 32
max_features: 256
num_blocks: 4
- Most likely nemo is too small to generalize to arbitrary faces. Try model trained on vox-celeb.
from monkey-net.
Related Issues (20)
- What is the image meaning of each column when training shape dataset HOT 2
- Questions about kp2gaussian and gaussian2kp HOT 1
- how do i reduce the batch size? demo.py
- cuda running out of memory with vox 256x256 dataset chekpoint. suggest edits pls. HOT 1
- align source image with the first frame of driver video HOT 4
- Extract keypoints HOT 3
- expected keypoint coordinates HOT 2
- How to make motion transfer demo working for frame by frame prediction? HOT 1
- assigning zero weights to hourglass decoder while predicting mask HOT 1
- keypoint predictions HOT 7
- About pretrained models HOT 2
- Mask Embedding
- The link of checkpoint is invalid
- Pretrained checkpoint HOT 1
- Training on custom data using pretrained weights HOT 4
- transfer params
- Log file - More number of lines in log file than expected HOT 4
- training strategy HOT 1
- Question about generator training HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monkey-net.