radekd91 / inferno Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 12.0 47.06 MB

🔥🔥🔥 Set the world of 3D faces on fire with INFERNO 🔥🔥🔥

License: Other

Python 98.85% Shell 0.97% Dockerfile 0.19%

inferno's People

Contributors

Stargazers

Watchers

Forkers

peterzs camenduru navezjt sal-dti jsaunders909 sichunwu sogoojoy rishiagarwal2000 jeb0813 raincrash zhongshijun wuzhongdehua

inferno's Issues

Errors with running demo

Could not import SPECTRE. Make sure you pull the repository with submodules to enable SPECTRE.
Traceback (most recent call last):
File "/mnt/workspace/inferno/inferno/models/temporal/external/SpectrePreprocessor.py", line 16, in
from spectre.src.spectre import SPECTRE
ModuleNotFoundError: No module named 'spectre.src'

Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
SWIN not found, will not be able to use SWIN models
Looking for checkpoint in '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints'
Found 1 checkpoints

/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt
Selecting checkpoint '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt'
Loading checkpoint '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt'
Some weights of the model checkpoint at ../../../face/wav2vec2-base-960h were not used when initializing Wav2Vec2ModelResampled: ['lm_head.weight', 'lm_head.bias']
This IS expected if you are initializing Wav2Vec2ModelResampled from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2ModelResampled from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ModelResampled were not initialized from the model checkpoint at ../../../face/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/pai/envs/work38/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Looking for checkpoint in '/mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints'
Found 1 checkpoints
/mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints/model-epoch=0758-val/loss_total=0.113977119327.ckpt
Selecting checkpoint '/mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints/model-epoch=0758-val/loss_total=0.113977119327.ckpt'
creating the FLAME Decoder
/mnt/workspace/inferno/inferno/models/DecaFLAME.py:93: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(lmk_embeddings['dynamic_lmk_faces_idx'], dtype=torch.long))
/mnt/workspace/inferno/inferno/models/DecaFLAME.py:95: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(lmk_embeddings['dynamic_lmk_bary_coords'], dtype=self.dtype))
creating the FLAME Decoder
/home/pai/envs/work38/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:209: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['renderer.render.dense_faces', 'renderer.render.faces', 'renderer.render.raw_uvcoords', 'renderer.render.uvcoords', 'renderer.render.uvfaces', 'renderer.render.face_uvcoords', 'renderer.render.face_colors', 'renderer.render.constant_factor']
rank_zero_warn(
unable to load materials from: template.mtl
0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "demos/demo_eval_talking_head_on_audio.py", line 172, in
main()
File "demos/demo_eval_talking_head_on_audio.py", line 153, in main
eval_talking_head_on_audio(
File "demos/demo_eval_talking_head_on_audio.py", line 80, in eval_talking_head_on_audio
run_evalutation(talking_head,
File "/mnt/workspace/inferno/inferno_apps/TalkingHead/evaluation/evaluation_functions.py", line 373, in run_evalutation
batch = talking_head(batch)
File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/inferno/inferno_apps/TalkingHead/evaluation/TalkingHeadWrapper.py", line 120, in forward
sample = self.talking_head_model(sample)
File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/inferno/inferno/models/talkinghead/TalkingHeadBase.py", line 526, in forward
sample = self.forward_audio(sample, train=train, desired_output_length=desired_output_length, **kwargs)
File "/mnt/workspace/inferno/inferno/models/talkinghead/TalkingHeadBase.py", line 234, in forward_audio
return self.audio_model(sample, train=train, desired_output_length=desired_output_length, **kwargs)
File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/inferno/inferno/models/temporal/AudioEncoders.py", line 236, in forward
return self._forward(sample, train=train, desired_output_length=desired_output_length)
File "/mnt/workspace/inferno/inferno/models/temporal/AudioEncoders.py", line 176, in _forward
proc = self.input_processor(raw_audio, sampling_rate=sample["samplerate"][0], return_tensors="pt")
File "/home/pai/envs/work38/lib/python3.8/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 117, in call
return self.current_processor(*args, **kwargs)
File "/home/pai/envs/work38/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py", line 179, in call
raw_speech = np.asarray(raw_speech, dtype=np.float32)
File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/_tensor.py", line 645, in array
return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

the link is broken

the link https://download.is.tue.mpg.de/emote/EMOTE_test_example_data.zip is broken

Issue with Training EMOTE - Losses Not Converging in Second Stage

Hi there,

First of all, thank you for your incredible work on EMOTE!
I've been experimenting with training EMOTE and encountered some issues during the second stage. Here's a summary of the problem:

Issue Description:
In the first stage, which involves only the vertex level loss, everything seemed to work smoothly. The loss values descended as expected and converged to some stable values. However, when I moved on to the second stage, which includes both disentangle loss and lip reading loss, I noticed that the loss values for vertex level, lip reading, and disentangle started behaving erratically. They don't seem to descend well, and instead, they vibrate or fluctuate.

My Question:
I'm wondering if you, or anyone else using EMOTE, have encountered similar issues during the second stage of training.

Maybe I have messed up with implementing a custom renderer using pytorch3d,,,I'm not sure hence the issue.

Thanks in advance for any insights or assistance you can provide!

about FLINT

Great work!
If I just want to use FLINT, how do i use it?

Looking forward to the release of FaceReconstruction model

Marvelous project! Looking forward to the release of FaceReconstruction model.

egmentation fault (core dumped) error

I installed it according to readme, but this error always occurs when running the talking head module. I tested it on two servers and it was the same error.

Question about the video emotion recognition

Hi, thanks for releasing the code! I want to use the video emotion recognition network, and I found a question in its used module TransformerEncoder. It seems that the newly computed encoded_feature have overwritten the encoded_feature previously calculated using the alibi mask. This does not correspond to the description in the paper.

I also wanted to ask, how long do you usually set the sequence length T when using it?

Error when running demo?

Followed all the instructions to get the environment running, I've also ran the submodules (optional) script at the start of the instructions. Whenever I try to run the demo, I get the following:

Could not import SPECTRE. Make sure you pull the repository with submodules to enable SPECTRE.
Traceback (most recent call last):
File "/home/ubuntu/inferno/inferno/models/temporal/external/SpectrePreprocessor.py", line 16, in
from spectre.src.spectre import SPECTRE
ModuleNotFoundError: No module named 'spectre'

Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
SWIN not found, will not be able to use SWIN models
Traceback (most recent call last):
File "demos/demo_eval_talking_head_on_audio.py", line 21, in
from inferno_apps.TalkingHead.evaluation.evaluation_functions import *
File "/home/ubuntu/inferno/inferno_apps/TalkingHead/evaluation/evaluation_functions.py", line 35, in
from psbody.mesh import Mesh
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/psbody/mesh/init.py", line 10, in
from .meshviewer import MeshViewer, MeshViewers
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/psbody/mesh/meshviewer.py", line 49, in
from OpenGL import GL, GLU, GLUT
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/GLUT/init.py", line 5, in
from OpenGL.GLUT.fonts import *
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/GLUT/fonts.py", line 20, in
p = platform.getGLUTFontPointer( name )
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 350, in getGLUTFontPointer
raise NotImplementedError(
NotImplementedError: Platform does not define a GLUT font retrieval function

Any suggestions where I may be going wrong?
Thank you!

Question about landmarks and EMICA's reconstruction from processed MEAD dataset

I downloaded the processed MEAD dataset using download_processed_mead.sh you provided. However, it seems that the landmarks from the processed/landmarks_original/.../landmarks.pkl do not align with the emica's reconstruction after flame lbs and orthogonal projection using the predicted 'cam', 'shape', 'exp' and 'pose'. Could you kindly elaborate on what the input images (how to warp it) of the EMICA is?

The first image is the processed 478 landmarks (from processed/landmarks_original/.../landmarks.pkl) drawn on the warped image using the landmarks_original.pkl. The second image is the projected 2d mediapipe lmks onto the same image using the reconstruction from the processed/reconstructions/.../shape_pose_cam.hdf5.

Problem about downloading processed data

Hello, thanks for your nice work at first! However, when I ran wget https://download.is.tue.mpg.de/emote/mead_25fps/processed/metadata.pkl -O metadata.pkl to download metadata.pkl, it couldn't find the file, could you give me some help!

about MEAD data process

Thank you for this great work!

I downloaded a part of the dataset of mead and followed the data processing method of readme, modified the file path of input, and output, and set detect landmark to true in order, but I got this error.

I located VideoFaceDetectionDataset here, self.index_for_frame_map are all 0, resulting in detection_in_ frame_index always 0, so there is an error reading the next frame. But I don't know how to fix it.
Is there something wrong with my steps？ More details are below.