Git Product home page Git Product logo

inferno's People

Contributors

radekd91 avatar timobolkart avatar zielon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inferno's Issues

Errors with running demo

Could not import SPECTRE. Make sure you pull the repository with submodules to enable SPECTRE.
Traceback (most recent call last):
File "/mnt/workspace/inferno/inferno/models/temporal/external/SpectrePreprocessor.py", line 16, in
from spectre.src.spectre import SPECTRE
ModuleNotFoundError: No module named 'spectre.src'

Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
SWIN not found, will not be able to use SWIN models
Looking for checkpoint in '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints'
Found 1 checkpoints

  • /mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt
    Selecting checkpoint '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt'
    Loading checkpoint '/mnt/workspace/inferno/assets/TalkingHead/models/EMOTE_v2/checkpoints/last.ckpt'
    Some weights of the model checkpoint at ../../../face/wav2vec2-base-960h were not used when initializing Wav2Vec2ModelResampled: ['lm_head.weight', 'lm_head.bias']
  • This IS expected if you are initializing Wav2Vec2ModelResampled from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2ModelResampled from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of Wav2Vec2ModelResampled were not initialized from the model checkpoint at ../../../face/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    /home/pai/envs/work38/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
    To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
    return torch.floor_divide(self, other)
    Looking for checkpoint in '/mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints'
    Found 1 checkpoints
  • /mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints/model-epoch=0758-val/loss_total=0.113977119327.ckpt
    Selecting checkpoint '/mnt/workspace/inferno/assets/MotionPrior/models/FLINTv2/checkpoints/model-epoch=0758-val/loss_total=0.113977119327.ckpt'
    creating the FLAME Decoder
    /mnt/workspace/inferno/inferno/models/DecaFLAME.py:93: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
    torch.tensor(lmk_embeddings['dynamic_lmk_faces_idx'], dtype=torch.long))
    /mnt/workspace/inferno/inferno/models/DecaFLAME.py:95: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
    torch.tensor(lmk_embeddings['dynamic_lmk_bary_coords'], dtype=self.dtype))
    creating the FLAME Decoder
    /home/pai/envs/work38/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:209: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['renderer.render.dense_faces', 'renderer.render.faces', 'renderer.render.raw_uvcoords', 'renderer.render.uvcoords', 'renderer.render.uvfaces', 'renderer.render.face_uvcoords', 'renderer.render.face_colors', 'renderer.render.constant_factor']
    rank_zero_warn(
    unable to load materials from: template.mtl
    0%| | 0/8 [00:00<?, ?it/s]
    Traceback (most recent call last):
    File "demos/demo_eval_talking_head_on_audio.py", line 172, in
    main()
    File "demos/demo_eval_talking_head_on_audio.py", line 153, in main
    eval_talking_head_on_audio(
    File "demos/demo_eval_talking_head_on_audio.py", line 80, in eval_talking_head_on_audio
    run_evalutation(talking_head,
    File "/mnt/workspace/inferno/inferno_apps/TalkingHead/evaluation/evaluation_functions.py", line 373, in run_evalutation
    batch = talking_head(batch)
    File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/mnt/workspace/inferno/inferno_apps/TalkingHead/evaluation/TalkingHeadWrapper.py", line 120, in forward
    sample = self.talking_head_model(sample)
    File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/mnt/workspace/inferno/inferno/models/talkinghead/TalkingHeadBase.py", line 526, in forward
    sample = self.forward_audio(sample, train=train, desired_output_length=desired_output_length, **kwargs)
    File "/mnt/workspace/inferno/inferno/models/talkinghead/TalkingHeadBase.py", line 234, in forward_audio
    return self.audio_model(sample, train=train, desired_output_length=desired_output_length, **kwargs)
    File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/mnt/workspace/inferno/inferno/models/temporal/AudioEncoders.py", line 236, in forward
    return self._forward(sample, train=train, desired_output_length=desired_output_length)
    File "/mnt/workspace/inferno/inferno/models/temporal/AudioEncoders.py", line 176, in _forward
    proc = self.input_processor(raw_audio, sampling_rate=sample["samplerate"][0], return_tensors="pt")
    File "/home/pai/envs/work38/lib/python3.8/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py", line 117, in call
    return self.current_processor(*args, **kwargs)
    File "/home/pai/envs/work38/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py", line 179, in call
    raw_speech = np.asarray(raw_speech, dtype=np.float32)
    File "/home/pai/envs/work38/lib/python3.8/site-packages/torch/_tensor.py", line 645, in array
    return self.numpy().astype(dtype, copy=False)
    TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Issue with Training EMOTE - Losses Not Converging in Second Stage

Hi there,

First of all, thank you for your incredible work on EMOTE!
I've been experimenting with training EMOTE and encountered some issues during the second stage. Here's a summary of the problem:

Issue Description:
In the first stage, which involves only the vertex level loss, everything seemed to work smoothly. The loss values descended as expected and converged to some stable values. However, when I moved on to the second stage, which includes both disentangle loss and lip reading loss, I noticed that the loss values for vertex level, lip reading, and disentangle started behaving erratically. They don't seem to descend well, and instead, they vibrate or fluctuate.

My Question:
I'm wondering if you, or anyone else using EMOTE, have encountered similar issues during the second stage of training.

Maybe I have messed up with implementing a custom renderer using pytorch3d,,,I'm not sure hence the issue.

Thanks in advance for any insights or assistance you can provide!

about FLINT

Great work!
If I just want to use FLINT, how do i use it?

egmentation fault (core dumped) error

I installed it according to readme, but this error always occurs when running the talking head module. I tested it on two servers and it was the same error.
image

Question about the video emotion recognition

Hi, thanks for releasing the code! I want to use the video emotion recognition network, and I found a question in its used module TransformerEncoder. It seems that the newly computed encoded_feature have overwritten the encoded_feature previously calculated using the alibi mask. This does not correspond to the description in the paper.

I also wanted to ask, how long do you usually set the sequence length T when using it?

Error when running demo?

Followed all the instructions to get the environment running, I've also ran the submodules (optional) script at the start of the instructions. Whenever I try to run the demo, I get the following:

Could not import SPECTRE. Make sure you pull the repository with submodules to enable SPECTRE.
Traceback (most recent call last):
File "/home/ubuntu/inferno/inferno/models/temporal/external/SpectrePreprocessor.py", line 16, in
from spectre.src.spectre import SPECTRE
ModuleNotFoundError: No module named 'spectre'

Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
Could not import EmoSwinModule. SWIN models will not be available. Make sure you pull the repository with submodules to enable SWIN.
SWIN not found, will not be able to use SWIN models
Traceback (most recent call last):
File "demos/demo_eval_talking_head_on_audio.py", line 21, in
from inferno_apps.TalkingHead.evaluation.evaluation_functions import *
File "/home/ubuntu/inferno/inferno_apps/TalkingHead/evaluation/evaluation_functions.py", line 35, in
from psbody.mesh import Mesh
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/psbody/mesh/init.py", line 10, in
from .meshviewer import MeshViewer, MeshViewers
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/psbody/mesh/meshviewer.py", line 49, in
from OpenGL import GL, GLU, GLUT
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/GLUT/init.py", line 5, in
from OpenGL.GLUT.fonts import *
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/GLUT/fonts.py", line 20, in
p = platform.getGLUTFontPointer( name )
File "/home/ubuntu/miniconda3/envs/work38/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 350, in getGLUTFontPointer
raise NotImplementedError(
NotImplementedError: Platform does not define a GLUT font retrieval function

Any suggestions where I may be going wrong?
Thank you!

Question about landmarks and EMICA's reconstruction from processed MEAD dataset

test_mediapipe_warp
test_mediapipe_rec_render
I downloaded the processed MEAD dataset using download_processed_mead.sh you provided. However, it seems that the landmarks from the processed/landmarks_original/.../landmarks.pkl do not align with the emica's reconstruction after flame lbs and orthogonal projection using the predicted 'cam', 'shape', 'exp' and 'pose'. Could you kindly elaborate on what the input images (how to warp it) of the EMICA is?

The first image is the processed 478 landmarks (from processed/landmarks_original/.../landmarks.pkl) drawn on the warped image using the landmarks_original.pkl. The second image is the projected 2d mediapipe lmks onto the same image using the reconstruction from the processed/reconstructions/.../shape_pose_cam.hdf5.

about MEAD data process

Thank you for this great work!

I downloaded a part of the dataset of mead and followed the data processing method of readme, modified the file path of input, and output, and set detect landmark to true in order, but I got this error.
ๅพฎไฟกๅ›พ็‰‡_20240307193058

9daef3cdd53b7827ba1d1d8fd38554f
960466aab2c32d71e5af6342b292b89

I located VideoFaceDetectionDataset here, self.index_for_frame_map are all 0, resulting in detection_in_ frame_index always 0, so there is an error reading the next frame. But I don't know how to fix it.
Is there something wrong with my steps๏ผŸ More details are below.
img_v3_028o_0ec4141b-8f47-4557-b136-cd29501f72fg

EMOTE: issues about spectre and SWIN

cgi-bin_mmwebwx-bin_webwxgetmsgimg_ MsgID=2576463121843906321 skey=@crypt_c3b5bd03_499489a85c5b16a437a4629a15903b9e mmweb_appid=wx_webfilehelper
Encountered these errors while running demo_eval_talking_head_on_audio.py. But I already have pulled submodules, and have tried many times. But it still gives errors.
How should I solve this?

How to use

There are detailed documentation for the use of instructions

EMOTE stage 1 - val loss not decreasing

Hi @radekd91, thanks again for releasing the code for EMOTE!

I am training EMOTE stage 1, and I noticed that while the training loss is converging, the validation loss goes up - is this something you noticed in your runs?

image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.