Git Product home page Git Product logo

codetalker's Introduction

codetalker's People

Contributors

doubiiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codetalker's Issues

[Question]have you tried to use blendshape on this network

Hello, have you tried to use blendshape on this network? I tried to train under the teachingforcing strategy, but I found that the inference was very problematic, but if I also use the teachingforcing similar guidance for inference, the result is very good . But it's pointless to reason like this,do you have any suggestions for this?

Different training results

I completed two stages of training using the vocaset dataset, but when I input the audio test, the output results are not good, except for the first few frames, the later face is not much movement.

Add Texuture

Do you know how to get the talking result with texture when I reconstruct a new face in FLAME template and obtain its material file (.mtl file)?
The material file has the depth image, normal image and texture image.

WARNING:root:NaN or Inf found in input tensor.

sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1
sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

On the vocaset training, Nan appears in both the first and second stages.

[2023-04-21 18:27:21,120 INFO train_vq.py line 189 19610]=>Epoch: [1/200][60/314] Data: 0.027 (0.038) Batch: 0.076 (0.141) Remain: 02:27:53 Loss: 0.1405 
[2023-04-21 18:27:22,283 INFO train_vq.py line 189 19610]=>Epoch: [1/200][70/314] Data: 0.028 (0.037) Batch: 0.077 (0.138) Remain: 02:24:02 Loss: 0.1339 
[2023-04-21 18:27:23,436 INFO train_vq.py line 189 19610]=>Epoch: [1/200][80/314] Data: 0.025 (0.036) Batch: 0.070 (0.135) Remain: 02:21:09 Loss: 0.1392 
[2023-04-21 18:27:24,593 INFO train_vq.py line 189 19610]=>Epoch: [1/200][90/314] Data: 0.027 (0.035) Batch: 0.144 (0.133) Remain: 02:18:51 Loss: 0.1353 
[2023-04-21 18:27:25,681 INFO train_vq.py line 189 19610]=>Epoch: [1/200][100/314] Data: 0.025 (0.034) Batch: 0.068 (0.130) Remain: 02:16:20 Loss: 0.1325 
[2023-04-21 18:27:26,705 INFO train_vq.py line 189 19610]=>Epoch: [1/200][110/314] Data: 0.027 (0.033) Batch: 0.072 (0.128) Remain: 02:13:38 Loss: 0.1300 
[2023-04-21 18:27:27,809 INFO train_vq.py line 189 19610]=>Epoch: [1/200][120/314] Data: 0.027 (0.033) Batch: 0.075 (0.126) Remain: 02:12:05 Loss: 0.1290 
[2023-04-21 18:27:28,815 INFO train_vq.py line 189 19610]=>Epoch: [1/200][130/314] Data: 0.027 (0.032) Batch: 0.139 (0.124) Remain: 02:10:00 Loss: nan 
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
[2023-04-21 18:27:29,784 INFO train_vq.py line 189 19610]=>Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan 
INFO:main-logger:Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan 
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.

The lve.txt and fdd.txt files for VOCA were not found

Hello, when I was looking at the evaluation code, I only found some files for BIWI, such as the vertex index files lve.txt and fdd.txt. Can you provide these two files for the VOCA dataset? I would be very grateful if it was possible

How to prepare training data ?

Data Preparation

Place your vertices data (.npy files) and audio data (.wav files) in <dataset_dir>/vertices_npy and <dataset_dir>/wav folders, respectively.

Save the templates of all subjects to a templates.pkl file and put it in <dataset_dir>, as done for BIWI and vocaset dataset. Export an arbitary template to .ply format and put it in <dataset_dir>/.

Ask about Data Preparation of Play with Your Own Data
.npy
templates.pkl
.ply
How are these documents prepared?

where to get pre-trained models model.pth.tar ?

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

cat config/vocaset/stage2.yaml
vqvae_pretrained_path: RUN/vocaset/CodeTalker_s1/model/model.pth.tar
wav2vec2model_path: facebook/wav2vec2-base-960h

where to get RUN/vocaset/CodeTalker_s1/model/model.pth.tar and facebook/wav2vec2-base-960h ?

Vocaset-test lip sync error comparison results

Hello & thanks for your work. I was wondering about lip sync error comparison for VOCASET-test data. I saw it reported for BIWI but couldn't find the one for VOCASET in the paper. Please let me know if I'm missing something

The tensor output by self.vertice_mapping in the TransformerEncoder of stage1 is all nan

Generally, when training to the second epoch, the output results are all nan. At this time, I check the bias and weight of the linear layer, and the results are all nan.

self.encoder.vertice_mapping[0]
Linear(in_features=15069, out_features=1024, bias=True)
self.encoder.vertice_mapping[0].bias
Parameter containing:
tensor([nan, nan, nan,  ..., nan, nan, nan], device='cuda:0',
       requires_grad=True)
self.encoder.vertice_mapping[0].weight
Parameter containing:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       requires_grad=True)

BIWI dataset

May I ask who has the biwi dataset to download, can you share it?

Applied to a 3D model of the human body as a whole

First of all, thank the author for opening up the code. I am currently confused about this output video. I would like to apply it to a 3D model of the human body as a whole. What method should I use to apply it? Can you provide an answer?

Colab notebook doesn't work

Hello,
I'm trying to run the Colab online demo but I obtain different errors at the runtime
1)```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.11.0 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.11.0 which is incompatible.


2) `ERROR: Cannot install pyglet==1.5.27, pyopengl==3.1.5, pyrender==0.1, pyrender==0.1.1, pyrender==0.1.10, pyrender==0.1.11, pyrender==0.1.12, pyrender==0.1.13, pyrender==0.1.14, pyrender==0.1.15, pyrender==0.1.16, pyrender==0.1.17, pyrender==0.1.18, pyrender==0.1.2, pyrender==0.1.20, pyrender==0.1.21, pyrender==0.1.22, pyrender==0.1.23, pyrender==0.1.24, pyrender==0.1.25, pyrender==0.1.26, pyrender==0.1.27, pyrender==0.1.28, pyrender==0.1.29, pyrender==0.1.3, pyrender==0.1.30, pyrender==0.1.31, pyrender==0.1.32, pyrender==0.1.33, pyrender==0.1.34, pyrender==0.1.35, pyrender==0.1.36, pyrender==0.1.39, pyrender==0.1.4, pyrender==0.1.40, pyrender==0.1.41, pyrender==0.1.42, pyrender==0.1.43, pyrender==0.1.44, pyrender==0.1.45, pyrender==0.1.5, pyrender==0.1.6, pyrender==0.1.7, pyrender==0.1.8 and pyrender==0.1.9 because these package versions have conflicting dependencies.`

3) ```
Building wheels for collected packages: tokenizers, sacremoses
  error: subprocess-exited-with-error
  
  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for tokenizers (pyproject.toml) ... error
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
16 packages can be upgraded. Run 'apt list --upgradable' to see them.

4)```
Traceback (most recent call last):
File "/content/CodeTalker/main/demo.py", line 9, in
from transformers import Wav2Vec2Processor
ModuleNotFoundError: No module named 'transformers'

Bug about multiprocessing distributed

In order to train with 4 graphics cards at the same time, I set a to True and set train_gpu to [0,1,2,3], but I got the following error

[2023-04-18 15:43:41,052 INFO train_pred.py line 71 682101]=>=> creating model ...                                                      [195/1825]
Traceback (most recent call last):                                                                                                                
  File "/root/miniconda3/envs/py3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main                                                     
    return _run_code(code, main_globals, None,                                                                                                    
  File "/root/miniconda3/envs/py3.9/lib/python3.9/runpy.py", line 87, in _run_code                                                                
    exec(code, run_globals)                                                                                                                       
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/__main__.py", line 39, in <module>                                        
    cli.main()                                                                                                                                    
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/server/cli.py", line 430, in main                                         
    run()                                                                                                                                         
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/server/cli.py", line 284, in run_file                                     
    runpy.run_path(target, run_name="__main__")                                                                                                   
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path   
    return _run_module_code(code, init_globals, run_name,                                                                                         
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module
_code                                                                                                                                             
    _run_code(code, mod_globals, init_globals,
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "main/train_pred.py", line 259, in <module>
    main()
  File "main/train_pred.py", line 45, in main
    mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/root/autodl-tmp/hzt/code/CodeTalker/main/train_pred.py", line 120, in main_worker
    loss_train, motion_loss_train, reg_loss_train = train(train_loader, model, loss_fn, optimizer, epoch, cfg)
  File "/root/autodl-tmp/hzt/code/CodeTalker/main/train_pred.py", line 162, in train
    model.autoencoder.eval()
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'autoencoder'

Expressions for vocaset

Hello,

Thanks for releasing this amazing work. I was curious if vocaset pre-trained models have diverse upper face expressions as well because the original dataset does not have them. If yes, could you comment on how you trained it?

Once again, great work! Thank you!

Can you animate any FLAME face model like voca does

Can I use the flame face model outside the vocaset(Training set, Validation set and test set) as a template to generate face animation, how do I set it up, and can I do it using a pre-trained model? Or need to retrain the model

About The Code

Hello, I have been learning about 3D speech-driven animation recently, and your paper inspired me a lot. I wonder when you will release the code so that I can learn more details. Thank you.

Is it necessary for commenting out the self.padding_mode != 'zeros'?

Is it necessary for commenting out the self.padding_mode != 'zeros'?
It won't report an error without making any modifications. Will it affect model accuracy. Thanks.

IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown https://github.com/NVIDIA/tacotron2/issues/182.

[Question]why train in a teachingforcing scheme

Hi,”the autoregressive model is trained in a teachingforcing scheme“ is mentioned in your article,but why?In previous related work, Faceformer pointed out that using this strategy will lead to poor results,can you please tell me your opinion?

More demos?

Can you provide some generated video demos? Do not see any link to your generated videos in your paper

biwi_stage2.pth.tar significant loss in training dataset.

I used the pretrained model(biwi_stage2.pth.tar) for fine-tuning training and found that there were significant loss in both the training and validation dataset, which is a bit unreasonable. Was biwi_stage2.pth.tar trained in the testing set Or there are some additional techniques outside of the code. Could you help explain? Very Thanks.

AttributeError: 'tuple' object has no attribute 'transpose'

when running, !sh scripts/demo.sh vocaset

Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
=> loading checkpoint 'vocaset/vocaset_stage2.pth.tar'
=> loaded checkpoint 'vocaset/vocaset_stage2.pth.tar'
Generating facial animation for demo/wav/man.wav...
2023-08-01 13:21:18.492516: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/content/CodeTalker/main/demo.py", line 219, in <module>
    main()
  File "/content/CodeTalker/main/demo.py", line 129, in main
    test(model, cfg.demo_wav_path, save_folder, condition, subject)
  File "/content/CodeTalker/main/demo.py", line 167, in test
    prediction = model.predict(audio_feature, template, one_hot)
  File "/content/CodeTalker/models/stage2.py", line 115, in predict
    hidden_states = self.audio_encoder(audio, self.dataset).last_hidden_state
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/CodeTalker/models/lib/wav2vec.py", line 132, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 788, in forward
    position_embeddings = self.pos_conv_embed(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 397, in forward
    hidden_states = hidden_states.transpose(1, 2)
AttributeError: 'tuple' object has no attribute 'transpose'

perplexity

During the training, the perplexity has been rising, what does the perplexity mean?

Error when training stage2

Hi, thanks for sharing your great work.
When training stage 2 by executing sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2, I get cuDNN error: CUDNN_STATUS_NOT_INITIALIZED. The error occurs when passing the audio into the audio encoder (wav2vec), at F.conv1d.
I have followed the provided environment setting and the error still occurs.
Can you help me solve the problem?

Why using teacher-forcing scheme?

Why using teacher-forcing scheme?
Teacher-forcing scheme proved to be worse than autoregressive scheme in many paper such as Faceformer and FaceXHuBERT?

Question with training stage 2

Hi, I tried to train stage 2 on my dataset, but the loss only oscillates and doesn't go down.
Should I reduce the learning rate (it is currently 1e-4) ?
Or should I reset the weight between loss_motion and loss_reg?
How much should the loss go down?
Do you have any tips for stage 2 training?
Thank you!

Evaluation score between my retraining and CodeTalker paper is not the same.

Evaluation score between my retraining and CodeTalker paper is not the same.

The first stage model was reused and the second stage model was retrained, but the final score were inconsistent with those in the paper.

Could you help explain? Thanks.

Retraing result:

Frame Number: 3879
Lip Vertex Error: 5.2776e-04
FDD: 4.4944e-05

Paper result:
Screenshot from 2023-04-25 16-35-24

online demo.

Hello, Your work is excellent!May I ask if the online demo you mentioned is on Google Clobe?And another question is whether this can be tested using the obj file by FFHQ?I am look forward to your reply.

ImportErorr

Hello, can you ask how to resolve this error? Thank you for your reply!
47633f749ff1f139e0081681fa53bfa

An error occurred while processing long audio using the provided pretrained model.

audio duration: 23s
error:
File "CodeTalker/main/demo.py", line 187, in test
prediction = model.predict(audio_feature, template, one_hot)
File "CodeTalker/models/stage2.py", line 133, in predict
feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

Some questions about reproduction

Thanks for sharing the great work! I want to follow your work and I'm trying to reproduce all the experiment results. Could you provide more details about Fig.4 in the paper? I have successfully generated videos using the scripts provided, but I don't know how to export a single frame w/ or w/o background color. Moreover, how did you generate the heat map (mean & std) in the figure?

Which epoch model were used for evaluation?

Which epoch model were used for evaluation?
After testing, it is found that overfitting will occur with the model training, and the last epoch(epoch100) may not be the best.

Could you help explain? Thanks.

Question about training error in stage 1

Traceback (most recent call last):
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 242, in
main()
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 48, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 109, in main_worker
loss_train, motion_loss_train, reg_loss_train = train(train_loader, model, loss_fn, optimizer, epoch, cfg)
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 148, in train
model.autoencoder.eval()
File "/home/anaconda3/envs/3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VQAutoEncoder' object has no attribute 'autoencoder'

Could you tell me where is the "autoencoder"?

Fail to reproduce on MEAD dataset

Hi, what a nice work!

I am currently attempting to reproduce this work on the MEAD dataset. Stage 1 of the process has gone smoothly, however, I am encountering an issue in Stage 2. After 20 epochs of training, I am not observing any movement in the output, and it remains static.

Do you have any idea?

Many thanks!

Real time use ?

Hello !
The work is impressive! I wonder if it would be proficient to use with real time generated TTS and produce realistic facial animation on a 3D face model in Unity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.