ashawkey / rad-nerf Goto Github PK

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

License: MIT License

Python 73.87% C++ 0.46% Cuda 23.97% C 1.17% Shell 0.53%

rad-nerf's Introduction

RAD-NeRF: Real-time Neural Talking Portrait Synthesis

This repository contains a PyTorch re-implementation of the paper: Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition.

Colab notebook demonstration:

Install

Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6.

git clone https://github.com/ashawkey/RAD-NeRF.git
cd RAD-NeRF

Install dependency

# for ubuntu, portaudio is needed for pyaudio to work.
sudo apt install portaudio19-dev

pip install -r requirements.txt

Build extension (optional)

By default, we use load to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the setup.py to build each extension:

# install all extension modules
bash scripts/install_ext.sh

Data pre-processing

Preparation:

## install pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

## prepare face-parsing model
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth

## prepare basel face model
# 1. download `01_MorphableModel.mat` from https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads and put it under `data_utils/face_tracking/3DMM/`
# 2. download other necessary files from AD-NeRF's repository:
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
# 3. run convert_BFM.py
cd data_utils/face_tracking
python convert_BFM.py
cd ../..

## prepare ASR model
# if you want to use DeepSpeech as AD-NeRF, you should install tensorflow 1.15 manually.
# else, we also support Wav2Vec in PyTorch.

Pre-processing Custom Training Video

Put training video under data/<ID>/<ID>.mp4.

The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5min.
```
# an example training video from AD-NeRF
mkdir -p data/obama
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4
```

Run script (may take hours dependending on the video length)

# run all steps
python data_utils/process.py data/<ID>/<ID>.mp4

# if you want to run a specific step 
python data_utils/process.py data/<ID>/<ID>.mp4 --task 1 # extract audio wave

File structure after finishing all steps:

./data/<ID>
├──<ID>.mp4 # original video
├──ori_imgs # original images from video
│  ├──0.jpg
│  ├──0.lms # 2D landmarks
│  ├──...
├──gt_imgs # ground truth images (static background)
│  ├──0.jpg
│  ├──...
├──parsing # semantic segmentation
│  ├──0.png
│  ├──...
├──torso_imgs # inpainted torso images
│  ├──0.png
│  ├──...
├──aud.wav # original audio 
├──aud_eo.npy # audio features (wav2vec)
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)

Usage

Quick Start

We provide some pretrained models here for quick testing on arbitrary audio.

Download a pretrained model. For example, we download obama_eo.pth to ./pretrained/obama_eo.pth
Download a pose sequence file. For example, we download obama.json to ./data/obama.json

Prepare your audio as <name>.wav, and extract audio features.

# if model is `<ID>_eo.pth`, it uses wav2vec features
python nerf/asr.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy

# if model is `<ID>.pth`, it uses deepspeech features 
python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy

You can download pre-processed audio features too. For example, we download intro_eo.npy to ./data/intro_eo.npy.

Run inference: It takes about 2GB GPU memory to run inference at 40FPS (measured on a V100).

# save video to trail_obama/results/*.mp4
# if model is `<ID>.pth`, should append `--asr_model deepspeech` and use `--aud intro.npy` instead.
python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso

# provide a background image (default is white)
python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso --bg_img data/bg.jpg

# test with GUI
python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso --bg_img data/bg.jpg --gui

Detailed Usage

First time running will take some time to compile the CUDA extensions.

# train (head)
# by default, we load data from disk on the fly.
# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.
# `--preload 0`: load from disk (default, slower).
# `--preload 1`: load to CPU, requires ~70G CPU memory (slightly slower)
# `--preload 2`: load to GPU, requires ~24G GPU memory (fast)
python main.py data/obama/ --workspace trial_obama/ -O --iters 200000

# train (finetune lips for another 50000 steps, run after the above command!)
python main.py data/obama/ --workspace trial_obama/ -O --iters 250000 --finetune_lips

# train (torso)
# <head>.pth should be the latest checkpoint in trial_obama
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000

# test on the test split
python main.py data/obama/ --workspace trial_obama/ -O --test # use head checkpoint, will load GT torso
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test

# test with GUI
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui

# test with GUI (load speech recognition model for real-time application)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui --asr

# test with specific audio & pose sequence
# --test_train: use train split for testing
# --data_range: use this range's pose & eye sequence (if shorter than audio, automatically mirror and repeat)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --data_range 0 100 --aud data/intro_eo.npy

check the scripts directory for more provided examples.

Acknowledgement

The data pre-processing part is adapted from AD-NeRF.
The NeRF framework is based on torch-ngp.
The GUI is developed with DearPyGui.

Citation

@article{tang2022radnerf,
  title={Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition},
  author={Tang, Jiaxiang and Wang, Kaisiyuan and Zhou, Hang and Chen, Xiaokang and He, Dongliang and Hu, Tianshu and Liu, Jingtuo and Zeng, Gang and Wang, Jingdong},
  journal={arXiv preprint arXiv:2211.12368},
  year={2022}
}

rad-nerf's People

Contributors

Stargazers

Watchers

Forkers

chenchy c00renut kyuhyoung peterzs dreamtalecore chenerg ishine wangyanhui666 maxmax2016 akinc-flwls khalilwong codeaudit wangfudong chhaviilli nooseok dwijaybane jonlysun t-akym jackzhousz zhangziliang04 legendrain claforte iyunya fondoger orlgln avkumar hubertlu-tw pustar churchkey007 kimwoonggon xionghaoo gaowudao moming133 fqbot ivyuadyi27 daxiangquan kenwaytis great1001 flyingfire2008 dfqytcom muyiai xiedongmingming yapianwan yzslab randomdeveloperm lit1088 baoguoding wangshuhao tonyzhu369 abitclose songfang lock4u lianjiang-yulj w4l6 jasonlixiaojian harryhan1989 weihaigang liulangjita ruilimit gaizhenbiao mrduxs gelove dreamsleep11 l1star03 tang155 cleardry 474491418 zhangsanfeng86 poohlin93 yongye507 ymanchris 13348698090 gempollai pansusu zaitianaoxiang assassindesign kayya886 zcloud2014 zardyuan lsvagusa trisha2601 yuezhenyu0208 tsman dlnan eskec14 lijiajia2023 ai-chef b10101 blusechen vandh venizhang imaben majiajun1 yunnewh yic03685 jaym6669 agonza1 adambear bearcatty wdshin

rad-nerf's Issues

合成的嘴形抖动有些频繁

你好，非常感谢你开源的优秀工作！

我按照readme中的介绍从头训练了trial_obama_eo，但是并使用intro_eo.npy来test。但是发现一些问题：

生成的结果中说话人的嘴形存在细小但频繁的波动，直观来看有点抽抽。不知道是不是aud_att_net没有被正确使用导致的呢？
在视频中有几帧我发现嘴部存在镂空的情况，有点类似于CG里面的穿模。不知道是不是对alpha的regularization导致的？

期待你的回复！

ngp_ep0035.mp4

Real-time execution

The process of generating the video in Google Colab takes one minute, despite the fact that it is real-time.
Why is this the case?

NoneType' object is not subscriptable

Thanks for this great work!
After I run python data_utils/process.py data/obama/obama.mp4, I faced this error:

ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)
[INFO] ===== extracted semantics =====
[INFO] ===== extract background image from data/obama/ori_imgs =====
  0% 0/400 [00:00<?, ?it/s][ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('data/obama/parsing/5583.png'): can't open/read file: check file path/integrity
  0% 0/400 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "data_utils/process.py", line 385, in <module>
    extract_background(base_dir, ori_imgs_dir)
  File "data_utils/process.py", line 81, in extract_background
    bg = (parse_img[..., 0] == 255) & (parse_img[..., 1] == 255) & (parse_img[..., 2] == 255)
TypeError: 'NoneType' object is not subscriptable

Is there a solution to this problem?

#在运行task7的时候出现了：
##(Radnerf) D:\RAD-NeRF>python data_utils/process.py data/1/1.mp4 --task 7
[INFO] ===== extract face landmarks from data/1\ori_imgs =====
Traceback (most recent call last):
File "D:\RAD-NeRF\data_utils\process.py", line 393, in
extract_landmarks(ori_imgs_dir)
File "D:\RAD-NeRF\data_utils\process.py", line 50, in extract_landmarks
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False)
File "C:\ProgramData\Anaconda3\envs\Radnerf\lib\site-packages\face_alignment\api.py", line 84, in init
self.face_alignment_net = torch.jit.load(
File "C:\ProgramData\Anaconda3\envs\Radnerf\lib\site-packages\torch\jit_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: Unrecognized data format
想问一下是什么问题

请教如何微调头部姿势

我训练出的所有模型，头部总是有轻微的前后晃动，感觉很不自然，很奇怪，请问这个有办法微调吗？

运行test.py结果不正确

首先感谢大佬优秀的项目！

我按照步骤安装了所需库后，运行：

python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso

环境：
Ubuntu 22.04.1 LTS
Python 3.8.2
Pytorch 1.13.1
RTX4090 Driver Version: 520.56.06 CUDA Version: 11.8

结果如下：
2023-03-10 16:37:13.718974: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-10 16:37:13.797059: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-03-10 16:37:14.042138: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/RAD-NeRF/lib/python3.8/site-packages/cv2/../../lib64:
2023-03-10 16:37:14.042172: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /root/miniconda3/envs/RAD-NeRF/lib/python3.8/site-packages/cv2/../../lib64:
2023-03-10 16:37:14.042175: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='data/intro_eo.npy', bg_img='white', bound=1, ckpt='pretrained/obama_eo.pth', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, l=10, lambda_amb=0.1, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, pose='data/obama.json', r=10, radius=3.35, scale=4, seed=0, smooth_eye=True, smooth_lips=True, smooth_path=True, smooth_path_window=7, test=True, test_train=False, torso=True, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_obama/')
[INFO] Trainer: ngp | 2023-03-10_16-37-14 | cuda | fp16 | trial_obama/
[INFO] #parameters: 4231701
[INFO] Loading pretrained/obama_eo.pth ...
[INFO] loaded model.
[WARN] missing keys: ['density_grid']
[INFO] load at epoch 28, global step 203616
[INFO] load 7272 frames.
[INFO] load data/intro_eo.npy aud_features: torch.Size([588, 44, 16])
Loading data: 100%|█████████████████████████████████████████████████████████████| 7272/7272 [00:00<00:00, 212625.94it/s][INFO] eye_area: 0.25 - 0.25
==> Start Test, save results to trial_obama/results
99% 581/588 [00:06<00:00, 100.65it/s][swscaler @ 0x6b64300] Warning: data is not aligned! This can lead to a speed loss==> Finished Test.
100% 588/588 [00:07<00:00, 74.71it/s]

请大佬帮忙看看，不知道我哪里做错了。
谢谢！

about side face

hi, bro, used my own training data which contains some side face data to training, then i found some odd output on the result folder.
when the person show front face, the odd things do not appear, but when person turn face, the odd things appear.

are there some wrong in my dataset? but i found nothing wrong on my data image from video. what should i do next?

请问如果我的训练视频不是25fps，哪些地方需要修改呢？？

数据预处理？以及渲染生成视频的部分？谢谢

Blinking?

Thanks for this great repo.

I tried the colab, but the resulting video doesn’t blink. Is that controllable from the pose file? I see there’s control over the eyes in the GUI, but not sure how to do it at inference time.

Thanks!

What's new in Grid Encoder?

Thanks for your brilliant work!

I'm interested in the modifications in the latest commit that update the grid encoder, which seems to replace the tilegrid to hashgrid, Can you briefly explain what's the motivation and how you achieve it?

Sincerely.

Traning result : eyes don't blinking

The image above is the result of completing general learning and fine tuning. In addition, we learned by adding the --fix_eye0.25 option by referring to other issues, but the results are as above.

I think it's similar that I've learned not only from the obama video but also from other videos, but result's eyes don't blink.
In addition to adding the --fix_eye0.25 method, I'm asking how I can learn to blink.

请问如何控制头部的动作？

在我生成的视频中，感觉头部晃动得很厉害（可能的原因是训练素材人特本身头部就有晃动），请问是否有参数控制头部晃动的幅度呢？

bad video quality after training base on my video.

Dear ashawkey

thanks for your great project.
I had exactly followed the process written in the readme, the original video is total 4 minutes (25 fps).
and I have trained 200000 iters for head + additional 50000 iters for fine-tuning the lips. (so total is 250000 iters) but finally I got the synthetic video like this. do you have any suggestion ? how can I get the similar quality synthetic video like the demo Obama video provided by you? thanks lot!

final.mp4

Performance without audio branch?

Hello, thank you for the great work!!

I'd like to test on my own data without audio. What would happen if I train the model without audio branch (setting enc_a to None)?

How to control the head movement frequency ?

使用gui界面时出现’Segmentation fault‘

开发环境;WSL2 Ubuntu18.04 torch1.12 cuda 11.6
问题描述：基于VcXsrv可以正常使用dearpygui，使用本工程的界面时，出现Segmentation fault。
排除发现只要import 四个extension的库freqencoder,shencoder,gridencoder,raymarching中任意一个就会出现段错误。直接运行无界面版本程序可以成功运行。

Training time is too long

Hi,
I found that it takes 0.07 seconds to train a step forward, but 15 seconds to backward，thus the total time will take several days.
Is there something wrong with my training code！

训练时报错

(Radnerf) D:\RAD-NeRF>python main.py data/obama/ --workspace trial_obama/ -O --iters 200000
Traceback (most recent call last):
File "D:\RAD-NeRF\raymarching\raymarching.py", line 10, in
import _raymarching_face as _backend
ModuleNotFoundError: No module named '_raymarching_face'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\utils\cpp_extension.py", line 1808, in _run_ninja_build
subprocess.run(
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\RAD-NeRF\main.py", line 131, in
from nerf.network import NeRFNetwork
File "D:\RAD-NeRF\nerf\network.py", line 7, in
from .renderer import NeRFRenderer
File "D:\RAD-NeRF\nerf\renderer.py", line 10, in
import raymarching
File "D:\RAD-NeRF\raymarching_init_.py", line 1, in
from .raymarching import *
File "D:\RAD-NeRF\raymarching\raymarching.py", line 12, in
from .backend import backend
File "D:\RAD-NeRF\raymarching\backend.py", line 31, in
backend = load(name='raymarching_face',
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\utils\cpp_extension.py", line 1202, in load
return jit_compile(
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\utils\cpp_extension.py", line 1425, in jit_compile
write_ninja_file_and_build_library(
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\utils\cpp_extension.py", line 1537, in write_ninja_file_and_build_library
run_ninja_build(
File "D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\utils\cpp_extension.py", line 1824, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'raymarching_face': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching_face -DTORCH_API_INCLUDE_EXTENSION_H -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\torch\csrc\api\include -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\TH -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -ID:\LeStoreDownload\anaconda\envs\Radnerf\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS_ -U__CUDA_NO_HALF2_OPERATORS__ -c D:\RAD-NeRF\raymarching\src\raymarching.cu -o raymarching.cuda.o
FAILED: raymarching.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching_face -DTORCH_API_INCLUDE_EXTENSION_H -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\torch\csrc\api\include -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\TH -ID:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -ID:\LeStoreDownload\anaconda\envs\Radnerf\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c D:\RAD-NeRF\raymarching\src\raymarching.cu -o raymarching.cuda.o
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)
raymarching.cu
D:/LeStoreDownload/anaconda/envs/Radnerf/lib/site-packages/torch/include\c10/macros/Macros.h(143): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)
raymarching.cu
D:/LeStoreDownload/anaconda/envs/Radnerf/lib/site-packages/torch/include\c10/macros/Macros.h(143): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
D:/LeStoreDownload/anaconda/envs/Radnerf/lib/site-packages/torch/include\c10/core/SymInt.h(84): warning #68-D: integer conversion resulted in a change of sign

D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

D:\LeStoreDownload\anaconda\envs\Radnerf\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

D:\RAD-NeRF\raymarching\src\raymarching.cu(181): warning #177-D: variable "rdx" was declared but never referenced
detected during instantiation of "void kernel_sph_from_ray(const scalar_t *, const scalar_t *, float, uint32_t, scalar_t *) [with scalar_t=double]"
(205): here

D:\RAD-NeRF\raymarching\src\raymarching.cu(181): warning #177-D: variable "rdy" was declared but never referenced
detected during instantiation of "void kernel_sph_from_ray(const scalar_t *, const scalar_t *, float, uint32_t, scalar_t *) [with scalar_t=double]"
(205): here

D:\RAD-NeRF\raymarching\src\raymarching.cu(181): warning #177-D: variable "rdz" was declared but never referenced
detected during instantiation of "void kernel_sph_from_ray(const scalar_t *, const scalar_t *, float, uint32_t, scalar_t *) [with scalar_t=double]"
(205): here

D:\RAD-NeRF\raymarching\src\raymarching.cu(553): warning #177-D: variable "index" was declared but never referenced
detected during instantiation of "void kernel_march_rays_train_backward(const scalar_t *, const scalar_t *, const int *, const scalar_t *, uint32_t, uint32_t, scalar_t *, scalar_t *) [with scalar_t=double]"
(589): here

D:\RAD-NeRF\raymarching\src\raymarching.cu(864): warning #177-D: variable "near" was declared but never referenced
detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int *, const scalar_t *, const scalar_t *, const scalar_t *, float, float, uint32_t, uint32_t, uint32_t, const uint8_t *, const scalar_t *, const scalar_t *, scalar_t *, scalar_t *, scalar_t *, const scalar_t *) [with scalar_t=double]"
(935): here

2 errors detected in the compilation of "D:/RAD-NeRF/raymarching/src/raymarching.cu".
raymarching.cu
ninja: build stopped: subcommand failed.

How to generate .pth file?

./data/
├──.mp4 # original video
├──ori_imgs # original images from video
│ ├──0.jpg
│ ├──0.lms # 2D landmarks
│ ├──...
├──gt_imgs # ground truth images (static background)
│ ├──0.jpg
│ ├──...
├──parsing # semantic segmentation
│ ├──0.png
│ ├──...
├──torso_imgs # inpainted torso images
│ ├──0.png
│ ├──...
├──aud.wav # original audio
├──aud_eo.npy # audio features (wav2vec)
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)

not found obama_eo.pth and obama.json file after training

请问什么是emb mode呢？

在main.py中，有一个emb模式的参数，请问它的具体功能是什么呢？在什么样的场景下使用呢？谢谢指点

got something just point cloud instead of real face

hello,i'm new to cv. I run exactly first test command as mentioned in readme. however, it just got something just point cloud instead of real face. would you like to help me on this? ngp_ep0028_0000_rgb

Type Error

I will suffer from the type error until I remove the last param ‘interpolation’

please have a look at this,thanks.

Running the code in stream mode

How can we have a stream using asr.py ? So that the audio file does not need to be input as a file.
For example, my words can be repeated in real-time by Talking-Head.

Thanks a lot @ashawkey

bash scripts/install_ext.sh in Windows

Hi @ashawkey
Is it possible to install these libraries on Windows?

pip install ./freqencoder
pip install ./shencoder
pip install ./gridencoder
pip install ./raymarching

Thanks in advance for your help!

如何支持更大分辨率的视频文件

当前版本支持512x512尺寸的视频，半身象，有没有办法支持1024x1024分辨率的视频文件？如果需要，如何进行处理？

Error Building Extensions

RAD-NeRF$ pip install ./freqencoder
Processing ./freqencoder
Preparing metadata (setup.py) ... done
Building wheels for collected packages: freqencoder
Building wheel for freqencoder (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [69 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running bdist_wheel
running build
running build_ext
/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building '_freqencoder' extension
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF/freqencoder
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF/freqencoder/src
Traceback (most recent call last):
File "", line 36, in
File "", line 34, in
File "/home/subzero/Documents/RAD-NeRF/freqencoder/setup.py", line 49, in
'build_ext': BuildExtension,
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 325, in run
self.run_command("build")
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 765, in build_extensions
build_ext.build_extensions(self)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
self._build_extensions_serial()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
self.build_extension(ext)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 556, in build_extension
depends=ext.depends,
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 480, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1694, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for freqencoder
Running setup.py clean for freqencoder
Failed to build freqencoder
Installing collected packages: freqencoder
Running setup.py install for freqencoder ... error
error: subprocess-exited-with-error

× Running setup.py install for freqencoder did not run successfully.
│ exit code: 1
╰─> [73 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running install
/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_ext
/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building '_freqencoder' extension
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF/freqencoder
creating /home/subzero/Documents/RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-37/home/subzero/Documents/RAD-NeRF/freqencoder/src
Traceback (most recent call last):
File "", line 36, in
File "", line 34, in
File "/home/subzero/Documents/RAD-NeRF/freqencoder/setup.py", line 49, in
'build_ext': BuildExtension,
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/install.py", line 68, in run
return orig.install.run(self)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/install.py", line 698, in run
self.run_command('build')
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 765, in build_extensions
build_ext.build_extensions(self)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
self._build_extensions_serial()
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
self.build_extension(ext)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 556, in build_extension
depends=ext.depends,
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 480, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/subzero/anaconda3/envs/rad-nerf/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1694, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> freqencoder

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Real time - random audio

Hi,
I trained the model, then:

python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso

With new random audio, the video was not predicting the lip sync correctly.
Best,

Exported video has no sound？

Error Building Extensions

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> freqencoder

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

How do I get json file??

I'm going to train to make the video into a portrait. How do I get the files I need for learning?

For example, the Obama video.

transforms_train(val).json
track_params.pt
additionally-need

What more resources do I need to learn custom videos? There are too many obstacles to learning through scripts.
Thank you.

using another audio feature extraction

During testing, I plan to use another audio feature extraction with a different shape (x, 16, 80). But it is incompatible with the convolution model.

RuntimeError: Given groups=1, weight of size [32, 44, 3], expected input[8, 80, 16] to have 44 channels, but got 80 channels instead

I change self.audio_in_dim in the directory ./nerf/network.py but it can not resolve the problem!

if 'esperanto' in self.opt.asr_model:
      self.audio_in_dim = 44

Is it possible to guide me which part must I change?

Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small

Hello, thanks for your nice work. When I run the code on my video, there is a problem appearing.

~/MyCode/RAD_NeRF$ python main.py data/person_video1_25fps_512x512/ --workspace person_video1/ -O --iters 250000 --finetune_lips
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=True, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=250000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/person_video1_25fps_512x512/', preload=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=1000000000.0, upsample_steps=0, workspace='person_video1/')
[INFO] load 783 train frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading train data: 100%|███████████████████████████████████████████████| 783/783 [00:00<00:00, 2328.66it/s]
[INFO] eye_area: 0.02593994140625 - 0.06561279296875
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2023-02-23_12-56-18 | cuda | fp16 | person_video1/
[INFO] #parameters: 3024277
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is person_video1/checkpoints/ngp_ep0256.pth
[INFO] loaded model.
[INFO] load at epoch 256, global step 200448
[INFO] loaded optimizer.
[INFO] loaded scheduler.
[INFO] loaded scaler.
[INFO] load 79 val frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading val data: 100%|███████████████████████████████████████████████████| 79/79 [00:00<00:00, 2280.58it/s]
[INFO] eye_area: 0.0255584716796875 - 0.0614166259765625
[INFO] max_epoch = 320
==> Start Training Epoch 257, lr=0.000050 ...
loss=0.0001 (0.0004), lr=0.000045: :   0% 1/783 [00:01<13:32,  1.04s/it]Traceback (most recent call last):
  File "main.py", line 253, in <module>
    trainer.train(train_loader, valid_loader, max_epoch)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 906, in train
    self.train_one_epoch(train_loader)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 1169, in train_one_epoch
    preds, truths, loss = self.train_step(data)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 766, in train_step
    loss = loss + 0.01 * self.criterion_lpips(pred_rgb, rgb)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/lpips.py", line 119, in forward
    outs0, outs1 = self.net.forward(in0_input), self.net.forward(in1_input)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/pretrained_networks.py", line 85, in forward
    h = self.slice3(h)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 166, in forward
    return F.max_pool2d(input, self.kernel_size, self.stride,
  File "/home/anaconda3/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn
    return if_false(*args, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 782, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
 
RuntimeError: Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small
loss=0.0001 (0.0004), lr=0.000045: :   0% 2/783 [00:01<07:52,  1.65it/s]

支持中文语音模型

我们使用的中文语音模型有高达3903个维度，能否修改代码让它支持中文语音呢？如果可以，请问该如何修改呢？

questions about head pose

Different drive audio from the same person share the same headpose.json，is it normal？

File "C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\torch\nn\functional.py", line 782, in _max_pool2d return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) RuntimeError: Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small

I am conducting/- O -- iters 250000-- finetune_ Lips, error reported: RuntimeError: Given input size: (192x2x2x2) Calculated output size: (192x0x0). Output size is too small, have you encountered it b
efore?

Question on the sampling steps used in RAD-NeRF

I notice that the param "max_steps" is set to 16 in RAD-NeRF, which makes sampling more rays in a batch in training feasible, while in original torch-ngp, it is way larger.
My question is that if "max_steps" is naturally smaller in the face rendering scene, or other improvements are made in the code.
Looking forward to your reply!

docker environment issue

Wonderful work! I am trying to reproduce your work, but I got some problem on building up the basic docker image setting, can you provide the Dockerfile that is used to set up the environment.
Thanks a lot!

Torso is moving

Is it possible to stabilize the output somehow?
The torso is still moving randomly (not a lot but I thought this paper has dealt with the unnecessary movement of the torso)

Can I use a static image removing the torso with only a stable frame?
Or is there any image stabilization method you know?

如何计算Sync和AUE呢？

Audio FPS？

Great Job!

这里的Audio FPS是指训练视频的FPS吗？
https://github.com/ashawkey/RAD-NeRF/blob/main/nerf/asr.py#L401
我看您在调用asr.py的时候，没有传入FPS，而代码里默认的FPS=50：
https://github.com/ashawkey/RAD-NeRF/blob/main/nerf/asr.py#L44

生成视频不眨眼

我发现当素材眨眼次数略微偏少时，生成的视频就完全不眨眼了。有办法微调吗？

RAD-NeRF的输出拼接回身体的其他部分

RAD-NeRF输出了很好的上半身合成效果，但是当我把这个视频一帧一帧拼回其下半身时，发现拼接部分会拼不上了，会有细微的错位等问题。

请问关于这个问题，您有什么建议吗？是否可以在算法生成的时候添加一些控制参数，以达到离嘴部越远的地方与推理的输入帧相比变化越小，以达到拼接会下半身时自然一些。

请问带有音频的视频怎么合成？

采用下面代码在进行测试推理并生成视频后，视频时没有声音的。
python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso

bash scripts/install_ext.sh get error

Follow the Install steps, when invoke: bash scripts/install_ext.sh
I got the error messages:

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing ./freqencoder
Preparing metadata (setup.py) ... done
Building wheels for collected packages: freqencoder
Building wheel for freqencoder (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
running bdist_wheel
running build
running build_ext
/root/miniconda3/envs/radnerf/lib/python3.9/site-packages/torch/utils/cpp_extension.py:813: UserWarning: The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.6). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/root/miniconda3/envs/radnerf/lib/python3.9/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.5
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building '_freqencoder' extension
creating /RAD-NeRF/freqencoder/build
creating /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39
creating /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF
creating /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder
creating /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder/src
Emitting ninja build file /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
creating build/lib.linux-x86_64-cpython-39
g++ -pthread -B /root/miniconda3/envs/radnerf/compiler_compat -shared -Wl,-rpath,/root/miniconda3/envs/radnerf/lib -Wl,-rpath-link,/root/miniconda3/envs/radnerf/lib -L/root/miniconda3/envs/radnerf/lib -L/root/miniconda3/envs/radnerf/lib -Wl,-rpath,/root/miniconda3/envs/radnerf/lib -Wl,-rpath-link,/root/miniconda3/envs/radnerf/lib -L/root/miniconda3/envs/radnerf/lib /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder/src/bindings.o /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder/src/freqencoder.o -L/root/miniconda3/envs/radnerf/lib/python3.9/site-packages/torch/lib -L/usr/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-39/_freqencoder.cpython-39-x86_64-linux-gnu.so
g++: error: /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder/src/bindings.o: No such file or directory
g++: error: /RAD-NeRF/freqencoder/build/temp.linux-x86_64-cpython-39/RAD-NeRF/freqencoder/src/freqencoder.o: No such file or directory
error: command '/usr/bin/g++' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for freqencoder
Running setup.py clean for freqencoder
Failed to build freqencoder

packages in environment at /root/miniconda3/envs/radnerf:

I have try many pytorch+cuda versions, still same, it seams that it's a compiling path problem, but dont know how to fix it.
the same environment works for the GeneFace project (borrowed your code).

No such file or directory: track_params.pt

Output #0, image2, to 'data/obama/ori_imgs/%d.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf57.83.100
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 450x450 [SAR 1:1 DAR 1:1], q=1-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc57.107.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame= 8000 fps=407 q=1.0 Lsize=N/A time=00:05:20.00 bitrate=N/A speed=16.3x    
video:264215kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[INFO] ===== extracted images =====
[INFO] ===== extract semantics from data/obama/ori_imgs to data/obama/parsing =====
[INFO] loading model...
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100% 44.7M/44.7M [00:00<00:00, 48.9MB/s]
100% 8000/8000 [09:48<00:00, 13.61it/s]
[INFO] ===== extracted semantics =====
[INFO] ===== extract background image from data/obama/ori_imgs =====
100% 400/400 [02:35<00:00,  2.57it/s]
[INFO] ===== extracted background image =====
[INFO] ===== extract torso and gt images for data/obama =====
100% 8000/8000 [05:32<00:00, 24.05it/s]
[INFO] ===== extracted torso and gt images =====
[INFO] ===== extract face landmarks from data/obama/ori_imgs =====
/usr/local/lib/python3.7/site-packages/skimage/io/manage_plugins.py:23: UserWarning: Your installed pillow version is < 8.1.2. Several security issues (CVE-2021-27921, CVE-2021-25290, CVE-2021-25291, CVE-2021-25293, and more) have been fixed in pillow 8.1.2 or higher. We recommend to upgrade this library.
  from .collection import imread_collection_wrapper
Downloading: "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" to /root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth
100% 85.7M/85.7M [00:08<00:00, 10.8MB/s]
Downloading: "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip" to /root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip
100% 91.9M/91.9M [00:08<00:00, 11.4MB/s]
100% 8000/8000 [07:55<00:00, 16.83it/s]
[INFO] ===== extracted face landmarks =====
[INFO] ===== perform face tracking =====
Traceback (most recent call last):
  File "data_utils/face_tracking/face_tracker.py", line 42, in <module>
    os.path.join(dir_path, "3DMM"), id_dim, exp_dim, tex_dim, point_num
  File "/content/RAD-NeRF/data_utils/face_tracking/facemodel.py", line 16, in __init__
    os.path.join(modelpath, "3DMM_info.npy"), allow_pickle=True
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/content/RAD-NeRF/data_utils/face_tracking/3DMM/3DMM_info.npy'
[INFO] ===== finished face tracking =====
[INFO] ===== save transforms =====
Traceback (most recent call last):
  File "data_utils/process.py", line 401, in <module>
    save_transforms(base_dir, ori_imgs_dir)
  File "data_utils/process.py", line 270, in save_transforms
    params_dict = torch.load(os.path.join(base_dir, 'track_params.pt'))
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 581, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'data/obama/track_params.pt'

Hi @ashawkey , Is this problem due to the incompatibility of the dependency version?
I'd really appreciate your help

What is the maximum supported resolution of the model?

Is the maximum supported resolution of the model 512 * 512? I want to try to run 1024 * 768 or even higher resolution on my RTX4090, where do I need to modify the code? Or does not require any code modification?

chartGPT with RAD-NeRF

I'm trying to drive the model with the text that chartGPT replies. But it has a long calling path:

text->TTS->wav->ASR->npy(logits/text)->RAD-NeRF

Such conversions seem inefficient and redundant.

Is there a better way to make it simpler and more efficient? ?

text->[???]->RAD-NeRF

For example, is it possible to use the Mel Spectrum output from TTS's am model to train or test the model? ?