yzmblog / monohuman Goto Github PK
View Code? Open in Web Editor NEWMonoHuman: Animatable Human Neural Field from Monocular Video (CVPR 2023)
MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR 2023)
Hello @Yzmblog,
Thank you for the awesome work.
Since you are using Humannerf, I had some questions on processing human3.6m dataset.
So I have the entire dataset, however, I tried generating the smpl from ROMP ( also used their processed file - Google drive ), but for some reason, the rendering is just terrible.
Maybe I am using the wrong camera values or something. ( ROMP does not provide the intrinsic and extrinsic values, so I am using -
Extrinsic - np.eye(4)
Intrinsic - fx, fy = 443.4 ( their given value in config.py) for cx and Cy, I am using the h36m values.
Could you tell me how can I process the h36m files, for 3d reconstruction. I would love to use the "3D GT" provided by the human3.6m dataset instead of processing it through videos using openpose ( if that's possible and accurate )
Kindly do let me know, as one of my frined suggested me this repo regarding the work.
Thank you once again.
hello,when I run the training script I encountered
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
it seems like some tensors run on the gpu, while others run on the cpu,so how can I deal with it?
不同人是有不同的运动差异的,在传统渲染管线里,可以通过对人设定不同的绑定参数来实现,在这里怎么才能实现这个差异?
Hello, my Monohuman made an error while training to 50000 iters. I started training again, but still encountered this error on the 50000 iter:
File "/home/yejr/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/serialization.py", line 423, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/home/yejr/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/serialization.py", line 650, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:445] . PytorchStreamWriter failed writing file data/5: file write failed
can you tell me how to fix it?
Hi @Yzmblog,
Thanks for the excellent work. It would be one of the most important baselines for our project.
I have some questions about the data you use to evaluate the novel view and novel pose synthesis. For the novel pose evaluation, from section C of the supplemental materials, you mentioned that you sample frames from all cameras at the rate of 30 in Set B and the number of picked frames is 184. I am a bit confused about this part.
Taking 393 which is the longest sequence as an example,
N_setB = 658 * 0.2 = 131.6,
N_sample = N_setB * 23 / 30 = 100.9
Then how could you get 184 frames for evaluation?
Kindly do let me know how could I get the same frames you use to calculate the scores of Table A2 and A3.
Thanks again!
I interrupted while training the network,can I train from the lastest checkpoint?
感谢您的出色工作,有以下几个问题想请教您:
期待您的回复,感谢~
I meet this problem and can't find a solution
当我运行这行命令时,python train.py --cfg configs/monohuman/zju_mocap/xxx/xxx.yaml resume False
出现了如下报错信息:
********** Init Trainer ***********
/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/cuda/init.py:104: UserWarning:
NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Save checkpoint to experiments/monohuman/zju_mocap/p386/suject_386/init.tar ...
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /home/zzq/lhe/monohuman/MonoHuman/third_parties/lpips/weights/v0.1/vgg.pth
Load Progress Dataset ...
[Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386
test--movement set--
-- Total Frames: 14
[Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386
test--movement set--
-- Total Frames: 432
Traceback (most recent call last):
File "train.py", line 37, in
main()
File "train.py", line 31, in main
train_dataloader=train_loader)
File "core/train/trainers/monohuman/trainer.py", line 177, in train
net_output = self.network(**data)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "core/nets/monohuman/network.py", line 556, in forward
featmaps, _ = self.feature_extractor(src_imgs.permute(0, 3, 1, 2))
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "core/nets/monohuman/feature_extract/feature_extractor.py", line 247, in forward
x = self.relu(self.bn1(self.conv1(x)))
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/instancenorm.py", line 57, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/functional.py", line 2080, in instance_norm
use_input_stats, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
我怀疑是cuda和pytorch版本的问题。
我使用的环境中cuda版本为11.7,pytorch版本是1.7.1
请问下您配置的cuda及Pytorch版本是多少
I train Monohuman on a single 16G P100, but it fail with error "cuda out of memory".
May I know how much memory of GPU is needed when training. thanks
Hi,I noticed that you have different index_a and index_b in each config files from different datasets,I wonder how do you get the specific index_a and index_b?
你好,关于Forward Correspondence Search Module,我想请问一下,训练过程中,是对于单目视频的每一帧,都会去寻找一张keyframe,然后拿这两帧的image feature指导nerf的渲染吗?
Dear authors,
Could you show a specific example of the preprocessing for the videos in the wild, e.g., how to generate accurate human mask and how to get the camera matrix?
I saw in the paper, you mentioned that RVM is used for mask generation. However, the mask we obtained values between 0 and 1. Is a hard mask required for this task or we can just used the matting results?
Thanks
Hi!
Thanks for the excellent work. When I use
python run.py \
--type text \
--cfg configs/monohuman/zju_mocap/394/394.yaml \
text.pose_path path/to/pose_sequence/backflip.npy
```a problem occurs. I want to ask why I need to add this line of code` joints -= pelvis_point[None, ],`
Hello @Yzmblog,
Thanks for your previous help with "perspective camera matrix".
This is a big request, so I hope you can check these.
Now, I want to plot the 3d smpl joints on the output images that we get. I used the "extrinsic" matrix from the function
MonoHuman/core/data/monohuman/freeview.py
Line 277 in 6429fdb
I tried to plot it, but apparently, I am getting upside down skeleton, so I think there is something issue with the matrix that I am using . Can you verify it once. It would be really helpful
Note: Even if I just get the joints on the image without the skeleton, thats more than enough. If you have any files or function of your own, feel free to let me know
skeleton_tree = {
'color': [
'k', 'r', 'r', 'r', 'b', 'b', 'b', 'k', 'r', 'r', 'r', 'b', 'b', 'b',
'y', 'y', 'y', 'y', 'b', 'b', 'b', 'r', 'r', 'r'
],
'smpl_tree': [[ 0, 1 ],[ 0, 2 ],[ 0, 3 ],[ 1, 4 ],[ 2, 5 ],[ 3, 6 ],[ 4, 7 ],[ 5, 8 ],[ 6, 9 ],[ 7, 10],
[ 8, 11],[ 9, 12],[ 9, 13],[ 9, 14],[12, 15],[13, 16],[14, 17],[16, 18],[17, 19],[18, 20],[19, 21],[20, 22],[21, 23]]
}
with open('/home/ndip/humannerf/dataset/zju_mocap/390/mesh_infos.pkl','rb') as f:
data = pickle.load(f, encoding='latin1')
frame120_data= data['frame_000120']
joints = frame120_data['joints'].astype(np.float32)
joints = joints.reshape(24,3)
joints3d = joints[:, [0,2,1]]
E = np.array([[-0.21479376, 0.97530604, -0.05139818, -0.58367276],
[-0.28440348, -0.01211552, 0.95862813, 1.02107571],
[ 0.93433307, 0.22052516, 0.27998276, 2.72222895],
[ 0. , 0. , 0. , 1. ]])
K = np.array([[537.1407 , 0. , 271.4171 ],
[ 0. , 537.7115, 242.44179],
[ 0. , 0. , 1. ]])
R = E[:3, :3]
t = E[:3, 3]
P = np.matmul(K, np.hstack((R, t.reshape(-1, 1))))
N_poses = joints3d.shape[0]
homogeneous_coords = np.concatenate((joints3d, np.ones((N_poses, 1))), axis=1)
points_new = np.matmul(P, homogeneous_coords.T).T
points_new /= points_new[:, 2:] # Normalize the homogeneous coordinates
points_new = points_new[:, :2]
def plotSkel2D(pts,
config=skeleton_tree,
ax=None,
linewidth=2,
alpha=1,
max_range=1,
imgshape=None,
thres=0.1):
if len(pts.shape) == 2:
pts = pts[None, :, :] #(nP, nJ, 2/3)
elif len(pts.shape) == 3:
pass
else:
raise RuntimeError('The dimension of the points is wrong!')
if torch.is_tensor(pts):
pts = pts.detach().cpu().numpy()
if pts.shape[2] == 3 or pts.shape[2] == 2:
pts = pts.transpose((0, 2, 1))
# pts : bn, 2/3, NumOfPoints or (2/3, N)
if ax is None:
fig = plt.figure(figsize=[5, 5])
ax = fig.add_subplot(111)
if 'color' in config.keys():
colors = config['color']
else:
colors = ['b' for _ in range(len(config['smpl_tree']))]
def inrange(imgshape, pts):
if pts[0] < 5 or \
pts[0] > imgshape[1] - 5 or \
pts[1] < 5 or \
pts[1] > imgshape[0] - 5:
return False
else:
return True
for nP in range(pts.shape[0]):
for idx, (i, j) in enumerate(config['smpl_tree']):
if pts.shape[1] == 3: # with confidence
if np.min(pts[nP][2][[i, j]]) < thres:
continue
lw = linewidth * 2 * np.min(pts[nP][2][[i, j]])
else:
lw = linewidth
if imgshape is not None:
if inrange(imgshape, pts[nP, :, i]) and \
inrange(imgshape, pts[nP, :, j]):
pass
else:
continue
ax.plot([pts[nP][0][i], pts[nP][0][j]],
[pts[nP][1][i], pts[nP][1][j]],
lw=lw,
color=colors[idx],
alpha=1)
# if pts.shape[1] > 2:
ax.scatter(pts[nP][0], pts[nP][1], c='r')
if False:
ax.axis('equal')
plt.xlabel('x')
plt.ylabel('y')
else:
ax.axis('off')
return ax
LASTLY
def vis_skeleton_single_image(image_path, keypoints):
img = cv2.imread(image_path)
kpts2d = np.array(keypoints)
_, ax = plt.subplots(1, 1)
ax.imshow(img[..., ::-1])
H, W = img.shape[:2]
# plotSkel2D(kpts2d, ax = ax)
plotSkel2D(kpts2d, skeleton_tree, ax = ax, linewidth = 2, alpha = 1, max_range = 1, thres = 0.5 )
plt.show()
img_path = "/home/ndip/humannerf/experiments/human_nerf/zju_mocap/p390/adventure/latest/freeview_120/000000.png"
vis_skeleton_single_image(img_path, points_new)
I am unsure, why it flipped, because when I just used the PlotSkel2D function, I got this
Could this be the transformation of points or the plotting is an issue??? Kindly help.
Thank you @Yzmblog
Hi, thank you for sharing great work.
It seems like the function run_eval in run.py does not run. The error occurs in the line 247 since the ray_mask is 1-dim tensor and the pred_img_norm is 2-dim tensor. And also, from the renderig result maked by ray_img, the SSIM and LPIPS cannot be evaluated. Can you tell me how did you exactly evaluate the renderings?
Hello! MonoHuman is excellent work.
I have a question about the implementation of the observation bank. From the code snippet, I find you use manually predefined two frames of index a and b as front and back. In section 3.3 of your paper, you said, "Then, we find the k pairs with the closest pose from these two sets." In my opinion, MonoHuman should match all poses in the videos and then compare the texture map's completeness in the k pairs, which seems different from the implementation in the code.
Do I misunderstand something? Besides, can you offer the script to choose the keyframe (i.e. index_a and index_b)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.