yzmblog / monohuman Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 9.0 4.93 MB

MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR 2023)

Python 99.45% Shell 0.55%

monohuman's People

Contributors

Stargazers

Watchers

Forkers

cheamy cylonspace xiaohangyang829 whuhxb peterzs srinivasulupadigay

monohuman's Issues

Annotation files for Human3.6m dataset

Hello @Yzmblog,

Thank you for the awesome work.

Since you are using Humannerf, I had some questions on processing human3.6m dataset.

So I have the entire dataset, however, I tried generating the smpl from ROMP ( also used their processed file - Google drive ), but for some reason, the rendering is just terrible.

Maybe I am using the wrong camera values or something. ( ROMP does not provide the intrinsic and extrinsic values, so I am using -
Extrinsic - np.eye(4)
Intrinsic - fx, fy = 443.4 ( their given value in config.py) for cx and Cy, I am using the h36m values.

Could you tell me how can I process the h36m files, for 3d reconstruction. I would love to use the "3D GT" provided by the human3.6m dataset instead of processing it through videos using openpose ( if that's possible and accurate )

Kindly do let me know, as one of my frined suggested me this repo regarding the work.

Thank you once again.

errors in training

hello,when I run the training script I encountered
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
it seems like some tensors run on the gpu, while others run on the cpu,so how can I deal with it?

请教关于人体运动差异

不同人是有不同的运动差异的，在传统渲染管线里，可以通过对人设定不同的绑定参数来实现，在这里怎么才能实现这个差异？

error in training

Hello, my Monohuman made an error while training to 50000 iters. I started training again, but still encountered this error on the 50000 iter:
File "/home/yejr/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/serialization.py", line 423, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/home/yejr/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/serialization.py", line 650, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:445] . PytorchStreamWriter failed writing file data/5: file write failed

can you tell me how to fix it?

Question about evaluation

Hi @Yzmblog,

Thanks for the excellent work. It would be one of the most important baselines for our project.

I have some questions about the data you use to evaluate the novel view and novel pose synthesis. For the novel pose evaluation, from section C of the supplemental materials, you mentioned that you sample frames from all cameras at the rate of 30 in Set B and the number of picked frames is 184. I am a bit confused about this part.

Taking 393 which is the longest sequence as an example,
N_setB = 658 * 0.2 = 131.6,
N_sample = N_setB * 23 / 30 = 100.9
Then how could you get 184 frames for evaluation?

Kindly do let me know how could I get the same frames you use to calculate the scores of Table A2 and A3.

Thanks again!

Train from the lastest checkpoint

I interrupted while training the network,can I train from the lastest checkpoint?

关于MonoHuman训练的一些问题

感谢您的出色工作，有以下几个问题想请教您：

为什么我用您开源的代码训练400000个iter需要5天（我是A100的显卡），而不是论文说的3天，是开源的配置文件有误吗？因为我发现chunk貌似设置的都很小。
我自己训练的模型感觉效果一般，并不如您在论文里展示的那样好，是开源的配置文件有误吗？
为何387.yaml的iter设置为800000个，其他都是400000个。
大概什么时候公布DDP代码和premodel呢？
如何在in-the-wild video上测试，和humannerf的流程一样吗？

期待您的回复，感谢~

cannot import name 'PRIO_PGRP' from 'os'

I meet this problem and can't find a solution

关于cuda和pytorch版本问题

当我运行这行命令时，python train.py --cfg configs/monohuman/zju_mocap/xxx/xxx.yaml resume False
出现了如下报错信息：
********** Init Trainer ***********
/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/cuda/init.py:104: UserWarning:
NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Save checkpoint to experiments/monohuman/zju_mocap/p386/suject_386/init.tar ...
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /home/zzq/lhe/monohuman/MonoHuman/third_parties/lpips/weights/v0.1/vgg.pth
Load Progress Dataset ...
[Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386
test--movement set--
-- Total Frames: 14

[Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386
test--movement set--
-- Total Frames: 432
Traceback (most recent call last):
File "train.py", line 37, in
main()
File "train.py", line 31, in main
train_dataloader=train_loader)
File "core/train/trainers/monohuman/trainer.py", line 177, in train
net_output = self.network(**data)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "core/nets/monohuman/network.py", line 556, in forward
featmaps, _ = self.feature_extractor(src_imgs.permute(0, 3, 1, 2))
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "core/nets/monohuman/feature_extract/feature_extractor.py", line 247, in forward
x = self.relu(self.bn1(self.conv1(x)))
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/instancenorm.py", line 57, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/functional.py", line 2080, in instance_norm
use_input_stats, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
我怀疑是cuda和pytorch版本的问题。
我使用的环境中cuda版本为11.7，pytorch版本是1.7.1
请问下您配置的cuda及Pytorch版本是多少

How much memory of GPU is needed for training?

I train Monohuman on a single 16G P100, but it fail with error "cuda out of memory".
May I know how much memory of GPU is needed when training. thanks

How to get the index_a and index_b in config file?

Hi,I noticed that you have different index_a and index_b in each config files from different datasets,I wonder how do you get the specific index_a and index_b?

关于Forward Correspondence Search Module的理解

你好，关于Forward Correspondence Search Module，我想请问一下，训练过程中，是对于单目视频的每一帧，都会去寻找一张keyframe，然后拿这两帧的image feature指导nerf的渲染吗？

About videos in the wild

Dear authors,

Could you show a specific example of the preprocessing for the videos in the wild, e.g., how to generate accurate human mask and how to get the camera matrix?
I saw in the paper, you mentioned that RVM is used for mask generation. However, the mask we obtained values between 0 and 1. Is a hard mask required for this task or we can just used the matting results?

Thanks

about joints -= pelvis_point[None, ]

Hi！
Thanks for the excellent work. When I use

python run.py \
    --type text \
    --cfg configs/monohuman/zju_mocap/394/394.yaml \
    text.pose_path path/to/pose_sequence/backflip.npy 
```a problem occurs. I want to ask why I need to add this line of code` joints -= pelvis_point[None, ]，`

Help Regarding Skeleton Plotting on Image with transformation

Hello @Yzmblog,

Thanks for your previous help with "perspective camera matrix".

This is a big request, so I hope you can check these.
Now, I want to plot the 3d smpl joints on the output images that we get. I used the "extrinsic" matrix from the function

MonoHuman/core/data/monohuman/freeview.py

Line 277 in 6429fdb

K, E = self.get_freeview_camera(

I tried to plot it, but apparently, I am getting upside down skeleton, so I think there is something issue with the matrix that I am using . Can you verify it once. It would be really helpful

Note: Even if I just get the joints on the image without the skeleton, thats more than enough. If you have any files or function of your own, feel free to let me know

Data used

  skeleton_tree = {
      'color': [
          'k', 'r', 'r', 'r', 'b', 'b', 'b', 'k', 'r', 'r', 'r', 'b', 'b', 'b',
          'y', 'y', 'y', 'y', 'b', 'b', 'b', 'r', 'r', 'r'
      ],
      'smpl_tree': [[ 0, 1 ],[ 0, 2 ],[ 0, 3 ],[ 1, 4 ],[ 2, 5 ],[ 3, 6 ],[ 4, 7 ],[ 5, 8 ],[ 6, 9 ],[ 7, 10],
          [ 8, 11],[ 9, 12],[ 9, 13],[ 9, 14],[12, 15],[13, 16],[14, 17],[16, 18],[17, 19],[18, 20],[19, 21],[20, 22],[21, 23]]
  }
  with open('/home/ndip/humannerf/dataset/zju_mocap/390/mesh_infos.pkl','rb') as f:
      data = pickle.load(f, encoding='latin1')
      frame120_data= data['frame_000120']
  joints = frame120_data['joints'].astype(np.float32)
  joints = joints.reshape(24,3)
  joints3d = joints[:, [0,2,1]]

  E = np.array([[-0.21479376,  0.97530604, -0.05139818, -0.58367276],
         [-0.28440348, -0.01211552,  0.95862813,  1.02107571],
         [ 0.93433307,  0.22052516,  0.27998276,  2.72222895],
         [ 0.        ,  0.        ,  0.        ,  1.        ]])
  
  K  = np.array([[537.1407 ,   0.     , 271.4171 ],
         [  0.     , 537.7115, 242.44179],
         [  0.     ,   0.     ,   1.     ]])
  
  R = E[:3, :3]
  t = E[:3, 3]

Transformation from 3d to 2d

  P = np.matmul(K, np.hstack((R, t.reshape(-1, 1))))
  N_poses = joints3d.shape[0]
  homogeneous_coords = np.concatenate((joints3d, np.ones((N_poses, 1))), axis=1)
  points_new = np.matmul(P, homogeneous_coords.T).T
  points_new /= points_new[:, 2:]  # Normalize the homogeneous coordinates
  points_new = points_new[:, :2]

For plotting the skeleton, I used the following function

  def plotSkel2D(pts,
                 config=skeleton_tree,
                 ax=None,
                 linewidth=2,
                 alpha=1,
                 max_range=1,
                 imgshape=None,
                 thres=0.1):
      if len(pts.shape) == 2:
          pts = pts[None, :, :]  #(nP, nJ, 2/3)
      elif len(pts.shape) == 3:
          pass
      else:
          raise RuntimeError('The dimension of the points is wrong!')
      if torch.is_tensor(pts):
          pts = pts.detach().cpu().numpy()
      if pts.shape[2] == 3 or pts.shape[2] == 2:
          pts = pts.transpose((0, 2, 1))
      # pts : bn, 2/3, NumOfPoints or (2/3, N)
      if ax is None:
          fig = plt.figure(figsize=[5, 5])
          ax = fig.add_subplot(111)
      if 'color' in config.keys():
          colors = config['color']
      else:
          colors = ['b' for _ in range(len(config['smpl_tree']))]
  
      def inrange(imgshape, pts):
          if pts[0] < 5 or \
             pts[0] > imgshape[1] - 5 or \
             pts[1] < 5 or \
             pts[1] > imgshape[0] - 5:
              return False
          else:
              return True
  
      for nP in range(pts.shape[0]):
          for idx, (i, j) in enumerate(config['smpl_tree']):
              if pts.shape[1] == 3:  # with confidence
                  if np.min(pts[nP][2][[i, j]]) < thres:
                      continue
                  lw = linewidth * 2 * np.min(pts[nP][2][[i, j]])
              else:
                  lw = linewidth
              if imgshape is not None:
                  if inrange(imgshape, pts[nP, :, i]) and \
                      inrange(imgshape, pts[nP, :, j]):
                      pass
                  else:
                      continue
              ax.plot([pts[nP][0][i], pts[nP][0][j]],
                      [pts[nP][1][i], pts[nP][1][j]],
                      lw=lw,
                      color=colors[idx],
                      alpha=1)
          # if pts.shape[1] > 2:
          ax.scatter(pts[nP][0], pts[nP][1], c='r')
      if False:
          ax.axis('equal')
          plt.xlabel('x')
          plt.ylabel('y')
      else:
          ax.axis('off')
      return ax

LASTLY

  def vis_skeleton_single_image(image_path, keypoints):
      img = cv2.imread(image_path)
      kpts2d = np.array(keypoints)
  
      _, ax = plt.subplots(1, 1)
      ax.imshow(img[..., ::-1])
      H, W = img.shape[:2]
  #    plotSkel2D(kpts2d, ax = ax)
      plotSkel2D(kpts2d, skeleton_tree, ax = ax,  linewidth = 2, alpha = 1, max_range = 1, thres = 0.5 )
      plt.show()
  img_path = "/home/ndip/humannerf/experiments/human_nerf/zju_mocap/p390/adventure/latest/freeview_120/000000.png"
  vis_skeleton_single_image(img_path, points_new)

Result is this:

I am unsure, why it flipped, because when I just used the PlotSkel2D function, I got this

Could this be the transformation of points or the plotting is an issue??? Kindly help.
Thank you @Yzmblog

Error in the Evaluation Code

Hi, thank you for sharing great work.

It seems like the function run_eval in run.py does not run. The error occurs in the line 247 since the ray_mask is 1-dim tensor and the pred_img_norm is 2-dim tensor. And also, from the renderig result maked by ray_img, the SSIM and LPIPS cannot be evaluated. Can you tell me how did you exactly evaluate the renderings?

Question about observation bank

@Yzmblog

Hello! MonoHuman is excellent work.

I have a question about the implementation of the observation bank. From the code snippet, I find you use manually predefined two frames of index a and b as front and back. In section 3.3 of your paper, you said, "Then, we find the k pairs with the closest pose from these two sets." In my opinion, MonoHuman should match all poses in the videos and then compare the texture map's completeness in the k pairs, which seems different from the implementation in the code.

Do I misunderstand something? Besides, can you offer the script to choose the keyframe (i.e. index_a and index_b)?