davrempe / humor Goto Github PK
View Code? Open in Web Editor NEWCode for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"
License: MIT License
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"
License: MIT License
Hi, Davis, thanks for sharing with us your excellent work.
I followed the instructions in README.md to run python humor/fitting/run_fitting.py @./configs/fit_imapper.cfg
in order to fit the whole iMapper dataset. At first, the test-time optimization process went well, but after a long time, the process terminated and reported an error. I noticed that the loss was abnormally large (the loss soared in the 30th iteration in Stage 3, and after a while, the process died). I repeated the optimization several times, but no process survived. The total run time varied. Some lasted 3h, and some lasted 6h.
I would appreciate it if you would give me some advice. Looking for your early reply.
could you give us the loss curve?
like kl_loss/contacts_loss/regr_trans_loss....
Traceback (most recent call last):
File "humor/fitting/viz_fitting_rgb.py", line 467, in
main(args)
File "humor/fitting/viz_fitting_rgb.py", line 342, in main
img_extn=IM_EXTN)
File "/data/amogh/tcs_related/humor/humor/fitting/../viz/utils.py", line 182, in viz_smpl_seq
default_cam_rot=cam_rot)
File "/data/amogh/tcs_related/humor/humor/fitting/../viz/mesh_viewer.py", line 108, in init
self.viewer = pyrender.OffscreenRenderer(*self.figsize, point_size=2.75)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyrender/offscreen.py", line 31, in init
self._create()
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyrender/offscreen.py", line 149, in _create
self._platform.init_context()
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyrender/platforms/pyglet_platform.py", line 52, in init_context
width=1, height=1)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyglet/window/xlib/init.py", line 173, in init
super(XlibWindow, self).init(*args, **kwargs)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyglet/window/init.py", line 606, in init
context = config.create_context(gl.current_context)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyglet/gl/xlib.py", line 204, in create_context
return XlibContextARB(self, share)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyglet/gl/xlib.py", line 314, in init
super(XlibContext13, self).init(config, share)
File "/data/groot/miniconda3/envs/humor_env/lib/python3.7/site-packages/pyglet/gl/xlib.py", line 218, in init
raise gl.ContextException('Could not create GL context')
pyglet.gl.ContextException: Could not create GL context
Thx
Cannot connect to download.cs.stanford.edu/orion/humor/checkpoints.zip
Hi, davrempe
Thanks for your generous share of this wonderful work, I have tested the model on my own video, the results are excellent except the consumiing time~
I wonder if i can get the absolute 3d coordinates of the keypoints from the output? thanks~
I want to run the test-time optimization faster on my machine. Can we change batch-size to 32 and reduce num-iters to 3 8 7 in configs/fit_imapper.cfg?
I'm wondering if it's possible to use other pretrained 17 points (COCO format) 2D detection methods instead?
Hi Davis,
may I ask how to run the denoise experiment as I have not found a cfg file for denoising experiment.
Thanks a lot
Hi Davis,
Thanks for this great work. I tried running this on a video sampled at 2fps and noticed a reduction in the quality of the reconstruction. Is this expected? Do you have suggestions on improving the results at low fps? Here is a sample result at 30fps vs 2fps
Thank you
your code can't work when “python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_use_split.cfg”.
bug of UnboundLocalError: local variable 'body_model_path' referenced before assign, after I fixed this bug, there is another bug "IndexError: list index out of range"
Hi, thanks for your wonderful work. With an external camera motion estimation result and the reprojected 3d poses as input, can the test time motion optimization be applied on this dynamic carema scenario directly? If it is possible, would you please give me some advice. Thanks, and hope for your reply!
Very high-quality HMR! I am really interested in it.
Any idea about cut down the optimization time which is quite long on my PC.
Hello, thank you for this great work!
I have a question about the reproducibility of the results on the i3DB dataset: when I use your code, the final global joint error is 33.5cm and I have 34.3 for Vposer-t (which are respectively equal to 28.15 and 31.59 in the paper).
Am I missing something or were your testing settings any different from those in the code?
By the way, is this expected to obtain only a 1cm gap between VPoser-t and HuMoR since the CVAE prior seems to be crucial for good predictions?
Thank you!
Lines 25 to 35 in b86c2d9
For example, AMASSFitDataset
is not supported. There might be an elegant way to do this by inspecting the name spaces of all the files in the library that this function should support, but to be honest a hardcoded dictionary might be better because it will at least fail in a predictable and easy to debug way.
Hello, thank you for this great work!
In stage3, I found the motion prior and init motion prior loss can be negative. As they are log-likelihood, is this Normal ?
Hello and thank you for your contribution. I would like to know what is the cause of my foot sinking and calf crossing that leads to my result, and does it need to be retrained to eliminate it?
As the title said,building openpose on ubuntu is hard for me.
Is it possible to use lightweight-openpose?
Hi, Davis,
You really did a wonderful job for both your paper and your codebase! Thanks for sharing the code, which is well-organized and well-annotated. I love it and I have learned a lot.
I have some trouble understanding the transformation of coordinate systems, eg. compute_world2aligned_mat and compute_cam2prior. Could you explain a little bit about the default setup of each coordinate system, and why we need to transform?
Besides, what is the meaning of "prior frame" and "canonical frame"? Do they all refer to reconstructing motions in the world coordinate system? If not, what's the relationship or connection between the prior frame and the canonical frame?
Looking forward to your reply.
Just opened this issue to post a solution I've found to load the data in blender.
For more information on how to use it, go to https://www.patreon.com/posts/90121986
but is basically changing the path to the npz file and the path to the SMPL fbx file on lines 17 and 18
Hi @davrempe, nice work! As mentioned in readme, current scripts can return a smooth sequence. I want to optimize body and hand pose together, could you give me some advise on the optimization pipeline?
At first, I want to congratulate the creators on their excellent work! I find it very useful and interesting!
I managed to run Humor on RGB videos and images and I wonder what are the steps for the creation of an .fbx file of the generated animated mesh (e.g. to use it in Blender or Unity environments).
I would appreciate any help/ideas.
Thank you in advance!
Hello, Thank you for your great work
Can you provide a Colab Notebook Demo, it will be helpful for everyone and this repo will be easy to use
Thank you
Your work is really marvellous!
The performance of HuMoR is really good,and I want to use it as a part of the preprocess in my work to predict the smpl parameters.But the output result format is different with SMPL,which has extra parameters named "floor_plane" and "contract".
Can I transfer the result format to SMPL which has only 4 knids of parameters? i.e:[betas,transl,global_orient,body_pose]
Looking forward to your reply!
Best Wishes!
Hi Davis Rempe, I am focusing your work since your last paper "Contact and Human Dynamics from Monocular Video". Thanks for your great work to human estimation area.
I run demo code and meet two problems. The first problem is about opengl version which leads to pyrender failure, I solve this problem by editing opengl version.
The second problem is about "xcb" library. After optimization process, I want to visualize final results and ubuntu system reports this error:
I search on the Internet, but can not find the solution.
Have been trying on both windows and Linux to download all reqirements and compile to make it work, but everytime something goes wrong. It seems like it needs to updating. Its a pity cause I would really like to use it
Hi Davis,
I found run_fitting.py
only stitches the last batch results here, as the seq_names
are read out from the current batch data here. So it means if I split the whole video into 10 subsequences but only set batch_size to 2, the first 8 subsequences will be discarded
in the final result.
I believe it will need a global variable outside of the data batch loop to cache all the cur_res_out_paths
.
-Jimei
SMPL_JOINTS is defined as
SMPL_JOINTS = {'hips' : 0, 'leftUpLeg' : 1, 'rightUpLeg' : 2, 'spine' : 3, 'leftLeg' : 4, 'rightLeg' : 5, 'spine1' : 6, 'leftFoot' : 7, 'rightFoot' : 8, 'spine2' : 9, 'leftToeBase' : 10, 'rightToeBase' : 11, 'neck' : 12, 'leftShoulder' : 13, 'rightShoulder' : 14, 'head' : 15, 'leftArm' : 16, 'rightArm' : 17, 'leftForeArm' : 18, 'rightForeArm' : 19, 'leftHand' : 20, 'rightHand' : 21}
and then SMPL_PARENTS is defined as
SMPL_PARENTS = [-1, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 12, 12, 13, 14, 16, 17, 18, 19]
According to their definition, It seems that the parent of 'leftShoulder' is 'leftToeBase' and the parent of 'rightShoulder' is 'leftShoulder'. This is too strange, or do I make a mistake?
Hi, I am following your data processing pipeline for AMASS dataset. I saw the out_fps = 30 by default. I am wondering if I want data to be sampled at 60 fps, can I just change out_fps = 60 in the code below? I am new to this, I am not sure if there will be any storage issues, etc...
Thanks!
humor/humor/scripts/process_amass_data.py
Line 26 in fc6ef84
Add flag to run_fitting
that allows passing in pre-detected OpenPose .json
files to be used rather than running OpenPose directly before fitting.
Support only OpenPose format resulting from their --write_json
flag with the BODY_25
skeleton and unnormalized 2d keypoints. Note that fitting only uses the body_keypoints_2d
of the first detected person in the file.
What are sequences based on terrain interaction?
Hi Davis,
Thanks for this great work! After doing some tests on videos with motion like waving hands, it seems that HuMoR over smooth the motions so that the range of motions become smaller. Is it possible to deal with this problem by changing the config file?
run the demo...
Traceback (most recent call last):
File "humor/fitting/run_fitting.py", line 439, in
main(args, config_file)
File "humor/fitting/run_fitting.py", line 175, in main
video_name=vid_name
File "/home/junwei/zjw/humor/humor/fitting/../datasets/rgb_dataset.py", line 59, in init
self.data_dict, self.seq_intervals = self.load_data()
File "/home/junwei/zjw/humor/humor/fitting/../datasets/rgb_dataset.py", line 146, in load_data
joint2d_data = np.stack(keyp_frames, axis=0) # T x J x 3 (x,y,conf)
File "<array_function internals>", line 6, in stack
File "/home/junwei/anaconda3/envs/humor_env/lib/python3.7/site-packages/numpy/core/shape_base.py", line 422, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
what happen?
When I build a RGBVideoDataset from a video, I find that the design of overlap_len seems wrong.
Assume that seq_len is 5 and overlap_len is 2, the first subseq's seq_interval should be (0,5) and the second subseq's seq_interval should be (3,8). Only in this way, the length of overlap is 2, that is, the part of overlap is the frame 3 and frame 4.
But!!!!!!Your code's result actually is (0,5) and (2,7), and the length of overlap is 3, that is, the part of overlap is the frame 2, 3 and 4.
3 not equal to 2!
So, are you sure that the variable overlap_len stands for the length of overlap?
If we train the model with detached gradient between neighbors or with teacher forcing,can I use data from different motion sequence? Under such circumstances, would it be better to use random sampling pairs from different sequences (BxD) rather than putting in a sequence (BxTxD)?
Hi Davis,
I was wondering how can I process the AMASS data(for example HumanEva) to fit the humor model to the 3D keypoints?
Thanks
This is a great project, thanks for your open source, I have two questions
What is the issue this line is referring to
humor/humor/fitting/run_fitting.py
Line 276 in b86c2d9
Thanks a ton!
Hi Davis,
Thanks for this great work. I have tested the model with the demo video and the performence is great. However, when I test it with the video captured by mobile phone, the mesh seems to jitter strangely among frames, the pose and shape is also strange. I have checked that the 2D points produced by OpenPose is correct. The video I used and the result is shown below. I am wondering if this is caused by the camera intrinsics since I use the default intrinsics. Or there is something wrong when I use ./configs/fit_rgb_demo_use_split.cfg during fitting process.
I tried to run the demo by python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_no_split.cfg whilte the memroy error appeared
F0303 11:12:23.966413 8246 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
However, my openpose was operated perfectly.
Here is my GPU description and memory:
**description: VGA compatible controller
product: TU104 [GeForce RTX 2070 SUPER]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:156 memory:b3000000-b3ffffff memory:a0000000-afffffff memory:b0000000-b1ffffff ioport:4000(size=128) memory:b4000000-b407ffff
Dedicated video memory: 8192 MB
Total available memory: 8192 MB
Currently available dedicated video memory: 7664 MB**
Hello and thank you for open sourcing your research. This is really good!
I'd like to know if you already have a script to retarget the output on SMPL fbx template?
In case such script does not exist, would you already have documentation describing the mapping between the output and SMPL skeleton?
Thank you
Hi Davis,
I found the offscreen rendering doesn't work with the required installation. My workaround is to 1) re-installing pyrender with mesa and 2) adding os.environ["PYOPENGL_PLATFORM"] = "osmesa"
to viz_fitting_rgb.py
.
Then there is a related issue. Even if it is set as "RGBA" by default, my renderer only generates the color_image
with three channels, which leads to an error of computing the valid_mask
here and then the composition here.
My fix in mesh_viewer.py
is as follows:
input_img = self.cur_bg_img
if color_img.shape[2] == 4:
output_img = (color_img[:, :, :-1] * color_img[:,:,3:] +(1.0 - color_img[:,:,3:])*input_img)
else:
valid_mask = (depth_img > 0)[:, :, np.newaxis]
output_img = (color_img * valid_mask + (1 - valid_mask) * input_img)
-Jimei
Hi, in prior to the question, thanks for the great work!
While running fitting to RGB videos (test time optimization), I get the error of:
======= iter 69 =======
Optimized sequence 0 in 461.702792 s
Saving final results...
Traceback (most recent call last):
File "humor/fitting/run_fitting.py", line 439, in
main(args, config_file)
File "humor/fitting/run_fitting.py", line 433, in main
body_model_path, num_betas, use_joints2d)
File "/home/humor/humor/fitting/../fitting/fitting_utils.py", line 434, in save_rgb_stitched_result
concat_cam_res[k] = torch.cat([concat_cam_res[k], cur_stage3_res[k][seq_overlaps[res_idx]:]], dim=0)
IndexError: list index out of range
I used the base command that you guys have provided:
python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_no_split.cfg
I can't seem to find why, can you help?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.