Comments (14)
@dhecloud We predict (u, v, d) directly. _transform_pose
is used to convert normalized coordinates (u, v, d) in cropped image into (u, v, d) in full image coordinates.
from region-ensemble-network.
@dhecloud Hi, your comprehension is right. The augmentation are all 2D transformation. So they can be easily applied on the cropped 2D image just like other 2D image tasks. Note that the labels should be also changed according to the transformation.
from region-ensemble-network.
@guohengkai Thanks for your reply.
I have some confusion regarding the MSRA dataset. In your code the bin files are loaded into a numpy array of shape (240,320). This is what i assume to be the x and y coordinate of the image, containing a single channel for depth.
However in joints.txt the ground truth (x,y,z) have negative values for x and y. For eg, x for P0/5/00000_depth.bin is -0.747919 and y is -51.5306. Shouldnt the ground truth be >= 0 or am i interpreting this wrongly?
from region-ensemble-network.
@dhecloud I think this is because the hands of some images in MSRA dataset are out of FOV. You can view the image to check it. I'm not very convinced.
from region-ensemble-network.
@dhecloud In guohengkai's code, the labels are in the format of (u, v, d) where u and v are pixel coordinates in the image. For example, u \in [0, 319] and v \in [0, 239] for MSRA dataset. You should first convert the groundtruth labels from (x, y, z) to (u, v, d).
from region-ensemble-network.
@guohengkai @xinghaochen oh i see. that clarifies a lot as i thought xyz and uvd were interchangeable.
One more question. In your training, do you:
- predict (u,v,d) directly,
- or predict (x,y,z) then convert to (u,v,d) when drawing the pose?
It seems like you did the second one then used _transform_pose
to change to (u,v,d). Just wanted to clarify.
Thanks a lot for your help!!
from region-ensemble-network.
@xinghaochen @guohengkai thanks a lot for your help!! time to start training. :)
from region-ensemble-network.
@xinghaochen Hi, could i check if my way of converting to uvd is correct? i got the the formula from your other repo which collated the all the hand pose research.
def world2pixel(x):
fx, fy, ux, uy = 241.42, 241.42, 160, 120
x[:, 0] = x[:, 0] * fx / x[:, 2] + ux
x[:, 1] = x[:, 1] * fy / x[:, 2] + uy
return x
When i tried visualizing the ground truth with opencv2 it gives me weird thumb joints
Thank you so much for your help!!
EDIT: From closer introspection, seems like world2pixel is giving me horizontally flipped coordinates i.e. if i flip the depth image and then draw (u,v,d) it works well.
Is this normal or did i miss some step in the conversion to (u,v,d)?
from region-ensemble-network.
@dhecloud Hi, it seems the sample images posted by you are from MSRA dataset. As far as I can remember, the coordinate system for pose annotations of MSRA dataset is a bit different from the traditional one. I first multiply y and z with -1 and convert xyz to uvd, which is exactly the same as your situation of horizontally flipped coordinates.
It's ok to transform the coordinates as long as it's corresponding to the depth image.
from region-ensemble-network.
@xinghaochen Hi, thanks for your help. Just a minor follow up, what did you set the probability of transforming your input depth to? I set mine to 60% chance to randomly translate/rotate/scale but the training does not seem to be going well.
from region-ensemble-network.
@dhecloud Hi, we set the probability to 100%, that is, each sample will go through random translation, rotation and scaling before being fed into the network for training.
How is the performance without data augmentation? You may first make sure the training without data augmentation works well and then add the data augmentation, so that you can find out whether the problem comes from data augmentation or not. If that's the case, what are the parameters of the ranges of random translation/rotation/scaling?
from region-ensemble-network.
@xinghaochen Without augmentation, smoothL1 loss goes down to around 6-10 after 150 epoches. What was your loss at the end of the training? i will probably redo all the training again.
I tried to follow the parameters in your paper. random [-10,10] horizontal and random [-10,10] vertical translation, rotate about [-180, 180] degrees. For scaling, i did not randomize between 0.9 and 1.1, but instead randomed one of these values [0.9,0.96,1, 1.04, 1.1].
Could i clarify something; i read in this issue that you did the augmentation after cropping to 96x96. I did it the same way too. For eg, if i horizontally translate the 96x96 input to the right by 10 pixels, I would add 10 to the u coordinates for the corresponding joint. Is this right? Does translating the resized 96x96 depth image by 10 to the right correspond to the joints moving to the right by 10 too?
from region-ensemble-network.
@dhecloud Have you normalized the joints according to the cropping? If so, the things you have done are right.
from region-ensemble-network.
@guohengkai Hi, no, i don't think so. All i did was to convert xyz to uvd for the joints. This might be the reason why. The process is same as in _crop_image right?
Edit: this is my code for normalizing the joint:
def _normalize_joints(joints, center, is_debug=False):
_fx, _fy, _ux, _uy = 241.42, 241.42, 160, 120
_cube_size = 150
_input_size = 96
xstart = center[0] - _cube_size / center[2] * _fx
xend = center[0] + _cube_size / center[2] * _fx
ystart = center[1] - _cube_size / center[2] * _fy
yend = center[1] + _cube_size / center[2] * _fy
src = [(xstart, ystart), (xstart, yend), (xend, ystart)]
dst = [(0, 0), (0, _input_size - 1), (_input_size - 1, 0)]
trans = cv2.getAffineTransform(np.array(src, dtype=np.float32),
np.array(dst, dtype=np.float32))
joints = get_translated_points(joints.reshape(21,3),trans)
return joints
def get_translated_points(joints, M): #to get new coordinate after applying transformation mat
for i in range(len(joints)):
x = joints[i][0]
y = joints[i][1]
joints[i][0] = M[0,0]*x+ M[0,1]*y + M[0,2]
joints[i][1] = M[1,0]*x + M[1,1]*y + M[1,2]
return joints
Drew it on the 96x96 and it looks fine. Just a question though, the d in u,v,d is largely untouched. d is usually in the 200-300 range while u and v is 0-96. This difference is fine right? Since the u and v coordinates are the most important when predicting
from region-ensemble-network.
Related Issues (20)
- How to normalize the depth image in ICVL dataset?
- Install pyrealsense HOT 3
- hand_model.detect_image() error HOT 3
- solver.prototxt HOT 2
- Visualization part lack argumentparser
- Training codes using REN HOT 4
- Annotation error on NYU dataset HOT 1
- Speed and GPU memory HOT 1
- Can you release the training code? HOT 18
- trianing code about ITOP dataset HOT 1
- label normalization HOT 16
- test code of live results from Kinect 2 HOT 1
- 手关节点做约束了吗?谢谢 HOT 1
- 左右手都可以回归的很好吗?或者说只支持其中一只手 HOT 1
- Can GAP Layer replace FC layer? HOT 1
- There is a little mistake in computer_error.py
- About the loss function HOT 1
- nyu数据集下载总是中断,请问除了官方的数据源还有没有其他下载方式或者国内的镜像?十分感谢!! HOT 4
- nyu数据集中joint_uvd中深度值与深度图中对应位置深度的值不匹配? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from region-ensemble-network.