Marker

Implement and play with marker detection.

Knowledge is from book "Augmented Reality: Principles and Practice", and online resources (commented in code).

PC Version

Spiderman! 😨😁😝

Marker Detection

The marker I'm using is simply like this:

Detailed Steps

Convert image to gray scale on GPU compute shader
Dot product of image color with (0.299, 0.587, 0.114)
Optionally, I also have applied a simple blur to make image stable
Convert to binary image by thresholding on GPU compute shader
Threshold value from mipmap top level (automatically averaged), or manually configure
Trace closed contour (only one) on CPU
I'm using Theo Pavlidis' Algorithm
Thresholds are applied to filter out small and non-rectangular contours
Fit quadrilateral to the closed contour and get 4 corners
Following the algorithm mentioned in Chapter 4
Similar to OpenCV's Ramer–Douglas–Peucker algorithm but simpler
Determine orientation by sampling near the corners

Pose Estimation

Starting from here, you will need to calibrate your webcam and get intrinsic matrix and distortion coefficients. I use OpenCV to do this step since it is convenient. Details can be found in OpenCV/camera.py.

Some Math

Let's say your have the 3x3 camera matrix K, 4 detected marker corners on camera image are p.
Corners in virtual world space are q (usually something like [1,1,0,1]).
What we are looking for is a 3x4 transformation matrix M (with rotation R and translation t) such that:

p = K M q
M = [R t]
  = [r1 r2 r3 t]

The fact that all corners are on a plane allows to eliminate Z axis and use homography estimation:

p = H q'

For example, if q is [qx,qy,0,1], then q' can be [qx,qy,1].
Since p' and Hq' are in same direction, cross product is zero:

p' x (H q') = 0

Expand the left side and we get the following:

[
   q1.x, q1.y, 1.0f, 0.0f, 0.0f, 0.0f, -p1.x*q1.x, -p1.x*q1.y, -p1.x,
   0.0f, 0.0f, 0.0f, q1.x, q1.y, 1.0f, -p1.y*q1.x, -p1.y*q1.y, -p1.y,
   .....
   q4.x, q4.y, 1.0f, 0.0f, 0.0f, 0.0f, -p4.x*q4.x, -p4.x*q4.y, -p4.x,
   0.0f, 0.0f, 0.0f, q4.x, q4.y, 1.0f, -p4.y*q4.x, -p4.y*q4.y, -p4.y,
] * 
[
   h1,h2,h3,h4,h5,h6,h7,h8,h9
] = 0

H = 
[
   h1 h2 h3
   h4 h5 h6
   h7 h8 h9
]

The matrix A on the left has size 8x9, h is a vector of size 9.
Run SVD on A and get U D V^T and h is the last column in matrix V.
Reconstruct 3x3 matrix H, and we need to extract R and t from it:

HK = K^-1 H
n = ( norm(HK.col0) + norm(HK.col1) ) / 2
t = H.col2 / n
r1 = normalize( HK.col0 )
r2 = normalize( HK.col1 )
r3 = normalize( r1 x r2 )
R = [r1 r2 r3]
M = [R t]

Note that there are many ways to do it. Here I illustrate a simple one I found online.
Besides, you may also need to undistort p first given camera distortion coefficients (Details can be found in OpenCV source code of function undisortPoints).

Finally, for any new point b in 3d world space, its mapping on screen will be computed from K M b.

Further details of my implementation can be found in PC/src/markerpose.cpp.

Pose Refinement

However, with only homography estimation, the result may have large pixel error, since SVD is just an approximation of the solution of the equation above.

Suggested by the book, I implemented Levenberg-Marquardt's algorithm to refine the matrix M.

Some More Math

Let's say the objective is to minimize:

sum ( p - K M q )^2

This is equivalent to:

sum ( kp - M q )^2
kp = K^-1 p

where M is the variable to update.
Suppose an update of d on M moves closer to the local minimum:

sum ( kp - M q - J d )^2

where J is the Jacobian matrix of function M q with respect to M.
Take derivative of it and set to zero, we get the following linear equation:

( J^T J ) d = J^T ( kp - M q )

Levenberg's version:

( J^T J + lambda diag( J^T J ) ) d = J^T ( kp - M q )

where lambda is a damping factor adjusted at each iteration.
Finally, iteratively solve for d and update M until max iterations or minimum error.

Further details can be found in PC/src/markerLM.cpp.

Some Debugging

The above does not work perfectly in my implementation.
When Y-axis (in OpenGL coordinate) value increases, the projected points tend to move to lower left (45 degrees). And this is independent of the camera rotation.
After a long time of debugging, I can't figure out why, but a quick fix is the following:

vec3 screenPos = poseM * pos;
screenPos.xy -= 0.5 * screenPos.z;

After projecting a point, shift its XY values by half of its Z value, and now it looks perfect!
This bug should be caused by the transformation matrix M, which then traces back to homography approximation and decomposition. A lot more to learn to understand why.

teamclouday / markerar Goto Github PK

markerar's Introduction

Marker

PC Version

Marker Detection

Pose Estimation

Pose Refinement

Some Debugging

markerar's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent