Git Product home page Git Product logo

markerar's Introduction

Marker

Implement and play with marker detection.

Knowledge is from book "Augmented Reality: Principles and Practice", and online resources (commented in code).


PC Version

Spiderman! 😨😁😝

markerPC2


Marker Detection

The marker I'm using is simply like this:

marker

Detailed Steps
  1. Convert image to gray scale on GPU compute shader
    Dot product of image color with (0.299, 0.587, 0.114)
    Optionally, I also have applied a simple blur to make image stable

  2. Convert to binary image by thresholding on GPU compute shader
    Threshold value from mipmap top level (automatically averaged), or manually configure

  3. Trace closed contour (only one) on CPU
    I'm using Theo Pavlidis' Algorithm
    Thresholds are applied to filter out small and non-rectangular contours

  4. Fit quadrilateral to the closed contour and get 4 corners
    Following the algorithm mentioned in Chapter 4
    Similar to OpenCV's Ramer–Douglas–Peucker algorithm but simpler

  5. Determine orientation by sampling near the corners

Pose Estimation

Starting from here, you will need to calibrate your webcam and get intrinsic matrix and distortion coefficients. I use OpenCV to do this step since it is convenient. Details can be found in OpenCV/camera.py.

Some Math

Let's say your have the 3x3 camera matrix K, 4 detected marker corners on camera image are p.
Corners in virtual world space are q (usually something like [1,1,0,1]).
What we are looking for is a 3x4 transformation matrix M (with rotation R and translation t) such that:

p = K M q
M = [R t]
  = [r1 r2 r3 t]

The fact that all corners are on a plane allows to eliminate Z axis and use homography estimation:

p = H q'

For example, if q is [qx,qy,0,1], then q' can be [qx,qy,1].
Since p' and Hq' are in same direction, cross product is zero:

p' x (H q') = 0

Expand the left side and we get the following:

[
   q1.x, q1.y, 1.0f, 0.0f, 0.0f, 0.0f, -p1.x*q1.x, -p1.x*q1.y, -p1.x,
   0.0f, 0.0f, 0.0f, q1.x, q1.y, 1.0f, -p1.y*q1.x, -p1.y*q1.y, -p1.y,
   .....
   q4.x, q4.y, 1.0f, 0.0f, 0.0f, 0.0f, -p4.x*q4.x, -p4.x*q4.y, -p4.x,
   0.0f, 0.0f, 0.0f, q4.x, q4.y, 1.0f, -p4.y*q4.x, -p4.y*q4.y, -p4.y,
] * 
[
   h1,h2,h3,h4,h5,h6,h7,h8,h9
] = 0

H = 
[
   h1 h2 h3
   h4 h5 h6
   h7 h8 h9
]

The matrix A on the left has size 8x9, h is a vector of size 9.
Run SVD on A and get U D V^T and h is the last column in matrix V.
Reconstruct 3x3 matrix H, and we need to extract R and t from it:

HK = K^-1 H
n = ( norm(HK.col0) + norm(HK.col1) ) / 2
t = H.col2 / n
r1 = normalize( HK.col0 )
r2 = normalize( HK.col1 )
r3 = normalize( r1 x r2 )
R = [r1 r2 r3]
M = [R t]

Note that there are many ways to do it. Here I illustrate a simple one I found online.
Besides, you may also need to undistort p first given camera distortion coefficients (Details can be found in OpenCV source code of function undisortPoints).

Finally, for any new point b in 3d world space, its mapping on screen will be computed from K M b.

Further details of my implementation can be found in PC/src/markerpose.cpp.

Pose Refinement

However, with only homography estimation, the result may have large pixel error, since SVD is just an approximation of the solution of the equation above.

Suggested by the book, I implemented Levenberg-Marquardt's algorithm to refine the matrix M.

Some More Math

Let's say the objective is to minimize:

sum ( p - K M q )^2

This is equivalent to:

sum ( kp - M q )^2
kp = K^-1 p

where M is the variable to update.
Suppose an update of d on M moves closer to the local minimum:

sum ( kp - M q - J d )^2

where J is the Jacobian matrix of function M q with respect to M.
Take derivative of it and set to zero, we get the following linear equation:

( J^T J ) d = J^T ( kp - M q )

Levenberg's version:

( J^T J + lambda diag( J^T J ) ) d = J^T ( kp - M q )

where lambda is a damping factor adjusted at each iteration.
Finally, iteratively solve for d and update M until max iterations or minimum error.

Further details can be found in PC/src/markerLM.cpp.

Some Debugging

The above does not work perfectly in my implementation.
When Y-axis (in OpenGL coordinate) value increases, the projected points tend to move to lower left (45 degrees). And this is independent of the camera rotation.
After a long time of debugging, I can't figure out why, but a quick fix is the following:

vec3 screenPos = poseM * pos;
screenPos.xy -= 0.5 * screenPos.z;

After projecting a point, shift its XY values by half of its Z value, and now it looks perfect!
This bug should be caused by the transformation matrix M, which then traces back to homography approximation and decomposition. A lot more to learn to understand why.

markerar's People

Contributors

teamclouday avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.