I'm very intrigued by your paper and this method but when I try to run your code on my own images, I get incorrect results. I'm using synthetic images rendered from Blender. The scene and stereo camera rig looks as follows.
When I run example.py
using my own images, intrinsics matrices A1
, A2
, and extrinsics matrices RT1
, RT2
, I get this incorrect result:
When I switch the images, such that img1
becomes the image taken by the right camera and img2
becomes the image taken by the left camera (leaving the matrices as before!), I get this result where the images appear to have been "straightened out", yet they still don't line up w.r.t. their y-values.
Moreover, this does not align with your example use, where img1
is the left image and img2
the right image:
|
img1 = cv2.imread("img/left.png") # Left image |
|
img2 = cv2.imread("img/right.png") # Right image |
I'm rather confused by this, and I'm wondering whether my extrinsics matrices are following a different convention than the one you intend. However, in your paper you state
Call $\mathbf{o}_1 = -\mathbf{R}_1^{-1} \cdot \mathbf{t}_1$ the position of the optical center in the WCS.
You also write
Let $\mathbf{A}_1 \in \mathbb{R}^{3 \times 3}$ be the intrinsic matrix of the left camera, with $\mathbf{R}_1 \in \mathbb{R}^{3 \times 3}$ and $\mathbf{t}_1 \in \mathbb{R}^{3 \times 3}$ its rotation matrix and translation vector, respectively, describing the position of the first Camera Coordinate System (CCS) with respect to World Coordinate System (WCS)
which, to me seems to characterise $\mathbf{t}_1$ as the position of the first camera's optical center, expressed in world coordinates, which seems to be in conflict with "the position of the optical center in the WCS" actually being $\mathbf{o}_1 = -\mathbf{R}_1^{-1} \cdot \mathbf{t}_1.$ In my understanding, $\mathbf{t}_1$ describes the world origin expressed in the coordinate system of the left camera, i.e. camera 1. Am I mistaken there?
In any case, if I do the sanity check to compute
R1 = RT1[:3, :3]
t1 = RT1[:3, 3]
o1 = -R1.T @ t1
I do indeed get that o1
matches the world coordinates of the camera centre (bpy.data.objects.get("cam_L").matrix_world.translation
). In case it's relevant, the extrinsics matrices have been extracted via
# Get cameras
cam_L = bpy.data.objects.get("cam_L")
cam_R = bpy.data.objects.get("cam_R")
# Obtain extrinsic matrix
RT1 = np.array(cam_L.matrix_world.inverted())
RT2 = np.array(cam_R.matrix_world.inverted())
What might I be doing wrong here? I'd appreciate any advice or guidance. Thank you! :)