Efficient solutions to the absolute pose of cameras with unknown focal length and radial distortion by decomposition to planar and non-planar cases

全文

(1)IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012) [DOI: 10.2197/ipsjtcva.4.78]. Research Paper. Efficient solutions to the absolute pose of cameras with unknown focal length and radial distortion by decomposition to planar and non-planar cases Martin Bujnak1,a). Zuzana Kukelova2. Tomas Pajdla2. Received: July 31, 2011, Accepted: January 4, 2012, Released: May 30, 2012. Abstract: In this paper we present new efficient solutions to the absolute pose problems for cameras with unknown focal length and unknown focal length and radial distortion from four 2D-to-3D point correspondences. We propose to solve these problems separately for non-planar and for planar scenes. By decomposing the problems into these two situations we obtain simpler and more efficient solvers than the previously known general solvers. We demonstrate in synthetic and real experiments significant speedup of our solvers. Especially our new solvers for absolute pose problem for camera with unknown focal length and radial distortion are about 40 × (non-planar) and 160 × (planar) faster than the general solver. Moreover, we show that our specific solvers can be joined into new general solvers, based on performing either planar or non-planar solver according to the scene structure or performing both solvers simultaneously and selecting the better result. Such joined solvers give comparable or even better results than the existing general solvers for planar as well as non-planar scenes. Keywords: minimal solvers, camera absolute pose, focal length, radial distortion. 1. Introduction The Perspective-n-Point (PnP) problem, i.e., the problem of determining the absolute position and orientation of a camera given its intrinsic parameters and a set of n 2D-to-3D point correspondences, is one of the most important problems in computer vision with a broad range of applications in structure from motion [1], [15], [24], recognition [18], [19], video-based rendering [5], robotics, augmented reality and other computer vision problems [14]. One of the oldest papers considering this problem dates back to 1841 [13]. Recently a huge number of solutions to the calibrated PnP problems for three and more than three points have been published [4], [11], [20], [21], [22], [27], [28]. The minimal number of points needed to estimate the camera position and orientation is three for a fully calibrated camera and five and half for a fully uncalibrated, camera. The linear solution to the problem of estimating absolute position and orientation together with five inner calibration parameters of a fully uncalibrated camera from six 2D-3D point correspondences is known as Direct Linear Transform (DLT) [2], [23]. Modern digital cameras have square pixels and the principal point close to the center of the image [14]. Therefore, for most of applications this prior knowledge can be used and four out of the five internal calibration parameters can be safely set to a fixed 1 2. a). Bzovicka 24, 851–07, Bratislava, Slovakia Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, Karlovo namesti 13, 121–35 Prague 2, Czech Republic [email protected]. c 2012 Information Processing Society of Japan . value (the skew to 0, the pixel aspect ratio to 1 and the principal point to the center of the image). Adopting these calibration constraints has several advantages. First, the minimal number of points needed to solve the absolute pose of a camera is reduced. Secondly, since fewer parameters are estimated, the results are more stable. In this paper we use this prior calibration knowledge and provide efficient solutions to two problems of estimating the absolute pose of a camera with unknown focal length from images of four known 3D points. In the first problem we assume the standard pinhole camera model [14] and in the second problem image points affected by some amount of radial distortion. Both solutions are non-iterative and based on Gröbner basis methods [10] for solving systems of polynomial equations. The problem of estimating absolute pose of a camera together with its focal length for image points without radial distortion was firstly solved by Abidi and Chandra [3]. In this paper authors formulated the problem using areas of triangular subdivisions of a planar quadrangle and arrived to a closed form solution which works only for planar scenes. The first solution to this focal length problem, which works for non-planar scenes, was presented by Triggs [26]. This solution uses calibration constraints arising from using the image of the dual absolute quadric and solves resulting polynomial equations using multivariate resultants [10]. The solution works for non-planar scenes but fails for the planar ones. In this paper authors also proposed a solution which handles both planar and non-planar points and is based on eigendecomposition of multiplication matrices, however this solution is numerically unstable. 78.

(2) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). and not practical. The paper [26] also provided a solution to the problem of estimating absolute pose of a camera with unknown focal length and unknown principal point from five 2D-to-3D correspondences. A solution working for both planar and non-planar scenes was proposed only recently [6]. This solution is based on Euclidean rigidity constraint and results in a system of four polynomial equations in four unknowns which are solved using the Gröbner basis method [10]. The problem of estimating absolute pose with unknown focal length from four 2D-to-3D correspondences is not minimal and one additional calibration parameter can be handled in this problem. In Ref. [16] authors included the radial distortion to the problem and proposed a method for solving absolute pose problem for a camera with radial distortion and unknown focal length from four point correspondences based on Gröbner bases. In this paper authors show that in many real applications the consideration of radial distortion brings a significant improvement. The presented solution uses quaternions to parametrize rotations and one parameter division model for the radial distortion [12] and results in five equations in five unknowns. These equations are quite complex and therefore the Gröbner basis method results in relatively large solver (1,134 × 720 matrix) which runs about 70 ms. Therefore, the proposed solver is not really practical in real-time applications. In this paper we solve the both mentioned absolute pose problems, the non-minimal case for the camera with unknown focal length and the minimal case for the camera with unknown focal length and radial distortion. This paper extends the work [7] where only the minimal solution to the absolute pose problem for the camera with unknown focal length and radial distortion was presented. Moreover, in this paper we propose a new general solver, which executes both the planar and the non-planar solver simultaneously and selects the better solution. This solver is more precise then the existing general solver [16] and is also several times faster than this general solver. Both our solutions are based on the parameterization of the camera projection matrix with three respectively four unknowns and on using constraints arising from the properties of the rotation matrix. This parameterization and used constraints work only for non-planar scenes. However, we use part of this parameterization and similar constraints to derive even much simpler solvers for planar cases of both problems. By decomposing the general problem to the non-planar and the planar case we obtain much simpler systems of polynomial equations and therefore also much simpler and more practical solvers for real applications comparing to general solvers [6], [16]. The most significant improvement is in speedup, especially in the case of the minimal absolute pose problem for camera with unknown focal length and radial distortion, where our new solvers are about 40 × (non-planar) and 160 × (planar) faster than the general solver presented in Ref. [16]. All our solutions are based on the Gröbner basis method for solving systems of polynomial equations [10]. Our new solution to the non-planar case of the radial distortion problem requires to perform Gauss-Jordan (G-J) elimination of significantly smaller. c 2012 Information Processing Society of Japan . matrix of size 136 × 152 than Ref. [16] and eigenvalue computation of a 16 × 16 matrix. The planar solver is even simpler and requires G-J elimination of only 12 × 18 matrix. Moreover, the proposed solvers return less solutions, 16 and 6 compared to 24 in Ref. [16], and their MATLAB implementation run-times are about 1 ms which is important for real-time applications and RANSAC. Also in the case of the absolute pose problem for camera with unknown focal length we obtain smaller solvers. The non-planar solver performs G-J elimination of a 23 × 31 matrix and returns 8 solutions, comparing to the elimination of about three times larger matrix with 10 solutions in case of the general solver [6]. In fact our formulation for this case is almost equivalent to the formulation presented in the paper [26]. The difference is only in the technique used to solve the resulting system of polynomial equations. In Ref. [26] the less efficient resultant method was used to solve this system of polynomial equations and results in SVD of 80 × 56 matrix. Our planar solver to this focal length problem is even simpler and involves just a homography calculation and focal length extraction which is linear and we provide simple closed form solution for this case. We show in experiments that our new specialized solvers can be joined to general solvers, based on performing either planar or non-planar solver according to the scene structure or performing both solvers simultaneously and selecting the better result. Such joined solvers give comparable or better results than the existing general solvers [6], [16] for all scenes, including the near-planar ones and are faster than these general solvers. Next we provide new formulations and solutions to both absolute pose problems for both non-planar and planar scenes. We compare these solutions with existing general solvers [6], [16]. By evaluating our solutions on synthetic and real data we show that our solutions are stable and efficient and that our solvers work well in real situations.. 2. Problem Formulation Let us assume the standard pinhole camera model [14]. In this model the image projection ui of a 3D reference point Xi can be written as λi ui = P Xi ,. (1). where P is a 3×4 projection matrix, λi is an unknown scalar value and points ui = [ui , vi , 1] and Xi xi , yi , zi , 1 are represented by their homogeneous coordinates. The projection matrix P can be written as P = K [R | t], (2) 3 is a 3 × 3 rotation matrix, t = t x , ty , tz where R = ri j i, j=1 contains the information about camera position and K is the calibration matrix of the camera. As described in Section 1 we assume that the only unknown parameter from the calibration matrix K is the focal length. There fore, the calibration matrix K has the form diag f, f, 1 . Since the projection matrix is given only up to scale we can equivalently write K = diag [1, 1, w], for w = 1/ f . Using these assumptions the projection Eq. (1) can be written. 79.

(3) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). as. ⎡ ⎢⎢⎢ r11 ⎢⎢ λi ui = ⎢⎢⎢⎢ r21 ⎢⎣ wr31. r12 r22 wr32. r13 r23 wr33. tx ty wtz. ⎤ ⎥⎥⎥ ⎥⎥⎥ ⎥⎥⎥ Xi . ⎥⎦. (3). In our problems we assume two different situations. In the first situation we assume the linear pinhole model without radial distortion in which the image points are represented by the homogeneous coordinates of the form ui = [ui , vi , 1] .. (4). In the second problem we assume that the image points are affected by some amount of radial distortion. Here we model the radial distortion by the one-parameter division model proposed by Fitzgibbon [12]. This model is given by formula

(4). (5) pu ∼ pd / 1 + krd2 , where k is the distortion parameter, pu = [uu , vu , 1] and pd = [ud , vd , 1] , are the corresponding undistorted and distorted image points, and rd is the radius of pd w.r.t. the distortion center. We assume that the distortion center is in the center of the image. Therefore rd2 = u2d + v2d and since scale is not important for homogeneous coordinates, we can express undistorted point as

(5) . (6) ui = ui , vi , 1 + k u2i + v2i We can eliminate the scalar values λi from the projection Eq. (3) by multiplying it with the skew symmetric matrix [ui ]× . Since [ui ]× ui = 0 we obtain the matrix equation ⎤ ⎡ ⎢⎢⎢ 0 −1 − k ri2 vi ⎥⎥⎥ ⎥⎥ ⎢⎢⎢ ⎢⎢⎢ 1 + k ri2 0 −ui ⎥⎥⎥⎥ ⎥⎦ ⎢⎣ −vi ui 0 ⎡ ⎤ ⎡ ⎤ ⎢ xi ⎥⎥ ⎥⎥ ⎢⎢⎢ r11 r12 r13 t x ⎥⎥⎥ ⎢⎢⎢⎢⎢ ⎢⎢⎢ ⎥⎥ ⎢ yi ⎥⎥⎥⎥ ⎥⎥⎥ = 0 ⎢⎢⎢ r21 (7) r22 r23 ty ⎥⎥⎥⎥ ⎢⎢⎢⎢⎢ ⎣⎢ ⎦⎥ ⎢ zi ⎥⎥⎥⎥ wr31 wr32 wr33 wtz ⎢⎢⎣ ⎦ 1 for Xi = xi , yi , zi , 1 . This matrix equation results in three polynomial equations from which only two are linearly independent. This is caused by the fact that the skew symmetric matrix [ui ]× has rank two. In the case of the image points not affected by the radial distortion, i.e., when k = 0, the projection Eq. (7) gives us for each point correspondence two linear homogeneous equations in 12 elements of the projection matrix P. For N 2D-to-3D point correspondences these equations can be written as Mp = 0, where M is a 2N × 12 coefficient matrix and p is the vector consisting of 12 elements of the projection matrix P. Therefore, the projection matrix can be written as a linear combination of the 12 − 2N null space basis vectors Pi of the matrix M P=. 12−2N . αi Pi ,. (8). i=1. where αi are unknown parameters from which one can be set to 1. In this way the projection matrix P can be parameterized using 11 − 2N unknowns. This parameterization was for example used. c 2012 Information Processing Society of Japan . in Ref. [26] for solving absolute pose problem for camera with unknown focal length and works only for non-planar scenes. Unfortunately this parameterization cannot be used in the case of image points affected by the radial distortion Eq. (6). Therefore, we will here provide two different parameterizations of the projection matrix P which are applicable also to image points affected by the radial distortion. Both parameterizations are very similar, the first one works for non-planar scenes and the second for the planar ones. 2.1. Absolute Pose for a Camera with Unknown Focal Length and Radial Distortion for Non-planar Scene Let us denote the elements of the projection matrix P as pi j , where pi j is the element from the ith row and jth column of the matrix P. The equation corresponding to the third row of the skew symmetric matrix on the left in Eq. (7) can be then written as −vi (p11 xi + p12 yi + p13 zi + p14 ) +ui (p21 xi + p22 yi + p23 zi + p24 ) = 0.. (9). This is a homogeneous linear equation in eight unknowns p11 , p12 , p13 , p14 , p21 , p22 , p23 and p24 . Since we have four 2Dto-3D point correspondences we have four such equations. These four equations can be rewritten in the matrix form M v = 0,. (10). where M is a 4 × 8 coefficient matrix and v = p11 , p12 , p13 , p14 , p21 , p22 , p23 , p24 is a 8 × 1 vector of unknowns. Therefore we can write our eight unknowns in v as a linear combination of the four null space basis vectors ni of the matrix M (10) v=. 4 . αi ni ,. (11). i=1. where αi are new unknowns from which one can be set to one, e.g., α4 = 1. In this way we obtain a parametrization of the first two rows of the projection matrix P with three unknowns α1 , α2 and α3 . To parametrize the third row of the projection matrix P we use one from the remaining two equations from the projection Eq. (7). When ui = 0 we use the equation corresponding to the first row of Eq. (7) and when vi = 0 the equation corresponding to the second row. In all remaining situations, which are most common, we can select the equation corresponding to ui or vi with larger absolute value. Let us assume that this is equation corresponding to the second row of Eq. (7). This equation has the form.

(6) 1 + k ri2 (p11 xi + p12 yi + p13 zi + p14 ) −ui (p31 xi + p32 yi + p33 zi + p34 ) = 0.. (12). Equation (12) contains elements p31 , p32 , p33 and p34 from the third row of the projection matrix and elements p11 , p12 , p13 and p14 from the first row of P which are already parametrized with α1 , α2 and α3 . We again have four equations of the form Eq. (12). Using Eq. (11) we can rewrite these equations as A p31 , p32 , p33 , p34 = B [α1 , α2 , α3 , k α1 , k α2 , k α3 , k, 1] , (13). 80.

(7) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). where A and B are coefficient matrices, A of size 4 × 4 and B of size 4 × 8. If the matrix A has full rank, i.e., points X1 , X2 , X3 and X4 are not coplanar, we can write p31 , p32 , p33 , p34 = A−1 B [α1 , α2 , α3 , k α1 , k α2 , k α3 , k, 1] . (14) This gives us a parametrization of the third row of the projection matrix P with four unknowns, α1 , α2 , α3 and k. Together with the parametrization of the first two rows Eq. (11) we obtain a parametrization of the whole projection matrix P with these four unknowns α1 , α2 , α3 and k. With this parameterization of the projection matrix P in hand we can now solve the absolute pose problem for the camera with unknown focal length and radial distortion. To solve this problem we use constraints that the three rows of the 3 × 3 submatrix of the projection matrix P are perpendicular and that the first two rows of this submatrix have the same norm. These constraints results from the fact that the 3 × 3 submatrix of the projection matrix P has the form K R, where R is a rotation matrix. In this way we obtain four equations in four unknowns α1 , α2 , α3 , k (two from them quadratic and two cubic) p11 p21 + p12 p22 + p13 p23 = 0,. (15). p31 p11 + p32 p12 + p33 p13 = 0,. (16). p31 p21 + p32 p22 + p33 p23 = 0,. (17). p211 + p212 + p213 − p221 − p222 − p223 = 0.. (18). To solve these four polynomial equations in four unknowns we use the Gröbner basis method [10]. This method was recently used to solve several minimal computer vision problems [6], [16], [25] and the automatic generator of the Gröbner basis solvers is available online [17]. For more details about this Gröbner basis method for solving systems of polynomial equations see for example [8], [10], [17]. Using this automatic generator we have obtained solver for our equations consisting of one G-J elimination of a 136 × 152 matrix and the eigenvalue computation of a 16 × 16 matrix. This solver gives us 16 solutions for α1 , α2 , α3 and k from which we can create the projection matrix P using Eqs. (11) and (14). Finally we can use the constraint that the squared norm of the first row of the 3 × 3 submatrix of the projection matrix P multiplied by w2 is equal to the squared norm of the third row of this submatrix w2 p211 + w2 p212 + w2 p213 − p231 − p232 − p233 = 0.. (19). This is a quadratic equation in w = 1/ f from which the positive root give us a solution for the focal length f . 2.2. Absolute Pose for a Camera with Unknown Focal Length and Radial Distortion for Planar Scene In the planar case, i.e., when all four 3D points are on a plane, we can’t directly use the parametrization presented in the Section 2.1 since it leads to a degenerated system. However, we can use a similar parametrization. Without loss of generality let us assume that all four 3D points. c 2012 Information Processing Society of Japan . Xi have the third coordinate zi = 0. In this case Eq. (9) corresponding to the third row of the matrix Eq. (7) can be written as −vi (p11 xi + p12 yi + p14 ) + ui (p21 xi + p22 yi + p24 ) = 0. (20) This is a homogeneous linear equation in only six unknowns p11 , p12 , p14 , p21 , p22 and p24 . Since we have four 2D-to-3D point correspondences we have four such equations which can be again rewritten in the matrix form M v = 0, where M is a 4 × 6 coefficient matrix and v = p11 , p12 , p14 , p21 , p22 , p24 is a 6 × 1 vector of unknowns. Therefore, in this case we can write our unknowns in v as a linear combination of the two null space basis vectors n1 and n2 of the matrix M v = β1 n1 + n2 ,. (21). where β1 is a new unknown. Using Eq. (21) we obtain a parametrization of the first two rows of the matrix P (without the third column) with one unknown β1 . To parametrize the third row we again use one from the remaining two equations from the projection Eq. (7). Let’s again consider the equation corresponding to the second row of the projection Eq. (7). In this planar case has this equation the form.

(8) 1 + k ri2 (p11 xi + p12 yi + p14 ) − ui (p31 xi + p32 yi + p34 ) = 0. (22) This equation contains elements from the first row of P which are already parametrized with β1 and three elements p31 , p32 and p33 from the third row which we want to parametrize. We again have four equations of the form Eq. (22). However, we will now use only three of them, i.e., equations corresponding to the first three 2D-to-3D point correspondences. Using Eq. (21) we can rewrite these three equations as (23) C p31 , p32 , p34 = D β1 , k β1 , k, 1 , where C and D are coefficient matrices, C of size 3 × 3 and B of size 3 × 4. If the matrix C has full rank, i.e., points X1 , X2 and X3 are not collinear, we can write p31 , p32 , p34 = C−1 D β1 , k β1 , k, 1 . (24) In this way we obtain parametrization of the third row (without the third column) of the projection matrix P with two unknowns, β1 and k. Together with Eq. (21) we have parametrized the first, second and fourth column of the projection matrix P with β1 and k. In this case we can not use the constraints Eqs. (15)–(18) on the rows of the projection matrix. It is because we do not have information about the third column of P. However, we can use constraints that the columns of the rotation matrix are perpendicular and of the same norm. These constraints in this case gives us two equations of degree four in three unknowns β1 , k and w = 1/ f w p11 w p12 + w p21 w p22 + p31 p32 = 0, w. 2. p211. +w. 2. p221. +. p231. −w. 2. p212. −w. 2. p222. (25) −. p232. = 0.. (26). 81.

(9) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). Moreover, we have one more equation of the form Eq. (22), for the fourth 2D-3D point correspondence, which was not used in Eq. (23). This equation has the form.

(10) 1 + k r42 (p11 x4 + p12 y4 + p14 ) −u4 (p31 x4 + p32 y4 + p34 ) = 0. (27). and after using parametrization Eqs. (21) and (24) of the unknowns p11 , p12 , p14 , p31 , p32 and p34 it results in one quadratic equation in two unknowns β1 and k. Equation (27) together with Eqs. (25) and (26) give us three equations in three unknowns β1 , k and w = 1/ f which can be again solved using Gröbner basis method [10] and automatic generator [17]. In this case the resulting solver results in one G-J elimination of relatively small 12 × 18 matrix and gives up to 6 real solutions to β1 , k and f . The third column of the projection matrix P can be finally easily obtained from its structure and the properties that the columns of the rotation matrix are perpendicular and of the same norm. 2.3. Absolute Pose for a Camera with Unknown Focal Length for Non-planar Scene In the case of the image points not affected by the radial distortion, i.e., image points with homogeneous coordinates ui = [ui , vi , 1] the parameterization of the projection matrix P presented in Section 2.1 results in parametrization of P with only three unknowns α1 , α2 and α3 . It is because in this case in Eq. (12) the radial distortion parameter k = 0 and therefore the elements p31 , p32 , p33 and p34 from the third row of the projection matrix can be directly parametrized with α1 , α2 and α3 as the remaining two rows of P. In fact this parametrization of P is almost equivalent to the direct parametrization of the whole projection matrix P using four null space basis vectors and setting α4 = 1 in Eq. (8). Using constraints Eqs. (15)–(18), i.e., that three rows of the 3 × 3 submatrix of the projection matrix P are perpendicular and that the first two rows of this submatrix have the same norm, we in this case obtain four quadratic equations in three unknowns α1 , α2 and α3 . Since this focal length problem is not minimal we have one more equation than unknowns. Therefore, we can select three from these four Eqs. (15)–(18) and use the fourth one to select the best root among 8 roots of these three equations. We can again solve these equations using Gröbner basis method [10] and the automatic generator [17]. This method results in the solver consisting of G-J elimination of one 23 × 31 matrix and computation of eigenvalues of an 8 × 8 matrix. Using this solver we obtain up to eight solutions for α1 , α2 and α3 . Finally using constraint Eq. (19) we obtain up to 8 solutions for the focal length f . Note that this solver is again smaller and give less solutions than the solver for the general case presented in Ref. [6]. 2.4. Absolute Pose for a Camera with Unknown Focal Length for Planar Scene Solution to the planar case of the absolute pose problem for a camera with unknown focal length is very simple and can be. c 2012 Information Processing Society of Japan . formulated as a closed-form solution. We again assume that all four 3D points Xi have the third coordinate zi = 0. In this case k = 0 and therefore the projection Eq. (7) results in eight linearly independent homogeneous equations in nine unknowns, the elements of the first, the second and the fourth column of the projection matrix P. Such equations can be easily solved and these columns of the projection matrix computed. Note that in fact we compute homography H between 2D points and 3D points in the plane [14]. Solutions to the focal length and for the third column of the projection matrix P can be then easily obtain from the structure of the projection matrix and the properties that the columns of the rotation matrix are perpendicular and of the same norm i.e., ⎡ ⎤ ⎡ ⎤ ⎢⎢⎢ h11 h12 h13 ⎥⎥⎥ ⎢⎢⎢ r11 r12 t1 ⎥⎥⎥ ⎢⎢⎢ ⎥⎥⎥ ⎢⎢⎢ ⎥⎥ (28) H = ⎢⎢⎢ h21 h22 h23 ⎥⎥⎥ = ⎢⎢⎢ r21 r22 t2 ⎥⎥⎥⎥ . ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦ h31 h32 h33 f r31 f r32 t3 Finally, f = ((h11 h12 + h21 h22 )/(h31 h32 ))1/2 and the third column of rotation matrix r3∗ = r1∗ × r2∗ .. 3. Experiments In this section we compare our new solutions (non-planar and planar) presented in Sections 2.1, 2.2, 2.3 and 2.4 with general solutions to the two presented problems proposed in Refs. [16] and [6]. We compare these solutions on synthetically generated scenes and show that all solvers return comparable results. Then we study the performance of new specialized solvers on near-planar scenes and show that planar and non-planar solvers can be for both problems easily joined into general solvers, which give comparable or better results then the existing general solvers [6], [16] for all scenes, including near-planar ones. Finally we show the performance of new joined general solvers on real datasets and compare them with the general solvers from Refs. [16] and [6]. 3.1 Synthetic Datasets In the following synthetic experiments we use synthetically generated ground-truth 3D scenes. These scenes were generated using 3D points randomly distributed on a plane or in a 3D cube depending on the testing configuration. Each 3D point was projected by a camera with random feasible orientation and position and random or fixed focal length. For experiments with radially distorted camera we distorted image points using the division model Eq. (5) to generate noiseless distorted points. Finally, Gaussian noise with standard deviation σ was added to the distorted image points assuming a 1,000 × 1,000 pixel image. 3.1.1 Numerical Stability In the first experiment we have studied the behavior of all presented solvers on noise free data to check their numerical stability. In this experiment 1,500 random scenes and feasible camera poses were generated. The radial distortion parameter was randomly drawn from the interval k ∈ [−0.45, 0] and the focal length from the interval f ∈ [0.5, 2.5]. Figure 1 shows results of our new non-planar solvers on nonplanar scenes (Top) and of the planar solvers on planar ones (Bot-. 82.

(11) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). Fig. 1. Log10 relative error of the focal length f and Log10 absolute error of the radial distortion parameter k obtained by selecting the real root closest to the ground truth value for the non-planar solver (Top) and the planar solver (Bottom). Right column shows the results for a camera with unknown focal length, left and middle column for radially distorted camera.. Fig. 2 Error of rotation (Top left), translation (Top middle), focal length estimates (Bottom left) and radial distortion estimates (Bottom middle) in the presence of noise for the p4p+f+k non-planar solver (Red) and the general solver from Ref. [16] (Blue). The right column shows results for the focal length estimates of our new planar (Bottom right) and non-planar (Top right) p4p+f solvers (Red), compared to Ref. [6] (Blue).. tom). In both cases we compare our solvers (Red) with the general solvers from Refs. [16] and [6] (Blue). The left and the middle plots show the results for the camera with unknown radial distortion and unknown focal length. The plots show the log10 relative error of the focal length f obtained by selecting the real root closest to the ground truth value in the left and the log10 absolute error of the radial distortion parameter in the middle. The right plots show the log10 relative focal length errors for the camera without radial distortion and with unknown focal length. As it can be seen all new solvers are a little bit numerically more stable that the general algorithms from Refs. [16] and [6]. In the case of the radial distortion problem, a small difference is also in the number of results with error greater than 10−5 . In our new solutions such results occur in about 1% of cases while in the general solver from Ref. [16] in about 4.5%. This difference in the number of “failures” will be also visible in the near-planar and real experiments. Note that the general solution from Ref. [16] uses techniques for improving numerical stability of Gröbner basis solvers based on changing basis and QR decomposition [8] while our solutions use the standard Gröbner basis method [10] without these improvements. We therefore believe that such techniques can further improve numerical stability of our solvers. It is partially vis-. c 2012 Information Processing Society of Japan . ible also from Fig. 1 where the results denoted as non-planar-best and planar-best (Dashed black) correspond to the most precise results obtained by our solvers by permuting the input points. This permutation of input data is in some sense similar to the permutation of columns of a coefficient matrix in the Gröbner basis solver and therefore it is also similar to the changing of basis used in Ref. [16]. However, these techniques for improving numerical stability [8] are somewhat expensive and, as it will be shown in real experiments, we do not have to use them in our solvers. 3.1.2 Noise Test In the next experiment we have tested behavior of our nonplanar and planar solvers in the presence of noise added to image points. We again compare all presented specialized solvers with the general solvers from Refs. [16] and [6]. For each noise level, from 0.0 to 2 pixels, 2,000 estimates for random scenes and camera positions, focal length fgt = 1.5 and radial distortion kgt = −0.2, were made. We did not apply the radial distortion for the second problem with the unknown focal length. Results in Fig. 2 are represented by the Matlab boxplot function which shows values 25% to 75% quantile as a box with horizontal line at median. The crosses show data beyond 1.5 times the interquartile range. In this case the rotation error (Top left) was. 83.

(12) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). Fig. 3. Results of experiment on the near-planar scene for focal and radial problem (Left) and only focal length problem (Right).. measured as the rotation angle in the angle-axis representation of the relative rotation RR−1 gt and the translation error (Top middle) as the angle between ground-truth and estimated translation vector. For the radial distortion problem our specialized (p4p+f+k) solvers solve the same algebraical problem and have similar numerical stability as the general solver from Ref. [16]. Thus these solvers should behave similarly also in the presence of noise too. This is visible also from Fig. 2 which shows results for our nonplanar solver (Red) and the general solver from Ref. [16] (Blue). Similar results were obtained also for our planar solver and therefore we are not showing them here. Different results were obtained for the problem with unknown focal length only. This is because this problem itself is overconstrained and each solver uses slightly different approach to find a solution. This is visible in Fig. 2, were the new (p4p+f) solvers give a little bit worse results for larger noise levels than the general solver [6]. However, our real experiments show that this noise sensitivity does not cause any problems in practice. 3.1.3 Computational Complexity The most significant improvement of our new specialized solvers over the general solver from Ref. [16] is in speedup, since our solvers are about 40 × (non-planar) and 160 × (planar) faster than the general solver [16]. This is caused by the fact that the new solvers results in much simpler systems of polynomial equations and therefore also in much simpler and practical solvers. While the general solver [16] requires to perform LU decomposition of a 1,134 × 720 matrix, QR decomposition of a 56 × 56 matrix and eigenvalue computations of a 24 × 24 matrix, our non-planar solver requires only one G-J elimination of a 136 × 152 matrix and eigenvalue computations of a 16 × 16 matrix. The planar solver is even simpler and requires one G-J elimination of only 12 × 18 matrix. Moreover, our two solvers return less solutions, 16 and 6 compared to 24 in Ref. [16]. This speed improvement is visible also for the absolute pose problem with an unknown focal length. The new solver for planar scenes is about 40 × faster comparing to the general solver [6] since it calculates just a homography between 3D and 2D correspondences and extracts the focal length in a closed form. The solver for non-planar scenes performs the G-J elimination of a 26 × 34 matrix and solves 8 × 8 eigenvalue problem comparing to the G-J elimination of a 78 × 88 matrix and the eigenvalue problem of a 10 × 10 matrix in Ref. [6]. The new solver is in this case about 3 × faster than Ref. [6]. All these facts are important in RANSAC and real applications in which the general solver from Ref. [16] was impractical due to. c 2012 Information Processing Society of Japan . its speed. 3.1.4 Near Planar Test In this experiment we have studied the behavior of new nonplanar and planar solvers and the general solvers [6], [16] on planar, general and non-planar scenes. We have focused primarily to near-planar scenes in order to show how to build fast joined general solver by composing our new specialized solvers. For this purpose we created a synthetic scene where we could control scene planarity by a scalar value a. Given planarity a we have constructed the scene as follows: Assume that we have a synthetic scene generated as described above in Section 3.1. Now let’s denote by ρ a plane created from the first three non-collinear 3D points and by s a normalizing scale. We calculate the scale s as the distance of the furthest point from these three points from their center of gravity CG. Then, we randomly generate the fourth point at the distance s a from the plane ρ and such that it is not further than s from the center of gravity CG. Note that for planarity a = 0 we get four points on the plane and for a = 1 we obtain a well defined non-planar four-tuple of 3D points. We did not contaminate the image points corresponding to these four 3D points by noise. We added noise with deviation of 0.5 pixels only to the remaining image points. This way we created a scene with one uncontaminated four-tuple of 2D-to-3D correspondences for which we can control planarity by a scalar value a. Next, for each given planarity value a we created a scene and calculated camera pose from the four-tuple of correspondences using the planar, the non-planar and the general solvers [6], [16]. Note that this four-tuple is not affected by noise and hence the only deviation from the ground truth solutions comes from the numerical instability of the solvers itself. To evaluate the impact of the instability on the solution we used the estimated camera, the focal length and in the case of the radial distortion problem also the estimated radial distortion to project all 3D points to the image plane. Then we measured how many points were projected closer than one pixel to its corresponding 2D image — we call them inliers. Figure 3 shows results for all examined solvers i.e., the focal length and radial distortion problem on the left and the focal length problem on the right with plots for the planar (Red), the non-planar (Cyan) and the general (Blue) solver together with the results obtained from the non-planar solver obtained by permuting the input points (Magenta). The result for permuted inputs for the focal length problem is omitted since it would be overlapped by the results of the non-planar solver. Green plot shows results for the solver which executes both planar and non-planar solvers. 84.

(13) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). Fig. 4. Example of an input image (Left), RANSAC sampling history for the image without (Middle) and with (Right) strong radial distortion.. and select the better solution. In this experiment we created 100 random scenes for each given planarity value a and evaluated all algorithms. Mean value is displayed. Interesting points in this Fig. 3 are the intersections of the planar and the general solvers and also the general and the nonplanar solvers. Ideally one would combine planar, general and non-planar solvers to gain maximal precision at maximal speed. For the radial distortion problem, it is suitable to use the planar solver when the planarity is smaller than 10−4.2 and the nonplanar solver when the planarity is greater than 10−2.8 . The general solver from Ref. [16] could be used in the interval a ∈ 10−4.2 , 10−2.8 . However executing both the planar and the non-planar solver simultaneously and select the better result, Fig. 3 (Green), performs surprisingly more reliably than the general solver [16]. Also its computation time is much smaller, comparing to the general solver. For the focal length problem, we get a much smaller gap, and practically one can use the non-planar solver immediately once the scene is not completely planar. For our experiments we used the threshold a = 10−7 since the planar solver is stable and much faster for near-planar scenes comparing to the non-planar solver. Interesting fact is, that in this case there is not a gap where we need to use the general solver to gain a better precision. 3.2 Real Data For the real data experiment we created a simple scene with two dominant planar structures (Fig. 4 left). Our intention was to show the behavior of our new joined planar/non-planar solvers in a real scene with sampling on the plane, near the plane and off the plane points. We used the new joined planar and non-planar solvers with splitting planarity threshold a = 10−3.2 for the radial distortion problem and a = 10−7 for the focal problem. First, we captured around 20 photos with the cell phone and the digital camera to get images with different distortions. Then we used Photo Tourism-like [24] pipeline to create the 3D reconstruction, 2D-3D correspondences and to get the ground truth reference for the camera poses. Note that 2D correspondences are positions of detected feature points and not ideal projections of 3D points and hence there is a natural noise. Since all our 2D-3D correspondences coming from the reconstruction pipeline are inliers we randomly modify 50% of 2D measurements to get 50% of outliers. We plugged all examined solvers, the calibrated P3P solver [11] (Cyan), the general P4P+f solver for the camera with unknown focal length [6] (Magenta), the general P4P+f+k solver. c 2012 Information Processing Society of Japan . from Ref. [16] (Blue) and our new joined P4P+f+k general solver (Red) and the new joined P4P+f general solver (green) to the locally optimized RANSAC estimator [9]. Then, we calculated the camera pose of each camera using given 2D-3D correspondences. Note, that we did not calibrate radial distortion before calling P3P and P4P+f solvers. Since many of images have strong radial distortion one cannot expect good results without correcting the distortion. On the other hand these results show that radial distortion solvers are useful in practice. Figure 4 (Right) shows difference in RANSAC convergence when using solvers with and without the radial distortion estimation and Fig. 4 (Middle) results for not so distorted image. Notice that the new solver for the pose problem with unknown focal length outperforms existing general solver for P4P+f problem [6]. This is because the new solver does not produce strictly orthogonal rotation matrix and the skew and anisotropic scale in the matrix allows fitting to more image data. On the other hand one need to use orthogonalization method to get proper rotation matrix after all inliers are found. This is not a problem since main purpose of RANSAC is to find a good set of inliers and usually a different solver for fitting to all good data is used once all inliers are found.. 4. Conclusion We have proposed new efficient solvers to the absolute pose problem for the camera with unknown focal length and unknown focal length plus radial distortion from four 2D-to-3D point correspondences. The presented solvers are obtained by joining two specialized solvers, one for non-planar scenes and one for planar ones. By decomposing the problem into these two cases we obtain simpler and more efficient solvers than previously known general solvers [6], [16]. We have demonstrated in synthetic and real experiments that new solvers give Eq. (1) comparable or better results for planar as well as non-planar scenes, and Eq. (2) are significantly faster than general solver from Ref. [16] and from Ref. [6]. Moreover, even executing both the planar and the non-planar solver simultaneously and selecting the better result is faster and numerically more stable comparing to the general solver [16] for all scenes including near-planar scenes. Matlab source codes of the presented solvers are available online at http://cmp.felk.cvut.cz/minimal/. Acknowledgments This work has been supported by EC project PS-SME-AT-844 De-Montes and CTU SGS12/191/ OHK3/3T/13 grant.. 85.

(14) IPSJ Transactions on Computer Vision and Applications Vol.4 78–86 (May 2012). References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28]. 2D3.: Boujou, available from www.2d3.com. Aziz, Y.A. and Karara, H.: Direct linear transformation from comparator to object space coordinates in close-range photogrammetry, ASP Symp. Close-Range Photogrammetry, Urbana, Illinois, p.118 (1971). Abidi, M.A. and Chandra, T.: A new efficient and direct solution for pose estimation using quadrangular targets: Algorithm and evaluation, IEEE PAMI, Vol.17, No.5 (1995). Ameller, M.A., Quan, M. and Triggs, L.: Camera pose revisited – new linear algorithms, ECCV 2000 (2000). Ballan, L. et al: Unstructured Video-Based Rendering: Interactive Exploration of Casually Captured Videos, Proc. SIGGRAPH (2010). Bujnak, M., Kukelova, Z. and Pajdla, T.: A general solution to the P4P problem for camera with unknown focal length, CVPR 2008 (2008). Bujnak, M., Kukelova, Z. and Pajdla, T.: New efficient solution to the absolute pose problem for camera with unknown focal length and radial distortion, ACCV 2010, Queenstown, NZ, November 8-12 (2010). Byröd, M., Josephson, K. and Åström, K.: Fast and Stable Polynomial Equation Solving and Its Application to Computer Vision, IJCV (2009). Chum, O., Matas, J. and Kittler, J.: Locally optimized RANSAC, Proc. DAGM, pp.236–243 (2003). Cox, D., Little, J. and O’Shea, D.: Using Algebraic Geometry, Second edition, Vol.185, Springer-Verlag (2005). Fischler, M.A. and Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM (1981). Fitzgibbon, A.: Simultaneous linear estimation of multiple view geometry and lens distortion, CVPR 2001, pp.125–132 (2001). Grunert, J.A.: Das pothenot’sche problem, in erweiterter gestalt; nebst bemerkungen u¨ ber seine anwendung in der geodäsie, Archiv der Mathematik und Physik (1841). Hartley, R. and Zisserman, A.: Multiple View Geometry in Computer Vision, Cambridge University Press (2003). Irschara, A., Zach, C., Frahm, J.M. and Bischof, H.: From SfM Point Clouds to Fast Location Recognition, CVPR 2009, pp.2599– 2606 (2009). Josephson, K., Byröd, M. and Aström, K.: Pose Estimation with Radial Distortion and Unknown Focal Length, Proc. CVPR (2009). Kukelova, Z., Bujnak, M. and Pajdla, T.: Automatic Generator of Minimal Problem Solvers, ECCV 2008, Marseille, October 12-18 (2008). Leibe, B., Cornelis, N., Cornelis, K. and Gool, L.J.V.: Dynamic 3d scene analysis from a moving vehicle, CVPR (2007). Leibe, B., Schindler, K. and Gool, L.J.V.: Coupled detection and trajectory estimation for multi-object, ICCV (2007). Moreno-Noguer, F., Lepetit, V. and Fua, P.: Accurate non-iterative o(n) solution to the pnp problems, ICCV (2007). Quan, L. and Lan, Z.D.: Linear n-point camera pose determination, IEEE PAMI, Vol.21, No.8, pp.774–780 (1999). Reid, G., Tang, J. and Zhi, L.: A complete symbolic-numeric linear method for camera pose determination, ISSAC 2003, pp.215–223, ACM, NY, USA (2003). Slama, C.: Manual of Photogrammetry, American Society of Photogrammetry and Remote Sensing, Falls Church, Virginia, USA (1980). Snavely, N., Seitz, S.M. and Szeliski, R.: Photo tourism: Exploring photo collections in 3D, ACM Trans. Graphics (SIGGRAPH Proceedings) (2006). Stewénius, H., Engels, C. and Nistér, D.: Recent developments on direct relative orientation, ISPRS Journal of Photogrammetry and Remote Sensing (2006). Triggs, B.: Camera pose and calibration from 4 or 5 known 3d points, Proc. 7th Int. Conf. Computer Vision, IEEE Computer Society Press (1999). Wu, Y. and Hu, Z.: Pnp problem revisited, Journal of Mathematical Imaging and Vision, Vol.24, No.1, pp.131–141 (2006). Zhi, L. and Tang, J.: A complete linear 4-point algorithm for camera pose determination, AMSS, Academia Sinica, Vol.21 (Dec. 2002).. c 2012 Information Processing Society of Japan . Martin Bujnak is a Ph.D. student at Center for Machine Perception by Czech Technical University in Prague. He received his master degree in computer graphics and parallel processing at the Faculty of Mathematics, Physics and Informatics, Comenius university in Bratislava in 2005. In his research he focuses on camera calibration, pose and structure from motion problems.. Zuzana Kukelova is a Ph.D. student at Center for Machine Perception by Czech Technical University in Prague. She received her master degree in mathematics at the Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava in 2005. She is interested in algebraic geometry and computer vision, where she focuses on minimal problems and methods for solving systems of polynomial equations.. Tomas Pajdla received his M.Sc. and Ph.D. degrees from the Czech Technical University in Prague. He works in geometry, algebra of computer vision and robotics with emphasis on non-classical cameras, 3D reconstruction and industrial vision. He contributed to introducing epipolar geometry of panoramic and noncentral cameras, generalized epipolar geometries and to developing solvers for minimal problems in structure from motion.. (Communicated by Reinhard Klette). 86.

(15)