3468

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014

Vision-Based Pose Estimation From Points With Unknown Correspondences Haoyin Zhou, Tao Zhang, Senior Member, IEEE, and Weining Lu

Abstract— Pose estimation from points with unknown correspondences currently is still a difficult problem in the field of computer vision. To solve this problem, the SoftSI algorithm is proposed, which can simultaneously obtain pose and correspondences. The SoftSI algorithm is based on the combination of the proposed PnP algorithm (the SI algorithm) and two singular value decomposition (SVD)-based shape description theorems. Other main contributions of this paper are: 1) two SVD-based shape description theorems are proposed; 2) by analyzing the calculation process of the SI algorithm, the method to avoid pose ambiguity is proposed; and 3) an acceleration method to quickly eliminate bad initial values for the SoftSI algorithm is proposed. The simulation results show that the SI algorithm is accurate while the SoftSI algorithm is fast, robust to noise, and has large convergence radius. Index Terms— Pose estimation, correspondence determination, numerical optimization.

I. I NTRODUCTION

C

AMERA pose estimation from a known object and its single image is a fundamental problem in the field of computer vision. When the correspondences between image points and object points are known, this problem is known as the Perspective-n-Point (PnP) problem which has been studied for many years and many effective approaches have been put forward [1]–[6]. Pose estimation is the subsequent process of the feature point extraction. Although there are many feature point extraction approaches, such as SIFT [32] and SURF [33], similar points are difficult to be distinguished and the correspondences between image points and object points are often unknown. For example, many of the features presented in Fig. 1 can be used as navigation marks, while it is often difficult to discriminate between different points. Figure 1(a) shows an artificial mark designed for visual navigation [8]. The correspondences of its corners can be determined by the geometry of the six rectangles. Assuming the corners are replaced with LEDs for visual navigation at night, the camera will get images

Manuscript received April 7, 2013; revised December 13, 2013 and March 5, 2014; accepted June 4, 2014. Date of publication June 9, 2014; date of current version July 1, 2014. This work was supported by the National Basic Research Program of China (973 Program) under Grant 2010CB731800. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Olivier Bernard. The authors are with the Department of Automation, School of Information Science and Technology, Tsinghua University, Beijing 100084, China (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2329765

Fig. 1. The correspondence determination problem is difficult when points have similar feature. (a) a navigation mark, (b) the mark seen at night if its corners are replaced with LEDs, (c)-(f) some objects with similar feature points (blue points).

like Fig. 1(b), from which the correspondences are difficult to be obtained. Natural objects also often have many similar feature points that is difficult to be discriminated, as shown in Fig. 1(c)-(f). Until now, the design of a method that can be generally applied to estimate the camera pose when the correspondences are unknown is still a difficult problem, which is known as the simultaneous pose and correspondence problem [7]. Besides the pose and correspondences determination, this problem also takes occlusion and clutter into account [7], [16]–[29]. So far, the most effective method to solve this problem is the SoftPOSIT method [7]. However, the SoftPOSIT method is still quite slow because it needs to randomly try hundreds of different initial poses. Obviously, the consideration of occlusion and clutter makes this problem complicated. However, in many applications, occlusion and clutter can be avoided by some pre-processing steps, such as object modeling can be used to deal with occlusion and object recognition can help to eliminate clutter. They are briefly introduced below. 1) Object Modeling: Mostly, occlusion is self-occlusion which can be avoided by establishing suitable object models. Like those used in the PnP applications, a suitable object model should consist of points that can be seen at the same time. For example, the back surface of the door shown in Fig. 1(c) is often occluded by itself. However, the front surface of the door can often be seen and it is enough to model the door just with the feature points of its front surface. 2) Object Recognition: Object recognition can help to eliminate points which do not belong to the object and to avoid

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

ZHOU et al.: VISION-BASED POSE ESTIMATION

clutter. It is a necessary step before pose determination because objects with similar shape may cause confusion. For example, cube is a common shape and different cube objects may look the same from different camera poses. Therefore this paper aims to propose a fast pose estimation algorithm from non-corresponding points without occlusion and clutter, which is called the SoftSI algorithm. For this purpose, a novel PnP method is proposed at first, which is called the SI algorithm. A. Related Works on the PnP Methods The “Perspective-n-Point problem” (PnP) (n is the number of points and n ≥ 3) was proposed by Fischler [1] and has been studied for many years. With n object points and their image points, the PnP problem aims to get the rotation R ∈ SO(3) and translation T ∈ R 3 of the camera with respect to the object from one image. The correspondences between image points and object points are assumed to be known in the PnP problem. The PnP methods include linear methods and nonlinear methods. The traditional linear methods, such as DLT [9], are fast but inaccurate. The traditional nonlinear methods are to transfer the problem into an optimal problem, and then solve it by Levenberg-Marquardt (LM) method [10] which requires proper initial values. Until now, many methods have been put forward, which can be roughly classified into two types. (1) Iterative methods: Alternately calculate the camera pose from the depth of points, and calculate the depth of points from the camera pose. The pose approaches the real value in the iteration. These methods are robust to initial value [2], [4]–[6]. (2) Non-iterative methods: equations are formulated between object points and their images, in which the depths of points are eliminated, and the camera pose is estimated from the equations [3], [11]–[15]. The PnP method proposed in this paper belongs to the first type, which is based on the shape of 3D objects and is called the shape-based iteration algorithm (SI). It is the foundation of the proposed pose estimation method from non-corresponding points (SoftSI). B. Related Works on the Simultaneously Estimating Pose and Correspondence Methods When the correspondences are unknown, the problem is known as the simultaneous pose and correspondence problem [7]. The traditional methods to solve it are based on the RANSAC method [1]. These methods randomly hypothesize a small set of correspondences and use PnP method to calculate the pose [16]. The search will be terminated if heuristic criteria are satisfied [17], [18]. The RANSAC-style methods are slow with high time complexity. Some other methods are proposed based on the fact that some correspondences rarely exist [19]–[23] and they should be removed from the search space. They initially consider all possible correspondences, and then cluster the pose in the six-dimensional pose space. High-possibility clusters, where these methods assume the correct result lies, are extracted. Hence the search space is reduced.

3469

Another way to reduce the search space is based on the image feature [24]–[26]. These methods initially learn image features in different poses and store the features and poses as priori information. At run time, the feature comparison technology is applied and the associated pose is obtained. Some optimal methods that are different from the hypothesize-and-test methods are proposed too. These methods formulate and try to minimize cost functions [7], [27], [28]. Among these methods, the best one is the SoftPOSIT method [7] that has an O(MN2 ) time complexity in which M is the object points number and N is the image points number. SoftPOSIT stands out in the simultaneously estimating pose and correspondence approaches because of its accuracy and speed. SoftPOSIT tries different initial poses and succeeds when the initial pose is close to the real pose. The correspondence technique used in this paper is inspired by the effective correspondence technique [34] also used in SoftPOSIT. Besides, another kind of methods is based on Kalman filter [29]. This kind of methods assumes that the pose priori information is available. Several initial pose guesses are defined with an associated uncertainty, and high-possibility points are used to verify the pose. The calculation speed is comparable to that of SoftPOSIT. The consideration of occlusion and clutter makes the simultaneous pose and correspondence problem complicated. As mentioned in subsection A, in many cases, occlusion and clutter can be eliminated, but the correspondences between image points and object points are difficult to be obtained. Hence, this paper focuses on this problem: pose determination with non-corresponding points without occlusion and clutter. The proposed algorithm is called the SoftSI algorithm. C. Outline The rest of the paper is organized as follows: Section II discusses the camera model and two points shape description theorems which are useful in correspondence determination. Section III proposes the PnP method called the shape-based iteration algorithm (SI) and provides the proof of its convergence. Section IV explains the camera pose estimation method from non-corresponding points (SoftSI) in detail. Simulation and experimental results are presented in Section V. II. P OINTS S HAPE D ESCRIPTION AND P ROBLEM S TATEMENT A. Pinhole Camera Model Without taking into account the distortion, the perspective projection equations are employed to describe the pinhole camera model, u= f

zc yc , v = f xc xc

(1)

where f is the camera focal length, (u, v)T is the image frame coordinate, and (x c , y c , z c )T is the camera frame coordinate. According to (1), X = λx

(2)

3470

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014

where x = ( f , u, v)T is the homogenous coordinate, X = (x c , y c , z c )T is the camera coordinate and λ = x c / f is the depth. Equation (2) indicates that an object point lies on the line of sight of its image point. The pose estimation problem aims to find out the rotation and translation between the reference frame A and the camera frame B. The coordinates of point P i in frame A and frame B are X iA and X iB . The relationship between X iA and X iB is X iB = RT X iA + T for i = 1, 2, …, N,

(3)

where R is the 3 × 3 rotation matrix and T is the 3 × 1 translation vector between frame A and frame B. Therefore, X iB −X Bj = RT (X iA − X Aj ) for i, j = 1, 2, …, N and i = j. (4) According to the following (5), two matrices W A and W B are constructed with the coordinates of points in frame A and B. They satisfy W A R = W B . ⎤ ⎤ ⎡ A ⎡ B (X 1 − X 0A )T (X 1 − X 0B )T ⎦ ⎦ (5) ... ... and W B = ⎣ WA = ⎣ (X NA − X 0A )T N×3 (X NB − X 0B )T N×3 where X 0A is the centroid of {X 1A . . . X NA } and X 0B is the centroid of {X 1B , . . . , X NB } . Matrices W A and W B are translation invariant. In other words, they are only affected by rotation. Hence, if W A and W B are both known, according to (5), R can be obtained by finding a rotation matrix R which minimizes ||W A R-W B || [30]. Unfortunately, depth λi is not able to be obtained directly with a single camera. According to (2) and (5), X iB and W B are unknown. B. Points Shape Description To solve pose estimation problem, two shape description theorems are proposed based on the SVD decomposition, W A = U A SA V TA , W B = U B SB V TB

(6)

When W A and W B are N × 3 matrices, only the first three columns of U A and U B are useful. Theorem 1: if W A R = W B , R ∈ SO(3), then there exist U B , SB and V TB , st.U A = U B , SA = SB , V TA R = V TB . Theorem 2: if W A R = M W B , R ∈ SO(3), M is a correspondence 0-1 matrix [34], then there exist U B , SB and V TB , st.U A = MU B , SA = SB , V TA R = V TB . The proof of the two theorems is given in Appendix I. The two shape description theorems describe the shape of a set of 3D points in two frames (frame A and frame B). It assumes that the two frames have the same points. Theorem 1 suggests that the shape of a point set is described by composition U and S, and the rotation is described by composition V. Since W A and W B are translation invariant, U A SA and U B SB are both rotation invariant and translation invariant. Theorem 2 suggests that U A SA = MU B SB . Hence, when the correspondences between frame A and frame B are unknown, the shape of a point set can be described by MUS.

Fig. 2. 2D space and 1D camera. The algorithm converges to the correct pose (a) and the wrong pose (b). (c) shows the algorithm with coplanar object, which compares to the collinear object in 2D space.

C. Problem Statement This paper aims to obtain the camera pose and correspondences with a set of 3D object points and their image points with unknown the correspondences from a single image. The proposed algorithm is called the SoftSI algorithm, which is based on the proposed PnP algorithm (the SI algorithm) and theorem 2. III. P OSE E STIMATION F ROM C ORRESPONDING P OINTS In this section, a novel PnP algorithm called the shapebased iteration algorithm (SI) is proposed. Subsection A gives the intuitive explanation of the SI algorithm; Subsection B introduces the calculation process; Subsection C proves the global convergence of SI. In subsection D, the convergence direction of SI is discussed to avoid pose ambiguity [6]. A. Algorithm Intuition To provide an intuition we consider an example where the dimensions of space and camera are reduced to two and to one respectively (see Fig. 2). As shown in Fig. 2, 1D camera obtains images in 2D space. Object points P1 , P 2 , P 3 project to the imaging line L and obtain their images p1 , p2 , p3 . P0 is the object center point and p0 is the image center point. It is a reasonable assumption that P0 is on the line of sight of p0 (see Appendix III). The object points are on the lines of sight of their image points. If the depths λ of all object points are known, the camera pose is determined. Fig. 2 (a) demonstrates how the SI algorithm approaches the correct pose. At first, a fixed depth of the center point P 0 (e.g. λ0 = 1) is set. And then a point set C 1 is constructed with a rotation R and a scale factor μ. Hence, C 1 is in the same shape of the object but with different scale. Then all points of C 1 are projected to corresponding lines of sight and the projection set B1 is obtained. Then the object-shape point set C 2 which is closest to B1 is obtained. As shown in Fig. 2 (a), it is obvious that rotation of C 2 is closer to the real rotation, and the iteration process (C 1 → B1 → C 2 → B2 → …) will finally converge to the correct camera pose.

ZHOU et al.: VISION-BASED POSE ESTIMATION

Algorithm 1 The SI Algorithm

3471

Algorithm 2 The SI Algorithm Correction Steps for Coplanar 3D Object

Algorithm 3 The SI Algorithm Correction Steps for Non-Coplanar 3D Objects

of the convergence of SI in subsection C, ||C i+1 − Bi+1 ||2 ≤ ||C i − Bi ||2 − ||Bi − Bi+1 ||2 . B. Algorithm Process According to the intuitive explanation above, the PnP algorithm is proposed as shown at the top of the page. Step 3-7 find the closest W C(i+1) to W Bi , and its proof is given in Appendix I. The norm ||·|| used in Step 4 suggests the Frobenius norm. C. Proof of the Global Convergence of the SI Algorithm Since Bi+1 is the projection of C i+1 to the line of sights, line segment Bi+1 ( j )Ci+1 ( j ) is perpendicular to the line of sight of point j . Line segment Bi+1 ( j )Bi ( j ) is on the line of sight of point j . Hence, the three points, Bi+1 ( j ), C i+1 ( j ) and Bi ( j ), are vertices of a right triangle. ∴ ||C i+1 ( j )Bi+1 ( j )||2 = ||C i+1 ( j )Bi ( j )||2 −||Bi ( j )Bi+1 ( j )||2 ∴ ||C i+1 − Bi+1 ||2 =  j ||C i+1 ( j )Bi+1 ( j )||2 =  j ||C i+1 ( j )Bi ( j )||2 −  j ||Bi ( j )Bi+1 ( j )||2 = ||C i+1 − Bi ||2 − ||Bi − Bi+1 ||2 ∵ C i+1 is the closest object-shape matrix to Bi ∴ ||C i+1 − Bi+1 ||2 ≤ ||C i − Bi ||2 − ||Bi − Bi+1 ||2 ∴ The distance between C i and Bi will become smaller and smaller until Bi = Bi+1 . D. Convergence Direction Correction Global convergence does not guarantee that the SI algorithm can always converge to the correct pose. In this section, the methods for correcting the convergence direction of the SI algorithm for coplanar object (Algorithm 2) and noncoplanar object (Algorithm 3) are proposed. When the convergence direction is wrong, the final result will still be stable without oscillation. According to the proof

(7)

According to (7), the algorithm keeps on calculating until Bi = Bi+1 . Because C i+1 = f (Bi+1 ) = f (Bi ) = C i , where f (·) suggests a function, the algorithm will finally reach a stable result without oscillation. For coplanar object, according to [6], the PnP methods from coplanar object suffer from pose ambiguity, which means there may have two poses making the object-space error function [4] close to zero. But only one of them is correct and usually the correct one has a smaller object-space error function. In the SI algorithm, the points of C i are projected to the lines of sight and Bi is obtained, and then C i+1 is obtained from Bi . The vertical lines of the lines of sight are denoted as dividing lines as shown in Fig. 2(c). The rotation variation between C i+1 and C i is caused by all points together and each point cannot make the estimated object cross its dividing line by itself. If all points are on the wrong sides of the dividing lines, like C i in Fig. 2(c), the algorithm will mostly converge to the wrong pose which also makes the object-space error function small. As shown in Fig. 2(c), mirror C i with the mirror plane which is perpendicular to the line of sight of P0 , and go on calculating, can make the algorithm finally obtain the correct pose. The experiment results of [6] also show that the correct and wrong poses are almost the mirror of each other with the mirror plane. Hence, for coplanar object, when a pose is obtained by SI, the other pose can be obtained by mirroring the estimated object with the mirror plane by algorithm 2 and subsequent calculation. The one that has a smaller ||W Ci − W Bi || is correct. For non-coplanar 3D object, the object-space error function will also be small if the correct pose is mirrored with the

3472

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014

mirror plane. However, the left-right-hand of the estimated object and the real object will be different. Because the proposed SI algorithm does not take into account the leftright-handed difference, SI might approach the wrong pose. An example is shown in Fig. 2(b). Suppose Bi is an intermediate result in the SI iteration. SI will obtain its closest object-shape points set C i+1 . Then, the rotation of C i+2 is obtained. The rotation of C i+2 becomes farther away from the real rotation. Apparently, the convergence direction of the algorithm is incorrect, which suggests that the algorithm cannot finally find the correct pose. As shown in Fig.2(b), C i+1 has a different left-righthand with the object {P 1 , P 2 , P 3 }. Hence, the left-righthanded difference between the real object and the estimated object suggests that the algorithm convergence direction is incorrect. It is similar for non-coplanar 3D objects in 3D space. Based on the analysis above, the SI algorithm adds some correction steps (algorithm 3) between step 5 and step 6 for non-coplanar 3D objects. The condition “trace (|RV |) > TH” indicates that Bi and A have similar shape. Algorithm 3 prevents the SI algorithm from approaching the wrong pose and change the convergence direction. The SI algorithm will go on calculating and find the correct result.

Algorithm 4 The Mini Determination Algorithm

IV. P OSE E STIMATION F ROM N ON -C ORRESPONDING P OINTS Correspondence determination problem, which means finding a set of image points that could be identified to be the images of a set of object points, is usually difficult when points look like the same. Mostly, different methods are required to obtain the correspondences for different patterns, which is inconvenient. A novel method to simultaneously obtain the camera pose and correspondences is proposed in this section, which is called the SoftSI algorithm. For correspondence determination, the correspondence matrix M N×N is introduced. In the final result of SoftSI, M N×N is a 0-1 matrix, and every row and column of M has only one ‘1’ element. M( j, k) = 1 suggests that the j th image point corresponds to the kth object point. Sometimes, there are multiple solutions to the pose determination problem when correspondences are unknown. For example, when object points are evenly distributed in a sphere surface, an image point could match any object point, and the correspondence pose is determined. Therefore, with a randomly initial pose, the proposed SoftSI algorithm could not always obtain the correct results. Moreover, with N points and 6DoF pose, there exist N + 6 variables and the problem has many local minima. Due to the reasons mentioned above, the SoftSI algorithm proposed in this section aims to obtain the correct pose and correspondences from an initial pose which is not far away from the real pose. Subsection A discusses the initial value selection. Theorem 2 is applied to obtain the correspondence matrix M, which is discussed in subsection B. Subsection C provides a method to quickly decide whether an initial value is good or not.

A. Initial Value Selection A ‘good’ initial rotation is necessary for the SoftSI algorithm (algorithm 5) to reach the correct solution. For the initial value selection, one could also randomly assume three correspondences and calculate the pose by using the SI algorithm. A reasonable assumption is that the center of object points corresponds to the center of image points [4]. Hence two other correspondences are required. The RANSAC method is suitable for the correspondences assumption and test method. With N points, the time complexity of the method is O(N 4 ), because two object points and two image points are required. However, the correspondences hypothesize-and-test method takes much time. Therefore, SoftSI will select its initial value by assuming rotation itself. With an initial rotation Rini , the method for obtain the initial correspondence matrix M ini (algorithm 4) is proposed based on the weak perspective projection model [4]. The object points are projected to the image with the weak perspective projection model and the matrix E is obtained. Then E is compared with the image matrix F. B. Algorithm Design Theorem 1 indicates that U B SB is both rotation invariant and translation invariant. In the SI algorithm, ||μU A SA − U B SB || becomes smaller and smaller. Similarly, with non-corresponding points, and according to Theorem 2, the SoftSI algorithm proposed in this section is about to make ||μMU A SA − U B SB ||

(8)

smaller and smaller. According to (8), the differences between each row of μU A SA and U B SB indicate the correspondence relationship M. To obtain M, the cost function is defined: M( j, k) = M ini ( j, k)exp(−co (|μU A SA ( j )−UB SB (k)|−ξ )) (9) where ξ is a convergence rate threshold.. co is a positive coefficient. The combination of exp(−co (|μU A SA ( j )− U B SB (k)| − ξ )) and M ini indicates that the algorithm combines the 2D image information and the 3D object information. According to (9), the M ini part ensures that the algorithm only searches the rotation in the neighborhood of the initial rotation Rini . This avoids the algorithm searching a large area and makes the algorithm have more opportunities to succeed when the initial rotation is near the real rotation. Since the summation of each row and each column of M should be 1, a technique due to Sinkhorn is applied to normalize a square correspondence matrix [31].

ZHOU et al.: VISION-BASED POSE ESTIMATION

3473

As the estimated rotation approaching the real rotation, M ini should be updated when the estimated rotation is better than the initial rotation. ||μMU A SA − U B SB || can be used to decide which pose is better. According to the analysis above, the pose estimation algorithm from non-corresponding points (SoftSI) is proposed.

Algorithm 5 The SoftSI Algorithm

C. Initial Value Selection Acceleration In this section, a quick and convenient method for deciding whether an initial rotation is ‘good’ or ‘bad’ is proposed. Since ‘bad’ initial rotations are able to be eliminated quickly, the proposed method will greatly accelerate the initial rotation search process. According to the following (10), when the rotation R is given, the translation T j can be obtained from each point j = 1, 2 . . . N. As R approaching the real rotation, the difference between each T j becomes smaller and smaller. When R equals to the real rotation, each T j should be the same. According to (11), the standard deviation of translation σT , is suitable for representing whether the current rotation R is close to the real rotation or not. A large σT indicates that the current rotation is ‘bad’. T j = (λ j /μx j − RX Aj )

f or

j = 1, 2, 3 . . . N

σT = Standard Deviation of T j for j = 1, 2, 3 . . . N

(10) (11)

After the first iteration, σT is denoted by σT1 , which suggests whether the initial rotation is ‘good’ or ‘bad’. The SoftSI algorithm tries to find a ‘good’ initial rotation. For each trial, after the first iteration, σT1 is obtained. The initial rotation should be eliminated if σT1 is large. Hence, the bad initial rotations can be eliminated quickly. To make the initial rotation selection method faster, step 6 and 7 of the SoftSI algorithm (algorithm 5) are replaced with step ‘M = M ini ’. An experiment is designed to find the relationship between initial rotation bias, σT1 and the success of SoftSI. Figure 3 (a) (b) (c) shows a typical kind of the relationship. As shown in Fig. 3 (a), σT1 is small when the initial rotation bias is small, and the SoftSI algorithm succeeds. Even when the initial rotation bias is not so small (≈ +90 degree), σT1 is also small and the algorithm succeeds. We find that σT1 determines the success of the SoftSI algorithm. Figure 3(d) supports this finding. As shown in Fig. 3 (d), when the initial bias is between 84 and 103 degree, σT1 falls to a low level and the SoftSI algorithm succeeds. V. S IMULATIONS AND E XPERIMENTS In this section, the performance of the proposed algorithms is tested by simulations and experiments. All codes are implemented in Matlab script. A. Simulations of Accuracy Firstly, simulations are carried out to demonstrate the accuracy of SI. The results of SoftSI and SI are the same when SoftSI succeeds. In simulations, the object points are uniformly distributed in a box: [-1,1] × [-1,1] × [-1,1] meters; the camera is set at

20 meters away from the box. The pose is randomly generated. Before using the SI algorithm, some decisions are made to make sure that the object points are in front of the camera. Different levels of noise are added to the image to test the accuracy of the algorithm. The comparison of the accuracy of the four PnP methods, SI, OI [4], POSIT [2] and EPnP [3], is shown is Fig. 4. OI and POSIT are two widely used iterative PnP methods, and EPnP is the state-of-the-art non-iterative PnP method. As shown in

3474

Fig. 3. The relationship between initial rotation bias, σT1 and the success of SoftSI. The red line represents that the algorithm succeeds with the initial rotation. (a)–(d) Show that σT1 is small when the initial rotation is small and SoftSI succeeds. (d) Show that σT1 determines the success of SoftSI.

Fig. 4. The comparison of the accuracy of SI, OI, POSIT and EPnP. Each point in the plot represents 1,000 trials.

Fig. 4, OI, POSIT and SI have similar accuracy, while they are more accurate than EPnP. B. Simulations of the Convergence of SoftSI The proposed SoftSI algorithm has a large convergence radius, while a large convergence radius suggests that the ‘good’ initial rotations distribute in a large area around the real rotation. A large convergence radius allows a large initial rotation search step. Obviously, it will reduce the number of trails and make the SoftSI algorithm fast. The convergence radius performance of the proposed SoftSI algorithm is certified by Monte Carlo simulations. The camera focal length is 1,000 pixels. The object points and the camera pose are randomly generated with the same parameters presented in subsection A. The Monte Carlo simulations are characterized by four mainly parameters: the number of points

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014

Fig. 5. The success rate of SoftSI with different parameters (the number of points N , search step size S, image noise σ and the object depth d) Each point represents 1,000 trials. (a) N = 10, co = 5, coini = 3 (b) N = 20, co = 8, coini = 4 (c) N = 30, co = 10, coini = 5 (d) N = 40, co = 12, coini = 6.

N, the initial rotation search step size S, the standard deviation of image noise σ and the ratio d = (the distance from the object to the camera) / (the size of the object along the depth direction). Simulations are performed with: N ∈ {10, 20, 30, 40}, S ∈ {30, 40, 50, 60, 70, 80, 90, 100} degree, σ ∈ {0, 1.5, 3} pixel, d∈ {10, 5, 2.5}. The noise added to the image is conservative enough. The success of the SoftSI algorithm is defined to be that over 90% of the points are correctly matched and the final σT is small. The correspondence success rate cannot be 100% especially when there is noise. The most extreme example is that the positions of two image points are exchanged because of noise. The most common miss-matched situation is that the SoftSI algorithm cannot distinguish some image points that are very close to each other. However, the final σT suggests whether or not all points agree with the result. If the final σT is small, one can say that the correspondence and pose determination is successful. The simulation results of the correspondence success rate with different parameters are shown in Fig. 5. Each point in plot suggests 1,000 trials. For each trial, N points and the camera pose are randomly generated, and the initial rotation is selected by searching in the 3D rotation space with search step S. The image noise is generated with stand deviation σ . As shown in Fig. 5, with a small search step (

Vision-based pose estimation from points with unknown correspondences.

Pose estimation from points with unknown correspondences currently is still a difficult problem in the field of computer vision. To solve this problem...
3MB Sizes 4 Downloads 3 Views