IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

5743

Improved Stereo Matching in Scattering Media by Incorporating a Backscatter Cue Shahriar Negahdaripour, Fellow, IEEE, and Amin Sarafraz Abstract— In scattering media, as in underwater or haze and fog in atmosphere, image contrast deteriorates significantly due to backscatter. This adversely affects the performance of many computer vision techniques developed for clear open-air conditions, including stereo matching, when applied to images acquired in these environments. Since the strength of the scattering depends on the distance to the scene points, the scattering field embodies range information that can be exploited for 3-D reconstruction. In this paper, we present an integrated solution for 3-D structure from stereovision that incorporates the visual cues from both disparity and scattering. The method applies to images of scenes illuminated by artificial sources and natural lighting, and performance improves with discrepancy between the backscatter fields in the two views. Neither source calibration nor knowledge of medium optical properties is required. Instead, backscatter fields at infinity, i.e., stereo images taken with no target in the field of view, are directly employed in the estimation process. Results from experiments with synthetic and real data demonstrate the key advantages of our method. Index Terms— Stereo vision, scene reconstruction, dense correspondence, scattering media.

I. I NTRODUCTION EATURE matching is the primary challenge in stereo vision. The state-of-the-art techniques based on a large range of approaches are targeted primarily for images recorded in non-scattering media [16], [28]. Significant difficulties can arise when these methods are applied to images recorded in scattering environments, e.g., underwater, and in haze and fog [9], [24], [26]. In such domains, the scene radiance generating the signal component of the image undergoes attenuation due to medium absorption and scattering, as light reflecting from the object travels to the camera. Additionally, the attenuated signal is corrupted by backscatter or sidescatter generated by the suspended particles, when illuminated by an artificial source from a position near the camera or by natural lighting. The amount of backscatter/sidescatter increases with the camera-to-scene distance, leading to a spatially-varying contrast decay across the image.

F

Manuscript received August 17, 2013; revised May 16, 2014 and September 9, 2014; accepted September 9, 2014. Date of publication September 17, 2014; date of current version November 20, 2014. The work of A. Sarafraz as a PhD student at the Civil, Architectural and Environmental Engineering Department, University of Miami was supported by the U.S.– Israel Binational Science Foundation under Grant 2006384. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Anthony Vetro. S. Negahdaripour is with the Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33146 USA (e-mail: [email protected]). A. Sarafraz is with the Center for Computational Science, University of Miami, Coral Gables, FL 33146 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2358882

Fig. 1. Water turbidity can adversely impact stereo matching accuracy; top: left image of a stereo pair and ground truth disparity map; middle: synthesized stereo pair in water with corresponding backscatter fields; bottom: validity maps, depicting points where disparity is computed accurately (white), by NSSD-based method applied to original stereo pair (Poster images) and to synthesized turbid-water images, respectively.

Among scattering environments, underwater is one of, if not the most challenging domain for stereo imaging due to a number of factors, included but not limited to 1) high turbidity and attenuation rate in a large percentage of environments (e.g., many lakes and bodies of still water, near shore); 2) poor (or non-existent) natural lighting below depths of 10’s of meters (or roughly 100 [m]), requiring artificial sources that generally generate significant backscatter under constrained camera-to-source separation when deployed on submersible platforms; 3) significant sidescatter under natural lighting in shallow waters. In the remainder, “backscatter” will be used

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

5744

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

TABLE I I NHERENT O PTICAL P ROPERTIES OF 3 WATER T YPES , TAKEN F ROM [19]

generally for the additive scatter field generated by suspended particles within water column. Underwater 3-D reconstruction by stereovision has been explored over the last dozen years [1], [6], [15], [17], [20], [21], [32]. Some of these works employ the open-air dense matching techniques based on relaxing the brightness constancy assumption; e.g., [20]. Others have applied feature-based methods to images acquired under good visibility, and thus negligible backscatter [6], [15], [21], [32]. Single image dehazing offers a possible approach for image contrast enhancement (and scene radiance recovery) [4], [8], [14], [33]. However, most of these methods are targeted for scenarios where signal deterioration is primarily due to medium attenuation. That is, they do not work well with images that are severely corrupted by backscatter. To demonstrate the impact of backscatter on stereo matching, we utilize the Poster images from the Middlebury data set [28], [29]. Fig. 1 depicts the left view and the true disparity map (top). Given in the next row are the synthesized turbid-water stereo images of the Poster scene at roughly one meter distance. These are constructed from the backscatter fields, given in the next row, using the model presented in Appendix 1, with the optical parameters of turbid harbor waters; see Table I taken from [19]. Finally, we have given the solutions of the standard Normalized Sum-Squared Difference (NSSD) method, applied both to the original and synthesized pairs. Setting an error threshold of 1 [pix], the binary validity maps (to differentiate between the correct and incorrect disparity values) reveal the adverse impact of backscatter on the estimation performance. This motivates devising an effective solution for underwater 3D scene reconstruction in the presence of backscatter. Image enhancement by removing the backscatter component has been demonstrated by utilizing two images acquired with different polarization filter settings from a single viewpoint [34]; a by-product is an up-to-scale rough range map. Knowledge of the optical properties of the medium have been applied for constructing descattered images, in order to apply traditional stereo algorithms [24]. Alternatively, recursive range recovery and image enhancement by descattering has been demonstrated by employing both polarization and stereo cues [25]. Recently, we reported experimental results where earlier techniques were compared to our method presented here [26], now covered with technical details. The primary contribution of this paper is a stereo matching method – the key step in 3D scene reconstruction – simultaneously exploiting the range cues from binocular stereo and backscatter in a unified estimation process. Improved performance is realized mainly in regions where lack of surface texture is overcome by exploiting the range cue from

Fig. 2. A typical underwater stereo imaging setup, where an artificial source is used for scene illumination.

backscatter. An analogy can be drawn with terrestrial domain 3D reconstruction methods that integrate information about depth from binocular stereo within texture regions with local surface normal from shading cue within untextured regions; see [2], [5], [10]. A key novelty to estimate and exploit the backscatter is by representation in the form of a rangedependent scaling of the same field at infinity, namely stereo images acquired in the medium with no target in the field of view (FoV). The scaling is determined through the optimization process for estimating the disparity field. Other significant aspects of our method are: 1) it does not require knowledge of the medium optical properties for backscatter estimation; 2) backscatter/sidescatter induced by natural and artificial lighting can be treated; 3) it requires no source calibration. Among various scattering media, underwater deployment is a natural application where the stereo baseline can be relatively short due to range limitation (from restricted visibility). Stereo reconstruction methods dominantly utilize some type of minimization formulation that combines a data discrepancy term with some form of regularization to impose disparity/depth smoothness within uniform regions. Variations among methods often arise in how discontinuities, occlusions, illumination variations, etc., are handled. In this work, disparity computation by the traditional NSSD applied within local support windows is augmented with two new error terms that penalize the discrepancy between the left and right backscatter fields and range-dependent fractions of these fields at infinity. Now, the NSSD-based methods fail generally within lowSNR regions with no (or rather weak) texture, potentially corrupted by various noise sources. In scattering media, the ambiguity within some of these regions is resolved by the exploitation of the range cue in the backscatter field. Naturally, this may not work in all regions and thus feature matching remains ambiguous; e.g., where noise sources dominate a weak range cue in backscatter. Our results from experiments with both synthetic and various real data quantifies the improved accuracy with respect to traditional approaches that rely solely on the existence of texture for matching. II. S TEREO M ODELING IN S CATTERING M EDIA Consider two calibrated cameras viewing the scene (Fig. 2). For simplicity but without loss of generality, we assume the traditional set up with parallel optical axes, baseline vector D = (D, 0, 0), and hence epipolar lines that are parallel to the x axis.

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

The image coordinates of a scene point P in the left and i right images are denoted xobj , i = L, R where x = (x, y). In the remainder, superscript i refers to the same quantity in the left and right views. We model each image I i as the sum of attenuated signal S i and backscatter B i components [25]: i i i I i (xobj ) = S i (xobj ) + B i (xobj ).

(1)

L ) pairs up corresponding features The disparity d(xobj L R {xobj, xobj }: R L = xobj + (d, 0). xobj

(2)

For a 3D scene point P at infinity, where the disparity is L = xR ), the signal components of the two views zero (xobj obj tend to zero, and the stereo pair is comprised solely of the backscatter at infinity: i )| i i I i (xobj P→∞ = B∞ (xobj ).

(3)

For each pair of matching features in the two views, we define the following total fields for both the signal and the backscatter components: L R L R , xobj ) = S L (xobj ) + S R (xobj ); S(xobj

(4)

L R B(xobj , xobj )

(5)

We further define L R , xobj )= ρ(xobj

= B

L

L (xobj )

+B

R

R (xobj ).

   L L R )  B (xobj) − B R (xobj L ) + B R (xR ) B L (xobj obj

.

(6)

We emphasize that the quantities on the left-hand size of (4) to (6) are defined for a pair of matching points in the left and right views. Most scene points viewed in both cameras have approximately equal light propagation paths to the two cameras, and thus (roughly) the same attenuated left and right signals S i . It is primarily the close-range points in the periphery of one camera’s FoV that have very different distances from the two cameras. However, such points are not typically imaged in the farther camera. Therefore, we may apply the brightness constancy model for two corresponding left and right features: L R ) = S R (xobj ). S L (xobj

(7)

L and xR from the For brevity, we omit the arguments xobj obj equations, in the remainder. It readily follows that

|I L − I R | = ρ B; I L + I R = B + S.

(8) (9)

Let us define s i and s as the ratios of the left/right and total backscatter fields to the same fields at infinity, respectively Bi B s i = i , and s = . B∞ B∞ Substituting (10) into (5) yields  L  R s s L R B∞ B∞ B∞ = + , s s

(10)

(11)

5745

where

 i i       B /B∞ si i  = B i /B B ∞ /B i∞ = k i B ∞ /B∞ =  . s B/B∞

(12)

Here, k i establishes the decomposition of the total backscatter field B into the component B i . In the remainder, we apply these equations in estimating various fields that are employed in our method. We utilize (ˆ.) to denote the estimate of (.) (e.g., ρˆ represents the estimate of ρ). III. E STIMATION OF BACKSCATTER AND S IGNAL C OMPONENTS We conclude from (3) that the left/right backscatter fields at infinity can be estimated from a stereo pair recorded with the scene at infinity (no object in the FoV). In Appendix A, we assess the variations in various ratios that relate the backscatter at each range to the backscatter at infinity, along the same projection ray. Here, we adopt a light transport model commonly employed in ocean optics, atmospheric imaging, computer graphics, and computer vision; e.g., to synthesize the volumetric scattering phenomenon [3], [18]. As in many applications, a sufficiently accurate approximation of the backscatter field can be obtained based on the single scattering assumption; that is, light scatters only once within the medium before it reaches the camera. Moreover, we model the medium as homogenous in order to integrate the smooth monotonically-varying single scattering equations analytically, as in some earlier related work, e.g., [31]. The behavior is analyzed for a typical imaging geometry and the optical parameters for 3 water types, given in Table I. These are taken from [19], based on the most carefully made and widely cited scattering measurements from [23]. The data were obtained in the turbid waters of San Diego Harbor (California), near-shore coastal waters of San Pedro Channel (California), and clear waters in the Tongue of the Ocean, Bahama Islands. We demonstrate that 1) while s depends on range, the sensitivity to the viewing direction is negligible, particularly as scattering becomes more significant (see Fig. 11); 2) decomposition B i = k i B of the total backscatter B into the left/right fields can be approximated by the ratio of the left/right backscatter at infinity to the total backscatter at infinity, particularly for longer ranges and (or) higher turbidity:  Bˆ i Bˆ i kˆ i = 1) kˆ i ≈ ∞ = L ∞ R ≤ 1 ( Bˆ ∞ Bˆ ∞ + Bˆ ∞

(13)

Substitution into (12) yields the key result: sˆ i ≈ 1 −→ sˆL ≈ sˆR ≈ sˆ . sˆ which we take to be equality in our method. In appendix B, we show that    ˆL   L R   B∞ − Bˆ∞ I − I R  ∞ ∞ ρˆ = = L . R L + BˆR I∞ + I∞ Bˆ∞ ∞

(14)

(15)

5746

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 3. (a) Incorrect backscatter ratio for the wrong range and correspondence yields erroneous left/right backscatter and attenuated signal components, leading to incorrect 3D reconstruction. (b) Illustration of backscatter estimation for a small patch. For each disparity level (horizontal axis), we estimate left/right L (x) and B R (x + d); B L and B L (x) shown on top 2 rows). Error backscatter fields B L (x) and B R (x + d) from corresponding backscatter at infinity fields B∞ ∞ ∞ (vertical axis) is minimum where the estimated backscatter and backscatter at infinity fields are similar, related by roughly constant scaling s i within the support L . window xobj

Fig. 4. (a) “Poster stereo images” from Middlebury data set, used to generate synthesized turbid-water stereo pairs in (b); ground truth (c) and estimated field in the form of binary validity maps; good (white) and bad (black) estimates with error threshold set at 1 pixel; result from our method applied to turbid-water images (f) are compared to (d, e) estimates from applying NSSD to ideal and turbid-water images in (a, b), respectively. TABLE II

Consequently, we can readily estimate B from (8): Bˆ = |I L − I R |/ρ. ˆ

D ISPARITY E RROR T HRESHOLD [P IX ] IN D IFFERENT E XPERIMENTS

(16)

Where there is no (or negligible) difference between the left and right backscatter fields, ρ is equal (or close) to zero. In such cases, Bˆ in (16) is overestimated. To avoid this, we set a lower bound ρˆo , thus determine Bˆ as follows:   L  I − I R . (17) Bˆ = max(ρ, ˆ ρˆo ) We have used the value ρˆo = 0.1 in all of our experiments. Calculating the ratio sˆ from (10) and setting equal to sˆ i i according to (14), we apply to the backscatter at infinity Bˆ ∞ i ˆ to determine the left/right fields B . Consequently, we obtain the left/right signals: Sˆ i = I i − Bˆ i ;

(18)

This established the foundation for the descattering of the stereo pair, although not the immediate scope in this work.

Referring to Fig. 3(a), when a wrong match is assumed, we generally obtain inaccurate estimates of ρˆ and Bˆ from (15) and (17), and inaccurate scales sˆ i and signal components Sˆ i from (10) and (18), respectively. This is demonstrated through an example next, and plays a key role in the development of our optimization method. L , we seek the match xR For a left feature xobj obj along the corresponding epipolar line of the right view. For each disparity in the feasible range dmin to dmax , we take small local patch x i centered at the hypothesized match. As obj L , which is described above, we determine Bˆ L within xobj given in the second row of Fig. 3(b). This can be compared L within the same region (repeated in the first row to Bˆ ∞

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

5747

Fig. 5. Results for 4 other synthetic turbid-water stereo pairs, constructed from Middlebury data set. Top to bottom, various rows depict (a) left view; (b) ground truth disparity map; (c) contrast maps depicting local texture content based on variance of intensity within local support window of each pixel. Binary validity maps are given for NSSD-based method applied to (d) turbid-water images and (e) signal components, and (f) our method applied to turbid-water stereo images (disparity error threshold set to 1 [pix]).

of Fig. 3(b) to simplify visualization). The estimated Bˆ L is smooth and resembles the backscatter at infinity, only for the correct disparity. Here, the estimated backscatter is a fraction L only for the correct disparity, as given by (10). sˆ L ≤ 1 of Bˆ ∞ IV. D ISPARITY E STIMATION Various quantities in the previous section are defined for each left-right correspondence. However, how do we establish

these correspondences by making use of the range cues in both the backscatter fields and signal components? Our method employs a minimization formulation comprising of the following error function: ET = ES + EB,

(19)

where the two terms E S and E B penalize the deviations of the estimated left/right signals and backscatter fields

5748

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 6. Normalized intensities with unity maximum value, x-axis in (a-d) cover 3 gradient magnitude thresholds Tg used for labeling pixels as textured/untextured, for (a, c) synthetic turbid water and (b, d) original images. The y-axis gives percentage of pixels with correct disparity in (a, b) textured and (c, d) untextured regions. Fixing Tg at roughly 50% of maximum gradient, percentage estimation accuracies are given for (e) textured and (f) untextured pixels, as well as (g) over entire image.

from the associated models, respectively. In defining these error terms, a number of frameworks exist,. e.g., the weighted support windows, hierarchical techniques, and occlusion handling methods. (The merits and trade-offs of these and other state-of-the-art stereo techniques have been

assessed in various surveys [7], [27], [28], [30].) Among these, the weighted support windows with the winner-takes-all strategy is simple to implement, and yet can effectively demonstrate the inherent advantages of our method.

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

5749

Fig. 7. Indoor water tank experiment. Top to bottom in column (a): stereo images recorded in clear water, and in scattering medium (generated by adding low-fat milk), with corresponding left/right backscatter at infinity fields. Two right columns: (b) scene regions with varying texture content and backscatter components; (c) ground truth disparity; (d) contrast map; (e, f) validity maps for NSSD-based method applied to turbid-water and clear-water data, respectively; (g) our method applied to turbid-water data.

For the signal term, we use the commonly adopted NSSD measure: E S (d) = NSSD( Sˆ L , Sˆ R ).

(20)

Building on the relationship between the left/right and total backscatter fields in (10), a suitable error function E B consists of some norm of the discrepancy between the estimated backscatter Bˆ i and the predicted value as a fraction sˆ of the i : left/right backscatter at infinity Bˆ ∞    ˆ B i EB = . (21) (| Bˆ i − sˆ B∞ |); sˆ = Bˆ ∞ i  i x obj

The scale sˆ is computed as the average over the support L and its windows x i of the two views, centered at xobj obj R . A window size of 21 [pix] × 21 [pix] assumed match xobj has been set in all of our experiments. The optimal solution is taken as the disparity giving the minimum total error: dˆ = arg min(E T ).

(22)

A. Implementation The objective is to assess performance based on matching accuracy, rather than computational efficiency. Thus, we, identify the solution through a winner-takes-all strategy by computing the error function in (19) over the disparity range of interest. We summarize the computational step:

for d = dmin : dmax do - calculate the backscatter at infinity fields for the corL , x R } from (3), and total B ˆ∞ responding pairs {x obj obj from (5); - estimate ρ using (15); - compute total backscatter using (17), scale sˆ from (10) by averaging the solution over support windows, and consequently set equal to the left/right scales according to (14); - calculate left/right backscatter fields using (10), and signal components from (18); - evaluate error terms for signal and backscatter components, using (20) and (21). end for The solution is taken as the match giving the lowest error. V. E XPERIMENTS We employ both synthetic data with perfect ground truth, and 3 real data sets to compare the results from our technique and the traditional NSSD-based solution, applied to both the backscatter-corrupted stereo pairs and descattered images (signal components). Results are given as a binary validity map; good (white) and bad (black) estimates based on a prescribed error threshold Td . The threshold Td is varied in different experiments, solely to account for ground truth uncertainty (known perfectly for synthetic data, but with varying precision in experiments with real data). These results

5750

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 8. Different regions of water tank scene in (a) clear and (b) turbid waters. (c) Contrast maps highlight texture context of each region. Binary validity maps of these various local regions for (d) SSD-based method and (e) our method. For each region, we have given percentage of improvement in disparity map estimation from applying our method.

are also summarized based on the percentage of points with errors not exceeding Td ; see Table II. A. Synthetic Images We have generated synthetic images of a dominantly scattering medium – where absorption is negligible and attenuation is primarily due to scattering – based on imaging conditions of the water tank experiment. We have set an average scene distance of 70 [cm], where scene range is computed from the ground truth disparity. Here, the source is centered between the two parallel cameras with a baseline of 17 [cm], at (8, −4, 0) [cm] w.r.t. the left-camera coordinate frame (see Fig. 2). A non-uniform Gaussian distribution for the illumination field is assumed.

The scene reflectance maps comprise of sample images from Middlebury stereo database, containing objects with different shapes and texture. Turbid harbor conditions with coefficients of scattering b = 1.82 [m−1 ] and attenuation c = 2.190 [m−1 ] (one attenuation length of 0.46 [m]) from Table I were utilized in generating the falloff (attenuation) and backscatter fields. For each view, the product of the falloff function with the scene reflectance map gives the signal component, which is then added to the backscatter field to construct the synthesized turbid-water image. We have also applied a small amount of additive random noise with variance of one gray level. Fig. 4(a) depicts the Poster images from Middlebury database, with the corresponding ground-truth disparity in (c). The validity maps in (d, e) are based on the solutions from the

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

5751

Fig. 10. (a) Imaging geometry of single-scattering for backscatter modeling; (b) Varying scene point position along the optical ray of the right camera, backscatter fields {B L , B L } depend on the range to the two cameras. Both B i i (i = L, R) are equal for points M at equal distance and ratio s i = B i /B∞ from both camera. As the point moves away along the optical ray u R , B R and ratio s R i increase monotonically, while both B L and ratio s L i decrease monotonically. The discrepancy between the two scales scale discrepancy can be quantified for a point S, as we move away from M.

Fig. 9. Results for images recorded under turbidity in an outdoor pool and marina. (a) enhanced left images; (b) “estimated” ground truth disparity maps; (c) row by row, validity maps by applying NSSD to raw data, descattered images by haze removal method in [14], and using our method. Percentages are for pixels with disparity error equal or below 5 [pix]. (d) left backscatter fields at infinity.

NSSD-based method applied to both the original images in (a) and synthetic “turbid-harbor” images in (b). These can be compared with the map in (f), obtained from applying our method to the synthesized stereo pairs. The disparity map in (e) is erroneous mainly over regions with weak texture, while computed accurately by our method over most of these

regions; the performance is comparable to the NSSD-based method applied to original images. Fig. 5 summarizes similar results for four other samples from the Middlebury database (Venus, Sawtooth, Teddy, and Cones). The so-called contrast fields in row (c) represent the local texture content, measured by the standard deviation of intensities within the 21 [pix] × 21 [pix] window. The key observations are: 1) performance of NSSD-based method is directly tied to the texture content, and comparable when applied to the original stereo pairs or their signal components; 2) our method offers notable improvement, approaching ground truth. Exceptions are primarily near discontinuity boundaries where the averaging over 21 × 21 support window leads to blurring of disparity and thus errors larger than 1-pixel error threshold. Having given the validity maps that explicitly show correct and incorrect estimates, actual disparity maps have not been given mainly due to lack of space. Fig. 6 offers a more detailed performance assessment. We have binned the results into two groups of so-called (a, b) textured and (c, d) untextured regions, according to the average gradient within the support region of each pixel, relative to a gradient threshold Tg . We label a pixel as textured/untextured if the average gradient within its support window is above/below Tg . By selecting and comparing 3 values of Tg = 0.05, 0.1, and 0.15, we have attempted to remove any bias in accuracy assessment of each group. Increasing the threshold yields less points within the textured bin, maintaining mainly points with very strong surface markings. Conversely, lowering the threshold places less points in the untextured bin. The labeling based on gradient magnitudes has been carried out for both the original Middlebury database images in (a, c), and for the synthetic turbid-water images in (b, d). According to Figs. 6 (a, b), the performance of competing methods converge within textured regions for increasing threshold Tg , thus, our method offers little advantage in these regions. (Note that the number of class pixels decreases as we raise the threshold, and that the gradient magnitude is smaller for the synthesized images than for the original Middlebury images.) For untextured areas, improvement offered by our method increases as the threshold Tg is lowered, because

5752

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 11. Backscatter simulation results for 3 different water types – (a-c) Variations in scale s, ratio of backscatter to backscatter at infinity, over FoV parametrized by viewing angles π/4 ≤ γ ≤ 3π/4) and ranges 0.15 to 2.0 [m] (verifying range dependency but insensitivity to viewing direction); (d) mean i and one standard deviation (measured by vertical bars) of ratio s over FoV. Referring to Fig. 10(b): (e) verifying approximate equality of scales s i = B i /B∞ and s = B/B∞ ; (f) percentage discrepancy between s and s i ; (g) comparing ratio k i = B i /B with approximation based on (13).

the bin is occupied by only pixels with very weak/no texture; see Fig. 6 (c, d). This emphasizes the role of the range cue from backscatter within the untextured areas. In (e–f), we have fixed the threshold at roughly 50% of the maximum gradient, while comparing the accuracy within (e) textured and (f) untextured regions, as well as (g) over the entire image. To summarize, the disparity estimation performance deteriorates as backscatter becomes more dominant (for increasing scene distance and within regions with low contrast), as expected. Our method offers enhanced accuracy by making use of two complementary visual cues, namely the signal and backscatter components of the image that are dominant within the textured and untextured areas, respectively.

B. Real Images The real data consist of the stereo images of known targets in an indoor water tank, outdoor pool, and a marina. For water tank data, we have recorded images in both clear water (for reference) and in turbid condition by the addition of highlyscattering low-fat milk (see Fig. 7). Both outdoor pool and marina images exhibit a high level of turbidity, as well; see row one of Fig. 9. The indoor water tank scene comprises of three objects, namely a planar mat, a cylindrical object, and a custommade chart with various texture patterns that are often used to demonstrate/assess the performance of various image process-

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

ing methods; e.g., the B&W cameraman image, alphabet letters of different sizes and separations, bar codes at different thicknesses and spacings, etc. The mat with stronger texture is placed closest to the stereo setup at about 70 [cm], with the cylinder at a distance of roughly 125 [cm]. Placed at slanted orientation, the chart covers a distance of 1–1.5 [m] from the cameras. Underwater non-uniform illumination with artificial sources was simulated by turning off the tank room (with black walls), utilizing a spot light at 2-3 [m] above the water surface, and another underwater light positioned midway between the two cameras. For ground truth, we have determined the disparity map from the images recorded in clear water, exploiting the known scene geometry (target shapes and configurations). For example, we have applied a linearly varying disparity model to the planar targets. Trusting the ground truth disparity based on images of targets with known sizes and distances in rather clear water, we have set the disparity error threshold at 1 [pix]. Finally, we have presented the results of experiments with data from an outdoor pool and a marina, collected in turbid waters under natural lighting; see Fig. 9. Here again, the ground truth disparity maps are determined by utilizing the knowledge of various target shapes, but distances are known less accurately. For example, the marina data set contains three planar targets; namely, the chart from the indoor water tank experiment, with two mats on either side. Utilizing manually matched features, a linear disparity model for planar targets has been applied to each of 3 regions, however, high turbidity deteriorates the precision in localizing features, thus, the ground truth disparity. Similarly, the ground truth disparity map in Fig. 9 (b) has been determined for the pool scene comprising of 3 cylindrical objects on the flat featureless pool bottom. Accordingly, the error threshold has been set to 5 pixels, due to ground truth uncertainty. Again, our method performs better, although the percentage improvements listed below the validity maps are inferior relative to earlier experiments. This is primarily for the weaker depth cue from backscatter, i.e., very similar and uniform left and right infinity fields, as depicted in Fig. 9 (d). The corresponding right fields are visually indistinguishable, thus, not given here. One key deviation is the accuracy degradation in the application of the NSSD method to the descattered marina images (compared to the turbid-water data). A likely explanation is that corresponding local regions in stereo pairs are unevenly enhanced when haze removal method (developed for uniform illumination condition) are applied independently to left and right images, consequently exaggerating the differences between them locally. VI. C ONCLUSION We have proposed a new method to estimate dense stereo disparity from the images acquired in scattering media, modeling each of the left and right images as a combination of backscatter and attenuated signal. Constraints are derived by expressing the left/right backscatter values of matching features as range-dependent fractions of backscatter at infinity values along the two viewing rays. The disparity computation is formulated as an optimization problem, seeking the range

5753

value that yields the most consistent decomposition of the stereo pair into the signal and backscatter components according to the model. Results from experiments with both synthetic and real data distinctively demonstrate improved performance over earlier methods that simply attempt to enhance image contrast through descattering. The advantage is most significant in regions where signal is dominated by the backscatter component, namely, areas with low/no texture. The new paradigm, to exploit the backscatter fields directly in the disparity estimation, can be readily incorporated within most existing stereo matching techniques. Potential future work involves a number of algorithmic enhancements, including the treatment of occlusion or image regions where the proposed decomposition may be ill-conditioned. A PPENDIX A We employ an analytical backscatter model to study the variations of various backscatter ratios over the field of view. Based on this, we establish some key constraints that are employed in our optimization method. These allow us to readily estimate certain intermediate fields and to compute the error terms in our formulation, associated with the left and right backscatter fields. We utilize the inherent optical parameters from Table I that cover a wide range of ocean waters. Referring to Fig. 10 (a), the backscatter field arising from the single scattering by the suspended particles can be expressed as an integral along viewing directions (eg., see [22]):

Dv p Io e−c(l+u) bβ(α) du, (23) B(b, γ , Dsv , Dv p ) = l2 0 where various terms are defined as follows: s, v, p: source, viewer and surface point subscripts b: medium’s scattering coefficient c: medium’s attenuation coefficient) α: scattering angle γ : angle between source and viewer ray Io : radiant intensity of point source Dsv : source-to-viewer distance Dv p : viewer-to-surface point distance l: source-to-scattering particle distance u: distance along viewer ray β(α): volume scattering phase function The phase function β(α) represents the angular distribution of the scattered photons (relative to source direction). Various phase-function models have been proposed in ocean optics, astronomy and atmospheric imaging studies; see [11]–[13]. A key parameter 0 ≤ |g| ≤ 1 characterizing the shape of the distribution is computed from the average cosine over all viewing directions:

π β(α) cos α sin α dα (24) g = 2π 0

For example, |g| ≈ 1 where β(α) is peaked around the illumination direction, and |g| ≈ 0 where β(α) is symmetric. A simple analytical approximation widely used is the

5754

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Henyey-Greenstein phase function [13]: β(α) =

1 − g2 1 4π (1 + g 2 − 2g cos α)1.5

(25)

Utilizing this model with various shape parameters 0 ≤ g ≤ 1, we have found insignificant variations in the results presented here. For simplicity, but without loss of generality, we set g = 1 that yields the normalized isotropic phase function β(α) = 1/4π. Also, while the backscatter integral can be determined directly by discrete approximation, we make use of an efficient solution by factorization into two components, proposed for computer graphics applications [31]:    1 π −1 Dv p /Dsv − cos γ B = Ao f A1 , + tan 4 2 sin γ  

γ , (26) − f A1 , 2 where



θ

f (u, θ ) =

e−u tan ζ dζ ;

0

b Io e−b Dsv cos γ ; Ao (Dsv , γ , b) = 2π Dsv sin γ A1 (Dsv , γ ) = b Dsv sin γ .

(27)

The smooth 2D function f (u, θ ) is independent of the medium optical properties. The medium-dependent term Ao is useful in determining the backscatter field, and the backscatter at infinity:

π γ  − f (A1 , ) , (28) B∞ = A o f A 1 , 2 2 However, it is canceled out in computing s = B/B∞ :   Dv p /Dsv − cos γ 1 π γ − f (A1 , ) f A1 , + tan−1 4 2 sin γ 2 . s= π γ f (A1 , ) − f (A1 , ) 2 2 (29) We assume a point source with unit radiant intensity (Io = 1), and set the source-camera separation Dsv = 0.1 [m] by centering the source between the two cameras with 0.2 [m] baseline. (This setup, generating significant backscatter due to large overlapping illumination and viewing volumes, represents a candidate scenario for the application of our method.) Repeated simulations have confirmed very little variations for a range of source-camera separations. Referring to Fig. 10, we have determined the scale from (29) for viewing angles π/4 ≤ γ ≤ 3π/4 [rad] and target ranges 0.15 ≤ Dv p ≤ 2 [m]. Fig. 11(a–c) gives the scale s for the three water types, varying in the ranges 0.64–1 (clear ocean), 0.68–1 (coastal ocean), and 0.82–1 (turbid harbor). Fig. 11(d) gives the average scale over all viewing angles γ, and one standard deviation at various ranges. The largest variation, under clear ocean conditions at close range, is of little interest due to the insignificant impact of backscatter. Variations over the field of view becomes negligible with increased turbidity and

(or) target range, which is of most interest. We conclude that backscatter at each point may be adequately represented as a range-dependent constant scaling of backscatter at infinity along the same projection ray, independent of the viewing direction. Referring to Fig. 10(b), we next consider the same stereo and source geometry to analyze the left/right and total i and s = B/B , respectively. backscatter scales, s i = B i /B∞ ∞ While these are equal for equidistant points from the two cameras, the deviation for other points is to be assessed. To explore, we take a selected projection ray in one, say, u R in the right camera (fixing the viewing angle γ R ). Based on earlier analysis, the scale variations are largest for close-range points. Thus, assuming cameras with 30 [deg] in half-angle FoV (after accounting for reduced viewing angle in water), we fix γ R = 60 [deg] (with point M at a distance of 0.2 [m] from both cameras). Varying the position of the scene point along the right-camera projection ray from 0.2 to 2 [m], we determine the projection ray u L , viewing angle γ L , etc., for the left view. Fig. 11(e) depicts the plot of various scales (s R : dashed, L s : dash-dotted, s:solid) as functions of range from the right camera. These are given in different colors for the three water conditions. To highlight the difference, we have depicted in (f), the percentage discrepancies (s L − s)/s and (s R − s)/s. For clear water, the peak value of less than 3% corresponds roughly to where the scene point is viewed frontally in the left camera. As turbidity increases, the peak value of percentage discrepancy decreases and moves closer to the equidistant point M. For distant points, the discrepancy approaches zero as these scales approach 1. i = B i /B Fig. 11(g) depicts the ratios k i = B i /B and k∞ ∞ ∞ i in order to: 1) differentiate these from the scales s i = B i /B∞ and s = B/B∞ ; 2) establish how the total backscatter field is decomposed into the left and right components; and most importantly, 3) verify that the decomposition of total backscatter into the left and right fields at finite ranges can be approximated by the same ratios at infinity: i /B∞ . k i = B i /B ≈ B∞

(30)

The approximation approaches equality with increasing turbidity and (or) range. A PPENDIX B We utilize the last result (that the range-varying ratio of the left/right backscatter field B i to the total backscatter B can be approximated by the ratio of the corresponding fields at infinity) in (6) to derive     L  L  L k − k R  (B /B∞ ) − (B R /B∞ )  B − B R ∞ ∞ . = L ≈ ρ= L L /B ) + (B R /B ) B + BR k + kR (B∞ ∞ ∞ ∞ (31) Again, this approaches equality with increasing turbidity and range:  L    L B − BR  I − I R  ∞ ∞ ∞ ∞ ρ= L = L . (32) R R B∞ + B∞ I∞ + I∞

NEGAHDARIPOUR AND SARAFRAZ: IMPROVED STEREO MATCHING IN SCATTERING MEDIA

Thus, ρ can be estimated from the left and right images acquired with no target in the FoV. ACKNOWLEDGMENT The authors are grateful to Professor Yoav Schechner from the Technion Israel Institute of Technology, and the anonymous reviewers for their invaluable technical and editorial comments that enabled us to significantly improve the original manuscript. R EFERENCES [1] C. Beall, B. J. Lawrence, V. Ila, and F. Dellaert, “3D reconstruction of underwater structures,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2010, pp. 4418–4423. [2] A. Blake, A. Zisserman, and G. Knowles, “Surface descriptions from stereo and shading,” Image Vis. Comput., vol. 3, no. 4, pp. 183–191, 1985. [3] J. F. Blinn, “Light reflection functions for simulation of clouds and dusty surfaces,” in Proc. ACM 9th Annu. Conf. Comput. Graph. Interact. Techn. SIGGRAPH, 1982, pp. 21–29. [4] N. Carlevaris-Bianco, A. Mohan, and R. M. Eustice, “Initial results in underwater single image dehazing,” in Proc. IEEE OCEANS Conf., Sep. 2010, pp. 1–8. [5] J. E. Cryer, P.-S. Tsai, and M. Shah, “Integration of shape from shading and stereo,” Pattern Recognit., vol. 28, no. 7, pp. 1033–1043, Jul. 1995. [6] G. Dudek et al., “AQUA: An amphibious autonomous robot,” Computer, vol. 40, no. 1, pp. 46–53, Jan. 2007. [7] C. R. Dyer, “Volumetric scene reconstruction from multiple views,” in Foundations of Image Understanding, L. S. Davis, Ed. Norwell, MA, USA: Kluwer, 2001, pp. 469–489. [8] R. Fattal, “Single image dehazing,” ACM Trans. Graph., vol. 27, no. 3, pp. 1–9, 2008, Art. ID 72. [9] G. L. Foresti, “Visual inspection of sea bottom structures by an autonomous underwater vehicle,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 691–705, Oct. 2001. [10] W. E. L. Grimson, “Binocular shading and visual surface reconstruction,” MIT AI Laboratory, Cambridge, MA, USA, Tech. Rep. AI-Memo-697, 1982. [11] G. R. Fournier and J. L. Forand, “Analytic phase function for ocean water,” Proc. SPIE, Ocean Opt. XII, vol. 2258, pp. 194–201, Oct. 1994. [12] G. R. Fournier and M. Jonasz, “Computer based underwater imaging analysis,” Proc. SPIE, Airborne Water Underwater Imag., vol. 3761, pp. 62–77, Oct. 1999. [13] L. G. Henyey and J. L. Greenstein, “Diffuse radiation in the galaxy,” Astrophys. J., vol. 93, no. 1, pp. 70–83, 1941. [14] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 1956–1963. [15] M. Johnson-Roberson, S. Kumar, O. Pizarro, and S. Willams, “Stereoscopic imaging for coral segmentation and classification,” in Proc. IEEE MTS/IEEE OCEANS Conf., Sep. 2006, pp. 1–6. [16] L. Nalpantidis, G. C. Sirakoulis, and A. Gasteratos, “Review of stereo vision algorithms: From software to hardware,” Int. J. Optomechatron., vol. 2, no. 4, pp. 435–462, 2008. [17] A. Leone, C. Distante, and G. Diraco, “A stereo vision framework for 3D underwater mosaicking,” in Stereo Vision, A. Bhatti, Ed. I-TECH Education Publishing, 2008. [18] B. L. McGlammery, “A computer model for underwater camera systems,” Proc. SPIE, Ocean Opt. VI, vol. 208, pp. 221–231, Mar. 1979. [19] C. D. Mobley, Light and Water: Radiative Transfer in Natural Waters. New York, NY, USA: Academic, 1994. [20] S. Negahdaripour and P. Firoozfam, “An ROV stereovision system for ship-hull inspection,” IEEE J. Ocean. Eng., vol. 31, no. 3, pp. 551–564, Jul. 2006. [21] S. Negahdaripour and H. Madjidi, “Stereovision imaging on submersible platforms for 3D mapping of benthic habitats and sea-floor structures,” IEEE J. Ocean. Eng., vol. 28, no. 4, pp. 625–650, Oct. 2003. [22] T. Nishita, Y. Miyawaki, and E. Nakamae, “A shading model for atmospheric scattering considering luminous intensity distribution of light sources,” in Proc. 14th Annu. Conf. Comput. Graph. Interact. Techn. SIGGRAPH, 1987, pp. 303–310. [23] T. J. Petzold, “Volume scattering functions for selected ocean waters,” Scripps Institution Oceanography, San Diego, CA, USA, Tech. Rep. SIO 72-78, 1972.

5755

[24] J. P. Queiroz-Neto, R. Carceroni, W. Barros, and M. Campos, “Underwater stereo,” in Proc. 7th Brazilian Symp. Comput. Graph. Image Process., Oct. 2004, pp. 170–177. [25] A. Sarafraz, S. Negahdaripour, and Y. Y. Schechner, “Enhancing images in scattering media utilizing stereovision and polarization,” in Proc. IEEE Workshop Appl. Comput. Vis. (WACV), Dec. 2009, pp. 1–8. [26] A. Sarafraz, S. Negahdaripour, and Y. Schechner, “Comparison of different methods for solving correspondence problem in underwater stereo images,” in Proc. IEEE/MTS Oceans Conf. (OCEANS), Sep. 2010, pp. 1–7. [27] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” in Proc. Comput. Soc. Conf. CVPR, Jun. 2006, pp. 519–528. [28] D. Scharstein and R. Szelisky, “A taxonomy and evaluation of dense twoframe stereo correspondence algorithms,” Int. J. Comput. Vis., vol. 47, nos. 1–3, pp. 7–42, 2001. [29] D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” in Proc. Comput. Soc. Conf. CVPR, Madison, WI, USA, Jun. 2003, pp. I-195–I-202. [30] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Shafer, “A survey of methods for volumetric scene reconstruction from photographs,” in Proc. Eurograph. Conf. Volume Graph., 2001, pp. 81–101. [31] B. Sun, R. Ramamoorthi, S. G. Narasimhan, and S. K. Nayar, “A practical analytic single scattering model for real time rendering,” ACM Trans. Graph., vol. 24, no. 3, pp. 1040–1049, Jul. 2005. [32] Y. Swirski, Y. Y. Schechner, B. Herzberg, and S. Negahdaripour, “CauStereo: Range from light in nature,” Appl. Opt., vol. 50, no. 28, pp. F89–F101, 2011. [33] R. T. Tan, “Visibility in bad weather from a single image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2008, pp. 1–8. [34] T. Treibitz and Y. Y. Schechner, “Active polarization descattering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 385–399, Mar. 2009.

Shahriar Negahdaripour received the B.S., M.S., and Ph.D. degrees in mechanical engineering from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 1979, 1980, and 1987, respectively. He was a recipient of the 1989 NSF Engineering Initiation Award, the 2009 DoD SERDP Project of the Year Award for the application of ROV-based video technology to complement coral reef resource mapping and monitoring, the 2003 Siemens Corp’s Best Paper Award at the IEEE International Conference on Advanced Video and Signal-Based Surveillance, and the 2007 Best Paper Award at the IEEE Workshop on Beyond Multiview Geometry, held in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). He served as the General Co-Chair of the IEEE CVPR’91 and the 1995 International Symposium on Computer Vision. His research interests include underwater vision, and applications of computer vision in sonar imaging.

Amin Sarafraz received the B.Sc. degree in surveying and geomatics engineering from the Amirkabir University of Technology, Tehran, Iran, in 2003, the M.Sc. degree in photogrammetry from the University of Tehran, Tehran, in 2006, and the Ph.D. degree in civil engineering from the University of Miami, Coral Gables, FL, USA, in 2012. His graduate work involved the applications of computer vision in civil engineering, including underwater imaging, image enhancement in scattering media, 3D reconstruction, and nondestructive testing. His main research interests are photogrammetry, image processing, computer vision, and pattern recognition. He is currently with the Center for Computational Science, University of Miami.

Improved stereo matching in scattering media by incorporating a backscatter cue.

In scattering media, as in underwater or haze and fog in atmosphere, image contrast deteriorates significantly due to backscatter. This adversely affe...
4MB Sizes 1 Downloads 5 Views