Skin Research and Technology 2015; 21: 319–326 Printed in Singapore  All rights reserved doi: 10.1111/srt.12195

© 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd Skin Research and Technology

Multimodal and time-lapse skin registration S. Madan1, K. J. Dana1 and G. O. Cula2 1

Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA and 2Consumer and Personal Product Division, Johnson & Johnson, Piscataway, NJ, USA

Background/purpose: Computational skin analysis is revolutionizing modern dermatology. Patterns extracted from image sequences enable algorithmic evaluation. Stacking multiple images to analyze pattern variation implicitly assumes that the images are aligned per-pixel. However, breathing and involuntary motion of the patient causes significant misalignment. Alignment algorithms designed for multimodal and time-lapse skin images can solve this problem. Sequences from multimodal imaging capture unique appearance features in each modality. Time-lapse image sequences capture skin appearance change over time. Methods: Multimodal skin images have been acquired under five different modalities: three in reflectance (visible, parallelpolarized, and cross-polarized) and two in fluorescence mode (UVA and blue light excitation). For time-lapse imagery, 39 images of acne lesions over a 3-month period have been collected. The method detects micro-level features like pores, wrinkles, and other skin texture markings in the acquired images. Images are automatically registered to subpixel accuracy.

OMPUTATIONAL SKIN analysis including pattern recognition and change detection is creating a foundation for modern quantitative dermatology. Pattern analysis can utilize multiple imaging modalities (multimodal) or imaging over time (time-lapse). Skin imaging modalities include blue fluorescence, ultraviolet fluorescence, visible light (unpolarized), and crossed/parallel polarization. For each of these five modalities, a distinct set of features is captured in the image. Stacking the images provides a five-dimensional appearance vector, but only if the images are aligned per-pixel. Because the images are high resolution (less than 1 mm per-pixel), even breathing and involuntary motions of the patient cause significant misalignment. Alignment algorithms are based on common features, so registering images from modalities with complementary instead of common features is a challenge. We present a method of high-resolu-

C

Results: The proposed registration approach precisely aligns multimodal and time-lapse images. Subsurface recovery from multimodal images has misregistration artefacts that can be eliminated using this approach. Registered time-lapse imaging captures the evolution of appearance of skin regions with time. Conclusion: Misalignment in skin imaging has significant impact on any quantitative or qualitative image evaluation. Micro-level features can be used to obtain highly accurate registration. Multimodal images can be organized with maximal overlap for successful registration. The resulting point-to-point alignment improves the quality of skin image analysis. Key words: multimodal registration – time-lapse registration – micro-level features – surface component recovery – appearance tracking

Ó 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd Accepted for publication 20 September 2014

tion multimodal skin registration that uses micro-features for alignment and a maximal overlap ordered sequence for multimodal alignment. Our results demonstrate highly accurate subpixel alignment in both multimodal and time-lapse image sequences. Interest in multimodal skin imaging is supported by observations of porphyrin in fluorescence modalities. In addition, polarized light imaging offers the key advantage that surface structures can be separated from subsurface structures (1–3). Time-lapse imaging captures the appearance of skin during the different stages of the pathology, and is used by dermatologists (4–6) to track the changes in skin appearance and evolution of disease over time. Existing methods of face registration, e.g., active appearance models (7, 8) and the Lucas– Kanade registration algorithm (9) typically rely on macro-features like eyes, nose, and chin boundary to register face images. Accurate alignment of high-resolution skin regions that

319

Madan et al.

do not have these structures is a challenge. In our approach, micro-level features like pores, wrinkles, and other skin texture markings are used to register high-resolution multimodal face images acquired in dermatology clinical studies. We show that these micro-features are sufficient to achieve high accuracy registration. In particular we register: (a) multimodal images acquired under polarized, fluorescence, and visible light within a second time interval, (b) sequence of time-lapse skin images with acne lesions acquired over a 3-month period clinical study. We have developed a publicly available software toolbox to precisely register skin images using micro-level features (link provided in final publication).

information about skin. The intensity Iv measured by the VL image sensor consists of two components, the surface reflectance and the subsurface reflectance (23). The surface component Is is the part of the incident light reflected off the skin surface, and the subsurface component Id is the part of the incident light traveling through the stratum corneum and the epidermis before it exits the skin. The VL sensor measures the sum of the surface and the subsurface components,

Overview of registration techniques Detailed surveys discuss various registration methods can be found in Zitova and Flusser (10) and Szeliski (11). Feature-based methods generally extract features and match points to estimate a transformation between the two images (12–14). Our approach is also feature based, but we use micro-features to achieve subpixel, pore-level alignment. Mutual information-based registration methods (15–17) register images by maximizing the mutual information between them. The registration approach assumes that the intensities in the two images being registered are statistically dependent. Deformable modelbased methods have the ability to handle nonrigid motion. For example in Rueckert et al. (18), the authors use cubic B-splines to register breast MRI images and in Mattes et al. (19), the authors use cubic B-splines to register PET and CT chest images. Active appearance models (20–22) are deformable models used in computer vision for registering and tracking faces. However, active appearance models rely on macro-level features like lips, face boundary, nose, and eyes instead of micro-level features.

1 Ip ¼ Iv þ Id : 2

Materials and Methods Imaging modalities We have acquired skin images under five different modalities. The reflectance modes are visible light unpolarized (VL), parallel-polarized (PL), and cross-polarized (CP). The fluorescence modes are UVA and blue light excitation (BL). The different modalities capture different

320

Iv ¼ Is þ Id :

ð1Þ

Polarized images are acquired by placing a linear polarizer in front of the light source and the camera (24–26). For the PL mode, the polarizer is in front of the sensor parallel to the polarizer in front of the light source. In this mode, the intensity Ip measured by the sensor is, ð2Þ

The PL image enhances the surface component and surface features like raised borders of lesions, pore structure, and wrinkles (27). When the polarizer in front of the sensor is perpendicular to the polarizer in front of the light source for the CP image, the entire surface component gets blocked, and the sensor measures only half the subsurface component. The intensity Ix measured by the sensor is, 1 ð3Þ 1x ¼ Id : 2 The CP image brings out subsurface skin features like color variation due to melanin erythema (28). Fluorescence images are obtained with either ultraviolet (UVA) excitation at 360–400 nm or blue light excitation at 400–460 nm. Under UVA excitation (29, 30), the collagen cross-links fluoresce and photo-damaged regions appear as dark spots produced by absorbance of epidermal pigmentation. Collagen cross-links increase with exposure and age. Under blue light excitation, elastin cross-links fluoresce. Blue light excitation enables detection of both melanin pigmentation and superficial vasculature, and sebum producing pores appear as yellow-white spots.

Imaging setup We acquire multimodal and time-lapse images using a custom built equipment, as shown in

Multimodal and time-lapse skin registration

Fig. 1. Imaging equipment to acquire multimodal images.

Fig. 1. The imaging equipment consists of a light source, a sensor, and polarizer filters placed in front of the light source and the sensor. In each imaging session, multimodal images of the subject are acquired within a 1 s time interval. During the acquisition, the operation of the sensor and the orientation of the polarizers are synchronized using a computer. Figure 2 shows an example image from each modality. The eye region and all face images in the paper have been blacked out to preserve the identity of the subject. Figure 2a–c shows VP, (a)

PL, and CP images. Figure 2d and e shows UVA and BL images. Although a chinrest is used and the images are obtained in rapid succession, misregistration of the high-resolution images is evident. For time-lapse imaging, we have imaged a subject with acne lesions at 39 different time points over a 3-month period. Figure 3 shows example images from the time-lapse set. The time-lapse images are misregistered, which makes it difficult to track the evolution of acne lesions over time, or perform quantitative processing on them. Precise point-to-point registration of the temporal images allows visualization of the evolution of acne lesions over time.

Multimodal registration method We use micro-level features to register multimodal images acquired in a single imaging session during a clinical study. By registering a face patch, the surface can be approximated as planar. Figure 4 shows image patches from multimodal images that are used for registration

(b)

(d)

(c)

(e)

Fig. 2. Multimodal skin images. Reflectance images: unpolarized (a), parallel-polarized (b), and cross-polarized (c). Fluorescence images: UVA (d) and blue light (e) excitation.

321

Madan et al.

Fig. 3. Sample images of a subject with acne lesions. The subject was imaged at 39 different time points over a 3-month period. Three time samples are shown.

and do not contain macro-structures like eyes, nose, and mouth. In traditional computer vision algorithms, such regions are treated as featureless regions. However, the patch images are abundant in micro-level features like pores, wrinkles, and skin texture, and we register the patch images by extracting and matching the micro-level features. By applying a local intensity transform for contrast enhancement, microfeatures can be detected with the scale invariant feature transform (SIFT) (31) interest point detector. We increase the contrast of the patch

images by stretching the intensities of the images in gray scale such that 1% of the intensities is saturated. Features are clearly brought out in the contrast enhanced patch images. We use a common computer vision workflow for matching SIFT feature points to estimate a transformation between the patch images for alignment. The transformation can be modeled as quadratic, affine, or homography, i.e., a 3 9 3 invertible matrix. According to theory of multiview geometry, when a planar surface is imaged from two different viewpoints (32), the

Fig. 4. Multimodal alignment overview. The key innovation is a procedure to use micro-features for alignment. By ordering the sequence of multimodal images as illustrated, alignment pairs have maximal overlap in features. A concatenation of the alignment parameters enables alignment of the entire set.

322

Multimodal and time-lapse skin registration

transformation between the two images can be modeled using a homography. By assuming planar patches, the transformation between patch points is a homography. In particular, there exists a matrix H such that,     x x1 ¼H 2 ð4Þ S 1 1

sufficiently large set of common features. Registration of the entire multimodal set is accomplished in a pairwise manner by aligning: (a) PL and VL images, (b) VL and CP images, (c) UVA fluorescence and CP images, and (d) blue excited fluorescence and UVA fluorescence images (BL and UVA).

where x1 = [x1, y1] is the pixel location of the point in one image, x2 = [x2, y2] the pixel location in the other image, and s a scale factor. Equation 4 can be rewritten as, 2 3  T T  h1 T x2 1 0 0  x1 x2  x1 4 5 h2 ¼ 0 ð5Þ 0T 0 xT2 1  y1 xT2  y1 h3

Results

where h1, h2, and h3, are the first, second, and third rows of the homography matrix . Each pair of point correspondence gives two constraints on the homography matrix. A total of n point correspondences give 2n constraints which can be written as, Ah ¼ 0

ð6Þ

where h ¼ ½hT1 ; hT2 ; hT3 T , and A is a 2n 9 9 matrix representing constraints from the point correspondences. Estimating the homography by directly solving Eq. (6) in the presence of outliers results in an incorrect estimation. In order to estimate the homography, outliers are removed using the RANSAC (33) algorithm. A typical example yields a total of approximately 100 feature points are retained from an original set of 1500 feature points. Once all the outliers are removed, the homography in Eq. (6) is re-estimated. The estimate is further refined using the Levenberg–Marquardt optimization algorithm (34). The final homography is used to warp the input patch image onto the reference patch image using bilinear interpolation. For registering multimodal images, pairs of images are registered and then homographies are concatenated to register all five modes to a single coordinate frame as shown in Fig. 4. The pairs for registration are chosen according to a maximal overlap ordering. A maximal overlap ordered sequence that enables registration is identified. Specifically, we have determined by observation that a specific sequence of image pairs enables registration because there exists a

Multimodal registration Figure 4 shows the images of a 1000 9 850 skin region skin patch acquired under PL, visual, CP, UVA excitation, and blue excitation modalities. The skin appearance is different under different modalities. The following pairs are registered: VL and PL, VL and CP, UVA and CP, BL and UVA. Figure 4 shows the pixel locations of corresponding feature points in the misregistered pairs plotted together as white and black points. Note that the pixel locations do not overlap indicating misregistration. The root mean square (RMS) values of the difference between the pixel locations in the misregistered images are 12.20, 9.54, 24.55, and 14.39 respectively. The images being registered are very high resolution; therefore, a small change in the location of a scene point results in a large difference between the pixel corresponding to the new scene location and the pixel corresponding to the original scene location. Figure 4 (bottom row) shows the pixel locations of the corresponding feature points in the registered images. Note that the pixel locations overlap indicating precise registration. The RMS values of the difference between the pixel locations in the registered images are 0.80, 0.91, 1.46, and 0.90.

Time-lapse registration The time-lapse set consisting of 39 misregistered images of forehead acne lesions are registered using the micro-level feature-based registration approach. In the first step, the images are globally registered using the Lucas– Kanade algorithm. Figure 5 shows the input images and the registration results after global registration and alignment with micro-features remove the residual misregistration between the globally registered images. The RMS value of the difference between the pixel locations is 3.198 for the globally registered images, and

323

Madan et al. (a)

(b)

(c)

(d)

Fig. 5. (a) Two of the 39 skin images with acne lesions acquired in the 3-month clinical study. (b) Pixel locations of corresponding feature points in the input images. Black/white indicates the two images and the feature point are clearly misaligned. (c) Pixel locations of corresponding feature points in the globally registered images. (d) Pixel locations of corresponding feature points in the final precisely registered forehead regions. (a)

(b)

(c)

(d)

(e)

(f)

Fig. 6. Surface component recovery from images that are parallel-polarized (a) and cross-polarized (b). The results using unregistered input images (c, e) and registered input images (d, e).

324

Multimodal and time-lapse skin registration

0.861 for the final micro-level feature-based registration approach. Videos showing the entire set of 39 misregistered and the registered time-lapse images are available and link will be provided in final version.

Recovering surface reflectance The surface reflectance component can be obtained by subtracting the CP and PL images if they are registered. Figure 6a and b shows a pair of polarized images (crossed and parallel) acquired in succession. Figure 6c shows the surface component recovered using the unregistered input images. Figure 6d shows the surface reflectance using the registered images. Misregistration artefacts are clearly seen in Fig. 6d in the form of dark spots throughout the image. Artefacts are also seen in Fig. 6d around the bottom tip of the nose region. The close-up

References 1. Bae EJ, Seo SH, Kye YC, Ahn HH. A quantitative assessment of the human skin surface using polarized light digital photography and its dermatologic significance. Skin Res Technol 2010; 16: 270–274. 2. Muccini A, Kollias N, Phillips SB, Anderson RR, Sober AJ, Stiller MJ, Drake LA. Polarized light photography in the evaluation of photoaging. J Am Acad Dermatol 1995; 33: 765–769. 3. Kollias N, Stamatas GN. Optical non-invasive approaches for diagnosis of skin diseases. J Invest Dermatol 2002; 7: 64–75. 4. Hongcharu W, Taylor CR, Chang Y, Aghassi D, Suthamjariya K, Anderson RR. Topical ALA-photodynamic therapy for the treatment of acne vulgaris. J Invest Dermatol 2000; 115: 183–192. 5. Tanzi EL, Alster TS. Comparison of a 1450-nm diode laser and a 1320nm Nd:YAG laser in the treatment of atrophic facial scars: a prospective clinical and histologic study. Dermatol Surg 2004; 30: 152–157. 6. Rizova E, Kligman A. New photographic techniques for clinical evaluation of acne. J Eur Acad Dermatol Venereol 2001; 15: 13–18. 7. Cootes T, Edwards G, Taylor C. Active appearance models. IEEE Trans Pattern Anal Mach Intell (PAMI) 2001; 23: 681–685.

view of the nose region within the rectangular boundary is shown in Fig. 6e (unregistered input images) and Fig. 6f (registered input images). The surface component can be accurately recovered by registering the input polarized images up to the resolution of pore-level features using the micro-level feature-based registration algorithm before subtraction.

Discussion The results show effective skin registration using a computational approach designed for generic computer vision tasks. The innovative contribution is the identification of a pairwise approach for registering multimodal images that have no common feature set. In addition, we have shown that micro-features such as pores in reflectance images and porphyrin in fluorescence images have sufficient visual structure for feature-based registration.

8. Matthews I, Baker S. Active appearance models revisited. Int J Comput Vis 2004; 60: 135–164. 9. Baker S, Matthews I. Lucas–Kanade 20 years on: a unifying framework. Int J Comput Vis 2004; 56: 221–255. 10. Zitova B, Flusser J. Image registration methods: a survey. Image Vis Comput 2003; 21: 977–1000. 11. Szeliski R. Image alignment and stitching: a tutorial. Found Trends Comput Graph Vis 2006; 2: 1–104. 12. Capel D, Zisserman A. Automated mosaicing with super-resolution zoom. IEEE conference on computer vision and pattern recognition, California, 1998, 885–891. 13. Ni D, Qu Y, Yang X, Chui YP, Wong TT, Ho SS, Heng PA. Volumetric ultrasound panorama based on 3d sift. International conference on medical image computing and computer-assisted intervention, New York, 2008, 52–60. 14. Yang G, Stewart CV, Sofka M, Tsai CL. Registration of challenging image pairs: initialization, estimation, and decision. IEEE Trans Pattern Anal Mach Intell (PAMI) 2007; 29: 1973–1989. 15. Pluim JPW, Maintz JBA, Viergever MA. Mutual information based registration of medical images: a survey. IEEE Trans Med Imaging 2003; 22: 986–1004. 16. Maes F, Collignon A, Vandermeulen D, Marchal G, Suetens P. Mul-

17.

18.

19.

20.

21.

22.

23.

timodality image registration by maximization of mutual information. IEEE Trans Med Imag 1997; 16: 187–198. Butz T, Thiran JP. Affine registration with feature space mutual information. International conference on medical image computing and computer-assisted intervention, Utrecht, 2001, 549–556. Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ. Nonrigid registration using freeform deformations: application to breast MR images. IEEE Trans Med Imaging 1999; 18: 712–721. Mattes D, Haynor DR, Vesselle H, Lewellen TK, Eubank W. PET-CT image registration in the chest using free-form deformations. IEEE Trans Med Imaging 2003; 22: 120–128. Saragih JM, Lucey S, Cohn JF. Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis 2011; 91: 200–215. Butz T, Thiran JP. Multi-view face alignment using direct appearance models. International conference on automatic face and gesture recognition, Washington, DC, 2002, 324–329. Xiao J, Moriyama T, Kanade T, Cohn J. Robust full-motion recovery of head by dynamic templates and re-registration techniques. Int J Imag Syst Technol 2003; 13: 85–94. Nayar S, Fang X, Boult T. Removal of specularities using color and

325

Madan et al.

24. 25.

26.

27.

polarization. IEEE conference on computer vision and pattern recognition, New York, 1993, 583–590. Anderson R. Polarized light examination and photography of the skin. Arch Dermatol 1991; 127: 1000–1005. Studinski RCN, Vitkin IA. Methodology for examining polarized light interactions with tissues and tissue like media in the exact backscattering direction. J Biomed Opt 2000; 5: 330–337. Jacques SL, Ramella-Roman JC, Lee K. Imaging superficial tissues with polarized light. Lasers Surg Med 2000; 26: 119–129. Kollias N. Polarized light photography of human skin, Bioengineering of the skin, CRC Press, New York, 1997; 95–104.

326

28. Kollias N, Stamatas GN. Optical non-invasive approaches for diagnosis of skin diseases. J Invest Dermatol 2002; 17: 64–75. 29. Bae Y, Nelson JS, Jung B. Multimodal facial color imaging modality for objective analysis of skin lesions. J Biomed Opt 2008; 13: 064007. 30. Han B, Jung B, Nelson JS, Choi EH. Analysis of facial sebum distribution using a digital fluorescent imaging system. J Biomed Opt 2007; 12: 014006. 31. Lowe DG. Distinctive image features from scale invariant keypoints. Int J Comput Vis 2004; 60: 91–100. 32. Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge: Cambridge University Press, 2000.

33. Fischler MA, Bolles RC. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1981; 24: 381–395. 34. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipes in C++. Cambridge: Cambridge University Press, 2002. Address: Siddharth Madan Department of Electrical and Computer Engineering, Rutgers University, Piscataway NJ, USA Tel: +1 848 445 5253 Fax:+1 732 445 2820 e-mail: [email protected]

Multimodal and time-lapse skin registration.

Computational skin analysis is revolutionizing modern dermatology. Patterns extracted from image sequences enable algorithmic evaluation. Stacking mul...
2MB Sizes 1 Downloads 6 Views