This article was downloaded by: [Dicle University] On: 07 November 2014, At: 11:38 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Ergonomics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/terg20

Child body shape measurement using depth cameras and a statistical body shape model a

b

b

a

Byoung-Keon Park , Julie C. Lumeng , Carey N. Lumeng , Sheila M. Ebert & Matthew P. a

Reed a

Biosciences Group, University of Michigan Transportation Research Institute, Ann Arbor, MI, USA b

University of Michigan Medical School, Ann Arbor, MI, USA Published online: 17 Oct 2014.

To cite this article: Byoung-Keon Park, Julie C. Lumeng, Carey N. Lumeng, Sheila M. Ebert & Matthew P. Reed (2014): Child body shape measurement using depth cameras and a statistical body shape model, Ergonomics, DOI: 10.1080/00140139.2014.965754 To link to this article: http://dx.doi.org/10.1080/00140139.2014.965754

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Ergonomics, 2014 http://dx.doi.org/10.1080/00140139.2014.965754

Child body shape measurement using depth cameras and a statistical body shape model Byoung-Keon Parka, Julie C. Lumengb, Carey N. Lumengb, Sheila M. Eberta and Matthew P. Reeda* a

Biosciences Group, University of Michigan Transportation Research Institute, Ann Arbor, MI, USA; bUniversity of Michigan Medical School, Ann Arbor, MI, USA

Downloaded by [Dicle University] at 11:38 07 November 2014

(Received 6 May 2014; accepted 9 September 2014) We present a new method for rapidly measuring child body shapes from noisy, incomplete data captured from low-cost depth cameras. This method fits the data using a statistical body shape model (SBSM) to find a complete avatar in the realistic body shape space. The method also predicts a set of standard anthropometric data for a specific subject without measuring dimensions directly from the fitted model. Since the SBSM was developed using principal component (PC) analysis, we formulate an optimisation problem to fit the model in which the degrees of freedom are defined in PC-score space. The mean unsigned distance between the fitted-model based on depth-camera data and the high-resolution laser scan data was 9.4 mm with a standard deviation (SD) of 5.1 mm. For the torso, the mean distance was 2.9 mm (SD 1.4 mm). The correlations between standard anthropometric dimensions predicted by the SBSM and manually measured dimensions exceeded 0.9. Practitioner Summary: Rapid and robust body shape measurement is beneficial for tracking child body shapes and anthropometric changes. A custom avatar generated by rapidly fitting a statistical body shape model to noisy scan data showed the potential for good accuracy in measuring child body shape. Keywords: child body shape measurement; statistical body shape model; depth camera; surface fitting; anthropometry

1.

Introduction

Custom avatars accounting for realistic individual information of a specific subject such as body shape, anthropometric data and joint locations are very beneficial in numerous ergonomic applications (Bandouch, Engstler, and Beetz 2008; Gragg et al. 2013; Reed and Parkinson 2008; Jung, Kwon, and You 2009; Schmidt et al. 2014). Whole-body scanners are now widely used for three-dimensional surface anthropometry (Allen, Curless, and Popovic´ 2003; Lu and Wang 2008; Stancˇic´, Music´, and Zanchi 2013; Yu and Xu 2010). High-resolution (millimetre-scale) scans can be obtained using a range of technologies, including camera-based laser systems, structured light and texture-based reconstruction (Bradley et al. 2010; Daanen and Ter Haar 2013; Geng 2011). Most such systems used for whole-body measurement use multiple sensors to capture a large percentage of the surface of a standing subject in at most a few seconds. Advances in depth-camera technology overcoming the drawbacks of conventional laser and structured-light systems have been of particular interest in the body scanning area. Tong et al. (2012) used three low-price depth cameras to scan the whole body of a subject and then deformed a template model to reconstruct personalised avatars from the data in a few minutes. Weiss, Hirshberg, and Black (2011) used the Shape Completion and Animation of People (SCAPE) model (Anguelov et al. 2005) to obtain a custom avatar by fitting data from multiple poses measured with a single depth camera. Yu and Xu (2010) presented a sub-pixel, dense stereo matching algorithm to generate realistically captured smooth and natural whole body shapes using a portable stereo vision system including four projectors and eight depth cameras. However, common issues still remain in these low-cost scanning systems, such as relatively long computational time to reconstruct a complete avatar from the noisy scan data, and the unreliability of reconstructed avatars inherited from the spatial distortions of captured depth data. The objective of this work is to demonstrate and evaluate a system to measure child body shapes using scan data from two low-cost depth cameras. A portable scanning station was constructed with Microsoft KinectTM for Windows sensors purchased in 2013 (the first commercial version of the hardware and software). Custom software applies unique calibration and image registration algorithms to each camera’s data and builds an avatar by fitting a statistical body shape model (SBSM) to the scan data. The output is a watertight avatar with associated standard anthropometric data of a specific child.

*Corresponding author. Email: [email protected] q 2014 Taylor & Francis

2

B.-K. Park et al.

Downloaded by [Dicle University] at 11:38 07 November 2014

2. Method 2.1 Scanning hardware and software Figure 1(a) shows the portable scanning system equipped with two Kinect sensors as depth cameras on pedestals behind and in front of the standing subject. The scanning station has a simple structure including a platform, two reference poles with handholds and two stands for the cameras. Handholds attached to the poles help to stabilise the subjects and the poles are used for registering the multiple scan images of the subject after the scan. The entire apparatus can be folded and stored in the platform. Figure 1(b) shows a subject in the system, with one sensor recording data from the front and one from the back of the subject. The major design challenge associated with the scanning station is due to the limitations of the Kinect sensor. For example, the vertical field of view of the sensor is limited to 438, so the sensor needs to be placed relatively far from the subject to capture the whole body. However, the depth accuracy and precision of scan data are degraded as the distance increases (Andersen et al. 2012). Hence, the current system uses the integrated motor to drive the sensor to three angles with respect to horizontal, so that the whole body can be scanned while the sensors are located near the subject. We developed a software application based on the Microsoft Kinect SDK to control the sensors and to obtain threedimensional point cloud data. To solve the radial and depth distortions caused by capturing images through an optical lens, the sensors were first calibrated using Kinect camera parameters (Smisek, Jancosek, and Pajdla 2013). Adjustments for depth distortion were made by reference to calibration scans of vertical planes in a range of locations throughout the scanning volume. This calibration was performed once for each sensor. To reduce the time-varying noise and holes in the scan, a total of 10 images of each scene were captured and the depth data at each corresponding pixel were averaged (neglecting zero data). Figure 2(a) depicts a comparison of the images before and after applying the averaging algorithm. Calibrated and denoised images were stored from three angles with each of the sensors, so that a total of six images were obtained. The prescribed dimensions of the scanning station such as the distance between the sensors and the angles between the scenes were used to roughly locate the images at appropriate positions and orientations. The geometric features of the reference poles were also used to align the images in this step. The iterative closest point (ICP) method was finally applied to each overlapped region between the images to complete the image alignment, and the final whole body data were obtained by merging the aligned data (Figure 2(b)). 2.2 Rapid fitting SBSM to scan For this study, we employed an SBSM based on laser scan data gathered from children (Reed 2012). Standing scans of 140 children, with stature range about 100 – 160 cm and body mass index (BMI) 12– 27 kg/m2, were obtained using a Vitus XXL scanner (Human Solutions). The SBSM was developed using principal component analysis (PCA) technique based on an

Figure 1. Kinect scanning station design: (a) station set up for scanning with two reference poles in each side and a platform (b) a subject in the system, with one sensor recording data from the front and one from the back of the subject.

Downloaded by [Dicle University] at 11:38 07 November 2014

Ergonomics

3

Figure 2. Post-processing steps to obtain a whole-body data of a subject: (a) noise and hole removal step and (b) image registration step. (a) Input depth data (left) and averaged depth data neglecting zero-valued points (right). (b) Raw capture images (left) and aligned images (right).

adaptation of the Allen, Curless, and Popovic´ (2003) method presented by Reed and Parkinson (2008). PCA is a widely used tool to express the data on an orthogonal basis that can be more readily analysed, and to achieve data compression (Jolliffe 2002). A set of anthropometric data such as stature, weight, BMI and waist circumference was included in the SBSM as well as the coordinates of body shape points. For the current work, we retained the first 40 principal components (PCs), which are enough to account for over 99% of the variance in the mesh coordinates. One core property of the PCA is that each PC score affects each data point linearly, meaning that the PCA is such that each increment in a PC score moves each vertex by a certain vector. Thus, by inverting this relationship between PC scores and vertices, we can find a set of PC scores that displaces the model vertices by a desired amount. Based on this concept, we compute a PC-sensitivity matrix and use the Moore– Penrose pseudoinverse to find a set of PC scores meeting the desired criteria. The overall procedure of the fitting algorithm is depicted in Figure 3. The first step is to build a sensitivity matrix M of the first n PC scores on every sampled vertex of SBSM. To obtain M we proceed as follows. Let P ¼ {p1, p2, . . . , pn} and V ¼ {v1x, v1y, v1z, v2x, . . . , vmz} be a set of the first n PC scores and a set of m sampled vertex positions, respectively. We first initialise P with zero values. A set of initial position V0 of the sampled vertices can be obtained by applying this initialised P to SBSM. We then calculate V1 by applying P1, which has the value of 1 in the first position as P1 ¼ {1, 0, . . . , 0}. A set of displacements V1 – V0 is stored in the first three rows in M. In the same manner, the second set of displacements V2 – V0 is also stored in the next rows using P2 ¼ {0, 1, 0, . . . 0}. By repeating this to the last PC score in P, we can obtain n £ 3m matrix M. Thus an equation to calculate a set of vertex displacements Vi by certain PC scores Pi is given by Vi ¼ MPi. The next step is to find a set of desired displacement vectors D to fit SBSM to target scan data. We first align the model position and orientation to the scan. In order to align the model, we use synthetic landmarks such as the top-most point on the head, acromion, hips and ankle points (Figure 4). Each landmark is automatically found from local geometric features (Lu and Wang 2008; Zhong and Xu 2006). We then match the height difference between the model and the data by adjusting the first PC score, which is closely related to stature in this data-set. To determine D for the current ith iteration, a kd-tree for the Kinect scan data is built to find a set of the nearest points V *i in the scan data to the Vi vertices. Once V *i was found, Di can be simply defined as V *i 2 V i (Figure 5). Since our problem is a standard linear least-square problem and the PC scores are independent, we use a pseudoinverse matrix M þ to obtain Popt, where Popt is the minimum-norm set of PC scores yielding the desired set of vertex displacements. Let M ¼ USV T be the singular value decomposition of M. The pseudoinverse of M is defined as the matrix M þ ¼ VSþU T,

Downloaded by [Dicle University] at 11:38 07 November 2014

4

Figure 3.

B.-K. Park et al.

Schematic of fitting process.

where Sþ is the matrix with the inverses of the non-zero elements of S (and leaving the zero elements unchanged). Then, the minimum-norm solution to the least-squares problem is given by Popt ¼ M þDi. The final step is to apply Popt to SBSM and do iterations of the previous steps as shown in Figure 3 until the desired stopping conditions are met. We set two kinds of stopping conditions: the sum of distances and maximum number of iterations. Note that the sum of distances between the newly morphed model and the target points is given by 2 SjMPopt 2 V *i j . As a consequence, the coordinates and a set of standard anthropometric data of the optimum body shape can be obtained from the SBSM with Popt. 3. Results Thirty-five children with a wide range of body sizes who were not included in the SBSM database were scanned in both the Kinect system and the VITUS XXL high-resolution laser scanner (approximately 300 k points). The laser scans were taken

Figure 4.

Synthetic landmarks over the body shape to initially align SBSM to data.

Ergonomics

Downloaded by [Dicle University] at 11:38 07 November 2014

Figure 5.

5

Sampled SBSM vertices and vectors to the nearest target points.

immediately after the Kinect scans, but the subject was required to walk about 3 m between the two scanners. Some difference in the postures was expected as a result. The subjects were aged 3 –12 years old, with body weight of 12.7– 65.6 kg and stature of 879– 1724 mm. They wore tight-fitting swimwear and a swim cap. During the scan, the subjects were asked to stand still in a specific posture with the arms abducted from the torso about 308, and the legs slightly spread. The scanning was repeated three times with each scanning system. The estimated scanning and post-processing time of the new system with two Kinect sensors were about 9 s and 3 s, respectively, on a typical laptop computer having Intelw Coreei5 2.5 GHz CPU and 8 GB DDR3 RAM memory. Figure 6(a) compares sample scan data from both the laser and Kinect systems. Although the side parts hidden from the view of the sensors were not scanned, the SBSM fitting automatically fills holes and smooths the data. Figure 6(b) shows the

Figure 6. (a) Sample scans of a specific subject: the left side is the scan from a Vitus laser scanner, the centre image is the Kinect scan result and the right side is the overlapped image of the two. (b) Example scans of a subject. A Kinect scan, the fitted SBSM to the Kinect scan and a laser scan data, respectively.

6

B.-K. Park et al.

Downloaded by [Dicle University] at 11:38 07 November 2014

results of fitting the Kinect scan from one subject alongside the laser scan of the same subject. The most apparent visual difference with the laser scan is that the fitted figure is smoother and the idiosyncrasies of the face are eliminated, effectively producing an anonymised result. The avatars obtained by fitting the Kinect data were compared with the Vitus laser scans of the same subjects. The mean modelling time was 1.7 s after 10 iterations on the same computing machine of the scanning system. The averaged meansquare-error between the fitted SBSM and the scanned data was 69.7 ^ 25.6. In Figure 7, avatars fitted to sampled Kinect scans were compared to Vitus scans directly (left) and to Vitus-fitted avatars (centre). The right side (grey image) is the Vitus scan of the same subject. The disparities are coded with the standard cold-to-hot colour mapping that corresponds to 0 –50 mm. The error is the mean distance between the fitted SBSM vertices and the nearest points in the comparison data. The disparities of the two types of comparisons averaged 9.4 mm (SD 5.1 mm) for ‘Kinect-fitted vs Vitus scans’, and 8.7 mm (SD 4.6 mm) for ‘Kinect-fitted vs Vitus-fitted’. Since almost all of the maximum disparities occurred in extremity regions due to the differences of the subject postures between the scans, only torso parts were compared to remove most

Figure 7. Sampled fitting results on child subjects. ID represents gender-age-stature (mm) of each subject. Left side: Kinect-fitted model versus Vitus laser scan; centre: Kinect-fitted versus Vitus-fitted models, and right side: Vitus scan data. Comparison results are coloured according to the absolute distances in mm (blue for 0 mm – red for 50 mm; colours are shown in the online version of this article). In the mean error rows, absolute distances on average are stated (numbers in the brackets are the mean errors of ‘Kinect-fitted vs Vitus-fitted’).

Ergonomics

7

posture effects in Figure 8. After ICP alignment on the torso vertices, the mean error was 2.9 mm (SD 1.4 mm) for ‘Kinectfitted vs Vitus scans’, and 2.5 mm (SD 1.0 mm) for ‘Kinect-fitted vs Vitus-fitted’. Standard anthropometric values predicted by the SBSM were compared with manually measured actual data. Figure 9 compares measured and predicted values for four variables amongst the standard variables, demonstrating correlation and consistency of this approach. Approximated trend lines showed good correlations across the variables with the mean slope of 0.95. Stature was the one that has the best predictive power among the variables. This variable showed 0.96 of R 2 and 22.4 mm of root-mean-square-error (RMSE), while the predicted statures from the laser scanner had 0.98 of R 2 and 17.8 mm of RMSE. The other predictors also showed good correlations compared to traditional measurement data. The mean R 2 across the predictors was 0.92, and the RMSEs were 2.73 kg for body weight, 0.89 kg/m2 for BMI and 25.3 mm for waist circumference.

Downloaded by [Dicle University] at 11:38 07 November 2014

4. Discussion This paper demonstrated a rapid method for generating a smooth, watertight, realistic avatar from depth-camera data to measure child body shapes. The method has similarities to the Weiss, Hirshberg, and Black (2011) approach, but by

Figure 8. Sampled fitting results on child subjects. Only torso parts (blue shaded; colours are shown in the online version of this article) are compared between Kinect-fitted models, Vitus-fitted models and Vitus data.

Downloaded by [Dicle University] at 11:38 07 November 2014

8

B.-K. Park et al.

Figure 9. Comparison between actual anthropometric measurements of subjects and predicted data from fitted SBSMs to scans (blue: Kinect scans and red: Vitus XXL laser scans; colours are shown in the online version of this article). Each dot represents a subject and the error-bars represent RMS error to the actual data.

imposing minimal constraints on subject posture and using two sensors we avoid many of the challenges faced in the earlier work. Our demonstration also specifically targets measurements of children. The evaluation results on our approach demonstrate the potential for good accuracy without having complete surface data. Although various techniques have been proposed to generate a custom avatar by fitting a template model in order to overcome the drawbacks of scan data, most approaches fill across holes and reduce noise without ensuring a realistic body shape. In contrast, our method, similar to the SCAPE method used by Weiss, Hirshberg, and Black (2011), generates a custom avatar using an SBSM that guarantees realistic body shape in areas with missing data. We used only two cameras and require only that our surface data not be biased, e.g. depth points systematically closer or farther from the sensor than actual. The approach is robust to noise and local errors because the SBSM does a global fit. This is analogous to spatial filtering of noise, but the filtering is based on actual human shapes rather than a simple measure of local noise. An important limitation of the current system, which is shared by all similar statistical methods, is that errors will be larger for subjects with unusual body shapes, e.g. the avatar of a subject with a serious scoliosis cannot be accurately obtained, since those geometric properties were not included in our SBSM. We also have yet to deal thoroughly with various postures. The solution to both is to incorporate data from a wider range of postures and from people with a wider range of body shapes. Funding This research was funded in part by the MCubed program at the University of Michigan. The child SBSM was developed with funding from the National Highway Traffic Safety Administration.

References Allen, B., B. Curless, and Z. Popovic´. 2003. “The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans.” ACM Transactions on Graphics 22: 587–594.

Downloaded by [Dicle University] at 11:38 07 November 2014

Ergonomics

9

Andersen, M. R., T. Jensen, P. Lisouski, A. K. Mortensen, M. K. Hansen, T. Gregersen, and P. Ahrendt. 2012. “Kinect Depth Sensor Evaluation for Computer Vision Applications.” Electrical and Computer Engineering – Technical report ECE-TR-6. Department of Engineering, Aarhus University, Denmark, 37 pp. Anguelov, D., P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. 2005. “Scape: Shape Completion and Animation of People.” ACM Transactions on Graphics 24: 408– 416. Bandouch, J., F. Engstler, and M. Beetz. 2008. “Accurate Human Motion Capture Using an Ergonomics-Based Anthropometric Human Model.” Articulated Motion and Deformable Objects, 248– 258. Berlin Heidelberg: Springer. Bradley, D., W. Heidrich, T. Popa, and A. Sheffer. 2010. “High Resolution Passive Facial Performance Capture.” ACM Transactions on Graphics 29 (4): 1 –10. Daanen, H. A. M., and F. B. Ter Haar. 2013. “3D Whole Body Scanners Revisited.” Displays 34 (4): 270–275. Geng, J. 2011. “Structured-Light 3D Surface Imaging: A Tutorial.” Advances in Optics and Photonics 3 (2): 128 –160. Gragg, J., J. J. Yang, A. Cloutier, and E. Pena Pitarch. 2013. “Effect of Human Link Length Determination on Posture Reconstruction.” Applied Ergonomics 44 (1): 93– 100. Jolliffe, I. 2002. Principal Component Analysis. 2 ed. Springer Series in Statistics. New York: Springer Verlag. Jung, K., O. Kwon, and H. You. 2009. “Development of a Digital Human Model Generation Method for Ergonomic Design in Virtual Environment.” International Journal of Industrial Ergonomics 39 (5): 744– 748. Lu, J. M., and M. J. J. Wang. 2008. “Automated Anthropometric Data Collection Using 3D Whole Body Scanners.” Expert Systems with Applications 35 (1): 407– 414. Reed, M. P. 2012. “A Pilot Study of Three-Dimensional Child Anthropometry for Vehicle Safety Analysis.” Proceedings of the 2012 Human Factors and Ergonomics Society Annual Meeting. Santa Monica, CA: HFES. Reed, M. P., and M. B. Parkinson. 2008 (January). “Modeling variability in torso shape for chair and seat design.” ASME 2008 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 561– 569. American Society of Mechanical Engineers. Schmidt, S., M. Amereller, M. Franz, R. Kaiser, and A. Schwirtz. 2014. “A Literature Review on Optimum and Preferred Joint Angles in Automotive Sitting Posture.” Applied Ergonomics 45 (2): 247– 260. Smisek, J., M. Jancosek, and T. Pajdla. 2013. “3D with Kinect.” Consumer Depth Cameras for Computer Vision, 3 – 25. London: Springer. Stancˇic´, I., J. Music´, and V. Zanchi. 2013. “Improved Structured Light 3D Scanner with Application to Anthropometric Parameter Estimation.” Measurement 46 (1): 716– 726. Tong, J., J. Zhou, L. Liu, Z. Pan, and H. Yan. 2012. “Scanning 3D Full Human Bodies Using Kinects.” Visualization and Computer Graphics, IEEE Transactions 18 (4): 643–650. Weiss, A., D. Hirshberg, and M. J. Black. 2011. “Home 3D Body Scans from Noisy Image and Range Data.” 2011 IEEE International Conference on Computer Vision (ICCV) 1951– 1958. Yu, W., and B. Xu. 2010. “A Portable Stereo Vision System for Whole Body Surface Imaging.” Image and Vision Computing 28 (4): 605– 613. Zhong, Y., and B. Xu. 2006. “Automatic Segmenting and Measurement on Scanned Human Body.” International Journal of Clothing Science and Technology 18 (1): 19 – 30.

Child body shape measurement using depth cameras and a statistical body shape model.

We present a new method for rapidly measuring child body shapes from noisy, incomplete data captured from low-cost depth cameras. This method fits the...
950KB Sizes 0 Downloads 6 Views