Home

Search

Collections

Journals

About

Contact us

My IOPscience

Signal recovery in imaging photoplethysmography

This content has been downloaded from IOPscience. Please scroll down to see the full text. 2013 Physiol. Meas. 34 1499 (http://iopscience.iop.org/0967-3334/34/11/1499) View the table of contents for this issue, or go to the journal homepage for more

Download details: IP Address: 165.124.124.16 This content was downloaded on 30/08/2014 at 18:34

Please note that terms and conditions apply.

IOP PUBLISHING

PHYSIOLOGICAL MEASUREMENT

Physiol. Meas. 34 (2013) 1499–1511

doi:10.1088/0967-3334/34/11/1499

Signal recovery in imaging photoplethysmography Benjamin D Holton 1,2 , Kavan Mannapperuma 1,3 , Peter J Lesniewski 1,4 and John C Thomas 1,5 1 School of Electrical and Information Engineering, University of South Australia, Mawson Lakes, SA 5095, Australia 2 Santos Pty Ltd, 60 Flinders St, Adelaide, SA 5000, Australia 3 Power Systems Design Pty Ltd, 16 Williams Circuit, Pooraka, SA 5095, Australia 4 School of Engineering, University of South Australia, SA 5095, Australia 5 Group Scientific Pty Ltd, Innovation House, First Ave, Technology Park, Mawson Lakes, SA 5095, Australia

E-mail: [email protected]

Received 7 March 2013, accepted for publication 11 September 2013 Published 22 October 2013 Online at stacks.iop.org/PM/34/1499 Abstract Imaging photoplethysmography is an emerging technique for the extraction of biometric information from people using video recordings. The focus is on extracting the cardiac heart rate of the subject by analysing the luminance of the colour video signal and identifying periodic components. Advanced signal processing is needed to recover the information required. In this paper, independent component analysis (ICA), principal component analysis, autoand cross-correlation are investigated and compared with respect to their effectiveness in extracting the relevant information from video recordings. Results obtained are compared with those recorded by a modern commercial finger pulse oximeter. It is found that ICA produces the most consistent results. Keywords: imaging photoplethysmography, biometric information, independant component analysis, principle component analysis, correlation, cardiac pulse, heart rate (Some figures may appear in colour only in the online journal) 1. Introduction Noninvasive extraction of biometric information, especially remote monitoring of cardiac pulse or heart rate (HR) by non-contact methods, is attractive due to its fundamental role in the diagnosis of health and well being. In this paper, we investigate the performance of different methods of analysis of photoplethysmography (PPG) data. PPG, introduced by Alrick Hertzman in 1938, is the basic technique used in many commercial medical devices for measuring the HR (Allen 2007). PPG senses the temporal variation of light transmitted by and reflected from the human body due to the change in volume of blood vessels under the skin and oxygenation levels of the blood that occurs during 0967-3334/13/111499+13$33.00

© 2013 Institute of Physics and Engineering in Medicine

Printed in the UK & the USA

1499

1500

B D Holton et al

respiration (Allen 2007, Zijlstra and Buursma 1997, Prahl 1999, Shi et al 2010, Asare and Spigulis 2013). Pulsatile blood flow modulates the light signal due to the minute volume changes, and the different absorption spectra of oxy- and deoxy-haemoglobin, as shown in figure 1, facilitate the measurement of the modulation. The basis for PPG is well established. However, it has further potential for determining biometric information especially with advanced signal processing techniques. Conventional contact PPG (CPPG) has been greatly improved by progress in the measurement of cardiovascular blood volume pulse and saturation of peripheral oxygen, SPO2 (Prahl 1999, Shelley 2007). The relatively high quality signal provided by CPPG with the sensor located on a finger, wrist or in the ear is central to the operation of the common pulse oximeter. It also provides the basis for monitoring individual waveforms and the time variability of the HR (Spigulis et al 2002, Millasseau et al 2006). This further allows for the estimation of blood pressure, pulse volume, arterial oxygenation (Jeong et al 2006, Andruschenko et al 2010, Song et al 2009) and respiratory rate (Madhav et al 2011, Nitzan et al 2008) as well as indicators of physical or mental stress (Yoo and Lee 2011). Imaging PPG (IPPG) is a form of non-contact PPG (NCPPG), based on the analysis of video images, and is an emerging field with potential for new applications. The signals are relatively weak and noisy and require extensive conditioning and analysis. Garbey et al (2007) first demonstrated that the human HR can be determined at a distance by analysing a video of exposed skin taken with a thermal camera and by identifying the HR as the corresponding maximum in the power spectrum of pixels along the thermally elevated area. The ability to extract the HR using special CMOS cameras with infrared and red lighting has been demonstrated (Hu et al 2008). Other authors have found that the HR and the respiration rate (RR) can be determined using simply a common colour camera and ambient illumination with full time–frequency analysis of the green colour intensity signal averaged over pixel clusters (Verkruysse et al 2008). This concept was recently extended by Poh et al (2011). They used a standard web camera and analysed a weighted sum of all three colour video signals, with the weights optimized by independent component analysis (ICA). Again, the HR was found from the maximum in the power spectrum but of pixels averaged over the whole face. In fact, that may be seen as a logical step after the successful application of ICA to recovering the periodic electro-encephalogram (EEG) waveform from several electrode signals, as has been done throughout the past decade (Walter-Williams and Li 2011). This approach was investigated further by Lewandowska et al (2011), who claimed that the use of principal component analysis (PCA) can be as effective as ICA. More recently, Kwon et al (2012) reproduced Poh et al’s approach in a standalone smartphone application. Using the ECG signal as a reference, they made comparisons for ten subjects. Interestingly, the error using the JADE ICA output was no better than that using the raw Green signal. Poh et al (2010) used their laboratory setup also for the analysis of the RR and HR variability (HRV) which they identified indirectly from the profile of the power spectrum. 2. Methodology We investigate different methods of data analysis for IPPG and its use to extract the HR of subjects. We consider ICA, principal component analysis (PCA), direct frequency analysis, auto-correlation and cross-correlation. The situation from the measurement perspective is demanding. This is because in IPPG, modulation of the intensity signal by the HR is very small, comparable to the quantization step of the 8-bit data in each red, green and blue (RGB) channel and it also produces significant

Signal recovery in imaging photoplethysmography

1501

Figure 1. Absorption spectra for oxy- and deoxy-haemoglobin (Prahl 1999).

quantization noise. It is essential to average the intensity data from the N pixels representing the face to produce an effect similar to an N-fold reduction of the quantization interval. The HR signal must be further enhanced to improve its recovery. It is possible at this stage to simply compute the power spectrum of one or two RGB channels, perhaps with some filtering, and find the unknown HR as the frequency of a single peak appearing in the range of expected HR. Unfortunately, this approach is unreliable and gives spurious results and large errors. Linear mixing of data from the three colour RGB channels can enhance the periodic HR component contained in the video signal. From that perspective, ICA appeared to be a useful algorithm capable of generating suitable mixing weights. Indeed, this was the approach leading to effective EEG monitoring (Walter-Williams and Li 2011). In theory, certain conditions should be satisfied for ICA or PCA to work as intended. These conditions, which include the statistical volume of data, their stationarity and independence, are rarely satisfied or tend to be unrealistic in real cases. In our case, an equally fundamental constraint is that both methods analyse the statistics of the data disregarding their order, i.e. time dependence or periodicity. There is an unfortunate contradiction in that signals are to be characterized due to being periodic but their timeline is not considered by these algorithms. The performance would be improved if the periodicity criteria were part of the algorithm too. Unfortunately, existing useful statistical algorithms either require prior knowledge of the period itself (Scholz 2007) or analyse a single temporal signal (Davies and James 2007). 2.1. Independent component analysis ICA is a type of blind source separation (BSS) in the sense that it computes a linear sum W of available data sets y (RGB colour channels in our case) with such weights w as to maximize one independent source at a time. In an ideal case, where the number of mixed data sets y is the same as the number of perfectly statistically independent source components x, it gives full unmixing of such sources. The data sources x must not have a Gaussian distribution and there may only be linear mixes A of these unknown sources. The mixed data sets have the form y(t ) = Ax(t ).

(1)

The original independent components are found from x(t ) = Wy(t ).

(2)

1502

B D Holton et al

Here, A is the unknown mixing matrix and its inverse, W , is the unmixing matrix found by the ICA. A number of ICA algorithms are available. The main difference between them is in the criteria employed to identify the unique character of the individual unmixed components. The most popular criteria are: • Fourth-order moments (kurtosis), as used in FastICA, developed by Hyvarinen and Oja (Hyvarinen 1999) using a fast fixed-point iterative algorithm to find the projections maximizing the non-Gaussianity of the signals. • Fourth-order cumulant tensors of diagonalized eigenmatrices, as used in JADE (also known as ICA by tensorial methods) developed by Cardoso (1999). JADE finds points where offdiagonals of these matrices become zero or as close to zero as possible. • Estimated Kullback–Leibler divergence paired with entropy, as used in RADICAL (robust, accurate, direct ICA algorithm) developed by Learned-Miller (2003). We point out that mixing the RGB signals is an important step offering much more than just mixing colours. This is because mathematical mixing includes subtraction which is not used in colour processing. However, where only three data sets (RGB) are available and where the ‘source components’ are nonlinearly distorted (e.g. by quantization or a moving boundary) and are not stationary, the simple procedure of linear de-mixing to find pure source components is the main limitation. So, in spite of the sophisticated criteria used to fine-adjust the weights of the de-mixing matrix, the difference in the end result is disproportionately small. 2.2. Principal component analysis PCA is a popular feature extraction and data projection method which reduces dimensionality by removing irrelevant or redundant variables and projecting the data with respect to the relevant variables. These are the variables which contribute the largest variance of the data or, in other words, that cause the greatest change in the projected data. As a multivariate statistical technique, PCA also offers BSS. It analyses the available data set Y with respect to several possibly inter-correlated variables and extracts orthogonal data components CX which contribute to the variance of the data. The CX are ranked according to their contribution to the variance. The one with the largest contribution is the principal component(s) (Abdi and Williams 2010). The PCA data model is the transform: Y = U T X.

(3a)

Here, Y is the vector containing the experimental data (RGB traces) and X is an unknown source with a size given by the length of the time series multiplied by 3 for the RGB components in our case. U is an orthonormal matrix, the columns of which are the principal components. They are determined recursively to correspond with the direction of maximum data variance. These component vectors (orthogonal to each other and ordered starting at the highest variance in relation to X) are the eigenvectors of the covariance matrix (Vicente et al 2007) 1 (3b) CX = XX T . N In this study, we used the inbuilt MATLAB PCA function princomp to carry out the analysis. PCA should efficiently find the composition of the RGB signals that gives the most variation in the measured signal. However, the small HR signal component may be masked by a larger noise signal which is not likely to be statistically independent, especially in the case of small data sets. Note that it has been observed that ICA and PCA are equivalent in the

Signal recovery in imaging photoplethysmography

1503

sense that they may lead to the same results in some applications (Lewandowska et al 2011, Vicente et al 2007). 2.3. Auto-correlation and cross-correlation Auto-correlation and cross-correlation are classical methods for determination of repetitive patterns in data or signals. Cross-correlation identifies similarity between two different timeshifted signals while auto-correlation assesses similarity of a signal with a time-shifted version of itself. Auto-correlation (Barlow and Berry 2010) α is given by  (x1(t ) − x¯1 )(x1(t−T ) − x¯1 ) , (4) α= σx2 t where x1(t) is the signal, x¯1 is its mean and σ x is the standard deviation of the signal. Cross-correlation is represented by  (x(t ) − x)(y ¯ (t−T ) − y) ¯ , γ = σ σ x y t

(5)

where x(t) and y(t) are the signals (with x¯ and y¯ being their respective means and σ x and σ y their standard deviations). In this study, the MATLAB command xcorr was used to perform both the cross-correlation and auto-correlation of the signals. Unlike the statistical methods described above, correlation techniques do not separate mixed data sources or signals. Rather, they highlight common features in different signals or periodicity in a single signal. From previous observations (Zijlstra and Buursma 1997, Prahl 1999, Poh et al 2011), we anticipated that colours such as red and green should relate to the period of the HR but that each might have a different phase related to the period of the HR itself. That suggested that cross-correlation may prove effective. 2.4. General limiting constraints The success of one or the other analysis method depends strongly on the signal condition which is determined by noise level and stationarity of the HR. It is difficult to compare these methods directly as they are really complementary techniques for data analysis. In addition to noise and the unsteady HR (modulated by breathing), there are other unwanted components in video signals, including variation in the illumination including the ripple caused by the AC mains power and boundary (template) movements or motion artefacts (facial movements, blinking, smiling, flashing teeth, etc). Note that the latter category generates large pulses of luminance in all colours. These make data analysis less reliable and can be removed by clipping. The role of either ICA or PCA is to mix suitable RGB signals to best match the experimental outputs in a blind fashion, i.e. based on the different statistical properties of each component. Unfortunately, the position (index) of the desired output in the output matrix of the ICA varies due to the iterative character of the algorithm. If PCA is used, then the desired output would not appear as the first or even the second component if there are other stronger signal components. Therefore, selection of the best (i.e. the most relevant) output component is required. For reliable data analysis, it is important to minimize unwanted components in the RGB signals where possible. Linear low-pass (LP) filtering can remove most noise and an averaging window can notch out beat frequencies caused by AC lighting. Other artefacts

1504

B D Holton et al

such as facial/eye/lip movements producing large spikes or the slower light variation are attenuated to a lesser degree by a high-pass (HP) filter or preferably by detrending, which serves the same purpose although in a nonlinear fashion. Filters require time equal to their impulse response (IR) to start producing valid outputs. That time is an additional delay which extends the required length of the video record to be analysed. The IR is a multiple of the reciprocal of the filter cutoff frequency and this is why detrending is preferred to an HP linear filter and also why a finite IR (FIR) filter should be used rather than infinite IR filters. The last stage of the analysis is to obtain the power spectrum of the preconditioned signal with resolution sufficient to meet the required accuracy of the HR determination. In a typical situation with a video frame rate of 15 frames per second (fps) and for an HR up to 2 Hz (120 bpm), the sampling rate is still well above the Nyquist criteria and allows for efficient attenuation of the 100 Hz light modulation and noise by LP filtering while a notch filter (rectangular window) removes beat frequency components of 5 and 10 Hz. The number of frames analysed (i.e. the length of the signal, T) must be sufficient to give the accuracy of a typical pulse oximeter, in our case:  f = ± 0.05 Hz. This leads to the minimum duration of the signal to be analysed, T = 1/ f of 20 s or 300 frames. Windowing a signal with a duration T is equivalent to convolving its spectrum with sinc(π f T). Thus, windowing the signal adds linear distortion. This is minimized by using a smooth windowing of the original signal, usually at the expense of extending its length. The trade-off here is that any extension of the minimum length of the analysed signal occurs at the expense of reducing the ability to monitor the detected HRV directly using repeated measurements. Evaluation of the measurement results is not straightforward. Similar limitations apply to published investigations of HR measurement using NCPPG. Also, the HR of real people is not constant and the oximeters which are commonly used as a reference have their own tolerance and a finite response or averaging time. We used a fingertip oximeter with an 8 s averaging time and HR accuracy of ± 3 bpm. We evaluated our measurement results by comparing our measured values with those from the pulse oximeter using Pearson’s correlation test using a large data set. 3. Experimental setup 3.1. Instrumentation and organization of data A webcam (Logitech C910 HD Pro Webcam) was used to record facial videos. Videos were recorded at 15 fps in 24-bit RGB colour (8 bits per channel) at 640 × 480 pixel resolution. The videos were recorded using two methods. One was through the MATLAB image acquisition toolbox where the videos were saved as uncompressed AVI files. The second was through the inbuilt Logitech interface and the videos were saved using lossless compression in WMV format. These latter videos were subsequently converted to AVI container format using a video converter (Freemake Video Converter or Avangate Suite, AVS4 you) prior to analysis. Human trials approval was obtained from the University of South Australia Ethics Committee (approval number 0000026093) and 18 subjects (M/F, 16/2) ranging in age from 22–62 years participated. In total, 31 separate videos were stored for analysis. The subjects included Caucasian, African and Asian heritage. The participants’ HRs were monitored using a commercially available pulse oximeter (Rossmax Finger Pulse Oximeter SB220 attached to their left pointer finger. A 30 s period was allowed to lapse prior to each recording to allow the cardiac rate to settle and to produce repeated oximeter readings to within 3 bpm. An oximeter reading was taken every 30 s during video recording. The participants were seated approximately 60 cm from the camera and were

Signal recovery in imaging photoplethysmography

1505

(a)

(b)

(c)

(d)

Figure 2. Recovery of signals from video recordings. (a) Face in the image is imported to MATLAB, (b) cropped manually and then split into its RGB components. (c) Frames are then averaged to create the raw signals and (d) the signals are conditioned in preparation for data analysis.

asked to minimize head movement during recording. A black backdrop was placed behind them to minimize reflection and provide a constant background scene. The videos were recorded for a total of 90 s each. Videos were captured at different times of the day with the main source of lighting coming from room light (fluorescent light banks) and daylight. 3.2. Signal recovery from video recordings A custom MATLAB program was written to import and process the video images. The videos were processed in 30 s blocks and a total of 63 samples were processed. Figure 2 outlines the process used to generate the RGB time series signals. These signals were then analysed to determine the HR. The procedure was as follows. (1) The videos were imported into MATLAB and for each frame the image was cropped so that only the face was left and the image was decomposed into the three separate RGB colour components (figure 2(b)).

1506

B D Holton et al Table 1. Unmixing matrix for the data in figures 3–5.

R

G

B

25.7 −2.0 −4.2

−16.6 14.6 −3.9

−6.8 1.1 14.1

C1 C2 C3

(2) The average pixel value for each colour in each frame was then found and stored to create a time series of the RGB signals (figure 2(c)). These three time series signals are the raw data which are analysed to determine the HR. (3) The signals were put through a 32-point LP FIR filter (4 Hz cutoff frequency), and normalized to their mean value using the formula  (yi(t ) − μi ) . (6) γi(t ) = σi t Here, i = 1, 2, 3 represents each of the three components, μi is the mean and σ i is the standard deviation of the component signal yi(t). The signals were detrended using a prior smoothness approach (Tarvainen et al 2002). (4) The signals were passed through a six-point moving average filter that uses a rectangular window. As can be seen in figure 2(d), the three signals contain distinctly periodic components and this periodicity is stronger in the green trace. Moreover, this time variation is on the timescale expected for an HR signal. (5) The three components of the conditioned RGB signals were input to the ICA and PCA analysis and the outputs were three basis components. These were passed through a fast Fourier transform (FFT) to obtain their power spectra. Component 2 of the ICA/PCA analysis was usually found to contain the strongest HR signal and the largest peak in component 2 was used to estimate the HR. Typical weights in the JADE ICA unmixing matrix are shown in table 1. The component C1 receives the majority of the red signal (with negative green and blue signals), component C2 has mainly green signal (with some negative red and positive blue) while component C3 is mainly the blue signal. An example of a power spectrum of the original RGB signals and the unmixed components is shown in figure 3. Auto-correlation analysis was performed on each of the RGB components and crosscorrelation analysis was performed between each pair of colours. The largest peak in these correlation signals was assigned to the HR signal and its lag time was used to estimate the HR. Auto-correlation plots for good quality RGB signals presented in figure 4 indicate that both red and green signals have their auto-correlation peaks which correspond well with the HR even without the use of the ICA. Similarly, the cross-correlation of the red and green signal has its first two peaks separated by the frequency equal to the HR. 4. Results The HR estimates were extracted from 63 video segments using the different methods and were compared with the values determined from the pulse oximeter using correlation scatter plots for each type of data analysis as shown in figure 5. It can be seen from the fifth and ninth plots of figure 5 that the HR estimate from the auto-correlation and FFT analysis of the green signal correlates strongly with the value measured with the finger pulse oximeter. This indicates that the HR signal is well represented in the green channel. In fact, this can

Signal recovery in imaging photoplethysmography

1507

Figure 3. Power spectrum of detrended signals processed by JADE ICA (black) compared with the spectrum of the RGB signals (colour plots); template 320 × 330 pixels. Used 512-point fast Fourier transform (FFT) over a good quality part of the video, signal 6F (with the sampling rate of 15 fps, frequency unit is 15/512 ∼ = 0.03 Hz). Note: the vertical line represents the oximeter reference reading, HR = 64 bpm.

Figure 4. Auto-correlation (unbiased) RGB signals (colour, solid lines) and ICA unmixed components (colour, dotted) 320 × 330 (6 F). Lag/delay unit 1/fps = 1/15 s. Oximeter recorded HR 64 bpm.

be anticipated from figure 2 which clearly shows that the green signal contains the strongest periodic component of the RGB signals. Also, as anticipated, there is almost no difference between the results obtained either through the auto-correlation or the power spectrum (through the FFT) since both operations are equivalent, they are just done in the time and frequency domains, respectively. The sixth through eighth plots of figure 5 show that cross-correlation between the signals in all combinations produces greater variation in the HR estimate and poorer agreement with HR measured using the oximeter. As can be seen from the fourth plot of figure 5, the results from the PCA were poor, and the examination of the power spectra showed that the cardiac peak often occurred in the other components and was not evident in component 2. The first three plots of figure 5 show that

1508

B D Holton et al

Figure 5. Scatter plots relating HR results between a webcam and finger pulse oximeter. Results were obtained using the FFT on signals obtained from JADE ICA, FastICA, RADICAL ICA and PCA, auto-correlation of green trace, cross-correlation of red and green trace, cross-correlation of red and blue trace, cross-correlation of green and blue trace and FFT on green trace directly, as shown over each plot. Pearson’s correlation coefficient, r, is shown for each case. Note: p

Signal recovery in imaging photoplethysmography.

Imaging photoplethysmography is an emerging technique for the extraction of biometric information from people using video recordings. The focus is on ...
1MB Sizes 0 Downloads 0 Views