Home

Search

Collections

Journals

About

Contact us

My IOPscience

Evaluating predictions of critical oxygen desaturation events

This content has been downloaded from IOPscience. Please scroll down to see the full text. 2014 Physiol. Meas. 35 639 (http://iopscience.iop.org/0967-3334/35/4/639) View the table of contents for this issue, or go to the journal homepage for more

Download details: IP Address: 134.129.120.3 This content was downloaded on 22/05/2015 at 12:19

Please note that terms and conditions apply.

Institute of Physics and Engineering in Medicine Physiol. Meas. 35 (2014) 639–655

Physiological Measurement

doi:10.1088/0967-3334/35/4/639

Evaluating predictions of critical oxygen desaturation events Hisham ElMoaqet 1 , Dawn M Tilbury 1 and Satya Krishna Ramachandran 2 1

Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA 2 Department of Anesthesiology, Medical School, University of Michigan, Ann Arbor, MI 48109, USA E-mail: [email protected], [email protected] and [email protected] Received 9 September 2013, revised 12 February 2014 Accepted for publication 17 February 2014 Published 12 March 2014 Abstract

This paper presents a new approach for evaluating predictions of oxygen saturation levels in blood (SpO2 ). A performance metric based on a threshold is proposed to evaluate SpO2 predictions based on whether or not they are able to capture critical desaturations in the SpO2 time series of patients. We use linear auto-regressive models built using historical SpO2 data to predict critical desaturation events with the proposed metric. In 20 s prediction intervals, 88%–94% of the critical events were captured with positive predictive values (PPVs) between 90% and 99%. Increasing the prediction horizon to 60 s, 46%– 71% of the critical events were detected with PPVs between 81% and 97%. In both prediction horizons, more than 97% of the non-critical events were correctly classified. The overall classification capabilities for the developed predictive models were also investigated. The area under ROC curves for 60 s predictions from the developed models are between 0.86 and 0.98. Furthermore, we investigate the effect of including pulse rate (PR) dynamics in the models and predictions. We show no improvement in the percentage of the predicted critical desaturations if PR dynamics are incorporated into the SpO2 predictive models (p-value = 0.814). We also show that including the PR dynamics does not improve the earliest time at which critical SpO2 levels are predicted (p-value = 0.986). Our results indicate oxygen in blood is an effective input to the PR rather than vice versa. We demonstrate that the combination of predictive models with frequent pulse oximetry measurements can be used as a warning

0967-3334/14/040639+17$33.00

© 2014 Institute of Physics and Engineering in Medicine Printed in the UK

639

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

of critical oxygen desaturations that may have adverse effects on the health of patients. Keywords: pulse oximetry, system identification, auto-regressive models, prediction, performance metrics (Some figures may appear in colour only in the online journal) 1. Introduction Recent developments in biomedical monitoring systems and computing technologies opened new opportunities to record and save longer time histories of medical signals. This has significantly increased the importance of continuous physiological monitoring in preventing respiratory complications and death, especially in the postoperative period. Current smart alarm-threshold-based monitoring systems have been shown to reduce the need for intensive care unit admission and complications (Taenzer et al 2010). However, there are several concerns associated with using threshold-based monitoring alarms. For example, a small difference in a threshold may increase the alarm rate by a factor of 10 (Watkinson and Tarassenko 2012). Accordingly, traditional threshold-alarm-based monitors may result in excess false positive alarms (if alarm limits are set too high) and harm (if alarm limits are set too low) (Bliss and Dunn 2000). Recognizing this, these systems have not been shown to reproducibly improve outcomes in hospitalized patients (Watkinson et al 2006). On the other hand, there have been recently a significant increase in alarm related adverse outcomes (Joint Commission 2013). There has been also some interest in combining multiple channels of physiological data for better overall awareness of the patient’s state (Tarassenko et al 2006), which showed better state detection results than a single channel, but not yet better patient outcomes. In this paper, we move from monitoring to prediction. We consider time-domain models, built from individual patient data, to make predictions about near-term future behavior of patients’ physiological state. We believe that if adverse events can be predicted far enough in advance, there may be an opportunity for clinical intervention to prevent such events. One important application of the problem considered in this study can be seen in sleep apnea. Typical desaturation events in such cases occur within 20–60 s of upper airway obstruction. The ability to accurately forecast these events would potentially lead to the development of innovative methods that modify this upper airway obstruction or provide supplemental oxygen intermittently to patients suffering from these events. Indeed, this would be of a great value for cases in which efforts to breathe are diminished or absent for short time intervals; primarily in central apnea patients (American Sleep Disorders Association 1997). Unlike all current home therapy modalities which are essentially reactive, we anticipate that future iterations of this concept will result in development of a prophylactic intervention. In this study, we use linear regression models for prediction. In addition to evaluating the predictive models against a standard engineering-style metric (root mean square error (RMSE)), we classify the prediction at each time step based on whether or not a critical (desaturation) event is correctly or incorrectly predicted. We consider two different data streams: oxygen saturation levels in blood (SpO2 ) and pulse rate (PR). Our predictions are based on models that we develop using past SpO2 data for the same patient. Future work may incorporate other signals into the prediction algorithm, and evaluate predictive capabilities of these types of models for different physiological signals. The main contribution of this paper is the method that we have proposed for evaluating predictive models, based on whether or not a desaturation event is correctly predicted. In 640

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

addition, we demonstrate that given the two signals SpO2 and PR, SpO2 is better considered as an input with PR as an output (instead of vice versa), and that the addition of PR to the models does not improve the prediction performance. The paper is organized as follows. After a brief summary of prior work in time-domain modeling and evaluation metrics for physiological systems in section 2, we discuss our data set and the pre-processing that we have done in section 3. Section 4 presents the models that we use, and section 5 discusses how our proposed evaluation metric compares to the standard RMSE metric. Results and discussion are found in section 6, and the paper concludes with some avenues for future work in section 7. 2. Background and problem statement 2.1. Pulse oximetry monitoring

Pulse oximetry monitoring (POM) devices are frequently used in general care units to record both non-invasive oxygen saturation levels in blood (SpO2 ) and PR at 1–2 s intervals. These devices have been suggested as cost effective, non-invasive techniques to reduce the risk of hypoxemic injury in highly risk populations (Eichhorn 2003) and to reduce costly critical unit admissions (McFadden et al 2007, Taenzer et al 2010). The human body requires and regulates a very precise and specific balance of oxygen in the blood to ensure sufficient oxygen delivery to tissues, primarily through binding with hemoglobin. The oxy-hemoglobin binding is characterized by cooperativity, allowing for more oxygen to be extracted by tissues which are suffering from low oxygen levels. This cooperativity between oxygen and hemoglobin molecules results in a sigmoid-shaped dissociation curve (Mason et al 2010). At an SpO2 of 90% (which corresponds typically to a partial pressure of oxygen in the blood pO2 of 50 mm Hg), the curve starts becoming steeper, such that a small decrease in the partial pressure of oxygen results in a large decrease in the oxygen content of the blood (West 2012). This causes sudden severe reduction in SpO2 to be potentially life-threatening (Stradling and Crosby 1991, Rauscher et al 1991, Kripke et al 1997, Mooe et al 1996, Epstein and Dorlac 1998). Therefore, a threshold of SpO2  89% has been identified as a significant marker of the need for nocturnal oxygen treatment (Centers for Medicare and Medicaid Services 1993). An oxygen desaturation index of 4% (ODI4 ) is clinically defined as the number of episodes per hour of oxygen desaturations that are greater than or equal to 4% from the base line of the signal (Berry et al 2012). The SpO2 signals of concerned patients can be classified according to the frequency of occurrence of the desaturation events. It has been shown that patients with an ODI4  5 have a significantly higher rate of postoperative complications than those with ODI4 < 5 and that the complication rate also increased with increasing ODI4 values (Hwang et al 2008). Notwithstanding the clinical importance of POM data, there have been few studies on identifying predictive models from these data. 2.2. Prediction of physiological signals

One well-studied physiological signal is blood glucose. Diabetes has been a challenging topic for feedback control approaches for many years through building different types of models from continuous glucose monitoring (CGM) data. Several data driven techniques have been applied for the prediction and control of glucose concentrations. For example, a radial-basis function neural network is proposed to predict and control glucose levels (Trajanoski et al 1998). Also, a Kalman filter is employed to adjust the parameters of a first-principles model 641

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

for the prediction and control of blood glucose (Dua et al 2006). Data-driven auto-regressive (AR) models have been used to build dynamic models. A first-order AR model, AR-1, is used after pre-processing (smoothing) the raw CGM data to remove high-frequency noise (Sparacino et al 2007). A tenth-order AR model, AR-10, is used to model CGM signals (Reifman et al 2007). The AR coefficients were identified via regularized least squares using raw (unsmoothed) signals. More recently, a 30th-order model, AR-30, is proposed to predict near future glucose concentrations from smoothed data (Gani et al 2009). One of the important questions that arises with physiological signals is the causality relationship between the variables and which signals affect which others significantly. The availability of multi-channel biomedical devices allows synchronized physiological signals to be recorded and facilitates investigating the interactive relationship between these variables. For the SpO2 and PR signals collected by POM systems, clinical observations showed that episodic drops in blood oxygen levels have direct impact on the autonomic nervous system control of the PR (Somers et al 1995). Previous studies have shown transient but reproducible increases in PR with episodic drops in blood oxygen, as is seen where patients suffering from sleep apnea are exposed to years of such changes, resulting in faster baseline PRs (Vries et al 2008, Schulte-Frohlinde et al 2002, Griffin et al 2005, Williams and Galerneau 2003, Tsuji et al 1994, Norris et al 2008, Riordan et al 2009). However, although both signals are recorded with the same device system, we have not found any previous mathematical studies on the relationship between the dynamics of these signals. There are various ways to construct dynamic models from time series. Yet, the performance of these models in forecasting is crucial for the ability to utilize them practically. RMSE (a standard performance metric) has been used to evaluate predictions of physiological systems (Sparacino et al 2007, Reifman et al 2007, Gani et al 2009). Nevertheless, this performance metric is not capable of distinguishing errors due to phase difference from errors due to magnitude difference. Also, we are more interested in predicting critical patterns in physiological signals rather than exactly matching them. One example of an evaluation metric used for physiological signals is the Clarke-error grid (Clarke-EGA) (Clarke 2005) that is currently used to assess the accuracy of time-series modeling methodologies developed from CGM data to predict glucose levels (Gani et al 2010). The grid maps actual and predicted glucose concentrations into five zones, where regions A and B are clinically acceptable, region C may result in unnecessary corrections, region D could lead to incorrect treatments, and region E represent erroneous treatment. To our knowledge, there is no similar evaluation metric that can be used with SpO2 signals. 2.3. Problem statement

In this paper, we propose an evaluation metric for predicting oxygen saturation levels in blood inspired by the Clarke-EGA used in diabetes applications. The proposed metric is based on evaluating the ability of the predictive models to capture critical desaturation events in the SpO2 series of patients. Following the work in Gani et al (2009), we use regularized POM signals for building AR models to characterize the dynamics of SpO2 . We show, through the proposed metric, that these models can be used to predict critical events in the near future to assist in speeding up medical response. Moreover, we investigate the possibility of improving SpO2 prediction capability through the inclusion of the PR dynamics as an external input in the predictive models. We use a data-driven approach to investigate the interaction between SpO2 and PR and we demonstrate that including PR dynamics in the SpO2 predictive models does not improve the capability of predicting critical events. Part of this work has been presented previously at a conference (ElMoaqet et al 2013b). 642

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

3. Data set This study was approved by the Institutional Review Board (IRB) at the University of Michigan (IRB#HUM00069035). The data used for this study were collected as part of a quality improvement study (IRB#HUM00027189) looking at the reliability and nurse response times to a postoperative oximetry-based paging alert system (Voepel-Lewis et al 2013). The study was conducted on 119 postoperative adults following orthopedic surgery over a three month period in 2009. All patients were placed on postoperative POM (MASIMO RAD-8, Irvine CA) on arrival to the patient care unit per Institutional policy. Immediately after termination of the patients’ monitoring period, device ASCII data consisting of SpO2 and PR were downloaded using PROFOX Oximetry Software (version PO Standard; Escondido, CA). The ODI4 values for each patient were also computed by the PROFOX oximetry software. The data obtained are discrete-time signals sampled each 2 s and quantized for a resolution of ±1% for SpO2 and ±1 bpm for PR. ODI4 values range from 0.1 to 45.9. 3.1. Missing measurements in raw time series

The raw signals (SpO2 and PR) for each patient include some instances where the measurements are of zero amplitude. Zeros that extend for a duration no more than six sampling steps (12 s) were considered anomalous sensor measurements and were replaced with the most recent non-zero amplitude, similar to the zero-order-hold principle (Astrom and Wittenmark 1997). Any zeros that last for more than six sampling steps break the continuity of the time series and this requires the corresponding SpO2 -PR signal to be partitioned to smaller pieces that can be modeled and analyzed separately. Furthermore, there are some instances where the PR signals are of amplitude less than 30 bpm which is clinically impossible. These false measurements were treated in the same way as the zero measurements. For this study, the analysis was based on 15 000 points of the data sets of each patient. N = 7500 consecutive points (i.e., 4.17 h) without data gaps of the second type were used for constructing models and the other 7500 points for evaluating predictions. Of the 119 patients, only 62 possess continuous 7500 points for modeling and not all of them have sufficiently many major events to reflect the results of our prediction algorithms. Hence, we have selected ten representative patients with ODI4 ranging from 2.2 to 45.9. 3.2. Raw signals smoothing

We considered two smoothing methods for denoising the signals and reducing the quantization effects (Sparacino et al 2007, 2010, Gani et al 2009). The simple moving average filter averages a number of points from the input signal to produce each point in the output smoothed signal (Sparacino et al 2010)

3.2.1. Moving average filters.

y˜i =

M−1 1  yi−n M n=0

(1)

where y˜ is the smooth signal, y is the raw data signal, and M is the number of points used in the moving average. The exponentially weighted moving average filter (EWMA) filter applies weighting factors which decrease exponentially. The weighting for each older datum point never reaches zero. The EWMA can be calculated recursively as y˜1 = y1

(2) 643

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Patient No. 12: Raw Data: Black, SMA: Red, EWMA: Blue., Reg−Data: Magenta 100

SpO2

95

90

85

Raw Reg SMA EWMA

80 0

1

2

3 Time (minutes)

4

5

6

Figure 1. Raw data and smoothed data for patient no. 12.

y˜i = αyi + (1 − α)y˜i−1 ,

i>1

(3)

where y˜ is the smooth signal, y is the raw data signal, and α is a constant smoothing factor between 0 and 1. A higher α discounts older observations faster. Regularization has been proposed previously to smooth CGM signals (Gani et al 2009). In this paper, we used Tikhnov regularization, which yields smoothed signals y˜ by computing y˜ = Ud w, where Ud denotes the integral operator and w denotes estimates of the signals’ first derivatives. This method does not introduce lag on the smooth signal relative to the original raw signal. To estimate the signal’s derivatives w, we minimized the functional f (w), given by

3.2.2. Regularization.

f (w) = y − Ud w2 + λ2d Ld w2

(4)

where y denotes the N × 1 vector of the raw signal, Ud denotes the N × N integral operator, w represents the N × 1 vector of first order differences (the rate of change of the signal), λd represents the data regularization parameter, and Ld denotes a well-conditioned matrix chosen to impose smoothness constraints on the derivative of the POM signals. For a chosen Ld , the quality of smoothing is determined by λd . When λd = 0, no regularization is performed ˜ increasingly resulting in the original raw data y. As λd increases, the solution w (and hence y) satisfies the imposed smoothness constraint, resulting at the same time in larger deviations from the raw data. Figure 1 shows smoothed SpO2 signals for 6 min data of patient no. 12. For the SMA filter shown in dashed red line, M = 5, for EWMA filter shown in dotted blue line α = 0.35, and for the regularized signal shown in solid magenta line λd = 20. 4. Time-series modeling 4.1. Auto-regressive models

In AR modeling, the predicted signal yˆk at time k (k = n + 1, . . . , N, where N denotes the total number of data samples available for modeling) is inferred as a linear combination of previously observed signals yk−i 644

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

yˆk =

n 

θi yk−i

(5)

i=1

where θ denotes the vector of AR coefficients to be determined, and n denotes the model order. The AR models are identified by finding the coefficients θ that best describe the dependences in the entire time series y. These coefficients are obtained based on the least squares fit (Ljung 1987). In our recent work (ElMoaqet et al 2013b), we showed that identifying AR models from the raw SpO2 data resulted in poor prediction results due to the noise in these signals as well as quantization effects. Moreover, we showed that smoothing via regularization improves prediction results significantly compared to moving average smoothing. Therefore, in this paper dyamic models are identified from the regularized SpO2 and PR signals. In our other work, we investigated AR models with moving average and artificial neural networks (ElMoaqet et al 2013a). No significant improvements over simple AR models were found. 4.2. Auto-regressive models with exogenous input (ARX)

To study the interactive relationship between the oxygen saturation in blood (SpO2 ) and the PR, we assume linear system models as approximations to the full nonlinear model around an operating condition. Therefore, we can use standard system identification techniques to fit linear auto-regressive models with exogenous input (ARX) between these physiological variables. For an ARX(n,m), the predicted signal yˆk at time k (k = max(n, m) + 1, . . . , N, where N denotes the total number of data samples available for modeling) is inferred as a linear combination of previously observed output yk−i and previously observed inputs uk−i n m   yˆk + ai yk−i = bi uk−i (6) i=1

i=0

where n and m denote the order of output and input polynomials. The ARX models are identified by finding the coefficients ai and bi that best describe the dynamics of the output time series y. These coefficients are estimated from input/output data using least squares fit principle (Ljung 1987). To investigate the interaction between SpO2 and PR signals, we have built some dynamic models in which SpO2 is an output to the PR input, and other dynamic models in which PR is an input to the SpO2 output. The significance of the coefficients for the two modeling choices was investigated to analyze the interactive relationship between these physiological variables. 5. Multi-step ahead SpO2 prediction 5.1. Linear prediction

Future values of a discrete-time signal characterized by a linear identified time-series model are forecasted k steps into the future using historical data such that the predicted output yˆN+k of a time series y can be seen as function of previous available measurements of this time series yˆN+k = f (yN , yN−1 , . . . , y0 ). k is called the prediction horizon, and corresponds to predicting output at time kTs , where Ts is the sampling time of the discrete signal (2 s for our data). For example, the one-step ahead predictor of an AR model can be expressed by the following equation: m  yˆN+1 = θi yN−i+1 (7) i=1

where the same equation is used with ARX models but with the inclusion of the input terms. 645

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Table 1. Prediction grid for proposed metric.

Pred.  89% Pred. > 89%

Ref.  89%

Ref. > 89%

A C

B D

5.2. Prediction evaluation metrics

Linear dynamic models are fitted using least square error principle and it is a common practice in AR modeling to assess the prediction capability by calculating the RMSE between the predicted and reference measurements of the time series. RMSE can be expressed as follows:   n 1 (yˆi − y˜i )2 (8) RMSE(y, ˆ y) ˜ = n i=1

5.2.1. Root mean square error.

where yˆ is the predicted signal, y˜ is the reference smoothed data signal, and n is the number of predicted output measurements. We define the best fit model as the model that provides the lowest mean square error on the validation data set. We are interested to evaluate dynamic systems models based on their ability to capture critical signal patterns rather than their ability to exactly match reference signals. Accordingly, we define our performance metric as the ability of the dynamic systems models to predict critical desaturation events. The threshold metric is chosen as 89% due to the clinical significance (Centers for Medicare and Medicaid Services 1993). Mathematically, consider a time series in the interval [ti , t f ] with arbitrary t ∈ [ti , t f ]. The AR predictor (equation (7)) for a prediction over horizon k gives a vector of length k as follows:

5.2.2. SpO2 prediction grid.

PV (t ) = [yˆt+1 yˆt+2 . . . yˆt+k ]

(9)

while the vector of smoothed reference measurements at the same interval can be expressed as follows: RV (t ) = [y˜t+1 y˜t+2 . . . y˜t+k ].

(10)

The prediction grid used for assessing the quality of the prediction process is shown in table 1, where points in regions A and D on the main diagonal represent points of good prediction, points in region C off-diagonal represent the ones at which the model fails to detect critical events, and points in region B off-diagonal represent false prediction points. The different prediction grid areas can be illustrated in figure 2. The proposed grid can be considered as a statistical binary classification function (Altman and Bland 1994a, 1994b) with sensitivity, specificity, positive predictive value, negative predictive value, and accuracy used as statistical measures to evaluate its performance. Thus, we assign regions A, B, C, D as the true positive, false positive, false negative, and true negative areas respectively. Mathematically, this can be expressed as follows: prediction sensitivity (TPR) =

|A| × 100% |A| + |C|

646

(11)

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Region A, Pred89

100 90 80 −20

0

0

20 Time(sec)

90 80 −20

40

0

20 Time(sec)

40

60

Figure 2. Examples of prediction grid regions, blue: actual, green-dotted: prediction.

prediction specificity (TNR) =

|D| × 100% |B| + |D|

positive predictive value (PPV) =

negative predictive value (NPV) =

accuracy (ACC) =

|A| × 100% |A| + |B| |D| × 100% |C| + |D|

|A| + |D| × 100% |A| + |B| + |C| + |D|

(12)

(13)

(14)

(15)

where |A|, |B|, |C|, and |D| are the number of points in regions A, B, C, and D. 6. Results and discussion 6.1. AR predictive models for S pO 2

We have developed software codes to build AR models for SpO2 signals of different patients and our results show that the dynamics of SpO2 signals can be captured with an AR-10 model (ElMoaqet et al 2013b). Using a higher model order AR-30 did not significantly improve RMSE results (p-value = 0.991). We present here an example for patient no. 12; a patient who experienced a very high frequency of desaturation events (ODI4 = 45.9). Figure 3 shows the performance of the 20 s ahead prediction for the AR-10 model of the SpO2 time series of this patient. Moreover, the 20 s ahead RMSE of AR-10 model for this patient is 0.921. Table 2 shows RMSE values for other patients’ models.

6.1.1 Performance of AR models.

6.1.2. Prediction horizon. Table 2 also shows the RMSE performance for 60 s ahead prediction for ten patients using AR-10 models. It indicates that increasing the prediction 647

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

100

RegD (blue−solid) & 20−s ahead prediction from AR−10 (magenta−dashed)

SpO

2

95

90

85 Reg. Data 20−s Pred. 89% Threshold 80 0

10

20

30

Time (min)

40

50

60

Figure 3. 20 s ahead AR-10 prediction results using smoothing via regularization, patient

no. 12. Table 2. 20 and 60 s ahead RMSE% of AR-10.

Horizon (s) P#

ODI4

k = 20

k = 60

10 12 18 37 59 80 89 118 125 136

5.3 45.9 9.2 2.2 13.4 18.6 21.8 20.4 21.9 34.6

0.280 0.921 0.404 0.473 0.388 0.587 0.953 0.572 0.738 0.878

0.958 2.440 1.662 2.125 1.057 2.186 4.025 1.520 1.844 2.596

Table 3. 20 s ahead prediction grid, patient no. 12.

Pred.  89% Pred. > 89%

Ref.  89%

Ref. > 89%

712 96

40 6642

horizon k from 20 to 60 s increases RMSE values for the recommended AR-10 but this does not convey any information on the ability of predicting critical desaturation events over this horizon. For example, figure 4 shows the 60 s ahead predicted signals for three patients. As it can be clearly seen in this figure, these models are still able to capture critical desaturations in this interval. When applying the proposed metric on the AR-10 model identified for patient no. 12, the prediction grid results for 20 and 60 s ahead in tables 3 and 4 respectively show that the specificity (TNR) for both prediction intervals are superior (99.4%, and 97.4% respectively) 648

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Regularized Data (blue dots) & 60−s ahead prediction from AR−10 (green solid) 100 100

95

95

SpO2

SpO2

95

SpO2

100

90

90

85 0

90

85 1 2 3 4 Time (minutes)

5

85

0

1 2 3 4 Time(minutes)

5

0

1 2 3 4 Time(minutes)

5

Figure 4. AR-10 model performance in 60 s prediction intervals for three patients.

Table 4. 60 s ahead prediction grid, patient no. 12.

Pred.  89% Pred. > 89%

Ref.  89%

Ref. > 89

759 577

126 6008

Table 5. 20 s ahead prediction grid results for ten patients, AR-10.

P#

|A|

|B|

|C|

|D|

TPR

TNR

PPV

NPV

ACC

10 12 18 37 59 80 89 118 125 136

0 712 124 339 232 81 71 38 1181 130

0 40 1 20 12 9 0 2 84 13

0 96 10 22 21 6 7 5 99 12

7490 6642 7355 7109 7225 7349 7412 7445 6126 7335

NA 88.1% 92.5% 93.9% 91.7% 93.1% 91% 88.4 % 92.3% 91.5%

100% 99.4% 99.9% 99.7% 99.8% 99.9% 100% 100% 98.6% 99.8%

NA 94.7% 99.2% 94.4% 95.1% 90.0% 100% 95.0% 93.4% 90.9%

100% 98.6% 99.9% 99.7% 99.7% 99.9% 99.9% 99.9% 98.4% 99.8%

100% 98.2% 99.9% 99.4% 99.6% 99.8% 99.9% 99.9% 97.6% 99.7%

indicating that the model was able to identify successfully almost all of areas out of the dangerous zones in both prediction intervals. Also, the 60 s ahead prediction grid shows a relatively high PPV) of 85.8% which reflects a high likelihood that the predicted desaturation events will be actual ones in both prediction intervals. On the other hand, the prediction grid results for 20 s ahead prediction shows a high sensitivity (TPR = 88.1%) but for a prediction interval of 60 s it decreased to 56.7% indicating that the predictive model is still able to capture more than 50% of the critical events over the longer horizon. Tables 5 and 6 show prediction grid results with TPR, TNR, PPV, NPV, and ACC values for 20 and 60 s ahead predictions respectively for ten patients. As shown in these tables, the values of TNR for both prediction intervals are close to 100% for all patients. Also, in general the PPV results are high in both intervals. In both prediction horizons, the AR model for patient no. 10 who did not experience any critical desaturation events did not predict any false critical ones. 649

H ElMoaqet et al

True Positive Rate (TPR)

Physiol. Meas. 35 (2014) 639

P#12− ROC Curve for 60s Predictions, AR−10 1 97% 95% 96% 94% 0.8 91% 0.6

89% 88%

0.4

87% 84%

0.2

0 0

Uniform Test SpO2 Cut−off thresholds % 0.2

0.4

0.6

0.8

False Positive Rate (FPR)

1

Figure 5. ROC curve for 60 s predictions of AR-10, patient no. 12.

Table 6. 60 s ahead prediction grid results for ten patients, AR-10.

P#

|A|

|B|

|C|

|D|

TPR

TNR

PPV

NPV

ACC

10 12 18 37 59 80 89 118 125 136

0 759 126 359 244 70 72 38 1213 143

0 126 5 17 7 13 14 9 82 14

0 577 89 174 136 80 66 45 503 163

7470 6008 7251 6920 7083 7307 7318 7378 5672 7150

NA 56.8% 58.6% 67.4% 64.2% 46.7% 52.2% 45.8% 70.7% 46.7%

100% 97.9% 99.9% 99.8% 99.9% 99.8% 99.8% 99.9% 98.6% 99.8%

NA 85.8% 96.2% 95.5% 97.2% 84.3% 83.7% 80.9% 93.7% 91.1%

100% 91.2% 98.8% 97.5% 98.1% 98.9% 99.1% 99.4% 91.9% 97.8%

100% 90.6% 98.7% 97.4% 98.1% 98.8% 98.8% 99.3% 92.2% 97.6%

The overall performance of the identified AR-10 models over prediction intervals can be seen through testing their classification performance when varying the cut-off threshold from the lowest to the highest SpO2 values in the patients’ time series. This can be illustrated using methods of receiver operating characteristics (ROC) (Zweig and Campbell 1993). For example, figure 5 shows very good classification for the 60 s ahead predictions obtained from the AR-10 model of patient no. 12. This can be best seen through the area under ROC curve (AUROC = 0.914) for this patient. Although the threshold of SpO2  89% is chosen for clinical significance, the AUROC expresses the overall classification performance of 60 s ahead predictions obtained from patients’ models. Table 7 shows AUROC for ten patients reflecting an excellent performance of predictive models. 6.2. Effect of pulse rate

We have developed software codes to investigate the interactive relationship between SpO2 and PR measurements. The regularized SpO2 and PR were used to construct these models. 650

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

i

Coefficient Value

Coefficient Value

Best Fit PR Output Coefficients (a ) 20 0 −20 0

5

10

15

Best Fit SpO2 Output Coefficients (a ) i

20 0 −20 0

i

Coefficient Value

Coefficient Value

Best Fit SpO2 Input Coefficients (b ) 2 0 −2 5 10 Coefficient Number (Time Lag)

5

10

15

Best Fit PR Input Coefficients (b i) 2 0 −2

15

5 10 Coefficient Number (Time Lag)

15

Figure 6. Distribution of best fit ARX model coefficients for two modeling choices for SpO2 and PR.

Table 7. AUROC for 60 s predictions of AR-10.

P#

AUROC

10 12 18 37 59 80 89 118 125 136

0.863 0.914 0.969 0.962 0.978 0.824 0.918 0.933 0.982 0.943

The software codes identify dynamic systems models with different orders (n, m) using 7500 points of the two time series and the models are validated on 7500 points of the time series that are different from the ones used for modeling. For efficient computation, the software codes select the model order that yields the lowest RMSE using candidate model orders n and m ranging from 1 to 15. Best fit modeling results are shown in figure 6. The distribution of the ARX model coefficients (n, m) shows that the effect of SpO2 historical measurements on the PR output values is larger than the effect of the historical PR values on the SpO2 measurements. The effect of PR as an input affecting SpO2 dynamics can be analyzed through the number (m) and magnitude (bi ) of the input coefficients of the PR measurements. The PR input has a very short memory effect on SpO2 (max time lag m = 3) and the coefficients have negligible values (max |bi | = 0.000 03). Although the model order could be freely selected, the low values of the coefficients (bi ) represent low dependences between PR and SpO2 measurements in a model that assumes PR as an input to SpO2 . Including the PR data in predicting SpO2 in linear models is not effective. In contrast, it can be seen easily that SpO2 is an effective input 651

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Figure 7. Duration between AR-10 (left) and ARX-(10,10) (right) predictions and adverse events.

to PR. Therefore, according to this analysis, these two physiological variables might be best characterized by the SpO2 measurements being an input (cause) and the PR values as an output (effect). This agrees with previous studies that indicated PR changes lag behind the changes in the SpO2 (Somers et al 1995). However, our goal in this study is to assess whether better prediction of critical events can be achieved by incorporating PR dynamics in the SpO2 models. To compare SpO2 prediction results with and without the input from PR dynamics, we considered an ARX(10,10) and compared prediction results for this model with the previously obtained prediction results with the AR-10 model. Thus, we tested these models over 60 s prediction intervals with the proposed SpO2 prediction grid as shown in table 8. Comparing these results to the previously obtained ones with AR-10 models in table 6, it can be noticed that adding the PR dynamics did not improve the capability of capturing the critical events in the SpO2 time series of patients (p-value = 0.814). 6.3. Predictive capability based on duration between prediction and adverse events occurrence

Another way to look at the predictive capability of SpO2 models can be seen through evaluating how early an adverse event can be forecasted. Of course, if an adverse event can be predicted sufficiently far in advance, there might be a stronger opportunity for clinical intervention to prevent such events. In contrast, if the predictive model did not capture the critical event or if it predicted the event just before its occurrence, this might not be able to provide enough time for medical intervention in general care units. Figure 7 shows the histograms of the duration between the time AR-10 and ARX-(10,10) models predict critical events and the actual time of occurrence of these events. The red histogram of AR-10 predictions shows that 46 critical events (56.2% of the total number of events) were able to be predicted 10–60 ahead of time using this type of model. However, looking at same histogram but for ARX-(10,10), no improvement in detecting these adverse events can be noticed due to the inclusion of the PR dynamics in the models (p-value 0.986). 652

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Table 8. 60 s ahead prediction grid results for ARX(10,10) models

P#

|A|

|B|

|C|

|D|

TPR

TNR

PPV

NPV

ACC

10 12 18 37 59 80 89 118 136 125

0 672 125 358 233 65 72 36 134 1226

0 2 4 17 20 7 6 28 9 65

0 664 90 175 147 85 66 47 172 490

7470 6133 7252 6920 7070 7313 7326 7359 7155 5689

NA 50.3% 58.1% 67.2% 61.3% 43.3% 52.2% 43.4% 43.8% 71.4%

100% 99.9% 99.9% 99.8% 99.7% 99.9% 99.9% 99.6% 99.9% 98.9%

NA 99.7% 96.9% 95.5% 92.1% 90.3% 92.3% 56.3% 93.7% 95%

100% 90.2% 98.8% 97.5% 98% 98.9% 99.1% 99.4% 97.7% 92.1%

100% 91.1% 98.7% 97.4% 97.8% 99.8% 99% 99% 97.6% 92.6%

7. Conclusion and future work This paper presents a threshold performance metric for predictions of physiological variables through evaluating the ability of the predictive models to capture regions of critical events. While the models used in this paper are standard linear models, they provided the insight needed to show the importance of introducing prediction evaluation metrics that are more suitable for physiological systems. The proposed performance metric is based on the ability of the model to predict critical desaturation events in the SpO2 time series of patients. Although this metric is currently designed to be used with SpO2 signals, it can be applied to other physiological signals using the same principle. Furthermore, the study shows that the inclusion of pulse rate dynamics in the SpO2 predictive models does not improve the capability of predicting critical SpO2 events. Through analyzing the significance of coefficients in the input–output models between SpO2 and PR, it was shown that oxygen in blood is an effective input to the PR rather than vice versa. The proposed performance metric is designed to evaluate the ability of dynamic models to predict critical events by classifying predictions into the four regions of the SpO2 prediction grid. Future work may include classifying the desaturations according to their severity. Also, we plan to consider analyzing the identification of these signals in the frequency domain in order to search for any relationship that might exist between the severity of a patient’s health status with the natural modes present in their physiological signals.

References Altman D G and Bland J M 1994a Diagnostic tests: 1. Sensitivity and specificity Br. Med. J. 308 1552 Altman D G and Bland J M 1994b Diagnostic tests: 2. Predictive values Br. Med. J. 309 102 American Sleep Disorders Association 1997 International Classification of Sleep Disorders, Revised: Diagnostic and Coding Manual (Rochester, MN: American Sleep Disorders Association) Astrom K and Wittenmark B 1997 Computer Controlled Systems: Theory and Design (Upper Saddle River, NJ: Prentice-Hall) Berry R B et al 2012 Rules for scoring respiratory events in sleep: update of the 2007 AASM manual for the scoring of sleep and associated events. J. Clin. Sleep Med. 8 597–619 Bliss J P and Dunn M C 2000 Behavioural implications of alarm mistrust as a function of task workload Ergonomics 43 1283–300 Centers for Medicare and Medicaid Services 1993 National Coverage Determination (NCD) for Home Use of Oxygen (Baltimore, MD: Centers for Medicare and Medicaid Services) Clarke W L 2005 The original Clarke error grid analysis (EGA) Diabetes Technol. Therapeutics 7 776–9 653

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Dua P, Doyle F J and Pistikopoulos E N 2006 Model-based blood glucose control for type 1 diabetes via parametric programming IEEE Trans. Biomed. Eng. 53 1478–91 Eichhorn J 2003 Recognizing and preventing hypoxemic injury risk on the general care floor J. Healthc. Risk Manag. 23 217–22 ElMoaqet H, Tilbury D and Ramachandran S K 2013a Linear and non-linear models for blood oxygenation prediction: a comparison Technical Report 2013-DT-001 (Ann Arbor, MI: University of Michigan—Mechanical Engineering Department) ElMoaqet H, Tilbury D M and Ramachandran S K 2013b Predicting oxygen saturation levels in blood using autoregressive models: a threshold metric for evaluating predictive models Proc. of American Control Conf. pp 734–9 Epstein L and Dorlac G 1998 Cost-effectiveness analysis of nocturnal oximetry as a method of screening for sleep apneahypopnea syndrome Chest 113 97–103 Gani A, Gribok A, Lu Y, Ward W, Vigersky R A and Reifman J 2010 Universal glucose models for predicting subcutaneous glucose concentration in humans IEEE Trans. Inform. Technol. Biomed. 14 157–65 Gani A, Gribok A, Rajaraman S, Ward W and Reifman J 2009 Predicting subcutaneous glucose concentration in humans: data-driven glucose modeling IEEE Trans. Biomed. Eng. 56 246–54 Griffin M, Lake D and Moorman J 2005 Heart rate characteristics and laboratory tests in neonatal sepsis Pediatrics 115 937–41 Hwang D, Shakir N, Limann B, Sison C, Kalra S, Shulman L, Souza-Ade C and Greenberg H 2008 Association of sleep-disordered breathing with postoperative complications Chest 133 1128–34 Joint Commission 2013 Sentinel Event Alert 50: Medical Device Alarm Safety in Hospitals (Oak Brook, IL: Joint Commission) Kripke D F, Ancoli-Israel S, Klauber M R, Wingard D L, Mason W J and Mullaney D J 1997 Prevalence of sleep-disordered breathing in ages 40–64 years: a population based survey Sleep 20 65–76 Ljung L 1987 System Identification: Theory for the User (Upper Saddle River, NJ: Prentice-Hall) Mason R, Broaddus V, Martin T, King T, Schraufnagel E, Murray J and Nadel J 2010 Murray and Nadel’s Textbook of Respiratory Medicine (Philadelphia, PA: Saunders) McFadden C J, Gutierrez L T, Leveque J A and Andersen M D 2007 CPOM: alleviating the demand for ICU beds J. Nursing Manag. 27 48E–H Mooe T, Rabben T, Wiklund U, Franklin K and Eriksson P 1996 Sleep-disordered breathing in women: occurrence and association with coronary artery disease Am. J. Med. 101 251–6 Norris P, Stein P and Morris J 2008 Reduced heart rate multiscale entropy predicts death in critical illness: a study of physiologic complexity in 285 trauma patients J. Crit. Care 23 399–405 Rauscher J, Popp W and Zwick H 1991 Computerized detection of respiratory events during sleep from rapid increases in oxyhemoglobin saturation Lung 169 335–42 Reifman J, Rajaraman S, Gribok A and Ward W 2007 Predictive monitoring for improved management of glucose levels Diabetes Sci. Technol. 1 478–86 Riordan W, Norris P, Jenkins J and Morris J 2009 Early loss of heart rate complexity predicts mortality regardless of mechanism, anatomic location, or severity of injury in 2178 trauma patients J. Surgical Res. 156 283–9 Schulte-Frohlinde V, Ashkenazy Y, Goldberger A, Ivanov P, Costa M, Morley-Davies A, Stanley H and Glass L 2002 Complex patterns of abnormal heartbeats Phys. Rev. E: Stat. Nonlinear Soft Matter Phys. 66 031901 Somers V, Dyken M E, Clary M P and Abboud F M 1995 Sympathetic neural mechanisms in obstructive sleep apnea J. Clin. Investigation 96 1897–904 Sparacino G, Facchinetti A and Cobelli C 2010 Smart continuous glucose monitoring sensors: on-line signal processing issues Sensors 10 6751–72 Sparacino G, Zanderigo F, Corazza S, Maran A, Facchinetti A and Cobelli C 2007 Glucose concentration can be predicted ahead in time from continuous glucose monitoring sensor time-series IEEE Trans. Biomed. Eng. 54 931–7 Stradling J and Crosby J 1991 Predictors and prevalence of obstructive sleep apnea and snoring in 1001 middle aged men Thorax 46 85–90 Taenzer A H, Pyke J, McGrath S and Blike G 2010 Impact of pulse oximetry surveillance on rescue events and intensive care unit transfers: a before-and-after concurrence study Anesthesiology 112 282–7 Tarassenko L, Hann A and Young D 2006 Integrated monitoring and analysis for early warning of patient deterioration Anaesthesia 97 64–8 654

H ElMoaqet et al

Physiol. Meas. 35 (2014) 639

Trajanoski Z, Regittnig W and Wach P 1998 Simulation studies on neural predictive control of glucose using the subcutaneous route Comput. Methods Programs Biomed. 56 133–9 Tsuji H, Venditti F, Manders E, Evans J, Larson M, Feldman C and Levy D 1994 Reduced heart rate variability and mortality risk in an elderly cohort. The Framingham Heart Study. Circulation 90 878–83 Voepel-Lewis T, Parker M L, Burke C N, Hemberg J, Perlin L, Kai S and Ramachandran S K 2013 Pulse oximetry desaturation alarms on a general postoperative adult unit: a prospective observational study of nurse response time Int. J. Nurs. Stud. 50 1351–8 Vries E D, Ramratthan M, Smorenburg S, Gouma D and Boremeester M 2008 The incidence and nature of in-hospital adverse events: a systematic review Qual. Saf. Health Care 17 216–23 Watkinson P, Barber V, Price J, Hann A, Tarassenko L and Young J 2006 A randomised controlled trial of the effect of continuous electronic physiological monitoring on the adverse event rate in high risk medical and surgical patients Anaesthesia 61 1031–139 Watkinson P and Tarassenko L 2012 Current and emerging approaches to address failure-to-rescue Anesthesiology 116 1158–9 West J 2012 Respiratory Physiology: The Essentials (Baltimore, MD: Williams & Wilkins) Williams K and Galerneau F 2003 Intrapartum fetal heart rate patterns in the prediction of neonatal acidemia Am. J. Obstetrics Gynecol 188 820–3 Zweig M H and Campbell G 1993 Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine Clin. Chem. 39 561–77

655

Evaluating predictions of critical oxygen desaturation events.

This paper presents a new approach for evaluating predictions of oxygen saturation levels in blood ( SpO2). A performance metric based on a threshold ...
879KB Sizes 2 Downloads 3 Views