Influences on the performance of hospital clinical event monitoring.

Influences on the Performance of Hospital Clinical Event Monitoring Ira J. Haimowitz MIT Laboratory for Computer Science 545 Technology Square, room 414 Cambridge, MA 02139 Net: [email protected]

Isaac S. Kohane Children's Hospital, Harvard Medical School 300 Longwood Avenue Boston, MA 02115 Net: [email protected]

Abstract'

distinct clinical events per day reported over the computer network and as the use of the CWS grows, so will this number. It was not obvious that current technology would allow real-time or at least near-real time generation of CEM reports. Even with the multi-megabyte main-memories of current computers, we were particularly concenied about the possible degradation of performance if the memory management of hundreds of concurrent monitors were left to the native virtual memory system. Some of these concems have been addressed by the growing literature in real-time artificial intelligence (Al) systems. Among the solutions: faster matching algorithms for rule-based systems (e.g. TREAT (3)), caching database transactions between Al systems and databases (4), or intelligent pre-processing of real-time data (5). Real-time AI systems have sought to take advantage of the increased availability of hardware implementing parallel computation to increase performance. Miller (6) describes an intelligent cardiovascular monitor composed of hierarchically arranged concurrent processes which generate increasingly abstracted data interpretations. Blackboard systems such as Guardian (7) have explicit control mechanisms, to detemiine the scheduling, execution and distribution of computational resources, such as parallel processors, to the various "Knowledge Sources." Ishida's work (8) exploits knowledge of the finer gained parallelism of distributed production systems to automate the decomposition of overloaded "agents" into concurrently executing production systems. We have investigated the problems when concurrent monitors have memory requirements exceeding the system's resources. These include influences that various strategies for caching the knowledge-based systems can have on aggregate performance. We wrote a computer program that simulates a hospital monitoring environment, called the multi-monitor simulator (MMS). This MMS allows one to change key parameters related to problem size and memory allocation and observe the effect on overall monitoring performance. One can vary the number of patients, the monitors ordered per patient (MOPexpressed as a percentage of the total number of monitors), and mechanisms for maintaining a fast cache of some often used monitors. Performance criteria include real time for a standard polling of events and the hit ratio,

The implementation of real-time clinical monitors in large hospital information systems places large performance demands on these systems. Meeting these demands not only requires methodologies to augment the performance of individual monitors but an understanding of how patient population and monitor characteristics might influence overall system performance. We have built a multimonitor simulator to study these influences on performance. In doing so, we have focused on the impact of a variety of techniques to cache a subset of a large number of monitors in primary memory. 1 Introduction

Typically, intelligent computer programs for medical diagnosis or therapy have been tested on small populations of patients. In contrast, studies of the performance of such systems deployed on a large scale to follow the care of thousands of patients have been scarce. The emergence of relatively low-cost, viable commercial technologies to maintain on-line patient data has inspired several efforts to implement some form of intelligent monitoring. Therefore a better understanding of the real-time performance of these monitors has become pressing. At Boston Children's Hospital, the recent completion of the core of the Virtual Data-Base (VDB) (1) has provided a rich environment for the development of Clinical Event Monitors (CEM). The VDB includes the patient data of many departmental applications such as clinical laboratory results, financial transactions, radiological reports, procedure reports, discharge summaries, and a rapidly increasing fraction of clinical documentation. This information is distributed to over 3000 providers over a high-speed network. A small, but growing fraction of these providers interact with this system through bitmapped Clinician Workstations (CWS) (2). In a hospital of the size of Children's, performance issues are of critical importance. There are at least 24000 1. The work reported here has been supported (in part) by NIH grant ROl LM 04493, NICHHD 5T32 HD07277-9, and by a U.S. O.N.R. Graduate Fellowship. Please send correspondence to Ira J. Haimowitz at the above address. 0195-4210/91/$5.00 C) 1992 AMIA, Inc.

614

the percentage of monitor runs for which the monitor is already in fast cache memory. 2 Tools & Structure

The MMS runs in Hypercard, connecting via Hyperbridge to a shell for object-oriented rule-based systems, NexperVt". Tests were run on an Apple Macintosh IICX. The simulator contains seventy monitors that evaluate patient electrolyte and blood gas laboratory results, each implemented as a knowledge base in the expert system shell. Fifty of these monitors perform a two point trend analysis for a patient's two most recent values of a particular electrolyte, ten each for Na+, K+, Ca++, Cl-, and HC03-. The rule sets for these fifty monitors are completely mutually disjoint. TWenty additional monitors analyze a patient's most recent pair of HCO3- and PCO2 values and identify possible acid-base disorders. They use an algorithm of Patil's ABEL program (9), linking possible disorders to acid-base normogram regions. A separate list maintains which monitors are ordered for each patient. The MMS also includes a set of clinical event tables. Each event consists of a patient id, a clinical parameter, a value for that parameter, and the time the hospital laboratory posted that event. The events vary randomly among all patients, and among the six electrolytes above. Each event table contains an average of 15 events per patient. Three data structures are updated throughout the simulation. One is the set of patient records, implemented as Nexpert objects and containing fifty-two slots for laboratory data and their interpretation. Another is the monitor cache, a collection of clinical monitors stored in primary memory for rapid access. The third is the table of simulation results. The MMS user can examine changing state of the controller on the control panel, shown in figure 1. The user may set any of the control parameters at the left. letrt Simulatle

aot cecl

Monhorcontroller t0 Load Patient KS)

Simulate 9

patients baud.

Event table:i [)t

Io

Trlal W. me=I 9

Control Perimeters Nexwert Molter CACNE f(tlent, no. p&tsus", KB N) We. of meltersL IIb8e,4,0x0000007 017

M. of patients

z3

ot biters- 'Z"4,fl

oWrfi,2xa1,0x00015;

Werking

cadff7,2,o0000014

150: patO hco3 14.8 2:32:44

I2 iff,2,OxOtDOOO 2, 0,00000 1

k adiff

WMdtf3,,OxODOWOOF

Lo

1

)

oJ3

7

dl

O

Statusnig7 e~f

11 cltfn2OxOOOOO abeII2,3,0x00000 hoo~diffl,2,04 00A

hcodIffI,2,0000009

event:

odi

ooin

". of e~~~~wit?,2!3eif,04CW00 ordering (s

I

finshed ZN

Leads

Time

0.015:40

l temiroder men FIGURE 1. Multi-monitor simulator control panel.

615

3 Basic Experimental Design

All simulations consist of a sequential polling of a specified number of clinical events. For each event the controller searches the list of monitors ordered for that

patient. For each such monitor, the controller checks if the available data will be sufficient for that monitor to reach any conclusions. If so, the controller checks the monitor cache; if the monitor is not loaded into the cache, the controller loads it, possibly replacing some other monitor according to an experimentally controlled cache replacement strategy. Once in the cache, the monitor evaluates the patient's status, and writes that status into the patient record. We ran three simulation trials for each of the experiments described below. 3.1 Test 1: Varying Patient Size The first experiment examined the effect of a varying number of patients on the efficiency of an otherwise fixed monitor setting. We ran three trials each with 10, 20, and 50 patients, keeping constant a MOP of ten percent of the seventy monitors, a cache size of twenty monitors, and a cache replacement strategy of replacing the monitor ordered by the fewest patients (see section

3.3). 3.2 Test 2: Varying Cache Size Our second experiment examined the effect on efficiency of varying the monitor cache's size. We ran three trials each with sizes of 15, 20, and 25 monitors, keeping constant a sample of fifty patients, a MOP of ten percent, and a cache replacement strategy of replacing the least ordered monitor. 3.3 Test 3: Varying Cache Replacement Strategy We experimented with three strategies for choosing which monitor to remove when a new monitor must be loaded into a full cache. One strategy is removing the monitor used least recently in the simulation (which we shall call "LRU"), a technique sometimes used for both data and instruction caches in computer architecture design. A second strategy is removing the monitor in the cache that is ordered for the fewest number of patients ("Least Ordered"). This method is justified by reasoning that a monitor that is ordered more is more likely to be run for a random datum, and is therefore worth keeping in the cache longer. A third strategy is removing the monitor that to that point has been used the least ("Least Used"). This method attempts to gauge patterns in a particular polling of events. For example, if monitors for acid-base disorders have been used frequently, the event table has thus far contained a high percentage of HCO3and PCO2 values. This bias may continue in the near future, justifying the retention of the often used acidbase monitors. We ran three trials with each cache

replacement strategy, keeping constant a sample of fifty patients, a MOP of ten percent, and a cache size of twenty monitors. 4 Results 4.1 Test 1: Varying Patient Size By increasing the number of patients we also increase proportionally the number of events in a standard polling, since we keep a constant of fifteen events per patient. The graphs in figure 2 therefore show different numbers of events for different patient sizes.2 Note that the average total number of runs is approximately proportional in the number of patients. Note also that each of the three curves appears upwardly concave for a little over one-third of its duration, and then levels off to an almost constant rate of runs per event. This rate is slightly more than one for all three numbers of patients. For the average load totals also, the slope becomes relatively constant after an initial upwards concavity. Unlike with the number of runs, the number of loads increases somewhat more than linearly in the number of patients. This non-linearity of the average number of runs is explained in part by the average hit ratio results. For each plot, the average hit ratio rises sharply, due largely to filling of the cache, until leveling off for the remainder of the simulation. Remaikably, the point of stabilization appears approximately the same, near 120 events, independent of the number of patients. For ten patients the hit ratio stabilizes at near 57%; for twenty, near 48%; and for fifty, patients near 42%. 4.2 Test 2: Varying Cache Size The larger the cache size, the more likely a given needed monitor is present, and therefore the hit ratio rises. Figure 3 shows that an increase by five in cache size can have a dramatic effect on the hit ratio. The hit ratio for fifteen monitors stabilizes near 35%, for twenty monitors near 42%, and for twenty-five monitors near 50%. 4.3 Test 3: Varying Cache Replacement Strategy Figure 4 shows distinctive hit ratio plots for the three strategies: LRU, Least Ordered, and Least Used. The graphs are identical until the twenty element cache is filled, after which they diverge and eventually stabilize: Least Used near 45%, Least Ordered near 42%, and LRU near 38%. The Least Used strategy often resulted in an entire monitor cache filled with acid-base monitors. because these monitors are triggered by either of two event parameters (HCO3- and PCO2), to one parameter 2. These data, as well as the average real times for a number of events, were kept in all experiments. Space limitations restrict us to show only the most illustrative graphs.

Number of events

Avera8e rttz3mbber of loads 450 400 350 300 so 250 200 150

-

I.

.

I

1

I

1

1_0

;T I

I

1

100~ OE:+0 _I--

-

li

..

1

1_1 I

.

.

1

1 I

1

I

I4uzr,1,er of enarto

Number of events

FIGURE 2. Results with varying numbers of patients. for the trend monitors. In an event stream generated independently with respect to parameters, the acid-base monitors were in fact used much more frequendy.

5 Discussion

We shall first analyze discuss those influences n monitor efficiency that are independent of the hardware platform or expert system shell one may use in a multimonitor system. We then discuss results that may have significantly depended on our paricular implementations. Formal proofs of some results are omitted, but can be obtained by contacting the authors. 5.1 Implementation Independent Conclusions For the derivations below we define a multi-monitor system (like the MMS) to be in steady state if all subsequent events trigger the maximal number of monitor runs.

616

e4 | _ ~ ~ ~ ~25cahe|

seventy monitors total: ten two-point trend monitors for each of five parameters and twenty acid-base monitors. Therefore, the expected makeup of a patient's monitor set is one of each kind of two-point trend monitors and two acid-base monitors. Now, returning to (EQ 2) above, if pk = K+, Na+, Ca++, or Cl-, the expected number of monitor mns is one, for the patient has on average ordered one trend monitor for that parameter. If pk = HCO3-, the expected number of monitor rns is three, for the patient has on average ordered on a HCO3- trend monitor and two acid-base monitors. Fnally, if pk = PCO2, the expected

Average Hit Ratio 0.6

15 cache

0.5

~~~~~~~~~~~~3

0.3

0.2

0cche

0.1

OE+0

Number of events

FIGURE 3. Average hit ratios for varying cache sizes. Average Hit Ratio

l-T

0.5. 0.45. 0.4

- -

p

LRU

0.35i 0.3. nA

Least Ordered

i

0.2. 0.15. 0.1

0.05! OE+O

I-

_s

-

-

:%w1

Least Used

ww

.4

-

Wr

mi

W4

8

Number of evrents

FIGURE 4. Hit ratios for varying strategies. 5.1.1 Number of Runs per Event at Steady State: Given knowledge about the required inputs of the various monitors and the patient ordering of monitors, we can approximate the expected number of runs per event for a clinical multi-monitor system at steady-state. Consider some event Ek at steady state of the MMS, and let Rk be the number of monitor runs generated by Et We wish to calculate E(Rk). Now, let pk be the parameter of event Ek; pk thus may be Na+, K+, etc. The six possibilities for pk form a partition over which we can compute this expected value: E(Rk) = l p(pk= pa) x E(Rk I pk = pa), (EQ 1) where pa ranges over the six possible parameters. In the MMS the events are generated randomly with respect to these parameters. TIhus the probability in the above equation is always one-sixth, and we can rewrite the expected value as: E(Rk) = 1/6 E(Rk I pk = pa). (EQ 2) To calculate the expected values inside the sum, we must consider how many monitors of each type a randomly chosen patient has ordered. In these experiments the MOP was ten percent throughout. Recall that there are ,

617

number of monitors is two, for the two acid-base monitors ordered on average. Substituting in (EQ 2) yields: E(Rk) = 1/6 (1+1+1+1+3+2) = 9/6 = 3/2. (EQ 3) This approximates the slope of any of the number of runs graphs in the results section. 5.1.2 Number of Loads per Event at Steady State: By assuming a constant hit ratio r at steady state, we can estimate the expected number of loads, E(Lk), at steady state. The monitors loaded for a given event are precisely those that run but are not in the cache. Since (l-r) is the probability that a randomly chosen monitor will not be in the cache, the expected number of loads is given by: E(Lk) = 3/2 (l-r). (EQ 4) If we use the end, relatively constant hit ratios from a graph in the section 4, we can approximate the steady-state slope of the corresponding number of loads graph. 5.1.3 Numbers of Patients and Hit Ratios: Let us define the monitor ordering distribution as a function mapping each monitor to the number of patients ordering that monitor This clinically significant function is useful for analyzing hit ratios under varying numbers of patients. In this test (section 3.1) our cache replacement strategy for all numbers of patients set sizes was Least Ordered. Indeed, because events in the MMS were randomly generated, those monitors ordered by the most patients tended to be used the most. However, the larger the number of patients, the more uniform the expected monitor ordering distribution. The more uniform this distribution, the less effective a strategy it is to hold the most ordered monitors in the cache, for the selection of a monitor to remove approaches random choice. Consequently the hit ratios decreased with increasing numbers of patients. 5.1.4 Cache Replacement Strategies and Hit Ratios: We aim to justify why in our results the long-tenn hit ratio for Least Recently Used was lower than that of Least Ordered, which in tum was higher than that of Least Used. Because our event stream is rndom in the patient id and in the clinical parameters, one cannot predict the next time

that a recently run monitor will be run again. Thus in the MMS an LRU replacement strategy is effectively random replacement. If, as in the MMS, events are generated randomly in both patients and parameters, it is reasonable to assume that monitors most frequently ordered are most likely to be used in the near future. Thus a Least Ordered strategy works fairly well. In clinical practice, this may not always be so: all patients may get a monitor that generates a warning for a rare but lethal event such as malignant hyperthermia, but that monitor may be used extremely rarely. One should therefore consider the nature of highly ordered monitors when considering a Least Ordered strategy. The Least Used strategy was the most effective in the MMS largely because it was most predictably tied to monitor usage. Particularly, it exploited that the acid-base monitors were continually used much more hequently in the long run. This was because they run on either of two parametric inputs, as opposed to one parameter for the two-point trend monitors. This knowledge was not explicitly encoded but was "learned" by the simulator as the simulation wore on. More generally, a Least Used strategy can be very valuable during event streams where the set of monitors used with high frequency remains nearly constant throughout the stream. 5.2 Implementation Dependent Conclusions Our main implementation-specific results relate to the real time for a a MMS simulation. Two key results characterize the average real time per event (RTE). One is that as cache size increases, RTE is swayed by two opposing influences. An increased hit ratio reduces the average timing for loads, but a larger total rule set in the expert system shell increases monitor running time. A second result is that RTE is monotonically increasing in the number of events processed. A separate experiment has shown this to be due to the increasing number of loads, and that increasing numbers of runs have no effect. 6 Summary / Future Work As we implement a clinical event monitoring system at Children's Hospital we have had to focus on real-time performance in a data-intensive environment. We have described one approach for assessing this performance, and have discovered both implementation-dependent and independent influences. Another, more theoretical approach may lie in queueing theory (10). Our experimental design approximates a M/M/1 queueing system; as noted above, our process time distribution is not strictly Poisson but is monotonically increasing. Taking an empirical approach has been valuable for at least two reasons: it has uncovered implementation-dependent influences, and

618

it has exposed us to the process of implementing a monitoring system of dimensions plausible for a moderatelysized hospital. As we continue to evaluate the effect of our architectural choices upon performances, we intend to adopt broader quantitative performance measures of speed, responsiveness, timeliness and graceful adaptation such as those described by Dodhiawala (11) to guide our efforts. 7 References 1. Margulies D., McCallie D.P., ELkowitz A., Ribitsky R. An integrated hospital information system at Children's Hospital. SCAMC 1990: 699-703. 2. Kohane I.S., McCallie D. P. A dynamically reconfigurable clinician's workstation with transparent access to remote and local databases. AMIA 1990. 3. Miranker D.P., TREAT: A better match algorithm for Al production systems. AAAI 1987: 42-47. 4. McKay D.P., Finin T.W., O'Hare A. The intelligent database interface: integrating Al and database systems. AAAI 1990: 677-684. 5. Washington R., Hayes-Roth B. Input data management in real-time Al systems. UCAI Xl: 250-255. 6. Miller PL, Gelemter D. Machine-independent model-based tools for parallel computation in biomedicine. SCAMC 1990: 262-265. 7. Hayes-Roth B., Washington R., Hewett M, SeiverA. Intelligent monitoring and control. UCAI 1989: 243-249. 8.Ishida T, Yokoo M,Gasser L. An organizational approach to adaptive production systems. AAAI 1990: 5358. 9.Patil R.S., Szolovits P., Schwartz W.B. Modeling knowledge of the patient in acid-base and electrolyte disorders. In: Szolovits P., ed. Artificial Intelligence in Medicine. Westview Press, Inc., 1982: 191-225. 10. Kleinrock, L. Queueing Systems, Volumes I and II, John Wiley and Sons, 1976. 11.Dodhiawala R, Sridharan NS, Raulefs P, Pickering C. Real-time Al systems: A definition and an architecture. UCAI 1989: 256-261. Acknowledgements We wish to thank those who suggested and critiqued many of the ideas presented here and particularly David Margulies, Peter Szolovits, David McCallie, and Long Nguyen.

Monitoring clinical activity and performance: how can hospital episode statistics be made fit for purpose?

Dopaminergic modulation of performance monitoring in Parkinson's disease: An event-related potential study.

Clinical leadership and hospital performance: assessing the evidence base.

Event familiarity influences memory detection using the aIAT.

Rule monitoring ability predicts event-based prospective memory performance in individuals with TBI.

The design of a rule-based clinical event monitor in a multi-vendor hospital computing environment.

Event-related potentials associated with performance monitoring in non-human primates.

homeSound: Real-Time Audio Event Detection Based on High Performance Computing for Behaviour and Surveillance Remote Monitoring.

Monitoring Whooping Crane Abundance Using Aerial Surveys: Influences on Detectability.

Leveraging Food and Drug Administration Adverse Event Reports for the Automated Monitoring of Electronic Health Records in a Pediatric Hospital.

Prescription-event monitoring: methodology and recent progress.

Time effects on event-related brain potentials and vigilance performance.

Circadian influences on clinical values in man.

The effect of information technology on hospital performance.

The Effects of Acute Dopamine Precursor Depletion on the Cognitive Control Functions of Performance Monitoring and Conflict Processing: An Event-Related Potential (ERP) Study.

Cognitive fatigue influences students' performance on standardized tests.

Cycling on a Bike Desk Positively Influences Cognitive Performance.

A conceptual framework for Taiwan's hospital clinical performance indicators.

Association between hospital post-resuscitative performance and clinical outcomes after out-of-hospital cardiac arrest.

Compliance assessed by the Medication Event Monitoring System.

Impact of hemodynamic monitoring on clinical outcomes.

Parental monitoring of children's media consumption: the long-term influences on body mass index in children.

The clinical impacts of apparent embolic event and the predictors of in-hospital mortality in patients with infective endocarditis.

Impact of feedback on three phases of performance monitoring.