Evaluation of

an

SQL Model of the HELP Patient Database

Stanley M. Huff, M.D., Cheryl L. Berthelsen, M.S., and T. Allan Pryor, Ph.D. Department of Medical Informatics, LDS Hospital/University of Utah Salt Lake City, Utah 84143 Andrew S. Dudley, M.B.A. Intermountain Health Care, Medical Information Systems Salt Lake City, Utah 84111 Abstract

is possible that the use of generalized database tools for patient data will result in increased costs, decreased functionality, and decreased performance because of the unique needs and characteristics of the clinical environment. As a consequence of the above issues we implemented and evaluated the performance characteristics of a patient database that was constructed using Tandem's SQL (Structured Query Language) database tools. The design of the database and the results of our evaluation are described below.

We tested a new model of the HELP patient database that makes use of relational tables to store patient data and provides access to data using SQL (Structured Query Language). The SQL database required more storage space and had many more physical records than the HELP database, but it wasfaster and more efficient in storing data than the standard HELP utilities. The HELP utilities used disk space more efficiently and were faster than the SQL tools when retrieving data for typical clinical reports. However, the SQL model provides networking capabilities, general report writing tools, detailed user documentation, and an abilityfor creating secondary indexes that offset its poorer performance.

Methods In the first part of this section we briefly describe the current HELP system architecture after which we discuss the design of the SQL tables used in our studies. Finally, we outline two separate experiments that compared the SQL database to the current HELP database.

Introduction The HELP [1,2,3] system is a patient care centered hospital information system that is based on knowledge driven tools for aiding clinicians in the care of their patients. A central part of the HELP system is the unique hierarchical database that is used to represent a patient's clinical information as coded data. The current database architecture has proven to be flexible and extensible in meeting the needs of clinical researchers while at the same time supporting the data entry and reporting demands of patient care providers. However, recent advances in commercially available relational database management systems (DBMS) [4] have prompted us to consider the use of relational tools in the HELP system. The potential advantages of using commercially available software as opposed to internally developed programs are many: decreased cost of software maintenance, standard ad hoc query tools, increased performance, enhanced documentation, decreased application development time, hardware independence, and simplified networking. But in a domain like clinical medicine it is possible that these potential advantages will not be realized due to the special attributes of clinical data. Specifically, patient information varies more widely in content, structure, and quantity than is true for most other types of computer stored data like that used in banking, sales, or travel scheduling. It 0195-4210/91/$5.00 C) 1992 AMIA, Inc.

HELP Database Architecture

The current HELP system database architecture is hierarchical and optimized for patient centered data entry and retrieval [3]. In practice this means that patient data blocks are keyed by a patient identifying number followed by a code (called the "data class"). The data class indicates the type of data that is contained in the block. Typical data classes are hematology, chemistry, pharmacy, pulmonary, and medical history. The header portion of a physical disk record also contains data fields that indicate the earliest and latest times that data was added to the block, and a table that shows the subtypes of information (called the "field code") present in the patient's data. For the hematology data class the field code table would indicate whether the data block contained complete blood count (CBC) data, coagulation studies, and/or urinalysis results. To minimize access time to a given data class of data, many logical data records (called "strings") are stored on a single physical block, with each data block being 2034 bytes in length. This means that many CBC's can be retrieved in a single disk read. The data strings are variable in length to accommodate growth in the types and quantities of data that can be stored, and the strings are organized in reverse chronological order so that the most recent data can

386

Patient Number

Data Class

Early Time

I

I

Field Code Index

Late Time

Ip

Figure 1. A HELP patient data record showing multiple packed data strings in a single physical record. The patient number and data class are the key to the record.

be retrieved the fastest. An example of a data block is shown in figure 1.

SQL Database Architecture In designing the SQL tables used in our studies we assumed that patient oriented access was paramount, that data is commonly retrieved in related groups, and that the most recent data is of the most interest. A related constraint was that the SQL implementation had to accommodate existing decision support tools to provide upward compatibility with existing clinical and decision support applications. The record structure used for the SQL patient database is shown in figure 2. The first column in the table is the patient identifier, followed by the data class of the data contained in the record. The data classes in these records are the same data classes used in the standard HELP database. The "type" field indicates the structure of the data that is contained in the "packed data" or packed string portion of the record. The field code represents the subclass of data that is contained in the record and corresponds to the same categories as contained in the field code table of the standard HELP

Patient Number i

Data Type Class

I23456783

013 1 4

database blocks. The "string time" represents the time that a record of data was obtained. The packed data portion of the record contains a variable length data string that is identical in form to the data strings contained in the data blocks of the standard HELP datab. It is apparent from the presence of the variable length packed data string present in the SQL table that this design is not a normalized relational representation. Packed strings were retained in the design to support hierarchical inferencing, which is a key feature of the expert system capabilities of HELP. It should also be clear that changing from the standard HELP structure where one physical block contains many packed strings to the SQL structure where each row contains a single string is a major architectural change. Thus, the studies described below should be examined with the knowledge that the SQL database will need to perform many more logical and physical reads to access the same amount of data as can be obtained by a single logical read in the standard HELP database. This discrepancy in the number of logical reads required to access a similar amount of data is dictated by the granularity difference in a block versus record oriented database architecture.

String Time

Field Code

I

iT

0/91 11:19 1-- packed string --l

U

Figure 2. A row of data in the SQL patient data table. The patient number, data class, type, and field code are the key to the record.

387

Experiment 1 We designed the first experiment to compare the data storage performance of the two database schemes. A test HELP patient data file and an SQL table were created on the same physical disk, but initially no patient data was placed in either file. We created two different data storage programs, one based on HELP tools and the other based on embedded SQL routines. Both programs were designed to read "script" files that contained a list of specifications of data to be stored in the patient database. Each entry in the script file contained a patient number, the data class, type, field code, and length of the data to be stored. Each process read an entry in the script file and then stored the approprate data in its respective database. To simulate a loaded clinical environment, we created a control process that progressively started new storage processes until system performance degraded to a predefined level. The control process monitored each of the storage processes and when the average time to store a string in the database exceeded 10 seconds, the experiment was terminated. We evaluated each of the database schemes separately on an otherwise quiescent CPU.

storage processes before the average time to store a single record exceeded 10 seconds. The standard HELP database was able to support 34. The time required to store a record in the database was less for the SQL database than it was for the HELP database. This is graphically displayed in figure 3 where the time to store 50 records in the database has been plotted as a function of the number of active storage processes. The SQL process also used less CPU time on average, 28 milliseconds for SQL versus 57 milliseconds for HELP.

500 400 w

300

q

I-O

GD 200 ._0

100 0

Experiment 2

0

We designed the second experiment to compare the retrieval performance of the two database schemes in circumstances as similar to the clinical environment as possible. Two databases were again created on the same physical disk. The test HELP file was loaded by copying all data for 30,844 patients from the HELP patient database. The SQL table was loaded by a program that read each physical block in the test HELP file, separated the packed strings into individual records, and stored each packed string as a row in the SQL table using the format shown in figure 2. To simulate a multi-user clinical situation we again used a control process that created 10 simultaneously working report processes. Each report process was directed by instructions in a script file, but in this case the entries in the script file contained a patient number and the name of a report frame to execute. Thus, the data retrieval processes were executing standard user reports that required not only data retrieval but data formatting and output. We evaluated each database scheme by running the ten reporting processes for 10 minutes while we measured frame execution time and the overall number of frames processed. Again, measurements of CPU usage, disk traffic, and cache memory usage were monitored during the trials. Results

Experiment 1

The SQL database was able to support 46 concurrent

10 20 30 40 Number of Processes

50

Figure 3. The average length of time to store 50 records as an increasing number of storing processes are created. Experiment 2 As part of the experiment, we monitored the sizes of the two databases during the loading process. The HELP database for 30,844 patients was 156 megabytes. When this data was loaded into the SQL database the table was found to be 254 megabytes in size, a 63% increase. The HELP database contained 2,730,929 data strings (an average of 88.5 strings per patient) which became individual records in the SQL database after loading. The number of strings per patient represents an average for the entire database that contains a large number of outpatients as well as inpatients. The performance of the two database models for retrieval and display of data was correlated with the complexity of the frames used. Frames that contained a single build (a single query against the database) were much faster than frames that

required multiple builds. The Complete Blood Count (CBC) frame is typical of many patient care reports used in the HELP system. All the data for a row in a report is found in a single packed data string, and the report is generated in reverse chronological order (i.e., most recent data first). The results of the study for the CBC frame are shown in figure 4. The HELP system is more efficient at generating this

388

report regardless of the number of strings found in the database, but it shows the greatest efficiency when more than twenty strings are required. The evaluation of the two systems for frames with multiple builds (where two or more data requests are made against the database) was difficult. The measurements were less uniform and the performance was not tightly coupled to the number of strings retrieved for the report. The vital signs report (containing heart rate, blood pressure, temperature, and respiration data) is a typical multiple build frame. It gathers data from at least three different types of strings in the database. The results of the study for the vital signs frame are shown in figure 5. 300

u

@ 200

of data in the report.

There is poor correlation between the number of strings of data appearing on the report and the CPU time required to generate the report. While both databases perform better for single build frames than for multiple build frames, HELP is much less effected. The HELP database is substantially faster for all string counts. The SQL trial also shows sharp irregularities in CPU time for reports with very little difference in string count. This may be partly explained by the fact that SQL was found to compete with the application program for CPU memory and its report processes were swapped out of main memory more frequently than were the HELP processes. Finally, the total number of frames that were executed in the 10 minute trial period is shown in figure 6. The HELP database was able to produce 1630 single build reports while the SQL system produced 1355, a decrease of 17% in total throughput versus HELP. For multiple build reports both systems had slower throughput, but the SQL architecture was effected more, producing only 68% of the reports produced by the HELP system.

100

2000

Q

0

.

n

0

n

0

10 20 30 Number of Strings

e

40

Y

Figure 4. The average length of CPU time required to execute a Complete Blood Count (CBC) frame based on the number of strings of data in the report.

. U

1000-

o

SQL HELP . oo

1200

W

~

>

Go

en

0vf -

1000 E

u}

Single Build Multiple Build Type of Frame

800

Figure 6. Total frames executed in a 10 minute interval with ten report processes running.

600

* 400

Discussion D 200

Q5

The results of our studies indicate that the SQL structure we used was more efficient for storage of data than the standard HELP database tools. This is not unexpected since the HELP architecture requires a read, update, and save of a patient data block while SQL is logically inserting a single row in a table. However, previous studies [5] have shown that in the HELP system data is typically retrieved 5

0

0

10 20 Number of Strings

Figure 5. The average length of CPU time required to execute a Vital Signs frame based on the number of strings

389

to 20 times for each time it is stored. This means that it is very difficult to improve system performance if only storage

speed is increased. In terms of retrieval speed the HELP system's block oriented structure out performed SQL, in speed, throughput, and resource use. However, the difference is actually less than might have been expected when the difference in the number of disk reads is considered. Because HELP does block oriented reads it does 1 disk request for every 4 to 7 disk requests from SQL, and only 1 disk read for every 1.5 disk reads by SQL [6]. Apparently Tandem's SQL makes up part of the difference in coding efficiency or in heavier use of main memory. This increased memory use, however, may also account for the poor performance of the SQL model in the multiple build frames because of increased memory page faults. Another fact was that the SQL database was roughly 50% larger to contain the same amount of data. This increase is partially explained by the redundancy inherent in repeating information like the patient number, data class, and field code in each data record in the SQL database, while it occurs only once in the header portion of the HELP data blocks. A further explanation is that Tandem's implementation of the SQL tables involves automatically maintained record pointers based on the primary key of the file. This is also true of the Enscribe files used by HELP, but since the number of records in the SQL database is 5 to 7 times greater than the number in the HELP database, this internal index requires proportionately more space. The declining cost of disk and optical storage make this a relatively unimportant issue. Having discussed the down side of SQL in our experiments, lets now look on the positive side. First, the SQL database was compelled to return data as if it was the HELP database for reasons of compatibility. While this was an appropriate test of what SQL's immediate impact would be in the existing environment, in the long haul HELP system architecture would no doubt evolve to take advantage of SQL's unique features. Specifically, frames could be modified so that data is retrieved in ways that are most efficient for SQL. Another possible enhancement would be the incorporation of secondary keys on data strings that would further enhance the flexibility and speed of the database. Beyond actual performance issues, there are a number of possible advantages to SQL use. The most notable is that SQL comes with vendor supported query tools, a report writer, and a host of networking software that integrates LAN technology with the mainframe. While the SQL structure that we tested takes advantage of one level of normalization (i.e., one data string per record), to enjoy all the benefits of SQL will require a truly relational implementation. This is not as trivial as it may seem, since there are significant problems in structuring relational tables

390

to accommodate some types of data in the patient record, particularly medical history, physical exam, procedure notes, and radiologic findings. An area of concern is also the difficulty of temporal queries in SQL databases. In the HELP tools there are statements that make it easy to find the last white blood count before a patient's temperature spike, but this type of query in SQL is cumbersome. Other aspects of the interface between HELP's knowledgebase and the patient database are equally awkward. Nevertheless, we are continuing to research ways that a fully relational implementation of the HELP patient database could be achieved. Finally, even though our SQL implementation was found to be slower than the HELP database, it is likely that the HELP system will involve some use of SQL database tools in the near future. The advantages of a hardware independent architecture that can be integrated with intelligent LAN workstations probably outweighs the performance impacts. However, this evolution will require care that key features of HELP's knowledge driven architecture are not trampled in a rush to jump on the standardization band wagon.

Acknowledgment We are grateful to Virginia Mary Smith for her work in the first implementations of the SQL database on the Tandem system at LDS Hospital.

References [1] Warner HR. Computer assisted medical decisionmaking. Academic Press Inc., New York, New York, 1979. [2] Pryor TA, Gardner RM, Clayton PD, Warner HR. The HELP system. J Med Systems 1983; 7:87-102. [3] Pryor TA. The HELP medical record system. M.D. Computing 1988; 5:22-33. [4] Pong, M. An Overview of NonStop SQL Release 2. Tandem Systems Review, 1990; 6:4-11. [5] AL-Hashimi, MSA. A preformatted report generating system: Optimized real-time clinical information retrieval for the HELP system. Master's thesis 1989; Department of Medical Informatics, University of Utah. [6] Smith, VM. Time study of the Tandem Structured Query Language. Master's thesis 1989; Department of Medical Infonnatics, University of Utah.

Evaluation of an SQL model of the HELP patient database.

We tested a new model of the HELP patient database that makes use of relational tables to store patient data and provides access to data using SQL (St...
820KB Sizes 0 Downloads 0 Views