COMPUTER

AND

BIOMEDICAL

A Generalized

MARCIA From

the Harvard

RESEARCH

lo,4

System

13-42

1

(1977)

for Collaborative Collection*

B. LEAVITT

AND ROBERT

On-Line

C. LEINBACH

Olffce of Information Technology and Department of Medicine, General Hospital, Boston, Massachusetts 02114 Received

October

Data

Massachusetts

8, 1976

A generalized file system has been developed to handle the data collection and retrieval needs of physicians working collaboratively in the treatment and study of ischemic heart disease. The system consists of a structure for files of data stored on a direct access device and a series of reentrant system subroutines which access these files. Each file, including the system files, is associated with a file description, which provides the storage and display characteristics of that file to the generalized routines. The routines also form the foundation for various application programs, including one that allows a user to construct a variable length clinical record from the data generated by the cardiac catheterization laboratory.

The MIRU (Myocardial Infarction Research Unit) computer group maintains a computer facility and-provides technical assistance to physicians engaged in the collaborative study and treatment of ischemic heart disease. The development of a data collection system, general enough to handle a variety of applications while insuring uniformity in the data collected from different sources and for different purposes, was an early project objective. Warner has enumerated desirable characteristics of a computerized medical information system, some of which are applicable to an environment pursuing collaborative studies. It should be at least as efficient, if not more so, to use such a computerized system for data entry as it was using the corresponding manual methods, and quality checks should be performed to reduce spurious data. The data storage should be structured to allow for rapid retrieval, and yet flexible enough to absorb additions and modifications (I). Furthermore, in the MIRU installation, the file system would have to share with other real-time processes the facilities of a small computer. Aaron has outlined the advantages of utilizing formatted file information systems. A description of the permissible contents of a record is separated from the record itself, and kept in what Aaron calls a data description table. Additions and modifications to the data structure can be effected by altering this table rather than revising the programs that access it. Applications can be developed that employ * Supported in part by U.S. Public Heart and Lung Institute. CopyrIght 0 1971 by Academc Press. Inc. All rights of reproduction in any form reserved. Printed in Great Britain.

Health

Service 413

Grant

No.

1 P17HLl7667-01

from

the National

ISSN 0010-4809

414

LEAVITT

AND

LEINBACH

different files in the same analysis by referencing the data through its data description table. Editing information can also be incorporated in the data description table and thereby facilitate some quality control at data entry time (2). Aaron also discusses the merits of developing an information system as a series of modules that can be implemented separately. These can then be assembled into packages tailored to the needs of specific applications, and additional modules can be implemented as system expansion requires (2). If these routines are coded reentrantly, permitting simultaneous use by more than one process. the task of serving the file access needs of several applications in limited core space is eased considerably. A variety of on-line data collection systems has been developed that employs a selection of these concepts. Greenes et al. have designed a high-level language (MUMPS) which manipulates a hierarchical data structure of variable length records (3) and Karpinski and Bleich have used MUMPS to implement an information storage and retrieval package that maintains small formatted files on a moderate size computer (4). Starmer et al. have developed a data acquisition and analysis system, in a limited core environment, that structures the files to optimize research applications (5). Davis et al., with the advantages of a much larger computer system, have addressed the problem of the computer-stored medical record. utilizing formatted, tree-structured records (6). Also using large computers, Wehl et al. (7), O’Kane and Hildebrandt (S), and Stamen and Wallace (9) have developed table-driven data acquisition and analysis systems. Both the Wehl and O’Kane systems emphasize a modular implementation that supports the development of various independent application programs. Stamen and Wallace have implemented a data management and analysis tool (designed to be used by behavioral scientists, but with general applicability) incorporating this particularly useful feature: The system files have the same structure as the user files and are also associated with a data description. As a result, the system files can be manipulated in the same fashion as the user files. Each of the systems described above presents a solution to the general problem of on-line collection and retrieval of medical data, and together they also illustrate the utility of several tried and proven implementation techniques. A selection of these techniques, with additional features suited to the needs and limitations of the MIRU environment, was employed in the development of a generalized file system designed for collaborative on-line data collection. FEDS (Formatted file Editing and Display System) is a structure for files of data stored on a direct access device and a series of routines that access these files. Each file is associated with a file description. called a format, which provides the storage and display characteristics of that file to the generalized routines. These routines allow personnel to enter, examine, and alter the data stored in the files, guided by a form displayed on an on-line CRT. The routines also form the foundation for various special purpose application programs.

GENERALIZED

ON-LINE DATA

COLLECTION

SYSTEM

415

IMPLEMENTATION

FEDS has been implemented on a XEROX SIGMA 3 computer with a 48K (K = 1024 words x 16 bits/word) memory, that also supports other real-time processes and batch mode applications in background. Its files are stored on a 50 mega-byte TELEFILE disk, which it shares with other program and data files. The design of FEDS, however, is general enough to be transported to any small computer which supports a large direct access device. FILE HEADER Key 1 Key 2

First Ret fcf Key 1 First Ret for Key 2

Free

Chain End

Los1 Ret for Key I Las1 Rec. for Kq 2

I Prev Ret I

FIG. 1. The structure

of a FEDS

file.

The FEDS files are organized in the following fashion. The body of each file consists of a series of fixed length, fixed field records which can also be accessed randomly. Each record is associated with a key, which is in most cases the patient identification (ID). Records with the same key are chained together. At the head of each file is a sequential list of all the record keys in that file, and the beginning and end of the record chains for each key. The header also contains the location of a free record chain, from which new records are taken as needed, and to which records deleted from the other chains are added (see Fig. 1). All files, both system and user, are organized in this way. There are three system files: the format file, the page description file, and the master file directory (see Fig. 2). The format file contains the formats of all the files in the system (including itself, as shown in Fig. 2). A format specifies the structure of

416

LEAVITT

AND

LEINBACH

the records in a file, and also describes the form as it appears on the CRT, Each record in the format file describes the storage and display characteristics of one fieid. The keys in the format file identify a record format. Therefore, instead of associating

+

MASTER

FILE

D/RECTORY

FORMAT

F/L E

Format for Master File lh Format for Format Ffle Formoi

i

for Page Desc. File I

I

I Formoi

for Pressure ,

File

,! _______

PAGE DESC. F/rE

PRESURE

I

F/L E

-‘i/

rzG-rr

FIG. 2. The relationship

------------+~IIzrRE

between

FORM )

system files and a user file.

records for a patient, the chains in the format file associate the fields of one format, i.e., all the fields that make up one record in the file which the format describes. Each field description specifies the type of data contained in the field, and FEDS supports a variety of data types including integer, real, and text. It also provides for two-dimensional displays, such as in the cardiac catheterization pressure form,

GENERALIZED

ON-LINE

DATA

COLLECTION

417

SYSTEM

1234567 DOE, JOHN

PRESSURE CONDITION HR 60

RESl

S/D(A/V) ART A0 LV/LVED

135/68 130/70 130/12

PC PA RV RA

28110 28/b

COtWENT:

HEAN

SMlDH

SATN%

95 18

a

No angina

during

16

72

4

the

study.

FIG. 3. The PRESSURES

form.

illustrated in Fig. 3, by allowing a complex data type, which references a range of other field descriptions, for fields which contain unlike subfields. (In the example shown ART and A0 are complex fields, each referencing S,fD(AjV), MEAN, SM/DM, and SATN% as subfields). If a record is sufficiently large, it may not be possible to display all the fields on the CRT at once. Also, it is sometimes desirable to view a selection from several records, such as those describing the same patient at different times, in vertical columns side by side, so that trends over time are more readily visible. The page description records provide the necessary mapping, specifying the number of records, and the portion of the one or more records, that will contribute to one screen display (see Fig. 2). Each chain of page description records describes the one or more screen records needed to display the data in an associated file. The master file directory provides accessing information for all the files in the system, including itself, and its keys are File ID’s. Each record specifies a file’s physical location, record length, and the ID’s of the file’s associated format and page description. As data collect for a particular file, experience often shows some fields to be of limited significance, and suggests that other fields, initially omitted, might be of value. In order to allow modifications to the record format and yet insure that the data collected using out-of-date forms can still be utilized, FEDS maintains different versions, or generations of a file. Each generation shares a file ID with other generations of the same file, but each is associated with its own record in the master file directory, and therefore its own format and page descriptions. Since the records in the master file directory which describe different generations of the same form

418

LEAVITT

AND

LEINBACH

have the same key, they are chained together, enabling easy movement back and forth between them. A group of modules performs all the basic functions associated with files constructed in this manner. These include 1. 2. 3. 4. 5.

preparing a file for reading or writing; locating an ID in a directory: adding or deleting an ID in a directory; locating the next or the previous or a specified record in a record chain; inserting or deleting a record from a chain.

These routines are coded reentrantly, and can be shared by the different processes accessing the file system at the same time. They also prevent any conflict which might arise when two or more processes are updating the same file simultaneously, and are constructed to facilitate orderly recovery should a system crash occur. Since the system and user files are identically structured, a module which operates on one set works fine for the other as well. For example, the routine which opens a file uses the LOCATE-ID module to find the file ID in the header of the master file directory. Aaron mentions, as a limitation to modularizing a system, the difficulty of establishing linkages between modules which serve for all calls (2). To overcome these difficulties, FEDS uses a 15-word file-profile table which contains most of the information about a file that is needed as arguments to the modules. This table reflects the status of a file: the location of the buffer to be used for I/O, the key currently being manipulated, and the location of the current record in the chain for that key. The use of this table also facilitates the nesting of reentrant routines, since most of the actions that are performed on the file are recorded in the table. Any errors detected by a module during its operation are placed in this table and passed back up through the nest of subroutine calls to the original caller, who decides what action to take. Once the appropriate user record has been located in or added to the user file, and the associated format and page description records have been fetched from the system files, control is passed to a set of display routines. Working from an array of field descriptions, these routines display an empty form (just the labels) for a new record, or a form and its data for already existing records. The person editing the record controls the movement of a cursor on the CRT, and can enter or update any of the data fields. Under the direction of the field description for each field the display routines convert the data to a form suitable for internal storage, and place it in its assigned position in the record. Data are checked for validity. It must be appropriate to the data type (for example, no letters are allowed where a numeric is expected) and, if numeric, must lie within the bounds specified in the corresponding field description. The user is informed immediately of any obviously spurious data. The display routines have also been designed to accommodate devices other than the CRT for data entry and display. The same forms that appear on the screen can be constructed on a printer page.

GENERALIZED

ON-LINE

DATA

SOME FEDS

COLLECTION

419

SYSTEM

APPLICATIONS

CATHLAB, the first application built on a FEDS foundation, handles data generated by the cardiac catheterization laboratory and allows the user to construct a variable length clinical record, called a study, which presents the data for a single catheterization. The CATHLAB program presents the user with a list of possible forms, a list of conditions associated with some of the forms, and a set of commands that operate on the study (Fig. 4). The user creates a new study with the appropriate command and selects the various forms he wishes to include. He is given an empty form after each selection and proceeds to fill in the data. When the necessary data have been entered, he can issue the CALCULATE command, thereby deriving the dependent variables, and creating an additional form containing the calculated data. Existing forms can be reviewed, and modifications to the data made. When the study has been completed, the PRINT command initiates the creation of a hard copy equivalent to the image on the screen, and locks the study preventing any further updates. Each of the forms is associated with a FEDS file. Another FEDS file contains one record per patient study and specifies the range of dates during which the study was constructed, the forms that have been included in the study, and a flag indicating whether the study is locked. An additional FEDS file lists the ID’s of the patients ., whose studies will be printed at the end of the day. Starmer has pointed out that the structure of a file is the most critical feature of a retrieval system. A file organization which facilitates the multiple queries typical of a . 1234567 ODE,

COMMAND=

JOHN FORMS

CONDITIONS

IO I PROCEDURE 2 COMPLICATIONS 3 4 TECHNIQUE PRESS SHORT PRESS LONG z FLOWS VALVES ; CALCULATIONS 9 RESULTS IO SPECIAL STUDS ‘11 P WAVE DESC , !*

REST EiiRC I SE REST AFTER EXERCISE REST AFTER COR ANGIO REST AFTER LV GRAM REST AFTER COR + LVG ATRIAL PACING VENTRICULAR PACING REST AFTER IABP REST IMlEOLY P PACE

‘:

FIG. 4.

The

cathlab

command

COMMANDS 1 2

3 4

5

6

7 8 9 IO

NEW STUDY LATEST STUDY DELETE FORM PREVIOUS STUDY DELETE STUDY CALCULATE PRINT EXIT

NS LS OF PS OS CA PR EX

page.

research application might impose too rigid a structure on the varying amounts of information associated with clinical records (5). By allowing the selection of forms to vary for each patient,, CATHLAB has achieved some of the advantages of varying length, tree-structured records, while retaining the Bxed length, fixed field records most conducive to efficient retrieval.

420

LEAVITT

AND LEINBACH

A general FEDS editor, that allows a user to edit any FEDS file, has also been implemented. The user accesses the data from an on-line terminal by specifying a file ID and key. He can then step through the one or more records that share the specified key, and then through the one or more pages which display the record fields. He can search for a particular record by specifying a field label and the value contained in that field of the desired record, and for a particular page by specifying its ordinal number. Existing records can be updated or deleted, and new records can be added to the file. Since the system files are just like any other FEDS file they too can be edited on line. The display characteristics of existing files can be altered, and new formats and page descriptions can be created at the terminal. Two applications are utilizing the FEDS editor. One handles patient information during the early hours of myocardial infarction in order to determine their prognostic importance. The other application uses the FEDS editor to record data generated when a cardiac pacemaker is implanted. Identifying data for each pacemaker patient and for each type of pacemaker device are also maintained. These files are periodically scanned to determine if a patient should be recalled for pacemaker replacement. RESULTS

The FEDS modules require less than 2.5K of 16 bit words, and the display routines occupy another 2K. Since all the modules are coded reentrantly, and can thereby be shared by several processes accessing the file system, they further promote efficient use of the available core. The modular nature of the system has facilitated the task of tailoring a major application to the needs of the medical personnel that use it, and also the development of a FEDS editor that handles more general applications. Any data collected in the FEDS files can be readily utilized by any application program which uses the FEDS modules for retrieval. Since all the modules use the formats to reference the data files, the task of modifying the data collection forms has been considerably eased. Over the year that FEDS has been in use, several dozen form changes have been suggested, and none has necessitated reprogramming efforts. Because FEDS maintains the previous generations of a file, it is still possible to retrieve data collected with the now out-ofdate forms. The formats have also facilitated computerized checking for spurious data. Theuser is informed if he enters grossly erroneous data and he can correct it immediately. The use of the CRT screen to display an entire form at once closely approximates traditional manual methods. While the task of entering the data on a computerized form is comparable to the task of filling out a paper form by hand, considerable quality control, exercised at the point of data collection, greatly eases the maintenance of .accurate records. Retrieving the data collected for a particular patient for review is far easier using the computer system. The FEDS formats can

GENERALIZED

ON-LINE

DATA

COLLECTION

SYSTEM

421

simulate a variety of paper forms on the CRT, since they support narrative data, table displays, multipaged forms, and multirecord displays. CATHLAB, the first major application program, utilizes FEDS to collect data generated by two cardiac catheterization laboratories. A variety of forms are filled out by the medical personnel for four to eight patients a day. It takes 1 to 6 seconds, depending on the number of fields and other activity on the system, to fetch and display a specified form. Part of the hard copy printed at the end of the day is structured for use in the patient record. A continually growing data base, now encompassing 800 patients, is available to various research protocols. The FEDS system was designed to operate as a research tool as well as a clinical aid. Although the files are structured to allow efficient retrieval by patient ID, the fixed field, fixed length records that characterize the FEDS files, are suitable to the multiple probes typical of research applications. Several specific file search programs have been written, and a general retrieval package, that creates file subsets in background under the direction of Boolean expressions, has been developed. ACKNOWLEDGMENTS The authors would like to thank Simon V. Rosenthal for his significant implementation and John B. Newell, Project Leader, for his encouragement and valuable suggestions development of FEDS.

contributions throughout the

REFERENCES 1. WARNER, H. R., AND MORGAN, J. D. High-density medical data management by computer. Comput. Biomed. Res. 3,464 (1970). 2. AARON, J. D. Information systems in perspective. Comput. Suru. 1,213 (19691. 3. GREENES, R. A., PAPPALARDO, A. N., MARBLE, C. W., AND BARNETT, G. 0. Design and implementation of a clinical data management system. Comput. Biomed. Res. 2,469 (1969). 4. KARPINSKI, R. H. S., AND BLEICH, H. L. Misar: A miniature information storage and retrieval system. Comput. Biomed. Res. 4,655 (197 1). 5. STARMER, C. F., ROSATI, R. A., AND SIMON, S. B. Interactive acquisition and analysis of discrete data. Comput. Biomed. Res. 5,505 (1972). 6. DAVIS, L. S., COLLEN, M. F., RUBIN, L., AND VAN BRUNT, E. E. Computer-stored medical record. Comput. Biomed. Res. 1,452 (1968). 7. WEHL, S., FRIES, J., WIEDERHOLD, G., AND GERMANO, F. A modular self-describing clinical databank system. Comput. Biomed. Res. 8,279 (1975). 6. O’KANE, K. C., AND HILDEBRANDT, R. J. “An Integrated Health Care Information Processing and Retrieval System.” Proceedings of the AFIPS National Computer Conference, Vol. 93. 1974. 9. STAMEN, J. P., AND WALLACE, R. M. “Janus: A Data Management and Analysis System for the behavioral Sciences.” Proc. ACM Annual Conference. Vol. 273. 1973.

A generalized system for collaborative on-line data collection.

COMPUTER AND BIOMEDICAL A Generalized MARCIA From the Harvard RESEARCH lo,4 System 13-42 1 (1977) for Collaborative Collection* B. LEAVIT...
574KB Sizes 0 Downloads 0 Views