Interactive Query Workstation: standardizing access to computer-based medical resources.

Computer Methods and Programs in Biomedicine, 35 (1991) 293-299

293

© 1991 Elsevier Science Publishers B.V. All rights reserved 0169-2607/91/$03.50

C O M M E T 01207

Interactive Query Workstation: Standardizing access to computer-based medical resources C h r i s t o p h e r C i m i n o , G. O c t o B a r n e t t , L a u r i e H a s s a n , D y a n R y a n B l e w e t t a n d J u d i t h L. Piggins Laboratory of Computer Science - Massachusetts General Hospital, Boston, MA, U.S.A.

Methods of using multiple computer-based medical resources efficiently have previously required either the user to m a n a g e the choice of resource and terms, or specialized programming. Standardized descriptions of what resources can do and how they may be accessed would allow the creation of an interface for multiple resources. This interface would assist a user in formulating queries, accessing the resources and managing the results. This paper describes a working prototype, the Interactive Query Workstation (IQW). The I Q W allows users to query multiple resources: a medical knowledge base (DXplain), a clinical database ( C O S T A R / M Q L ) , a bibliographic database (MEDLINE), a cancer database (PDQ), and a drug interaction database (PDR). Descriptions of each resource were developed to allow I Q W to access these resources. T h e descriptions are composed of information on how data are sent and received from a resource, information on types of query to which a resource can respond, and information on what types of information are needed to execute a query. These components form the basis of a standard description of resources. Unified medical language system; Database access; U s e r interface

I. Introduction It is often stated that the amount of medical information available to the physician is increasing at an almost exponential rate [1]. The amount of published material pertaining to a particular specialty is almost unmanageable, not only because of its volume but also because it may appear in a wide range of resources unknown to the specialist. In order to manage this information, a number of computer-based medical resources are being developed. The need to learn to access these resources adds another information burden to the physician's load.

Correspondence: C. Cimino, Laboratory of C o m p u t e r Science, 50 Staniford Street - R o o m 530, Boston, M A 02114, U.S.A.

As more computer-based medical resources become available, it becomes more difficult for occasional users to be aware of all the options available. For example, a user with a specific question may not be aware what resources are available through his institution. He may not be aware which resource is most appropriate to answer his question. It is likely he will not know how to formulate his question to utilize the resource in the most efficient way. The potential exists for questions raised about information acquired from one resource to be answered by another resource. This potential is lost unless the user is familiar with all the resources available. A computer-based environment would recover this potential if it could provide information about what resources were available, access those resources, and help the user formulate appropriate queries.

294

2. Background Previous research in this field has concentrated on interfaces that remove the need for the user to know about a particular aspect of the resource they are using. No attempt is made to inform or guide the user in the development of a query. The CONIT program was developed at M I T in 1981 for formulating queries for multiple databases [2]. The user did not need to have any knowledge about the syntax of any of the databases being searched. Fairly sophisticated queries could be formed and a wide variety of databases searched. The primary disadvantage of CONIT was that in order to capture the power of some of the database queries, the CONIT query language was very complex. The CANSEARCH program was developed at Huddersfield (U.K.) in 1986, in part to address some of this problem. It is an attempt to incorporate domain knowledge to assist a user in formulating a query [3]. The program uses a rule-based system to develop a query about cancer topics which is then executed on the M E D L I N E database. In comparison to CONIT, this system is very easy to use. Its disadvantages are that to extend it to other domains requires incorporation of new domain knowledge and the refinement of a query is done through the use of multiple menus. The n u m b e r of menus the user needs to select from can become tedious, especially if the domain is expanded. To create an expandable system it is necessary for the program to rely on information from outside sources. One approach would be to modify each resource to provide a hard-wired link to a common user interface. This would require a developer to have access to the program code for every resource that is to be linked. It has the additional disadvantage that linking new resources would require a great deal of work. Many proprietary resources could not be linked at all. An alternative approach would depend on a standardized description for each resource. A database of these resource descriptions could provide textual information for the user and machine-readable information for a computer program to access a resource and execute wellformed queries. Two additional components that

could be useful would include a description of the vocabularies used by each resource and a description of the types of information dealt with by each resource. These components correspond to the Unified Medical Language Systems's Information Sources Map, Metathesaurus and Semantic Network, respectively.

2.1. Unified medical language system (UMLS) The National Library of Medicine's U M L S project is "designed to facilitate the retrieval and integration of information from many machinereadable information sources" [4]. In order to guide users to appropriate information resources, an Information Sources Map will be developed which "will contain information about the scope, location, vocabulary, syntax rules, and access conditions of publicly available machine-readable biomedical information resources" [4]. The current emphasis is on building a machine-readable knowledge source that will encompass terminology from a variety of controlled vocabularies. This knowledge source is called the U M L S Metathesaurus (META). Each term in M E T A will have information about the vocabularies in which the term appears, as well as semantic information about the term [5]. The U M L S Semantic Network is a description of relationships that can occur between the various semantic types used in M E T A . " T h e purpose of the semantic types and the associated semantic network is to provide a consistent categorization of all concepts represented in the Metathesaurus and to elucidate the permissible relationships between and among these concepts" [6].

2.2. Direct programmed links As a first approach to linking multiple resources, direct hard-wired links were developed for two applications, a medical knowledge base (DXplain) and a medical literature search program ( R A M M ) [7]. These were chosen because the source code for both was available and the linking of these two would give a non-trivial result not available from other computer-based resources. The user first entered a case into DXplain and obtained a

295

differential diagnosis. A disease in the differential was then selected as the basis of a literature search. Several disadvantages to such hand-coded, specific links were immediately obvious. Each new link between resources required extensive programming. Any change or update in the resource programs would require revision of the links. The links could be created or modified only by someone who had an understanding of the data structures and functions of the resources and access to the source code. This exercise provided information about generalizations that could be made in describing computer-based resources. T h e r e are two distinct processes involved in both DXplain and R A M M . Terms appropriate to the particular resource were selected. Then the terms were processed by the application to produce a result. Both vocabularies include entry terms (terms recognized as being equivalent to a term in the controlled vocabulary). Both vocabularies use modifiers. Both vocabularies use a tree structure to link terms that are more or less specific.

2.3. Partial programmed links The next exercise consisted of developing links based on what was learned about term selection and term processing. For this exercise, the links for DXplain and R A M M were rewritten so that the interactions could be controlled from a uniform user interface. A link to a clinical database ( C O S T A R / M Q L ) was also developed. A subset of 100 patient records, with identification information removed, was used. Each link still required specialized programming for each resource but provided a uniform interaction with the user interface. The interface program could send messages to a link that would provide information about the resource. This information could then be used to access the resource. All the links were required to provide responses to four system-defined messages. These provided information about the interactions the link was capable of, what data were required to perform them, and what results would be returned. The links were required to respond to at

least one other predefined message to provide access to the application's controlled vocabulary. For example, the BEST message provided the best match of a string of text to the resource's vocabulary. This exercise demonstrated some interesting capabilities. The system only needed information about how to send messages to a resource link. The system could then automatically generate a user interface based on information acquired from sending the definition messages described above. This isolation of the user interface from the resources allowed modifications to be made in a resource and the potential for new resources to be added without affecting the user interface. In this revised system, the user was able to select which resources were used. The system was able to provide the user limited descriptions of the resources to aid selection. The user only needed to be familiar with the I Q W user interface in order to use multiple resources. However, aside from this static information, no guidance was provided to the user concerning which of the available queries might be appropriate. Queries could be chained; information derived from a patient record could be used to acquire a differential diagnosis which could then be used as the basis of a literature search. While chaining of queries increased the potential usefulness of multiple resources, two new issues arose. The system did not display which terms from one controlled vocabulary might be used in a query of a different resource's controlled vocabulary. In addition, all the information returned from a query was available for new queries resulting in sometimes overwhelming quantities of information from which the user could select. The system still had all the disadvantages of requiring source code for creating links. The user still needed to understand the purpose of all the available resources, although he no longer needed to know how to formulate queries for each resource. The user could easily be overwhelmed by the number of possible queries and the amount of accumulated results. The next step in development was to create an interface that provides more active guidance in selecting queries, and organizes the results of queries.

296

2.4. Design considerations

3.2. Bridge 386

The primary goals in this system are to (1) allow easy addition of new resources, (2) maintain a uniform interface, (3) retrieve information from the resources, (4) provide the user information about the resources, and (5) allow the user to enter and execute a query. In this prototype, efficient performance was not considered a primary goal. Nor was any attempt made to minimize hardware requirements.

Bridge 386 is a commercial product that enhances Windows' interprocess communication utilities. It provides additional capabilities useful to the IQW. Bridge allows a Windows program to run and control another program. I Q W can start and stop another application and use it as a resource. Bridge allows I Q W to send commands which the resource application receives as if the commands were entered at the keyboard. Currently, in order for the application to be of use to IQW, it must be capable of writing the results of a query to a file. I Q W can then read the result file for further processing.

3. System description 3.1. Kappa

3.3. Hardware

The use of messages to derive characteristics about resources and the desire to provide a uniform user interface for a variety of resources suggested that an object-oriented environment would be appropriate for development. Maintenance of separate applications and the user interface is simplified by using an object layer between the user interface and the application. The small n u m b e r of reasonable queries relative to the large amount of possible starting information suggested that query guidance would be best performed by a rule-based system. Kappa, which is an objectoriented, rule-based system, was chosen for development of the IQW. Kappa requires Microsoft ® Windows version 3 and at least 1 megabyte of random access memory. K a p p a allows the creation of objects which have single inheritance (anything defined for a parent object is accessible to the child and children can have only one parent). Each object can have slots (variable values) and methods (functions) associated with it. The environment also allows the creation of goals to be achieved as well as rules for filling in slots. If a goal requires a slot that has not been filled, the appropriate rule is invoked. Backward chaining of rules to satisfy a specific goal or forward chaining to accomplish any goal can be invoked. K a p p a also provides functions for creating a customized user interface, and easily allows new functions to be added in the form of C code.

The software currently runs on a Hewlett Packard Vectra RS-25C, which is an MS-DOS compatible 386-based machine. While only 1 megabyte of R A M is needed in theory, to run Windows, Bridge and K a p p a efficiently with other resources re-

Query Selection ~r~tient Record~ First Name Age Sex ID Last Name John 43 M AA0099 Doe

Concepts

Queries

Obesity . ~ Aphthous S t o m a t i t i s ~

Finding Info - ~-

Colon Cancer ~ Colonoscopy ~

CancerStaging Citations--

Nicorette Gum - ~ ' -

Problems

Obesity Aphthous Stomatitis Colon Cancer Colonoscapy

Applications ~

-

DXplain

Disease Info

~ -

SideEffects

PDQ - - MEDLINE •

PDRS

Fig. 1. In a typical session, the user might start from a patient record. The Interactive Query Workstation (IQW) identifies terms contained in the record and presents them to the user. In this example, the user has selected Nicorette Gum. IQW eliminates queries which are not applicable to this term ('Disease Info' and 'Finding Info') and allows the user to select from those remaining. When the user selects 'Side Effects', IQW checks if there is any other information needed. In this case there is not, so it precedes to send the query to the PDRS program, collect the results and display them to the user.

297

Query Execution Concepts

Queries Applications

Bridge

DOS File Information Concepts

that need to fill a slot of that TYPE may use these terms or use the TYPE object's method to acquire a new term from the user. The TYPE object allows validation to insure that user entries are of the appropriate format.

3.5. Status report Fig. 2. Currently l O W passes queries to applications through Bridge 386. The 'Results' of these queries must be available to I Q W in a file. Information collected from the applications can be parsed into terms. If semantic type information is available (or the user can provide semantic type information) these terms can become concepts. These concepts can then form the basis of a new query.

quires at least 4 megabytes of RAM. An E G A or V G A monitor and a Hayes compatible modem are required.

3.4. Resource descriptions For each resource there are three types of object that describe the resource: APPLICATIONobjects, QUERY objects, and TYPE objects. An APPLICATION object has slots and methods that allow data to be sent and received from a resource. For example, the DXplain APPLICATION object contains a method for calling a communications program, names of appropriate script files, and the name of the file where results will be stored. A QUERY object describes a template for a query. This includes a user-readable description of a query, the name of the application object that can handle the query, methods for formatting a valid query in the syntax of the resource, and a list of slots that need to be filled by the user in order to process the query. Continuing the example, the disease information QUERY object would have a pointer to the DXplain APPLICATIONobject. Each slot has a slot TYPE which points to a TYPE object. The disease information QUERY object has only one slot, which points to the disease name TYPE object. TYPE objects carry information about types of term. Any terms acquired during a session are stored in the appropriate TYPE object. Queries

In one form of a typical session the user first looks up a patient record. The patient record is then parsed to extract terms. The user can choose one or more of these terms as the basis of a query. The extraction and typing of terms are currently determined based on a vocabulary incorporated into the IQW. Eventually, this vocabulary will be replaced by the UMLS Metathesaurus. From all possible queries, applicable queries are selected based on the slot TYPES needed for each query and the chosen term TYPES. If no query can make use of all the chosen terms, then queries are selected that can make use of any of the chosen terms. If more than one query is applicable, the user selects from the reduced list and that query is made active. If only one query is applicable, that query is the active query. If there are still unfilled slots in the active query, the user is asked to fill them. The query is then executed and the results displayed. The user can also select a query from the list of all queries. Once she selects this query, the slot filling proceeds in the same manner as above. The user can review results of any query performed in a session and process the results to extract terms for a new query. Text of queries can be copied to the Windows clipboard and pasted into another application such as a word processor. Text from the Windows clipboard can also be parsed to extract terms for use in a new query.

4. Future plans Now that META-1 is available, it will be used to replace the test vocabulary currently used. The META-1 semantic types will be used as TYPE objects where appropriate. Some TYPE objects, such as "patient ID", though not META-1 semantic type, will be retained. Currently there are

298 different TYPE objects for each of the resources: for example, DXplain disease, MeSH disease, and C O S T A R problem. META-1 will contain explicit source vocabulary information so, in the above example, these TYPES would be collapsed into one TYPE 'disease'. The current system can only create queries de novo or based on terms found in the result of a single previous query, i.e., there is no easy way to use terms from the results of several different queries. This limitation results from the number of different TYPES from which a query will be created; when the number is large, there are too many possible queries to be useful to the user. Making use of Kappa's rules algorithms might limit the possibilities and allow the system to take a more active role in query selection. For example, a physician may retrieve a patient record and then perform a literature search for treatment of one of the patient's problems. If the literature search results contain references to therapeutic drugs, the system may suggest the user search for drug interactions between the drugs listed in the literature search and those in the patient record. The UMLS Semantic Network will be tested as a tool for query selection. If terms have been acquired which are typed in META-1 as 'Bacterium' (e.g., Pneumococcus) and no terms have been typed as 'Pharmacologic Substance' (e.g., an antibiotic) the system might derive that " P r o p e r treatment of a specific bacterial infection?" would be an appropriate query. The system will collect information about queries executed. This information will be useful for improving future versions. In addition to information about common queries, common sequences of queries will be useful in refining the query selection algorithms. Information about an individual user could be used to personalize the search strategies for a particular user. For example, some users might prefer a few relevant pieces of information while others might desire all relevant pieces of information. Collection of this information would be done in an unobtrusive way. New resources will be added. A textual database would be the next logical choice of resource to be added. A resource such as Scientific American M E D I C I N E ' s C O N S U L T would

be an example of both a new type of database (textual) and a new type of resource (CD-ROM). Existing resources might be replaced with network-available counterparts; for example, DXplain through Telenet. This would result in a smaller program and smaller hardware requirements at the expense of speed and external costs (connect time charges). Because of the amount of memory swapping that occurs when Bridge controis DOS applications, it is unlikely that the memory requirements will drop below 2 megabytes. While the resources currently used did not require alteration of the application code, significant program-specific information was needed in the APPLICATION objects to allow interaction. Further generalization of the system would require a more extensive resource description language. This would include a query description general enough to be applicable to all resources, a scripting language for describing interactions with a resource, and a method of translating a query description into a script. Currently, results are handled as free text, but a standardized description of the results would improve processing of results. These planned improvements correspond to concepts suggested for an intelligent guidance of user queries [1,9]. These include (1) providing a consistent user interface for several applications, (2) standard data definitions to improve interpretation of results and allow easy transfer of data between applications, (3) profiling of users;, to permit, individualized interaction, (4) obtaining feedback to guide system development, (5) using the context of the user's interaction to guide query selection, and (6) using methods of ranking and presenting results that are based on the user's interests. In addition, the IQW should keep the user informed of l O W abilities and application abilities especially when these diverge. Equally important to development will be feedback from practising clinicians. The IQW will be placed in an out-patient environment within the next year. This will provide information about whether it is a useful tool and what further steps can be made to improve it. It may also reveal new insights into the information needs of physicians.

299

Acknowledgements This work was supported in part by NLM contract [N01-LM-8-3513] and in part by an educational grant from Hewlett Packard Corporation. C.C. is supported by a NLM training grant [2T15-LM07037-04]. DXplain, MQL and COSTAR are trademarks of Massachusetts General Hospital. MEDLINE is a trademark of the National Library of Medicine. Kappa is a trademark of Intellicorp, Inc. Bridge is a trademark of SoftBridge, Inc. PDR ® Drug Interactions and Side Effects Diskettes is a trademark of Medical Economics Company, Inc. Scientific American MEDICINE CONSULT is a trademark of Online Computer Systems, Inc.

References [1] R.A. Greenes and E.H. Shortliffe, Medical Informatics: An Emerging Academic Discipline and Institutional Priority. J. Am. Med. Assoc. 263(8) (1990) 1114-1120. [2] R.S. Marcus and J.F. Reinjes, A translating computer interface for end-user operation of heterogeneous retrieval systems, l:Design, J. Am. Soc. Inform. Sci., 32(4) (1981) 287-303. [3] A.S. Pollitt, An expert systems approach to document retrieval: A summary of the CANSEARCH research project,

in Technical Report Series, Research Report 86/6, (Huddersfield Polytechnic, U.K., 1986). [4] B.L. Humphreys and D.A. Lindberg, Building the Unified Medical Language System. in Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 475-480 (IEEE Computer Society Press, Washington, DC, 1989). [5] M. Tuttle, D. Shertz, M. Erlbaum, N. Olson and S. Nelson, Implementing Meta-l: The First Version of the UMLS Metathesaurus. in Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 483-487 (IEEE Computer Society Press, Washington, DC, 1989). [6] A.T. McCray, The UMLS Semantic Network. In: Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, ed. L.C. Kingsland, pp. 503-507 (IEEE Computer Society Press, Washington, DC, 1989). [7] H.J. Lowe, G.O. Barnett, J. Scott, R. Eccles, E. Foster and J. Piggins, Remote Access MicroMeSH: A Microcomputer System for Searching the MEDLINE Database. in Proceedings of the Twelfth Annual Symposium on Computer Applications in Medical Care, ed. R.A. Greenes, pp. 535-539 (IEEE Computer Society Press, Washington, DC, 1988). [8] T. Barsalou, An Object-Based Architecture for Biomedical Expert Database Systems. In: Proceedings of the Twelfth Annual Symposium on Computer Applications in Medical Care, ed. R.A. Greenes, pp. 572-578 (IEEE Computer Society Press, Washington, DC, 1987). [9] W.W. Stead, IAIMS: An Opportunity for National Collaboration. Integr. Acad. Inform. Man. Syst. Newsl., 2(3) (1989) 1-2 (Duke University Medical Center).

Interactive query workstation: a demonstration of the practical use of UMLS knowledge sources.

Should self-destructive behavior affect a patient's access to scarce medical resources?

Medical Practice Support System. A medical practitioner's multimedia workstation.

Access to medical records.

Leveraging user query sessions to improve searching of medical literature.

LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures.

'RE:fine drugs': an interactive dashboard to access drug repurposing opportunities.

Rural providers' access to online resources: a randomized controlled trial.

Medical barriers to access to family planning.

Integration of an object knowledge base into a medical workstation.

"Allocation of medical resources".

Medical education teaching resources.

A multipurpose teaching workstation using expert systems, CD ROM and interactive laserdisc.

Spatial analysis of the distribution, risk factors and access to medical resources of patients with hepatitis B in Shenzhen, China.

Child Health and Access to Medical Care.

Unrestricted access to medical case reports.

Medical resources and suicide prevention.

Internet resources in medical genetics.

Medical resources and suicide prevention.

Digital patient records and the medical desktop: an integrated physician workstation for medical informatics training.

Open access medical publications.

Online access to medical records: finding ways to minimise harms.

Interactive video learning in medical school.

Learning to Thrive: Building Diverse Scientists' Access to Community and Resources through the BRAINS Program.