An Expert System for Simulation of Coronary Heart Disease Risk Factor Interventions Zhangqing Zhuo, Eugene Ackerman and Lael Gatewood Micropopulation Simulation Resource, Health Computer Sciences, University of Minnesota, Minneapolis, MN 55455

Abstract

meaningful way. The knowledge used in performing these tasks involves the behavior of the simulation system, the nature of the conceptual model, the configuration of the software and the stochastic simulation process. Therefore, it is necessary to have expertise in a number of different fields such as statistics, modeling, computer programming, and simulation techniques to use CRISPERS correctly and intelligently. This limits the application of CRISPERS for clinical research, since most potential users do not have all of these skills. One of the common and important activities in simulation is the interaction between the simulation system and its user. A simulation system may be of the highest quality, but if the user and the system cannot communicate effectively and easily with each other this system will be of no use. The CRISPERS enhancement currently most in demand is an interface to help clinical investigators evaluate new methods and strategies for risk-factor intervention. This interface would be especially useful for designing clinical trials, planning health services, and developing prevention strategies. An expert system is a knowledge-based program that uses heuristic strategies developed by humans to solve specific classes of problems [4]. Applications of expert system technology in the area of simulation have been recognized in recent years. O'Keefe presented a number of ways in which expert systems technology could be used to enhance the area of modeling and simulation [5]. One option pertains to "interface" design. Following a dialogue with users, an interface such as CRISPERT generates the instructions or code needed to use the simulation system and then interprets and explains the results.

The feasibility of using an ezpert system to support intervention studies within CRISPERS was investigated. A prototype ezpert system named CRISPERT was designed to accept user inputs, adjust the values to CRIsPERS requirements, start a sequence of simulations, and analyze and interpret the results. The rule-based system waas implemented using the ezpert system development language OPS5 combined with FORTRAN, as well as SAS procedures and DEC VMS system service routines. Results of initial tests suggest that using an ezpert system as an interface between users and CRISPERS is a viable approach. The development of CRISPERT facilitates the usability of CRISPERS for intervention studies of coronary heart disease.

Introduction CRISPERS is a generalized micropopulation simulation system for studies of coronary heart disease (CHD) in which Monte Carlo techniques are used to make decisions. The acronym CRISPERS stands for Chronic Disease Risk Intervention Simulation Programs for Epidemiological Research Studies. It has been developed within the National Micropopulation Simulation Resource [1]. One purpose of CRISPERS is to simulate the effects of modifying cardiac risk factors on CHD morbidity and mortality. Such modifications are difficult to carry out on human populations because of the cost, the time involved, and the potentially harmful side effects of the changes. Computer simulation can be used prior to clinical studies to investigate potential impacts and synergistic effects. Various studies of intervention strategies against coronary heart disease have utilized computer simulation [2][3]. At present, simulation with CRISPERS is an interactive process in which users design the simulation, select the input parameters, run the simulation, and analyze the results. To simulate an intervention strategy within CRISPERS, initial values and desired reports must be specified during the interactive command processing. After the desired simulations have been generated, the results must be interpreted in a

0195-4210/91/$5.00 © 1992 AMIA, Inc.

System Design The design philosophy of CRISPERT stems from its purpose as an interface and from its role in the research environment. The ideal interface would follow a paradigm in which the user provides information about the study, defines the goal and lets the computer find the solution. In its initial stage, CRISPERT can help to investigate more realistic intervention strategies, give

674

more support to the novice, and assist with extracting information from simulation results.

Domain Knowledge In order to help use CRISPERS in an intervention study, the knowledge in CRISPERT should cover several different topics. They include knowledge about intervention studies, statistical methodologies useful for analysis, awareness of the CRISPERS keywords used in intervention experiments, and knowledge regarding how the simulated and real systems behave. Moreover, both knowledge about the system control structures and knowledge about interactions with system users are required to be in the knowledge base. A generic intervention framework that can be based on population, risk group or member characteristics has been implemented in CRISPERS. Users can specify desired intervention strategies by setting specific parameters. These values provide intervention patterns, depicting times and values at which to initiate, modify, and terminate risk factor changes. By specifying other parameters, either population-based or member-based intervention can be selected for each risk factor. Three possible methods are available to increase, decrease or preset each risk factor by a particular amount or percentage. Users particularly need support in interpreting the statistical comparisons used. In CRISPERS, events are generated by Monte Carlo techniques in which a pseudo-random number is used to determine the outcome of a decision rule. The simulation results are highly dependent on the initialization of the random number generator. A single simulation may produce misleading outcomes. Accordingly, each simulation is repeated many times to get a stable distribution for each outcome. Also, to investigate the effects of intervention strategies, it is desirable to use the same random seed to avoid undesirable variation due to the simulation system itself. One indication of the effects of an intervention strategy is the comparison of the frequency of occurrence for each event category between the baseline simulation and intervention experiments. Within CRISPERT, three different statistical methods can be employed to explore significant differences. First, the distributions of simulation results can be represented using quartiles, where the second quartile is the median. The difference between the third and first quartile can be used to estimate the standard deviation. If the first quartile of the baseline simulation is less than the third quartile of the simulated intervention, the effect of the strategy is probably not significant.

675

A second method compares the averages of many

replicate simulations. Since these are Monte Carlo simulations, the replicates will give a distribution for each outcome from which one can compute a sample variance. The latter can be used to estimate both a standard deviation and a standard error of the mean. In comparing the averages of the replicates, the relevant figure is the estimate of the standard error of the mean. The difference is statistically significant only if the two averages differ by more than twice the standard errors. A third method looks at the absolute difference of the means from the baseline simulation and the intervention runs. For example, if the difference is greater than one person per thousand, it may indicate that the intervention strategy is effective. Otherwise, there is no practical effect from the intervention strategy even though the difference might be statistically significant. CRISPERT has four important functions: acquiring inputs from users, controlling the simulation system CRISPERS, performing statistical analysis, and interpreting the results. The system control algorithm directs the flow of these actions. During acquisition of inputs from users, the system control orders the questions based on knowledge about running CRISPERS. A menu driven control technique is used to make it easier for the user to interact with the system. User interaction knowledge refers to data acquisition and outcome explanation. Information required from the user includes data file names, duration of the study, and the values of various parameters used in the intervention strategies. Statements used to question users are more understandable than CRISPERS'S concise keywords. The user's answers are checked and only the relevant ones are accepted. "Help" features are incorporated throughout the acquisition process. Explanations depend on the interpretation of statistics, as well as on knowledge of the model and its features. Interpretations of the effects of an intervention strategy are given for each of the three different tests of significance. Explanations about the conclusions are available at the user's request. Descriptive statistics are displayed when the user asks for an explanation. There is an opportunity to review comparison graphs at the terminal or to print copies of the graphs.

Knowledge Representation The process of structuring knowledge about a problem in a way that facilitates problem solving is called knowledge representation [6]. Numerous representational schemes have been proposed, each of them attempting to determine how knowledge should be represented by identifying the essential components of

knowledge. Three popular schemes are semantic-network, frame-based and rule-based representations. They are widely used in expert system development [6]; each has its own strengths and weaknesses. In the rule-based representation, knowledge is expressed as modular units in the form of if-then, or condition-action, or situation-action rules. They are the simplest form of knowledge representation to understand and to use. Inferential knowledge is natural to express as rules. A system can incorporate practical human knowledge in conditional rules, then solve complex problems by selecting relevant rules and combining the results in appropriate ways. However, rules are not perfect. They lack variation and are unstructured. Their format is inadequate or inconvenient for representing many types of knowledge, or for modeling the structure of a system. Nonetheless, the rule-based representation scheme is the best approach for an application when the domain knowledge has been derived from an expert's experiences in problem solving over a number of years [6]. Thus, a rule-based system was chosen for this research. The inference engine mechanisms used in the rulebased system are generally either data-driven or goaldriven. With data-driven ones, the problem solver begins with the given facts of the problem, and rules are applied whenever their left-hand side conditions are satisfied. To use this strategy, problems must have sufficient data in their initial state so that a solution can be constructed. In contrast with data-driven control, goal-driven problems are those for which a set of solutions is known and the current case is classified as one of the known solutions. A goal-driven control strategy only allows consideration of rules that are applicable to particular goals. The preferred control strategy is determined by the properties of the problem itself. A data-driven control strategy seems more appropriate for CRISPERT because the initial state of the problem contains many facts that must be synthesized into a solution.

Development Environment The development environment refers to the combination of the software tools and the hardware with which CRISPERT was developed. Both technical factors and non-technical factors were considered. Generally, the hardware considerations include integration with the existing environment, cost, performance, and available software. The selection criteria for software include cost, available features, development/delivery environment and problem fit. The most viable choice for this project involved using the available resources at the

time of implementation of the demonstration prototype CRISPERT. The Resource runs CRISPERS and other epidemiological models on several DEC 3100 VAXstations connected with several other VAXes in a Local Area VAX cluster. The VAXstation 3100 is a low-cost desktop system offering all the advantage of DEC's powerful architecture. The system supports up to 32 megabytes of memory. It can run either a VMS or an ULTRIX operating system. The system supports DECwindows, so users can work within a multi-window environment. Since each workstation is a member of the cluster, it can share storage, printers, applications, and other system resources while having dedicated processor time. The availability of this computer resource made it possible to develop CRISPERT in a DEC VAX/VMS environment. Expert system technologies tend to be different from conventional programs in their extensive use of heuristics, symbolic manipulation and inferencing mechanisms to represent knowledge. An expert system could be constructed in conventional programming languages such as FORTRAN, but the lack of symbolic representational and reasoning structures would make such programming unnecessarily complex or infeasible. The spectrum of specialized tools for expert system development ranges from very high-level programming to low-level support facilities. Different tools may be needed for different phases of a project, from prototyping development to production. Only one expert system development tool, OPS5, is available on the Micropopulation Simulation Resource's VAX computer system. OPS5, standing for Official Production System - Version 5, is a rule-based language descended from earlier OPS languages designed for AI and cognitive psychology applications [7]. As a narrowly focused expert system building tool, OPS5 is suitable for developing rule-based, datadriven systems. Several successful applications, including XCON to configure computer systems [8], have been developed with OPS5. Because of its ready availability on the Micropopulation Simulation Resource's computer system, OPS5 was chosen for the initial implementation of CRISPERT. OPS5 does not possess sufficient numerical capability for a statistical analysis environment and cannot call VMS system services and run-time library routines directly. However, OPS5 provides support for calling routines written in other languages. This makes it possible for the capabilities and strengths of other programming languages to be available to the OPS5 programmer. VAX/VMS FORTRAN, which is a superset of FORTRAN 77, was chosen for the routines

676

that perform statistical calculations, call system services and run-time library routines and communicate with CRISPERS.

Implementation The process of expert system implementation involves encoding knowledge. Ongoing maintenance and modification of the system is the key to development of intelligent high-performance programs. Modularity is an issue in any software development. CRISPERT has been implemented as a modular structure in order to enable ongoing maintenance. In the CRISPERT implementation, modularization also supports the desire to make the structure of the system reflect the structure of the domain knowledge. The CRISPERT system consists of four major phases. Figure 1 shows a schematic of the prototype CRISPERT architecture. The Controller directs the flow of control among the phases; Acquisition obtains information from the user; Analysis performs statistical analysis on simulated outcomes; and Explanation reports the results from the statistical analysis and interprets them to the user from the viewpoints of modeling and statistics. The CRISPERT Start routine sets the conflict resolution strategy, initializes working memory and invokes the Controller. CRISPERT allows the user to start a new intervention study or to analyze results from previous studies, whichever is desired. When the user decides to start a new study, the Controller invokes the Acquisition phase to let the user describe information about the study, while an input file containing the user's requirements is created. CRISPERT then automatically starts CRISPERS to simulate r- -

--

I

Figure 1. Schematic Architecture of CRISPERT (Solid lines Indicate control pathways; broken lines represent data flow)

677

the desired intervention strategy. After the requested number of simulation runs have been completed, the Analysis phase is invoked to statistically analyze the simulation outcomes. Finally, the Explanation phase is executed. To automate the execution of CRISPERS using the information from the input data file, CRISPERT creates a VMS command file. The VMS system service routines then submit the command file and simulations for execution. CRISPERS writes the simulation outcomes to files. When the user requests the analysis of simulation results, CRISPERT retrieves the data from these files. Therefore, CRISPERS and CRISPERT are interconnected through sharing of data files. The architecture was implemented by 17 programs written in VAX OPS5 and 13 subroutines written in VAX/VMS FORTRAN. The declarations of working memory elements are isolated in one module. The FORTRAN declarations define labeled common blocks. The primary functions of the OPS5 programs are control, acquisition and explanation. The FORTRAN subroutines perform statistical calculations, call VAX system services, and handle interprocess communications. SAS routines provide graphics for comparison and presentation. Currently, CRISPERT has 236 production rules. They store the domain knowledge and program control information. Other information is extracted from files created by CRISPERS during simulations and then transferred into working memory elements.

Discussion CRISPERT systematizes simulation of public health intervention strategies with CRISPERS. It assumes that users have some background in stochastic simulation techniques, but does not require the users to know the sequence of keywords processed, the kind of information required by each keyword, or the techniques of statistical comparisons between studies. This makes it possible to use CRISPERS with a minimum of knowledge in various quantitative disciplines. Therefore, the development of CRISPERT facilitates usability of CRISPERS in studies of coronary heart disease. This suggests that using an expert system as a user interface is a viable approach. The basic approach taken was to develop CRISPERT incrementally. Thus the functional capabilities of CRISPERT were gradually increased. The primary advantage is that the increases in functional capability are easier to test. Each functional increment can be verified immediately, rather than trying to do the entire validation at the end. This also decreases the

time needed for incorporating corrections in the system. In essence, the approach is of a prototype that is continuously and rapidly changing over the entire development time. Maintainability is one of the chief concerns in developing a software system. Most maintenance processes for expert systems involve upgrading the knowledge of the system and then changing the implementation to reflect the changed knowledge. CRISPERT has been implemented in such a way that the modularity of the system follows the structure of the domain knowledge, so that modification or addition of one knowledge entity would not affect others. The development tool OPS5 and the conventional programming language FORTRAN were used to implement CRISPERT. Their use was motivated by their ready availability at the Micropopulation Simulation Resource. At the beginning, from the standpoint of economic and efficiency concerns, it was a reasonable choice. However, OPS5 has several disadvantages that affected development of CRISPERT. One of them is that OPS5 does not support reasoning under uncertainty. In the current version of CRISPERT, exact reasoning is used; it deals with exact facts and comes to exact conclusions. More advanced versions will need to incorporate uncertainty. Another main disadvantage is that OPS5 does not have an explanation facility built into it. A number of test cases were run on CRISPERT; they indicated that the explanations and interpretations produced by CRISPERT agree with human experts. The test cases also indicated that the control capabilities of the system were adequate and the control of the system flowed as intended. These observations confirmed that the rule-based representation scheme with the data-driven control strategy was suitable. CRISPERT was demonstrated to Resource personnel and external users who had some background in stochastic simulation. All felt that CRISPERT was useful for intervention studies with CRISPERS. CRISPERT was also demonstrated at a recent simulation workshop. The leader and participants agreed that CRISPERT was a useful pedagogic tool. The current investigation makes a convincing argument that expert systems technology can be applied effectively to help users simulate intervention strategies with CRISPERS. Since the preliminary implementation of CRISPERT solves only a portion of the problem, the next step is extending the capability of CRISPERT. This requires that the knowledge base be increased and updated to meet the added requirements. In addition, other expert system tools may be needed for continuing development of this system.

678

Acknowledgements This research formed part of the doctoral dissertation of the first author. It was supported in part by NIH Grant P41-RR01632. The editorial efforts of Jan Marie Lundgren are appreciated.

References [1] Z. Zhuo, L. C. Gatewood, and E. Ackerman, "A Monte Carlo simulation program for coronary heart disease," in Proceedings of the Fourteenth Annual Symposium on Computer Applications in Medical Care (SCAMC) (R. A. Miller, ed.), (Los Alamitos CA), pp. 303-307, IEEE Computer Society Press, 1990.

[2] T. E. Kottke, L. Gatewood, S.-C. Wu, and H.-A. Park, "Preventing heart disease: Is treating the high risk sufficient?," J. Clin. Epidemiol., vol. 41, pp. 1083-1093, 1988.

[3] J. Tsevat, M. C. Weinstein, L. W. Williams, A. N. Tosteson, and L. Goldman, "Expected gains in life expectancy from various coronary heart disease risk factor modifications," Circulation, vol. 83, pp. 1194-1201, 1991.

[4] R. G. Bowerman and D. E. Glover, Putting Expert Systems into Practice. New York, NY: Van Nostrand Reinhold Company, 1989. [5] R. M. O'Keefe, "Simulation and expert system - A taxonomy and some examples," Simulation, vol. 46(1), pp. 10-16, 1988.

[6] D. A. Waterman, A Guide to Ezpert Systems. Reading, MA: Addison-Wesley, 1986.

[7] T. Cooper and N. Wogrin, Rule-based Programming with OPS5. San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1988.

[8] V. E. Barker and D. E. O'Connor, "Expert systems for configuration at Digital: XCON and beyond," Commun. ACM, vol. 32(3), pp. 298-310, 1989.

An expert system for simulation of coronary heart disease risk factor interventions.

The feasibility of using an expert system to support intervention studies within CRISPERS was investigated. A prototype expert system named CRISPERT w...
879KB Sizes 0 Downloads 0 Views