Stochastic Simulation Algorithms for Query Networks Steve B. Cousins Abstract One of the barriers to using belief networks for medical information retrieval is the computational cost of reasoning as the networks become large. Stochastic simulation algorithms allow one to compute approximations of probability values in a reasonable amount of time. We previously examined the performance of five stochastic simulation algorithms applied to four simple belief networks networks and found that the SelfImportance algorithm performed well. In this paper, we examine how the same five algorithms perform when applied to a belief network derived from the cardiovascular subtree of the Medical Subject Headings (MeSH). Both the Likelihood Weighting and SelfImportance algorithms perform well when applied to the MeSHderived network, suggesting that stochastic simulation algorithms may provide reasonable performance in medical information retrieval settings. Introduction A belief network is a knowledge representation and inference technique formalized as a directed graph in which nodes represent probabilistic assertions and edges represent conditional probabilities between nodes [1]. Query networks are belief networks in which nodes represent index terms or concepts and edges represent the degree of belief in the relevance of a concept represented by a distal node given the relevance or irrelevance of the concept represented by a proximal node. We have been exploring the use of query networks as user models and domain models in medicine, and have created query networks from the terms in MeSH as well as by hand. In this paper, we concentrate on inference with query networks; for more details on using query networks for information retrieval, see [2,3]. The computational complexity of updating query networks threatens the practicality of query networkbased information retrieval; exact calculation is computationally intractable in the general case [4]. In some domains, approximation algorithms have proved suffiently powerful to allow for the routine use of belief networks, and we hypothesize that similar approximation algorithms may be applicable to information 01954210/91/$5.00 C) 1992 AMIA, Inc.
William Chen
Mark E. Frisse
Charles N. Mead
retrieval problems. To test this hypothesis, we implemented a collection of previouslydefined belief network stochastic simulation algorithms; we call our software CABeN (Collection of Algorithms for Belief Networks). CABeN implements an Equiprobable Sampling algorithm, the Likelihood Weighting algorithm (also called the Basic algorithm), the SelfImportance algorithm [5], Pearl's Markov simulation algorithm [1] and Chavez's BNRAS algorithm [6,7]. CABeN consists of a library of simulation algorithms, a characterbased user interface, and several sample programs. It is written in ANSI C and currently runs under MSDOS and SunOS (UNIX). Each algorithm can be run with or without a Markov blanket scoring option [8]. (When Markov blanket scoring is used, node probabilities are calculated on the basis of a node's "local neighborhood" rather than solely on the basis of a node's parents.) This paper presents a simplified description of the simulation algorithms. A more rigorous treatment can be found in a paper by Shachter and Peot [5].
Belief Networks Belief networks are based on the firm theoretical foundation of probability theory. Inference in belief networks is performed by propagating the effects of evidence; after evidence propagation, the posterior marginal probability of node X (herein simplified to "probablity of X") corresponds to p(XIEl = e1, E2 = e2,...,En= en), where Ei = ei indicates that evidence node Ei is in state ei. An example of belief network inference using a boolean network is shown in Figure 1A. The values within the nodes in Figure 1A indicate the initial probabilities of the variables represented by the nodes. Because the nodes are boolean, the numbers represent the conditional proability that the node state is TRUE. At initialization, evidence has not yet been provided to the network and node posterior marginal probabilities are calculated from the prior probability of the "root" node (A) and the conditional probabilities between nodes. Evidence can be provided to a network in several ways. The most common method assign state values to nodes (TRUE or FALSE in the boolean case). For example, if node C is fixed to state TRUE, the probabilities of the other nodes are recalculated with this new information (Figure 1B). As a result, the probabilities assigned to direct descendent and ancestor nodes have increased, while the values assigned to
696
(i) pA)0.8
p0IA)  0.7 upO I7A) 0.2 p(CIS) 0.5
iMIQ p(EC)

0.9
(A)
0.9Pcli ) 0.90.1 (B)
(C)
(D)
Figure 1: A simple belief network (A) with its associated conditional probabilities and posterior marginal probabilities (on nodes) (B) with the node C set to 'TRUE' (C) with both A and C fixed and (D) with evidence given to C as an oddslikelihood ratio. nodes "more remote" to C have changed only minimally. If the value of node A is now set to FALSE, the most prominant effects of propagation are seen in nodes B and D ( Figure 1C). Any node in a network may have "evidence" nodes associated with it. Updating of node values occurs when evidence is provided to a node's "evidence" nodes. Evidence can be propagated by means of oddslikelihood ratios. Figure 1D represents the case where an "evidence" node C' is related to node C by a positive oddslikelihood ratio of 2: 1. Note that probability values for immediate neighbors of node C in the case where evidence node C' is set to TRUE are not as high as they are in the case where the value of node C itself is set to TRUE ( Figure 1B). Belief Network Simulation Algorithms In networks in which the number of nodes is small and few nodes have multiple parents, it is possible to use algorithms which calculate the exact value of each node's updated probability [9,1]. Unfortunately, most realistic applications require networks in which there are large numbers of nodes and many nodes have large numbers of parent nodes. Often, these networks can only be solved by using approximation algorithms. To demonstrate how an approximation algorithm works, consider a very simple network with 3 boolean nodes, A, B, and C (Figure 2). This network has 23 = 8 states. Joint probability values for each node can be calculated as shown in Table 1. The probability value for any variable in any state can be calculated by summing over all rows in which the variable is assigned to the state. (Each row represents the value associated with a particular state assignement for every node in the network). For example, p(A = TRUE) is the sum of the first four rows, or 0.32 + 0.08 + 0.005 + 0.095 = 0.5. Evidence affects a variable's value by restricting the cases considered to those in which the evidence node is in the desired state
P(A) = 0.5
p(BIA)=0.8
p(ECTB)= 0.8
p IA) =0.05
P(CIO)= 0.05
Figure 2: A trivial 3node belief network. and then normalizing the resulting value. For example, if C is set to FALSE, we consider only the 4 rows where C is FALSE. In this case, the calculated value 0.08+0.095 for A would be e0.08+0.095+0.005+0.45125 027723. V. &Unfortunately, the sizes of belief network state tables grow exponentially as a function of the number of nodes in the network, so these exhaustive calculations become intractable for all but the simplest graphs. Simulation algorithms select a subset of states, calculate the value for the row associated with that subset of states, and use calculated values to estimate the values that would be obtained if all rows were selected and calculated. The five approximation algorithms considered in this paper differ from one another primarily in the method employed to select rows. The simplest algorithm, Equiprobable Sampling, chooses rows at random. Each state subset (row) has an equal chance of selection. Although this selection method has the advantage of simplicity, it does not select rows (node states) efficiently and often produces inaccurate overall node probability value estimations. The assignment of states to the Basic ( Likelihood Weighting) algorithm is weighted by the calculated probability value of a node. This strategy requires that the nodes be considered in graph order. For example, if the value of node A ( Figure 2) has been set to FALSE, node B would be set TRUE 1 in 20 times: p(B I A) = 0.05; states where node B is set to TRUE
697
Table 1: Exhaustive list of joint probabilities for a trivial belief network. Value A B C T T T p(A)p(B I A)p(C B) 0.32 T T F p(A)p(B I A)p(C I B) 0.08 T F T p(A)p(B I A)p(C B) 0.005 T F F p(A)p(B A)p(C B) 0.095 F T T p(A)p(B A)p(C B) 0.02 F T F p(A)p(13 I A)p(C B) 0.005 F F T p(A)p(B I A)p(C B) 0.02375 F F F p(A)p(B I A)p(C B) 0.45125 I
.1
1
.
.1
I

(A) FlatGraph
.
(5B L meGraph
1W , a