IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 4, JULY 1978

368

Stochastic Models for Multistage Cell Classification Systems

JAMES L. CAMBIER,

MEMBER, IEEE, AND

LEON L. WHEELESS, JR.,

Abstract-Probabilistic models for multistage cell classification systems are described. A simple finite Markov chain models classification events which occur as a cell passes through the system. The state space consists of various identities assigned to the cell, including true celi type and identities assigned by classifiers. Effects of throughput rate, data buffer capacity, and classifier processing rate on system performance are predicted by another model composed of a network of single server queues. Markov and queue models are interrelated in that classification events at one processor (modeled by the Markov chain) govern arrival rates of other processors. In turn, the queue model predicts the probability that a cell wili be missed due to fmite data buffer capacity. The miss event is modeled by the Markov chain as a possible classification outcome. Application of the models is illustrated for a multistage gynecologic flow prescreening system with slit-scan processing in the first stage and two dimensional image processing in the second. Results predict system sensitivity as a function of first stage false alann rate and abnormal cell occurrence rate.

INTRODUCTION R ESEARCH efforts in automated prescreening for gynecologic cancer and its precursors have primarily been devoted to single stage cell classification systems employing a single feature set extracted from all cells processed. These include high resolution slide-based systems [1]- [4], zero resolution flow systems [5] - [8], and medium resolution flow systems [9]. Results of a recent study on a slit-scan flow system [10], [11] indicate that a combination of low resolution first stage and two dimensional higher resolution second stage processing may be necessary to achieve desired system sensitivity to abnormal cells while maintaining acceptable false negative rates. A multistage slide-based system has been proposed by Poulsen and Marshall [4]. Design of a multistage prescreening system requires consideration of many factors. Error rate performance of individual classifiers must be weighed against cost and processing time. Processing time and data buffer capacity govern maximum cell throughput rates which individual classifiers can tolerate. This paper presents probabilistic models which may be applied to multistage classification systems. The models may be used to predict overall system performance as a function of error and throughput rates of individual classifiers and the decision rules governing passage of cells through the system. Manuscript received March 22, 1977; revised September 13, 1977. This work was supported by the National Cancer Institute under Contract NO1-CB-33862. The authors are with the Cytopathology Automation Division, Department of Pathology, University of Rochester Medical Center, Rochester, NY 14642.

SENIOR MEMBER, IEEE

Applicability of the models is very general. They may be used to compare various configurations of classifiers and study effects of specimen composition or other factors on overall system performance. Model structure is presented for a multistage flow prescreening system suggested by Wheeless, et al. [11]. Performance of such a system is examined for various assumptions concerning specimen composition and classifier error rates. MODEL STRUCTURE The multistage classification system gives rise to two separate models. A simple finite Markov chain models classification events which occur as a cell passes through the system. True cell identity is reflected by transitions from an initial state; subsequent states represent tentative cell identities assigned by various classifier stages, and final classification is represented by a terminal state. Classification error rates, conditioned on true cell identity, define Markov transition probabilities. Overall system error rates are calculated as multistep transition probabilities. Dynamics of system operation are represented by a second model composed of an array of finite single server queues. Cells entering the system arrive according to a probabilistic process; each classifier extracts data from arriving cells, placing it in a buffer having known capacity. Processing time required to extract features from incoming data and classify each cell is described probabilistically, and the classification decision determines whether or not the cell must be processed by a subsequent stage. Data buffer capacity assumes an important role in this model, since a cell arriving at a classifier whose data buffer is full cannot be processed. The cell data, as a result, are usually lost. This miss event comprises one possible result of an attempted classification, and it may be incorporated into the Markov chain model. Analytic solutions for ergodic queue length distributions [13] are difficult if not impossible to obtain when arrival and service processes are non-Markovian. Hence, modeling studies for cell classification systems were performed using computer simulation to obtain queue size distributions and missed cell probabilities. MODEL FOR Two STAGE FLUORESCENCE-BASED FLOW PRESCREENING SYSTEM Application of modeling techniques will be illustrated for the multistage prescreening system illustrated in Figure 1. Cells stained with acridine orange, a fluorescent dye, flow through a thin wall of laser excitation illumination in the first stage. One dimensional ("slit-scan") fluorescence con-

0018-9294/78/0700-0368$00.75

i

1978 IEEE

CAMBIER AND WHEELESS: STOCHASTIC MODELS AND CELL CLASSIFICATION FLOW

1

369

P(ST)

RATE A

FIRST STAGE

.4

-

.3

-

.2

-

.1

-

6.

-0 o'

0

DETECTOR

QI

BUFFER

RATE p) >-

SECOND STAGE

4=>~~

~ } ~

Q2, ST2 CAMERA

PROCESSOR

X ~~~~

Q3

BUFFER

ST3_

PROCESSOR

Figure 1 Queue model for multistage prescreening system.

tours are collected and transferred to the first stage data buffer Ql, which has capacity KI. For cells uniformly distributed in a specimen volume sufficient to provide for low cell coincidence rates at the measurement region, it is reasonable to assume that criteria for a Poisson arrival process [12] are met. Cell arrivals at the first stage are considered Poisson with rate X. Time required for first stage feature extraction and classification is service time STI. ST1 is a random variable with mean 1/Ul, where Ul is first stage service rate. An empirical first stage service time distribution obtained from analysis of approximately 700 slit-scan contours was used (Figure 2). Time required to collect slit-scan contours and transfer them to the data buffer is negligible, since samples are stored in computer memory as fast as they are collected. Slit-scan contours are processed on a first-in, first-out basis and cells classified suspicious in the first stage are referred to the second stage for additional processing. Cells classified suspicious are termed alarms; they may be false alarms, normal cells erroneously classified suspicious, or true alarms, abnormal cells correctly classified suspicious. Variable p will be called the first stage alarm rate. Since any randomly chosen cell has probability p of producing a first stage alarm, second stage intervals constitute a Poisson process with rate pX. The second stage detector utilizes a very short laser illumination pulse to capture the cell image on a high sensitivity vidicon camera tube. The digital image is transferred to a computer for second stage processing. Time required to transfer image data from camera to computer is not negligible; hence the camera itself is considered a single server queue (Q2) with capacity one and constant service time ST2 (equal to 1 /U2). Cells arriving for second stage processing when the camera queue is not empty are missed. Second stage image processing time ST3 is assumed to have a negative exponential distribution with rate U3, so the mean of ST3 is 1/U3. Q3 has capacity K3. The Markov chain model for the system is presented in Figure 3. Each state represents an identity assigned to the cell, and a transition from state i to state j occurs with probability Pij. Transitions from state 0 represent a priori cell type; thus Po1 is the probability that a randomly chosen cell is abnormal. States 3-9 are identities assigned by classifiers; P25, for example, is the probability that a normal cell is classified suspicious in the first stage. True cell types are abnormal and normal, with the latter category including possible false alarm causes [11] such as binucleate cells,

0

I,

0

2

8

6

4

12

10

14

1B

16

20

Figure 2 Empirical service time distribution for processing of slitscan contours. ST is in milliseconds.

ABNORMAL

/

TRUE CELL TYPES

2 N

STAGE I~~~ST OUTPUT

\

ORMAL

\

CLASSES

NORMAL

SUSP.

SUSP. 7

MISSED

8

~~~~~

SUSPICIOUS

NORMAL

2ND STAGE OUTPUT CLASSES

NORMAL

Figure 3 Markov chain for multistage prescreening system.

cellular and noncellular artifact, improperly oriented cells, etc. Cells are classified, in the first stage, as normal or suspicious, with all suspicious cells constituting first stage alarms. These cells may be classified normal or suspicious or be missed in the second stage. RESULTS AND DISCUSSION Simulation studies conducted with the model were intended to demonstrate effects of camera and second stage processing times and first stage alarm rate p on miss rates at queues Q2 and Q3. M2 is the miss rate at Q2, the camera queue, and M3 is the miss rate at Q3, the second stage processor queue. Mean first stage service time was set at 10 units (U1 = 0.1) and arrival rate X at 0.04 for a first stage traffic intensity p equal to 0.4. First stage queue size Kl was set equal to 6; this resulted in no significant number of misses at this stage. A more practical interpretation of these rates may be realized by considering time to be measured in milliseconds; mean ST1 is then 10 ms and system throughput rate is 40 cells/s. Typical values for U2 are 0.01 and for U3, 0.001 and 0.002. Corresponding mean service times for Q2 and Q3 are 1/U2 (100 ms) and 1/U3 (1 and 0.5 s), respectively. Values assumed for service rates Ul, U2, and U3 and arrival rate X are typical of a prototype instrument used to establish feasibility of the slit-scan flow system approach [9]. Tenfold throughput rate increase is anticipated in a final instrument design, requiring dedicated digital and analog-digital hybrid processors to effect higher service rates. Simulations of increased throughput rates reported elsewhere [14] indicated

370

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 4, JULY 1978

0.5

-

0.4

-

0.3

-

A

True alarm rate = PTA

=.04

=P01 P14P48False alarm rate = PFA

0.2 -z 0. I

00

p

.02

.06

.04

.08

.10

Figure 4 Miss rates M2 and M3 versus first stage alarm rate p for multistage prescreening system. Ninety percent confidence intervals are indicated by vertical bars.

that a tenfold speed increase may produce unacceptable missed cell rates unless second stage detection and processing times can be decreased proportionately, or first stage false alarm rate reduced to below 0.1 percent, or a combination thereof. The queue simulation program was run for Ul = 0.1, U2 = 0.01, U3 = 0.001 and 0.002, and K3 = 6 with results depicted in Figure 4. Miss rates were higher for the camera stage (M2) than the second stage image processor (M3) for U3 = 0.002. First stage alarm rate p, as stated earlier, is the proportion of arriving cells which produce first stage alarms. Thus, referring to Figure 3, (1) p = P01 P14 + P02 P25. For each value of p, the simulation was allowed to run continuously until at least 150 alarms had occurred. Corresponding sample sizes ranged from 20,000 for p = 0.1 to 32,000 for p = 0.005. This assured reasonably small statistical variation in miss rates, which were calculated as

(3)

= P02P25 P58 (4) As miss rates P47 and P57 increase P48, P49, P58 and P59 must decrease since cells are less likely to reach the second stage processor. This decrease may be formulated as follows. Assume that v and w are second stage true alarm and false alarm rates, respectively. Then for non-zero miss rates P48 = P [abnormal cell is not missed and is classified suspi-

cious in second stage] = P [abnormal cell is classified suspicious in second

stage] P [it is not missed], [since the two events are independent, and] P48 = V(1 - P47).

Likewise P49 =( - V)(1 -P47) P58 W(1 P57) P59 = (1 - W) (1 - P57). Combining Equations 5a-d with 3 and 4: =

-

PTA =POI P14 v(1 - P47) PFA P02P25 W(1 - P57).

(Sa) (Sb) (Sc) (Sd)

(6) (7)

Figures 5-8 illustrate PTA and PFA as a function of P25 for various Po0. Values assumed for Markov transition probabilities were: P14 = 1.0 w =0.1 (2) v=0.9.

P(Ml) = Mi/TA P(M2) = M2/TAL P(M3) = M3/(TAL - M2) where

Miss rates P47 and P57 were calculated from M2 and M3 plotted in Figure 4 for U2 = 0.01, U3 = 0.002. Second stage classifier error rates w and v equal to 0.1 and 0.9 represent a TAL = total first stage alarms. worst-case condition, since these values are based on typical These rates were used to calculate P47 and PS7 (Figure 3): performance expected of binucleate cell recognition techniques reported elsewhere [14]. Most other first stage false alarms P47 = P57 = P(M2) + P(M3) (1 P(M2)). should be easier to recognize than those from binucleate Miss rates were identical for normal and abnormal cells since cells, which occur rarely compared to other false alarm causes. the miss event depends only on queue state in the second and Error rates expected of a practical first stage slit-scan processor include near zero missed positive cells [10]; hence P14 third stages. TA = total arrivals

-

Figure 4 illustrates dependence of miss rates on alarm rate. Effects of missed cell rates on overall system operation are more clearly illustrated by plots of expected second stage true and false alarm rates as a function of first stage false alarm rate P25. P25 is an important first stage performance measure, since it indicates the degree to which the first stage "enriches" the cell population; that is, increases the expected proportion of abnormal cells in the population input to the second stage. From the Markov chain:

is set to 1.0. Figures 5-8 reflect decreases in true alarm rate as first stage false alarm rate P25 increases. These are due to increased "loading" of the second stage processor by additional first stage alarms, and consequent increased miss rates, which reduce the number of true alarms reaching the second stage. This reduction is not highly significant, especially in relation to the wide variation of false alarm rate PFA observed for the same P25 range. Overall system performance depends on

CAMBIER AND WHEELESS: STOCHASTIC MODELS AND CELL CLASSIFICATION

371

Pol01-4

0

.02

.04

.08

.06

.10

Figure 6 True alarm (PTA) and false alarm (PFA) rates versus first stage false alarm rate (P25) for abnormal cell occurrence rate (Po,) equal to 10-3 (0.1 percent). 0

.02

.06

.04

.08

.10

Figure 5 True alarm (PTA) and false alarm (PFA) rates versus first stage false alarm rate (P25) for abnormal cell occurrence rate (Po,) equal to 10'4 (0.01 percent).

PFA -I

Po=-2

I0 0

.02

.04

.06

P01= 10

.08

0

.10

.02

.04

.06

.08

.10

Figure 7 True alarm (PTA) and false alarm (PFA) rates versus first stage false alarm rate (P25) for abnormal cell occurrence rate (Po,) equal to 10-2 (1 percent).

Figure 8 True alarm (PTA) and false alarm (PFA) rates versus first stage false alarm rate (P25) for abnormal cell occurrence rate (Po,) equal to 10-1 (10 percent).

relative numbers of alarms expected from normal and abnormal specimens. Single cell false alarm and true alarm rates may be used to predict screening performance expressed as specimen false positive and false negative rates. Given a single cell false alarm rate PFA and true alarm rate PTA it is assumed that all normal specimens have alarm rate PFA with classification of each cell an independent trial with probability of success

PFA.

Similarly, all abnormal specimens have alarm rate (PFA PTA) with classification of normal and abnormal cells constituting independent trials with success probabilities +

PFA and PETA, respectively. The number of alarms expected

from a normal specimen containing n cells may then be regarded as the total number of successes in n independent trials of an event with probability PFA. The number of alarms from an abnormal specimen is based on events with proba-

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 4, JULY 1978

372

pW-01°

P25 '10

> I-

0.1:

0

w Ir U,n -J

49

.oo0 .001

.01

0.1

4-

.001

Figure 9 Specimen false positive versus false negative rate for first stage false alarm rate (P2s) equal to 0.01 (1 percent) and abnormal cell occurrence rate (Po,) equal to 0.0001 (0.01 percent), 0.0003 (0.03 percent), 0.0005 (0.05 percent), and 0.0007 (0.07 percent).

0.1

.01

P(FALSE NEGATIVE)

P(FALSE NEGATIVE)

Figure 10 Specimen false positive versus false negative rate for first stage false alarm rate (P25) equal to 0.10 (10 percent) and abnormal cell occurrence rate (Po,) equal to 0.0001 (0.01 percent), 0.001 (0.1 percent), 0.002 (0.2 percent), and 0.003 (0.3 percent).

bility (PFA +PTA). The DeMoivre-Laplace limit theorem [15] then states that probabilities of events defined in terms of S,, the number of alarms, approach in the limit as n - oo corresponding probabilities calculated from a normal N(0, 1) distribution where z is the reduced random variable

Po0 = .000,

w

=.O0

I

j

Sn - np Z =-

'0 +925=°l %~~~2

Snp

Here p is the probability of success of a single independent trial, and q = (1 - p). Screening performance predicted by the model for various proportions of abnormal cells present (Po,) is plotted in Figures 9 and 10 for first stage false alarm rates P25 = 0.01 and P25 = 0.10, respectively. Stated requirements for an automated prescreening instrument typically include ability to detect 1 abnormal cell in 103 normal cells (Po, = 0.001) while maintaining approximately twenty percent false positive and less than 0.1 percent false negative specimens. Results indicate that for Po, values greater than 0.001 (0.1 percent abnormal cells in the specimen) acceptable error rates can be achieved if P25 < 0.01. For P25 = 0.1, Po 1 must be approximately 0.002 or more for comparable performance. Very low abnormal cell incidence rates, such as PoI = 0.0001 in Figures 9 and 10, result in poor screening performance for model parameters assumed thus far. Improved system sensitivity may be achieved with second stage false alarm rate w equal to 0.01 instead of 0.1. As stated previously, w = 0.1 represents a worst-case condition, and w = 0.01 is probably realistic when false alarm causes other than binucleate cells are considered. Figure 11 illustrates screening performance for various first stage false alarm rates (P25) when Po1 = 0.0001 and w = 0.01. Acceptable performance is predicted for P25 equal to 0.005 (0.5 percent) or less. Curves for P25 equal to 0.01 or less were calculated from the Poisson approxi-

a

Stochastic models for multistage cell classification systems.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 4, JULY 1978 368 Stochastic Models for Multistage Cell Classification Systems JAMES L...
2MB Sizes 0 Downloads 0 Views