American Journal of Epdermology Copynght« 1992 by The Johns Hopkins University School of Hygiene and PuoSc Health Al rights reserved

Vol. 136, No. 2 Printed in U.S.A.

Issues in Human Immunodeficiency Virus (HIV) Screening Programs

Unlike test sensitivity and specificity, the false positive and negative predictive values (probabilities of mislabeling an individual being tested) depend heavily on the prevalence of the infection of the human immunodeficiency virus (HIV) as well as the quality of the kit. A consequence of this dependence is that the false positive predictive value can reach a high magnitude such as 0.9; that is, 90% of the positive tests are false. This raises many important issues pertaining to the current practice of HIV screening such as to how to control these misclassification errors, how to interpret test results, and how to estimate prevalence using test results. These issues are examined in detail here by considering the factors that dictate the quality of a screening program. Some real data examples are used to illustrate the importance of this consideration in designing programs to achieve the desired goals. The rationale behind the common two-step sequential protocol in HIV screening is examined to point out its limitations under practical situations. Finally, the use of entropy in evaluating the informativeness of a screening program is discussed. Am J Epidemiol 1992;136:244-55. bias (epidemiology); HIV; information theory; predictive value of tests; sensitivity and specificity (epidemiology)

The spread of the human immunodeficiency virus (HIV) has been devastating; it is estimated that already over a million persons in the United States have been infected with the virus (1,2). While progress has been made in treating the deadly disease, a vaccine does not seem to be forthcoming in the near future. Serious steps have been taken Received for publication July 15, 1991, and in final form March 2,1992. Abbreviations. AIDS, acquired immunodeficiency syndrome; EUSA, enzyme-linked unmunosorbent assay; HIV, human immunodeficiency virus. 1 Department of Biostatistics, Harvard School of Public Health, Boston, MA. 2 Department of Health Policy and Management, Harvard School of Public Health, Boston, MA. Reprint requests to Dr. Marcello Pagano, Department of Biostatistics, Harvard School of Public Health, 677 Huntington Ave., Boston, MA 02115. This study was supported in part by grants from the National Institutes of Health (NIAID RO1-AI28076, T32AJ07358, and N01-AI-95030). The authors thank Dr. Paul deary, Dr. Marc Lallemant, and Dr. Rachel Royce for helpful discussions during the development of the paper.

by many state legislatures to reduce the number of new infections in order to control the epidemic of the acquired immunodeficiency syndrome (AIDS) (3, 4). An effective procedure for curbing the further spread of the virus is to screen individuals for identification of HIV infectious status. Since the HIV virus is not detected directly through a screening test and the infectious status of an individual will probably not be known until some AIDS-defining symptoms are observed, labeling an individual as being either infected or uninfected based on the test results always carries a certain amount of risk, which is characterized by the false positive and negative predictive values (proportions of misclassified individuals with positive and negative test readings, respectively). What is probably unique to the HIV screening is that the false positive predictive value can reach a high magnitude, especially when testing individuals from a low-risk population. For exam244

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

Xin M. Tu,1 Eugene LJtvak,2 and Marcello Pagano1

Issues in HIV Screening

QUALITY OF HIV SCREENING PROGRAMS

Before discussing the characteristics that define the quality of a screening program, we must first clarify the notion of the quality

of a test kit and the quality of a test program. The quality of a test kit is measured by its precision in detecting the presence or absence of a disease, which is quantitatively described by the test sensitivity and specificity. The sensitivity refers to the probability of successfully detecting a disease, while the specificity refers to the probability of consistently rejecting the presence of a disease when the test kit is repeatedly applied to groups of individuals with and without the disease, respectively, under similar conditions (7). Both the sensitivity and specificity can be written with probability notation as Se = Pr{T | D) and Sp = Pr(T~ \ Dc), where D and Dc are the presence and absence of the disease; F1" and T~ are the positive and negative test readings, respectively; and Pr(A | B) is the conditional probability of event A given event B. So, the sensitivity gives an estimate of the proportion of false test readings when a test kit is applied to a group of individuals with the disease, and the specificity gives an estimate of the proportion of the false readings when the test kit is applied to a group of individuals without the disease, as a result of random errors under the repeated use of the test kit. Note that these quantities do not depend on the prevalence of a disease, since both are conditional on either the presence or absence of the disease. A related notion in measuring the quality of a test program is the probability of correctly identifying individuals for the disease status when a chosen test kit is applied to a population with a certain prevalence. Depending on the outcome of a test reading, the probability of correctly classifying an individual is given by Pr{D\ T*) if the individual tests positive, or, PriDc\T~) if the individual tests negative. With these two types of classification probabilities, which are usually referred to as the positive and negative predictive values, abbreviated PPV and NPV, respectively (7), we can calculate the probabilities of misclassification associated with the two possible outcomes of a test, namely, the false positive and negative predictive values (abbreviated FPPV and

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

pie, this false "alarming" probability can reach 0.9 when screening some low-risk populations; that is, 90 percent of the positive tests are in fact false. Such a high probability may not cause much concern in certain situations, such as testing blood samples for blood banking (even though a large number of blood samples may be falsely declared as contaminated and wasted), whereas falsely labeling individuals applying for marriage licenses, pregnant women, health care workers, and patients admitted to the hospital as carrying the virus is certainly irresponsible and can have an enormous psychological and social impact on the individuals (5, 6). Another important consequence of the high false positive predictive value is that it may be impossible to estimate the prevalence for low-risk populations in the United States from screening programs. So, it is extremely important to formulate the goals of testing, to carefully design the screening programs to achieve the desired goals, and to interpret as well as to use the test results correctly. Formulating the goals of testing often depends on the application contexts, which vary from situation to situation. To design an appropriate screening program for a particular purpose and to correctly use the test results, however, require an understanding of the characteristics that define the quality of a screening program. In this paper, we characterize the factors that determine and influence the false positive and negative predictive values, examine the quality of some screening programs that screen individuals from low-risk populations for the infection of HIV, and discuss an important consequence as a result of the high false positive predictive value in estimating HIV prevalence. We also propose the use of entropy for determining the informativeness of a screening program.

245

246

Tu et al.

FNPV, respectively) by c

FPPV = Pr(D \T+) = 1 - Pr{D\T+)=

1 - PPV

= 1 - Pr{Dc\T)

= 1 - NPV.

The false positive predictive value gives an estimate of the proportion of misclassified individuals with positive tests, and the false negative predictive value, an estimate of the proportion of misclassified individuals with negative tests when a test kit is applied to a population of a certain disease prevalence. Note that given the specifications of a test kit and the prevalence of a disease, these probabilities of misclassification can be calculated beforehand without actually testing the individuals (see below). This feature allows one to foresee the outcome of a test program and therefore select an optimal design for a given situation. For this reason, we use the term probability rather than proportion in reference to these misclassification errors below, since the calculation of the latter often implies the availability of the actual test results. Since a positive test is possible for an individual with or without the disease, the false positive and negative predictive values will depend on the prevalence of the disease as well as the sensitivity and specificity of the test kit. For example, if every individual is infected in the population, the false positive predictive value will be zero, while at the other extreme, if nobody is infected, this value will always be one, unless a perfect test kit is used. Although conceptually simple, this key difference between the quality of a test kit and that of a test program, unfortunately, is not always distinguished (5). In addition to the random errors due to the limited reliability of a test kit, errors in HIV testing can also arise from several other sources. One such source, which characterizes HIV screening, is related to the design of the tests. The enzyme-linked immunosorbent assay test and the Western blot test are both designed to detect serum antibody pro-

FPPV = 1 - PPV (1 -p){\ (1 - p X l

-Sp) -Sp)+pSe'

FNPV = 1 - NPV p{\ ~ Se) p(l -Se) + (l -p)SpThese probabilities are seen to depend on the prevalence of HIV infection, as mentioned earlier, as well as specification of a test kit in use. Note that for sequential (or serial) testing where multiple tests are performed and are serially contingent upon the positive result of the previous test (7), the above formulas are still valid upon substitution of the appropriate sensitivity and specificity. For afc-stepsequential testing (a sequential testing involving k tests), under the assumption of independence and equal sensitivity and specificity among the tests

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

FNPV = PriD\T-)

duced by the immune system in response to protein components of the virus (8). In other words, neither of these tests directly detects the HIV virus. As a consequence, an uninfected individual may test positive because of the presence of the antibody, or vice versa. This could arise, for example, when antibody-positive individuals are not infectious (9, 10), when infected individuals become too ill (in the late stage of AIDS) to produce antibody, or when infected individuals are tested during the latency period between infection and formation of antibody (11, 12). This type of error, which is difficult to quantify in a general context, may be minimized under a specific application by examining the medical history or symptoms of the individuals to be tested before the actual test, or, if one has the time and resources, by a culture. In this paper, however, we confine ourselves to the source of random errors as characterized by the test sensitivity and specificity. Given the sensitivity (Se) and specificity (Sp) of a test kit and the prevalence p = Pr{D) of HIV infection, the false positive and negative predictive values can be calculated using Bayes' formula (7, 13)

Issues in HIV Screening

(7), the sensitivity and specificity for the sequential testing, denoted Se^k) and Spw respectively, can be determined, respectively, by

-Sp)k>Sp,

and )

= Pr(Ti+, ... Tk+\D) = (Sef < Se.

Thus, the sensitivity will decrease while the specificity will increase under sequential testing. To explore the dependence on these parameters, we plotted the predictive values of misclassification errors in figures 1 and 2 as a function of prevalence for some levels of sensitivity and specificity. Note that these different levels of sensitivity and specificity reflect the wide range of test specifications as reported in some recent studies in evaluating the quality of various HIV screening programs (14—16). The study in reference 16 reported an average specificity of 0.87 based on a survey of 30 national reference laboratories, while the study in reference 14 reported a specificity of 0.99999 under a sequential testing involving four tests or a

specificity of 0.95 for a single test under the assumption of independence and equal sensitivity and specificity among the tests. Since a common protocol in many HIV screening programs is to do a Western blot confirmatory test on individuals who test positive by enzyme-linked immunosorbent assay, we focus below on a sequential testing situation with two independent tests of equal sensitivity and specificity. In this case, assuming a specificity of 0.95 for a single test, a specificity of 0.998 can be achieved for this twostep sequential testing strategy. The qualitative difference between the two types of misclassification errors, as seen from the plots, is that the false positive predictive value decreases while the false negative predictive value increases as a function of prevalence. A disturbing observation from figure 1 is that the false positive predictive value can reach as high as close to one when prevalence is small. For example, for a prevalence of 0.0001, which is typical among low-risk populations (6), this value is 0.9 even with a specificity of 0.998. A practical implication of this high value in screening for HIV infection is that individuals who test positive will have to be subject to other possible means of screening or be followed

Sp=0TB5 Sp=0.95 Sp=0.99

Sp=0.99B

08

0.4

08

LO

Prevalence FIGURE 1. The false positive predictive value {FPPV) for a fixed value of sensitivity (Se = 0.975) and different levels of specificity {Sp).

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

= 1 -(1

247

248

Tu et al.

FNPV 10—1

Se=0.90 • Se=0.95

0.6-

0.4-

y

02 —

0.0-

T" 0.0

& i

T0.2

04

0.6

—r~ OS

10

Prevalence RGURE 2. The false negative predictive value (FNPV) for a fixed value of specificity (Sp = 0.99) and different levels of sensitivity (Se).

for visible AIDS-defining symptoms, since labeling an individual with an error of 90 percent is certainly not acceptable under any circumstances. Also, as shown in figure 1, the false positive predictive value is very sensitive to specificity and is virtually independent of sensitivity when it varies in the interval between 0.96 and 0.99 (not shown). So, one possible way to control this error under the limited precision of a test kit is to increase the specificity by using a A>step sequential testing strategy. This method can be helpful in general. For example, we can reduce this probability from 0.12 to less than 0.01 by increasing the specificity from 0.985 to 0.998 when prevalence is about 0.1. However, such a reduction may not always be desirable, especially in screening individuals from lowrisk populations. For example, if the prevalence in the above example is decreased to 0.001, the false positive predictive value will change from 0.9 to 0.5. With a false positive predictive value of 0.9, that is, 90 percent of the positive results are false, we would probably label an individual who tests positive as uninfected. However, with a false positive predictive value of 0.5, that is, half of the positive tests are false, we would have to toss

a fair coin to decide on the infectious status of the individual. This phenomenon leads to the notion of informativeness of a test, which is further discussed in a later section. A protocol routinely implemented in many screening programs is two-step sequential testing; apply a secondary test, such as a Western blot, to individuals with a positive test to reduce the false positive predictive value. When screening individuals from lowrisk populations, however, this false predictive value can still reach a high magnitude. For example, for a prevalence of 0.0001, this value can reach 0.9 even with a specificity of 0.998. However, this protocol may still be useful under some circumstances (see the discussion in a later section). So far, we have focused on the false positive predictive value. The false negative predictive value, as seen from figure 2, is relatively small and stable compared with the false positive predictive value for the purpose of HIV screening (not for blood banking), where prevalence is currently less than 0.3 even for the high-risk populations. For example, with a test sensitivity of 0.99, this misclassification error is 0.000001 when prevalence is 0.0001, and it is 0.004 when prevalence is 0.3. As shown in figure 2, this

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

Se=0.99 Se=0.99B

0.8 —

Issues in HIV Screening 2 4 9

ESTIMATING PREVALENCE

High false positive predictive values also raise important questions as to how to correctly estimate prevalence using test results from screening programs. A caveat is that estimating prevalence by simply dividing the number of positive tests by the sample size can lead to severe bias. As a matter of fact, it may be impossible to reliably estimate prevalence using screening programs, especially for low-risk populations. To see this, let n+ denote the number of individuals who test positive when screening tests are applied to a population of size n. Then, an unbiased estimate of prevalence with minimal variance maximizes the following likelihood function

L=f{P)"\\

So, the test specificity is seen to play a critical role in the estimation of prevalence, and the solution of p exists unconditionally only if Sp= 1. Since the expected value E(n+/n) = f{p) > 1 - Sp, where the equality holds only when p = 0, the only possible source attributable to the violation of the estimable condition (2) is the sampling variation in n+. So, the probability P r | - > ( 1 -Sp) provides a means to characterize the estimable condition under the sampling variation. To calculate this probability, we note that n+ follows a binomial distribution Bi[k\n, f(p)] with n and f(p) being the sample size and probability of success for a binomial distribution. For large sample size n, the difficulty in calculating the probability of a binomial distribution may be facilitated by using either the normal approximation

iff(p) is not close to 0 or the Poisson approximation

where

n+~ Poi[k\\ = nf(p)],

= Pr{T+\D)p [\-Pr{T-\De)](\-p)

+

[Se + Sp-l]p, and Se and Sp denote the sensitivity and specificity, respectively. The solution of p to this maximization, denoted p, is given by

which is easily solved for p (17), yielding

with A being the mean of the Poisson distribution (13). This probability for a sample size of 10,000 is plotted in figure 3 as a function of prevalence. It is clear from the figure that this probability is very sensitive to specificity. When prevalence is about 0.0001, even with a specificity of 0.99999, we only have Pr\—>{l-Sp)\ »0.6.

+

P=

(n /n) - (1 - Sp) Se- (1 - Sp) '

(1)

It follows from equation 1 that p is estimable, or a solution between 0 and 1 exists only if ^-(l-5p)>0. (2) n (We assume that the denominator in equation 1 is positive.)

So, if using results from screening programs, estimation of prevalence for the low-risk populations can be very problematic. This case is similar to the situation arising from measuring an object with an inaccurate measuring device when the measurement errors exceed what is being measured. It is also seen from equation 1 that p « n+/n only when 1 - Sp is negligible com-

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

error may be reduced by increasing the test sensitivity even though the effect may not be dramatic, especially fof small prevalence. The magnitude of this misclassification error becomes virtually independent of test specificity when it varies between 0.95 and 0.998 (not shown in the figure).

250

Tu et al. Probibfflty to-i

OJS—I

Sp=0.85 Sp=0.95 Sp=0.998 Sp=0.99Q99

02 —

00OuOOO

OJXtt

i O004

0.COS

O008

Prevalence FIGURE 3. The estimable condition Pr[n+/n and different levels of specificity (Sp).

(1 - Sp)] for a sample size of 10,000 with test sensitivity of 0.97

TABLE 1. Comparison of prevalence p and its estimate by - — for a test with sensitivity of 0.98 and

n specificity of 0.998.

P

n+

0.0001

0.0005

0.001

0.005

0.010

0.050

0.1

0.0021

0.0025

0.003

0.0069

0.012

0.051

0.1

• n+, number of positive tests; n, number of subjects tested.

pared with n+/n. Estimating p by n+/n can lead to severe bias if both n+/n and 1 - Sp are close in magnitude. Shown in table 1 are some hypothetical examples to compare p and n+/n. Note that when n+/n is 0.0021, which is very close to 1 - Sp, n+/n overestimates the prevalence by almost 20 times. As a real data example, the New York State Health Department's Newborn Screening Program reported that 82 of 108,562 mothers residing in upstate rural areas, who had given birth during the period between November 1987 and March 1990, tested positive, or a ratio of n+/n = 0.08 percent (18, 19). Note that the data exclude the number of equivocals, which is about 0.03 percent. To estimate the prevalence in this population group, the specificity would have to reach 0.9992 for the estimable condition

(2) to hold, which, as indicated in the previous section, can hardly be accomplished using a two-step sequential testing. So, it is quite likely that these mothers who tested positive are a result of the high false positive predictive value. INFORMAT1VENESS OF SCREENING TESTS

The goal of screening in most cases is to find the subjects under testing that meet certain criteria. In this section, we confine ourselves to such situations and discuss how to determine the informativeness of a test. If the objective of a screening test is to find subjects that meet certain criteria, such as HIV infectious status, then a test is useful or informative only when such a test improves one's knowledge about each subject

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

08 H

Issues in HIV Screening

mative tests is very limited, and they should be avoided in general. The uncertainty about the infectious status of an individual before and after a test T is given by the probabilities p = Pr{D) and PPV= PriD^), respectively. If everybody in a given population is infected or free of the virus, then p = 1 or p — 0, respectively, or simply «(£>) = \Pr{D) - Pr{Dc)\ = 12p - 11 = 1. We are most uncertain about an individual's infectious status when p = P(D) = 0.5 or 5(D) = 0. In this case, a test T will be informative only if PPV = P(D | T+) * 0.5, or, equivalent^, 8(D | 7**) =

\Pr{D\T+)-Pr{Dc\T+)\

= \2PPV- 1| >

0. More generally, a test T is informative onlywhen5(Z)|r + )>5(D). Note that we discussed the notion of informativeness above purely in the context where no decision making is involved. In practice, even for an informative test, decisions regarding the use of such a test still depend on the situations at hand. For example, suppose that PPV = P(D\ T+) = 0.7 after a test is applied to a population with a prevalence of 0.5. Even though the test is informative in classifying subjects who test positive, such a test is undoubtedly useless for labeling HIV infection because of the high misclassification error of 0.3 {FPPV = 1 - PPV). However, the study of this general principle will help practitioners to be aware of this problem and avoid carrying out tests that are noninformative. To determine the informativeness of a test, we can use the quantities b(D) and 8{D | T), as above, where T refers to the outcome of a test, namely, T* or T~. However, a more established measure is the entropy, which has been widely used in characterizing the level of uncertainty in many physical sciences (20, 21). The entropy before a test T, denoted H(I), is defined by H(D) = -Pr(Dc)\og Pr{Dc) - Pr{D)\og Pr[D). Similarly, the entropy after the test T con-

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

tested regarding the criteria. In general, a test will always provide some additional information, but such information may be irrelevant or even have an opposite effect on the desired outcome. For example, as shown above, an HIV screening test may make it more difficult to label an individual who tests positive while the test is still informative overall, since the test provides additional information regarding those who test negative. So, conditional on the criterion of HIV infection, the test is considered noninformative. On the other hand, if the goal or criterion is to find subjects that are not infected, as in testing blood samples for blood banking, for example, the test is informative since it reduces our uncertainty about the samples that are not infected. Note that a noninformative test may still be desirable in some situations. For example, without any test, we would probably label a randomly selected individual from a population with a prevalence of 0.0001 as uninfected. The false positive and negative predictive values associated with this labeling procedure are 0 and 0.0001, respectively. If we apply a test with sensitivity of 0.99 and specificity of 0.998 and label an individual with a positive test result as being infected, the false positive predictive value would rise to 0.9, though the false negative predictive value would reduce to 0.000001. So if the goal in this situation is to find individuals who are infected, then the test is certainly not informative, since it is hard to imagine labeling an individual as being infected with such a high misclassification error. In practical situations, however, such a test may still be desirable. For example, if the test is applied to a population of 10,000, the number of individuals with positive test results can be expected to be 10,000 Pr(T+) = 20. Even though labeling these individuals presents difficulty, following these people for visible AIDS-related symptoms may not be expensive. Furthermore, an individual from the subpopulation with negative test results is much less unlikely to be infected because of the reduced false negative predictive value. However, the usefulness of noninfor-

251

252

Tu et al.

ditional on the test outcome is given by H(D\T) = -Pr{Dc)\og Pr(Dc\ T) - Pr(D)\o% Pr{D\T). Clearly, H(D) > 0 and H(D\ T) > 0, since the log terms above are all negative. In terms Difference in entropy to—I

06 —

02—

-02-

_to—i 00

02

OA

0.8

08

10

Prevalence FIGURE 4. Differences H(D) - H(D | T) (dashed line) and S(D \ T*) - 6(D) (solid line) for a test kit with sensitivity of 0.975 and specificity of 0.99.

Difference in entropy IJO—l

01-

-01-

-O8-

-L0-

—r~ OJ)

02

~i 04



r~ OJt

08

LO

Prevalence FIGURE 5. Differences H(DC) - H{DC | T~) (dashed line) and b(Dc | T") - 5(DC) (solid line) for a test kit with sensitivity of 0.975 and specificity of 0.99.

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

of entropy, a test T is informative for classifying subjects only if H(D) - H(D\ T+) > 0 for positive tests or H(D) - H(D | T~) > 0 for negative tests. Plotted in figures 4 and 5 are the quantities H(D) - H(D\T+), SiDlT*) - 8(D), H(DC) - H(DC\ T-), and 5(DC\ T~) - S(DC).

Issues in HIV Screening

Note that both H(D) - HWJT)

and

tially reduced by increasing the specificity of both tests. DISCUSSION

In this paper, we have discussed the dependence of the false positive predictive value on the prevalence of HIV as well as the factors that determine and therefore can be controlled to minimize this misclassification error in the context of a single and sequential testing situations. Most practitioners may be aware of this dependence. However, the seriousness of the problem caused by the high false positive predictive value does not seem to have been adequately addressed. Since prevalence plays a critical role in reducing classification errors, especially the false positive predictive value, it is important to incorporate any covariate information about prevalence into classification. For example, the status of intravenous drug use would help reduce the classification errors because of the difference in prevalence between drug users and non-drug users. Another effective strategy is to use the group sequential testing strategy to further increase the specificity (E. Litvak, X. M. Tu, and

Difference in entropy 10—1

0J6-

-Of

Sp=0.85 Sp=0.95 Sp=0.99 Sp=0.99B

-08 —

-L0—I OJD

OX

OA

0.6

0.8

Prevalence FIGURE 6. Difference in entropy H(D) - H{D 17+) for a fixed value of sensitivity (Se = 0.975) and different levels of specificity (Sp).

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

5(D | T+) - b{D) as well as H{D) - H(D \ T) and 5{D \ T~) — 8{D) change sign at the same prevalence values, but the curves corresponding to the entropy are smooth whereas the others are not. The two plots clearly indicate the region of prevalence where the test is informative for a given test outcome. Since the current prevalence of HIV in the United States is smaller than 0.3 even among high-risk groups (6), the test is always informative for labeling uninfected individuals. Plotted in figure 6 are the H(D) H(D | T*) for four levels of specificity. As shown in the figure, we can substantially reduce the size of the region where the test is uninformative by increasing the test specificity. With the limited precision of a test kit, one way to achieve this is to use the sequential testing strategy as discussed earlier. Plotted in figure 7 are the H(D\ 7,+) H(D\ T,+, T2+) for three different combinations of sensitivity and specificity with two test kits in a two-step sequential testing situation (T\ followed by T2). As seen from the figure, the size of the region where the test is uninformative depends on the sensitivity and specificity of both tests and is substan-

253

254

Tu et al.

Difference in enfropy to-|

08-

0.0

FIGURE 7. Difference in entropy H(D I V ) - H(D | T,+, T2+) for different combinations of sensitivity and specificity with two test kits in a two-step sequential testing (T, followed by T2). The specifications of the kits for the first and the second tests are denoted by Se1 and Sp1, and Se2 and Sp2, respectively.

M. Pagano, Harvard School of Public Health, unpublished manuscript). This group sequential testing strategy also offers an advantage by simultaneously increasing the test sensitivity. Such a testing strategy not only dramatically increases both the predictive values but also leads to tremendous cost savings. An important consequence of low prevalence and a high false positive predictive value is that screening programs that use the common protocol of two-step sequential testing may be inappropriate for estimating prevalence, especially for low-risk populations. Severe bias can arise if one estimates the prevalence by simply dividing the number of positives by the total sample size. One possible solution is to design protocols using the group sequential testing procedures (E. Litvak, X. M. Tu, and M. Pagano, Harvard School of Public Health, unpublished manuscript), which reduce the false predictive values about 40 times on the average even for a prevalence of 0.0001 under the same precision of a single test. A common misconception about testing is that a test will always provide some additional information. Our analysis in the

previous section is intended to clarify this misconception and to emphasize the importance of having a clear and specific goal before implementing any test. However, despite this general principle, which should always be kept in mind when designing a test program, a noninformative test may still be necessary for HIV screening in some situations. REFERENCES 1. Karon JM, Dondero TJ Jr. HIV prevalence estimates and AIDS case projections for the United States: report based on a workshop. MMWR 1990; 39:1-31. 2. Angell M. A dual approach to the AIDS epidemic. N Engl J Med 1991;324:1498-1500. 3. Thomac C. A synopsis of state AIDS-related legislation: January to July 1987. Washington, DC: Intergovernmental Health Policy Project, George Washington University, 1987. 4. Cleary PD, Barry MJ, Mayer KH, et al. Compulsory premarital screening for the human immunodeficiency virus. JAMA 1987,258:1757-62. 5. Meyer KB, Pauker SG. Screening for HIV: Can we afford the false positive rate? N Engl J Med 1987;317:238^1. 6. Barry MJ, Cleary PD, Fineberg HV. Screening for HIV infection: risks, benefits, and the burden of proof. Law Med Health Care 1986;14:259-67. 7. Fletcher RH, Fletcher SW, Wagner EH. Clinical epidemiology. 2nd ed. Baltimore, MD: Williams & Wilkins, 1988.

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

Sel=Se2=0.90,Spl=Sp2=0.85 Sel=0.90,Se2=0.99.Spl=0.85,Sp2=0.95 Sel =Se2=0.99.Spl =Sp2=0.95

Issues in HIV Screening

15.

16. 17. 18. 19. 20. 21. .

surement of the false positive rate in a screening program for human immunodeficiency virus infections. N Engl J Med 1988;319:961-4. Ozanne G, Fauvel M. Performance and reliability offivecommercial enzyme-linked immunosorbent assay kits in screening for anti-human immunodeficiency virus antibody in high-risk subjects. J Gin Microbiol 1988;26:1496-1500. Jeremy JS, Snell E, Supran M, et al. World Health Organization quality assessment program on HIV testing. AIDS 1990;4:803-6. Levy PS, Kass EH. A three-population model for sequential screening for bacteriuria. Am J Epidemiol 1970;91:148-54. Novick LF. New York State HIV seroprevalence project goals, windows, and policy consideration. Am J Public Health 1991;81(suppl):ll-14. Novick LF, Glebatis DM, Stricof RL, et al. New York State HIV seroprevalence study: methods and results. Am J Public Health 1991 ;81 (suppl): 15-21. Harris B. Entropy. In: Kots S, Johnson NL, eds. Encyclopedia of statistical sciences. Vol 2. New York: John Wiley & Sons, 1982:512-16. Shannon CE, Weaver W. The mathematical theory of communication. Champaign, IL: University of Illinois Press, 1959.

Downloaded from https://academic.oup.com/aje/article-abstract/136/2/244/131610 by Lancaster University user on 12 January 2019

8. Essex M, Allan J, Kanki P, et al. Antigens of human T-lymphotropic virus type IH/lymphadenopathyassociated virus. Ann Intern Med 1985;103:70O-3. 9. Burger H, Weiser B, Robinson WS, et al. Transient antibody to lymphadenopathy-associated virus/ human T-lymphotropic virus type III and Tlymphocyte abnormalities in the wife of a man who developed the acquired immunodeficiency syndrome. Ann Intern Med 1985;103:545-7. 10. Burger H, Weiser B, Robinson WS, et al. Transmission of lymphadenopathy-associated virus/ human T-lymphotropic virus type III in sexual partners: seropositivity does not predict infectivity in all cases. Am J Med 1986;81:5-10. 11. Horsburgh CR, Jason J, Longini IM, et al. Duration of human immunodeficiency virus infection before detection of antibody. Lancet 1989^2:637-9. 12. McNeil JG, Brundage JF, Wann ZF, et al. Direct measurement of human immunodeficiency virus seroconversions in a serially tested population of young adults in the United States Army, October 1985 to October 1987. N Engl J Med 1989;320: 1581-5. 13. DeGroot MH. Probability and statistics. 2nd ed. Reading, MA: Addison-Wesley, Inc, 1987. 14. Burke DS, Brundage JF, Redfield RR, et al. Mea-

255

Issues in human immunodeficiency virus (HIV) screening programs.

Unlike test sensitivity and specificity, the false positive and negative predictive values (probabilities of mislabeling an individual being tested) d...
728KB Sizes 0 Downloads 0 Views