ORIGINAL ARTICLE

Estimating Community Prevalence of Ocular Chlamydia trachomatis Infection using Pooled Polymerase Chain Reaction Testing Kathryn J. Ray1, Zhaoxia Zhou1, Vicky Cevallos1, Stephanie Chin1, Wayne Enanoria1,2, Fengchen Lui1,3, Thomas M. Lietman1,4,5,6, and Travis C. Porco1,4,5 1

F.I. Proctor Foundation, University of California, San Francisco, CA, USA, 2Division of Epidemiology, University of California Berkeley, School of Public Health, San Francisco, CA, USA, 3School of Information Science and Technology, Donghua University, Shanghai, China, 4Department of Ophthalmology, University of California, San Francisco, CA, USA, 5Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA, and 6Institute for Global Health, University of California, San Francisco, CA, USA

ABSTRACT

Keywords: Chlamydial infection, polymerase chain reaction, pooling of data, trachoma prevalence

INTRODUCTION

20 14

Purpose: Trachoma is the leading cause of blindness from infection worldwide. Treatment programs require accurate Chlamydia trachomatis infection prevalence rates to guide decision making. The use of clinical examination is by far the most common way to monitor activity, but may yield overestimates of infection prevalence. Laboratory testing on individual specimens such as polymerase chain reaction (PCR) is highly sensitive and specific, but prohibitively expensive. Here we demonstrate simulations of pooled PCR results may estimate infection prevalence of an entire community yielding substantial cost savings if pool size is chosen correctly. Methods: Community infection prevalence was estimated using maximum likelihood estimation with data collected from a previously described study. Simulations for communities were performed to determine the accuracy of prevalence estimation using pooled results. The root mean squared error was then used to determine an acceptable inaccuracy in estimates allowing for a pooling strategy to be formed. Results: Results from simulations and empirical data suggest optimum pooling strategies to estimate community infection prevalence while keeping the root mean squared error of the estimate below 2%. Reduction of PCR testing which permits cost savings is shown to be between 5 and 80% given a community infection prevalence below 60%. Conclusions: Pooling specimens for PCR testing often provides enough data to accurately estimate infection prevalence at the community level.

mass administration of azithromycin, an antibiotic which treats the infection with high efficacy.2,3 Polymerase chain reaction (PCR) tests are the gold standard for detecting infection, yet performing such tests on individuals can be prohibitively expensive when conducting large studies.4 Thus, when infection

Trachoma, an ocular disease which results from repeated Chlamydia trachomatis infection, remains one of the major causes of blindness in the world today.1 A global elimination program is underway based on

Received 26 March 2012; Revised 7 August 2013; Accepted 16 August 2013; Published online 24 February 2014 Correspondence: Travis Porco, F.I. Proctor Foundation, Department of Epidemiology and Biostatistics and Department of Ophthalmology, UCSF, San Francisco, CA 94143-0412, USA. Fax: +1 415 476 0527. E-mail: [email protected]

86

Estimating Prevalence using Pooled Specimens prevalence drops to low levels in particular areas, control programs often rely on clinical signs alone to decide whether or not to continue antibiotic treatments. Unfortunately, such clinical signs are not a reliable indicator of infection, and the resulting crude estimates of the prevalence may not be adequate for program guidance.5–9 Performing PCR tests on pooled specimens is less expensive than testing individual specimens and allows for more accurate prevalence estimates than the use of clinical signs.10 In the context of infectious disease, pooling specimens has two uses: (1) prevalence estimation, or (2) case identification.11 Unpooling and re-testing the individual specimens of positive pools are not necessary here because everyone in the community will be given antibiotic treatment. The pooled results are explicitly used to estimate prevalence of infection in a community. To effectively make use of specimen pooling, guidance is needed regarding how many specimens should comprise a pool. Large pools consisting of many specimens are less expensive, but may provide less information. Moreover, it is not always possible for all pools to be formed from the same number of specimens. In this report, we examine a general method of estimation applicable to arbitrary pooling, with no requirement of equal pool sizes. When using pooled PCR results to estimate community prevalence, overestimation often occurs when all pools test positive. Here, we compute the probability of all pools testing positive for differing levels of prevalence, pool sizes, and total number of specimens. We form pooling strategies by determining the optimal pool size based on projected prevalence for community sizes of 50–200 as well as when pools must be divided into smaller subsets. Finally, we estimate reduction of PCR tests that pooling strategies can provide, which in turn provide cost savings.

MATERIALS AND METHODS

monitored for ocular chlamydial infection at baseline, and 2, 6, 12, 18, and 24 months after baseline. Ocular specimens were obtained by swabbing each child across the tarsal conjunctiva of the right upper eyelid. To estimate the prevalence of infection, specimens from each child in a community were taken, randomized in order, and then pooled into groups of 5 with the remainder forming a separate, smaller, pool. In each case, these pools were tested for the presence of chlamydial DNA using Amplicor PCR (Roche Diagnostics, Branchburg, NJ, USA). Quality assurance of pooling was conducted by randomly choosing 24 positive pools and 36 negative pools out of those analyzed. All specimens from these selected pools were then retested individually. PCR results, positive or negative, along with the size of each pool (the number of specimens comprising the pool) were used to estimate the community infection prevalence. When a negative result was obtained for a given pool, it was assumed that all specimens comprising the pool were negative; however, when a pool yielded a positive result, the number of positive specimens was only known to be one or more. For a set of pools of equal size, the community prevalence can be estimated based on the observation that when sampling from an infinite population with prevalence (p), the probability (q) of a positive pool of size (s) is 1–(1 – p)s.14 Given an observed relative frequency q of positive pools, the prevalence can be estimated from the equation: ^p ¼ 1 ð1 ^qÞ1=s .15 More generally, for a set of n PCR pools of size s1, s2, . . . , sn with results Y1, Y2, . . . , Yn, (where Yi = 1 if pool is positive and Yi = 0 if pool i is negative) let P denote the number of positive infected specimens, N denote the number of negative specimens, and xi denote the number of positive specimens in pool i. The probability P (Y1 = y1, . . . ,Yn = yn) of obtaining the pooled results is the likelihood PðY1 ¼ y1 , . . . , Yn ¼ yn Þ ¼

sn yn X xn ¼yn

In a previously described study, Trachoma Elimination Follow-up (TEF), specimens were collected according to a pre-specified plan from communities enrolled in mass antibiotic treatment programs.12,13 The clinical trial was described in depth previously and registered with clinicaltrials.gov (NCT00221364). Simulations were based on data from the collected specimens, which were performed according to the guidelines of the Declaration of Helsinki. Ethical review and approval for this study was obtained from the Committee for Human Research of the University of California, San Francisco, USA, Emory University Institutional Review Board, and the Ethiopian Science and Technology Commission. Briefly, 1–5-year-old children, those most likely to harbor infection, were !

2014 Informa Healthcare USA, Inc.

87

s 2 y2 X s1 y1 X Uðx1 , . . . , xn Þ PþN x ¼y x ¼y 2

2

1

1

s1 , . . . , sn ð1Þ N P where Uðx1 , . . . , xn Þ ¼ , s1 x1 , . . . , sn xn x1 , . . . , xn P and the multinomial coefficient is x1 , . . . , xn defined to be zero unless all xi are integers such that Pn i¼1 xi ¼ P (and similarly for the other multinomial ^ of coefficients). Given observed data Yi, the value P P which maximizes the likelihood (1) yields an estimate of the prevalence fraction in the community, ^ P=ðP þ NÞ. ^ PÞ, of the estimated We determined the bias, ðP ^ number P of infected individuals, using simulations

88 K. J. Ray et al. for communities of size 49–200 and pools of size 2, 3, and 5 (typical pool sizes used in previous studies;12,13,16,17 for convenience, we did not consider pools of size 4). Specifically, we assumed a true number of positive infections, and simulated the possible assignments of the positives over a given set of pool sizes. This yielded simulated pooled results Y1, Y2, . . . , Yn. Computations compare the ^ which permits the bias ðP ^ PÞ to true (known) P to P be estimated, i.e. the difference between the true number of infections and the average simulated estimated number of infections. Finally, we computed the root mean squared error qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Pc 2 j¼1 j ðJtrue JÞ , c where c represents the community size, j represents the probability of estimating j infections, Jtrue represents the true number of infections in the community, and J represents the estimated number of infections. This measurement reflects not only the bias, but also the variance in the estimator. The root mean squared error was used to determine an acceptable inaccuracy in the estimate which occurs due to estimator bias. We set the threshold for the root mean squared error at 2%, a value which corresponds to one infection in a typical population size of 50 children. A pooling strategy was then suggested based on the projected prevalence range and size of a community. Additional simulations yielded the percentage of positive pools, given the true prevalence of infection. This information provided guidelines for finding the largest pool size such that the expected mean squared error was within acceptable limits. Since PCR costs can vary among different laboratories, we estimated overall PCR test reductions dependent on the initial projection of infection prevalence. The percent savings can be calculated based on the reduced number of PCR tests needed once the specimens are pooled. We conducted simulation analysis of prevalence estimation through pooling using the software program Mathematica (Wolfram Research, Champaign, IL, USA). Community infection prevalence was estimated using maximum likelihood estimation. We focused on simulations for communities ranging from size 50–200 persons and pool sizes of 2, 3, or 5. The root mean squared error) was then used to determine an acceptable inaccuracy in estimates allowing for a pooling strategy to be formed based on different projected prevalences and community sizes. Further simulations were run to obtain fraction of pooled results testing positive for specific pool sizes and community sizes. Finally we varied the true infection levels in the model in order to observe how often all pools tested positive.

RESULTS Quality assurance data which consisted of 60 5-pools were PCR tested and then un-pooled and tested individually. Pooled results proved to be 100% sensitive compared to individual testing. Each of the 24 positive 5-pools (120 specimens in total) contained at least one individual specimen which tested positive. Each of the 180 specimens in the 36 negative pools was negative when tested individually. ^ PÞ, of the estimated To estimate the bias, ðP ^ of infected individuals we considered number P specimen sizes of 24, 49, 99, 149 and 199 chosen so as to require unequal pool sizes. The bias, defined as the difference between the expected value of the estimated number of infections and the true number of infections, was computed using repeated simulations. We plotted the simulated bias and root mean squared error for 1, 2, 3, . . . n infections for a community of size 49, with the majority of pools being 2, 3, or 5, but including a remainder (Figure 1a and b). Figure 1(b) shows that for a community of size 49, pools of size 5 are acceptable for infection prevalence 520%, pools of size 3 are acceptable for prevalence range 20–40%, and pools of size 2 for prevalence range 40–60%. Acceptable sizes are based on keeping the root mean squared error 52%. Pooling size recommendations for other community sizes are summarized in Figure 2. The results from additional simulations yielded the percentage of positive pools, given the true prevalence of infection (Figure 1c). For any given prevalence, the fraction of pools testing positive is higher if the pools are bigger. When the fraction of pools testing positive is too high, precise estimation of community prevalence is not possible. Thus, in practice, if initially choosing a particular pool size results in too high a fraction of pools testing positive, retesting specimens with smaller pools should be considered. The dashed line in Figure 1(c) shows conditions when a particular pool size is no longer a good strategy for estimating community prevalence. Figure 1(d) illustrates the sigmoid increase in the probability that all pools will test positive as the true prevalence of infection increases, shown for community sizes of 50 and 200. Pooling strategies are viable provided the probability that all pools test positive is small. For example, 50 specimens, 25 of which are true infections, grouped into pools of 2, yields a probability of all pools testing positive of 3 105. Simulations for 2-pooling indicate the chance of finding all pools positive is very small if the true infection prevalence is below 60% (Figure 1d). The recommendations summarized in Figure 3 are all based on simulations and can be derived from results shown in Figure 1(c). Table 1 displays the percent reduction in the number of PCR tests needed with use of pooling strategies versus performing individual PCR tests for Ophthalmic Epidemiology

Estimating Prevalence using Pooled Specimens (a)

(b)

20

5-pool

5-pool 3-Pool 2-Pool 2% RMSE bound

0.40

3-Pool

Error of Estimator (RMSE of 10K Simulations)

Bias of the Estimator

(Mean of 10K Simulations)

2-Pool

15

10

5

0.35 0.30 0.25 0.20 0.15 0.10 0.05

0 0%

20%

40%

60%

80%

100%

0.00 0%

True Prevalence of Infection (Community Size=49)

-5 1.0

(d)

Fraction of Pools Testing Positive (Mean of 10K Simulations)

0.9

3-Pool

0.8

2-Pool Individual Testing

0.7 0.6 0.5 0.4 0.3 0.2 0.1

20% 40% 60% 80% True Prevalence of Infection (Community Size=49)

100%

Pool size 5, 50 specimens

1.0

Pool size 5, 200 specimens

5-pool

Simulations in which all pools test positive (Proportion of 10K simulations)

(c)

89

Pool size 3, 50 specimens Pool size 3, 200 specimens Pool size 2, 50 specimens

0.8

Pool size 2, 200 specimens

0.6

0.4

0.2

0.0 0%

20%

40%

60%

80%

100%

0.0 0%

True Prevalence of Infection

20%

40% 60% 80% True Prevalence of Infection

100%

^ PÞ on average given the true FIGURE 1. (a) Simulation results of pooled testing for trachoma infection showing estimator bias ðP prevalence of infection for a community size of 49 people. (b) Simulation results of pooled trachoma testing showing the root mean square error (RMSE) given the true prevalence of infection for a community size of 49 people. (c) Results of pooled trachoma testing showing the proportion of simulations in which all pooled specimens tested positive given the true prevalence of infection for a community size of 50 people. The dashed lines indicate conditions when a particular pool size choice will no longer yield accurate estimates of community prevalence. (d) Sigmoid increase in the probability that all pools will test positive as the true prevalence of infection increases, in pooled trachoma testing.

2000 specimens (approximately the number needed for the TEF study). The fewer tests performed, the greater the cost reduction achieved. The reduction in tests is greatest when the prevalence is 520%, which permits accurate estimation using pools of size 5. The loss in accuracy for all strategies presented in Table 1 is 52% root mean squared error of the estimate. This table assumes infection prevalence is below 60%. If communities have infection prevalence higher than 60%, pooling to obtain estimates is not recommended.

DISCUSSION While the gold standard for assessing infection in a community is individual-level PCR testing, in practice such methods are prohibitively expensive to use in trachoma control programs. The use of pooled estimates has been effective in such settings as the TEF study and Trachoma Amelioration in Northern !

2014 Informa Healthcare USA, Inc.

Amhara clinical trial, where pooled estimates were used as primary outcomes.12,13,16,17,19 Unlike other pooling PCR protocols, we do not require unpooling and retesting specimens at the individual level. In this paper we provided recommendations for pooling that are applicable in a broader setting, and show that the imprecision introduced by pooling can be kept within acceptable limits by simply choosing a small enough pool size. Figure 1(d) demonstrates that as the infection prevalence increases, the probability that all pools will be positive increases. The sharp rise in the sigmoid curve away from the x-axis demonstrates circumstances in which a particular pooling strategy is no longer viable. The maximum likelihood method we employ is applicable for arbitrary (including unequal) pool sizes, and can be straightforwardly computed using Monte Carlo simulation for the population sizes that were used in previous studies. It is extremely important that a proper randomization technique is used for specimens prior to grouping into

90 K. J. Ray et al. pools for PCR testing. This is true because samples collected in a sequence are more likely to cluster positive values due to subjects being in the same family or neighborhood. It should be noted that if investigators are selecting specimens from a random sample of children from the community, there will be additional variability in the prevalence estimates due

to sampling error. Here we aimed to collect specimens from all children aged 1–5 years. In practice, while the crude estimates available from clinical signs may be inadequate to reliably estimate prevalence, they have been adequate for determining pool sizes. For example, the prevalence of trachomatous clinical activity is relatively inexpensive to obtain and often available in trachoma programs. While clinical activity is not reliable on an individual level, it can provide guidance for crude estimates of prevalence in a community. Reanalysis with smaller pools is possible when analysis of the first pooled set suggests that the initial pooling size is too large given the prevalence. Moreover, when the prevalence is small, as during the later stages of an elimination program, pools of size 5 will be adequate, leading to substantial cost savings when compared to individual testing. Note that pools larger than 5 cannot be recommended until it is demonstrated that reduced sensitivity of the test due to dilution of the individual specimens is not a major issue.18 Pooling is most cost-effective when prevalence is low, which is precisely the setting in which it is needed to determine whether or not to continue mass administration of antibiotics. The cost savings with PCR pooling is estimated to be between 5 and 80% depending on pool size and the cost of individual PCR tests. Observing clinical activity trachomatous follicles or trachomatous inflammation is even less

Projected prevalence of infection

100% Individual Testing

80% 2-Pools

60% 3-Pools

40% 5-Pools

20%

0% 50

100

150

200

Community Size

FIGURE 2. Optimum testing pool sizes based on projected prevalence of trachoma infection for a community size of 50–200 people.

> 60%

What is the projected infection prevalence?

≤ 20%

21-40%

Test specimens individually

41-60%

Pool specimens into groups of 5 and perform pooled PCR tests

Pool specimens into groups of 3 and perform PCR tests

Pool specimens into groups of 2 and perform PCR tests

What percent of pools tested positive?

What percent of pools tested positive?

What percent of pools tested positive?

< 70% Use pooled specimen results to estimate community prevalence

70-90% Retest specimens using pools of size 3

≥ 90% Retest specimens individually

< 80% Use pooled specimen results to estimate community prevalence

80-90% Retest specimens using pools of size 2

> 90% Retest specimens individually

≤ 90% Use pooled specimen results to estimate community prevalence

>90% Retest specimens individually

FIGURE 3. Simple approximate guide for pooling strategies based on simulation results of pooled polymerase chain reaction (PCR) trachoma testing. Ophthalmic Epidemiology

Estimating Prevalence using Pooled Specimens

91

TABLE 1. Estimated reduction in polymerase chain reaction (PCR) tests for trachoma using recommended pooling strategy* for 2000 specimens and communities sized 50–200 people. Scenario A: Initial projection of infection prevalence is accurate Initial projection of infection prevalence, % 20 21–40 41–60 60

Scenario B: Additional retesting using smaller pools is needed*

Recommended size of pool, n

Total PCR tests, n

Reduction compared to individual PCR testing, %

Total PCR tests, n

Reduction compared to individual PCR testing, %

5 3 2 1

400 667 1000 2000

80 67 50 0

1067 1667 1900 2000

47 17 5 0

*See Figure 3

expensive, unfortunately, it is not nearly as reliable an indication of active trachoma, because of delayed clearing of follicles.10,20 As clinical activity may remain misleadingly high, judicious use of pooled PCR testing in selected resource-limited settings in selected areas may permit less frequent treatment.

DECLARATION OF INTEREST The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper. The National Eye Institute of the National Institutes of Health was the primary supporter of this analysis (TEF study, NCT00322972).

REFERENCES 1. Resnikoff S, Pascolini D, Etya’ale D, et al. Global data on visual impairment in the year 2002. Bull World Health Organ 2004;82:844–851. 2. Bailey RL, Arullendran P, Whittle HC, Mabey DC. Randomised controlled trial of single-dose azithromycin in treatment of trachoma. Lancet 1993;342(8869):453–456. 3. WHO. Report of the first meeting of the W.H.O. Alliance for Global Elimination of Trachoma. Geneva: World Health Organization; 1997. 4. Bailey RL, Hampton TJ, Hayes LJ, et al. Polymerase chain reaction for the detection of ocular chlamydial infection in trachoma-endemic communities. J Infect Dis 1994;170: 709–712. 5. Miller K, Schmidt G, Melese M, et al. How reliable is the clinical exam in detecting ocular chlamydial infection? Ophthalmic Epidemiol 2004;11:255–262. 6. Thein J, Zhao P, Liu H, et al. Does clinical diagnosis indicate ocular chlamydial infection in areas with a low prevalence of trachoma? Ophthalmic Epidemiol 2002;9: 263–269. 7. Michel CE, Solomon AW, Magbanua JP, et al. Field evaluation of a rapid point-of-care assay for targeting antibiotic treatment for trachoma control: a comparative study. Lancet 2006;367(9522):1585–1590.

!

2014 Informa Healthcare USA, Inc.

8. Keenan JD, Lakew T, Alemayehu W, et al. Slow resolution of clinically active trachoma following successful mass antibiotic treatments. Arch Ophthalmol 2011;129: 512–513. 9. Wright HR, Taylor HR. Clinical examination and laboratory tests for estimation of trachoma prevalence in a remote setting: what are they really telling us? Lancet Infect Dis 2005;5:313–320. 10. Bird M, Dawson CR, Schachter JS, et al. Does the diagnosis of trachoma adequately identify ocular chlamydial infection in trachoma-endemic areas? J Infect Dis 2003;187: 1669–1673. 11. Kim H-Y, Hudgens MG, Dreyfuss JM, et al. Comparison of Group Testing Algorithms for Case Identifications in the Presence of Test Error. Biometrics 2007;63:1152–1163. 12. Chidambaram JD, Alemayehu W, Melese M, et al. Effect of a single mass antibiotic distribution on the prevalence of infectious trachoma. JAMA 2006;295:1142–1146. 13. Lakew T, House J, Hong KC, et al. Reduction and return of infectious trachoma in severely affected communities in Ethiopia. PLoS Negl Trop Dis 2009;3:e376. 14. Dorfman R. The detection of defective members of large populations. Ann Math Stat 1943;14:436–440. 15. Hepworth G, Watson R. Debiased estimation of proportions in group testing. J Roy Stat Soc 2009;58:105–121. 16. House JI, Ayele B, Porco TC, et al. Assessment of herd protection against trachoma due to repeated mass antibiotic distributions: a cluster-randomised trial. Lancet 2009; 373(9669):1111–1118. 17. Stoller NE, Gebre T, Ayele B, et al. Efficacy of latrine promotion on emergence of infection with ocular Chlamydia trachomatis after mass antibiotic treatment: a cluster-randomized trial. Int Health 2011;3:75–84. 18. Diamant J, Benis R, Schachter J, et al. Pooling of Chlamydia laboratory tests to determine the prevalence of ocular Chlamydia trachomatis infection. Ophthalmic Epidemiol 2001; 8:109–117. 19. Melese M, Alemayehu W, Lakew T, et al. Comparison of annual and biannual mass antibiotic administration for elimination of infectious trachoma. JAMA 2008;299: 778–784. 20. Keenan JD, Lakew T, Alemayehu W, et al. Clinical activity and polymerase chain reaction evidence of chlamydial infection after repeated mass antibiotic treatments for trachoma. Am J Trop Med Hyg 2010;83: 482–487.

Copyright of Ophthalmic Epidemiology is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.