Journal of Medical Microbiology (2014), 63, 433–440

DOI 10.1099/jmm.0.060616-0

Comparative analysis of microbiome between accurately identified 16S rDNA and quantified bacteria in simulated samples Haiyin Wang,13 Pengcheng Du,13 Juan Li,13 Yuanyuan Zhang,1 Wen Zhang,1 Na Han,1 Patrick C. Y. Woo2,3 and Chen Chen1 1

Correspondence

National Institute for Communicable Disease Control and Prevention, Center for Disease Control and Prevention, State Key Laboratory for Infectious Disease Prevention and Control, Beijing 102206, PR China

Chen Chen [email protected]

2

State Key Laboratory of Emerging Infectious Diseases, Research Centre of Infection and Immunology, Department of Microbiology, University of Hong Kong, Hong Kong, PR China

3

Carol Yu Centre for Infection, University of Hong Kong, PR China

Received 10 April 2013 Accepted 27 November 2013

Although 16S rRNA gene (rDNA) sequencing is the gold standard for categorizing bacteria or characterizing microbial communities its clinical utility is limited by bias in metagenomic studies, in either the experiments or the data analyses. To evaluate the efficiency of current metagenomic methods, we sequenced seven simulated samples of ten bacterial species mixed at different concentrations. The V3 region of 16S rDNA was targeted and used to determine the distribution of bacterial species. The number of target sequences in individual simulated samples was in the range 1–1000 to provide a better reflection of natural microbial communities. However, for a given bacterial species present in the same proportion but at different concentrations, the observed percentage of 16S rDNAs was similar, except at very low concentrations that cannot be detected by real-time PCR. These results confirmed that the comparative microbiome in a sample characterized by 16S rDNA sequencing is sufficient to detect not only potential infectious pathogens, but also the relative proportion of 16S rDNA in the sample.

INTRODUCTION Micro-organisms are ubiquitous in the environment and human body and the study of community structures and dynamics is a research hotspot (Huttenhower et al., 2012). More than 99 % of micro-organisms in the environment are non-culturable (Kaeberlein et al., 2002) and the proportion of each species in a sample changes during culture as the culturable components of the sample grow. Generally, the best way to overcome the problems presented by non-culturable organisms is to detect genetic markers directly in a sample. The gene encoding 16S rRNA (16S rDNA) is the gold standard for the identification and classification of bacteria because of its high conservation within species, variation between species, and stable rate of evolution (Fox et al., 1977); it is also the preferred marker in microbial ecology (Ward et al., 1990). Sequencing of 16S rDNA facilitates the discovery of novel bacteria in clinical microbiology laboratories (Woo et al., 2008). 3These authors contributed equally to this work. Abbreviations: 16S rDNA, 16S rRNA gene; OTU, operational taxonomic units; RAP, ratio between accurate proportions; RMP, ratio between measured proportions; SBN, standardized bacterial number.

060616 G 2014 SGM

Printed in Great Britain

Next-generation sequencing technology (von Bubnoff, 2008; Mardis, 2008) enables us to sequence culturable and nonculturable microbes directly from samples, providing powerful insight into the microbial world (Wooley et al., 2010). The 454 pyrosequencing and Solexa/Illumina iontorrent approaches are technologically matured and widely used in 16S rDNA sequencing (Mardis, 2008). Target regions are PCR-amplified using universal primers and sequenced on high-throughput instruments. Most bioinformatics workflows start by trimming read data then calculating multiple alignment distances to cluster sequences into operational taxonomic units (OTUs), which are used to represent microbial diversity and mapped to the 16S rDNA database for taxonomic identification (Lozupone et al., 2006, 2011; Cole et al., 2009; Sun et al., 2009; Schloss & Handelsman, 2006; Griffen et al., 2011; Giongo et al., 2010). These approaches have been used to study the microbial communities of the human gut (Peterson et al., 2008; Eckburg et al., 2005), respiratory tract (Zhao et al., 2012; Charlson et al., 2010), sea water (Gilbert et al., 2012) and soil (Roesch et al., 2007). However, many factors cause the results to diverge from the true bacterial composition and it has thus far been impossible to correct for these divergences. First, it is difficult to obtain DNA of sufficient quality, and extraction methods differ in the 433

Library construction and sequencing. The 16S rDNA hypervariable

region 3 (V3, length y200 bp) was amplified with a universal primer set (forward, 59-CCAGACTCCTACGGGAGGCAG-39; reverse, 59-CGTATTACCGCGGCTGCTG-39). All samples were amplified under the same conditions: denaturation at 94 uC, annealing at 51 uC and extension at 72 uC, all for 45 s with 25 cycles of amplification. After separation in a 434

Gram stain

No data available; 2; Gram-negative; +; Gram-positive.

DNA was mixed in pure water to produce various concentrations and proportions of 16S rDNA copy number (Tables 1 and 2). All strains were mixed together at the same concentrations to compose samples S1 to S5 containing 103, 104, 105, 106 and 107 c.f.u. ml21. Samples S6 and S7 are graded mixtures of species (Table 2); S7 contains the same species and proportions as S6, but at tenfold concentrations.

ND,

204 204 204 204 204 204 204 204 204 54.13 54.91 55.3 55.51 55.1 54.62 55.08 53.94 50.99 5

ND ND

33

ND

2.94

1489 1446 1452 1452 1441 1450 1447 1446 1563 7 7 7 10 8 7 7 47.3 52.3 56.8 61.5 54.8 59.7 52.2

37 uC to obtain single bacterial colonies, which were then cultured in trypticase soy broth at 37 uC for 24 h before DNA extraction. Genomic DNA was extracted with a Promega kit. DNA concentrations were measured on a NanoDrop system and the 16S rDNA copy number per microlitre was calculated as follows; copies ml215Xng ml21/ (L6324.5 g mol21/6.0261023mol21)6N (Xng ml21, DNA mass concentration; L, genome size; 324.5 g mol21, average molecular mass of a dNTP; 6.0261023mol21, Avogadro constant; N, 16S rDNA copy number in the genome of each species).

Table 1. Bacterial species used in this study

Sample preparation. All bacteria were grown on blood agar plates at

Species

chosen to represent a wide range of microbial and molecular properties. They included Gram-positive and Gram-negative bacteria, normal commensals of the gastrointestinal tract and pathogens. Genome sizes ranged from 2.53 to 5.28 Mb, G+C content ranged from 33 to 61.5 mol % and the copy number of rDNA operons ranged from five to ten (Table 1). In order to more accurately simulate the actual conditions in real samples, we included Enterobacter aerogenes and Salmonella typhimurium, which have identical 16s rDNA sequences.

Genome length (Mb)

Genome G+C (mol%)

Bacterial strains. The ten bacterial species used in this study were

4.68 4.9 4.53 4.74 5.28 5.02 4.88

METHODS

2 2 2 2 2 2 2 2 +

No. of rRNA copies

16S Length (bp)

To evaluate the efficiency of current metagenomic sequencing approaches in detecting microbial composition, we designed serial experiments with simulated samples. Ten bacterial species were mixed in pure water at varying concentrations and proportions to simulate real samples. The results showed interesting and meaningful associations between simulated and actual sample compositions, which may be used to evaluate and optimize methods and data analysis. Deviations between the simulated and real sample results were surprisingly large. Even so, our results suggest our methods are reliable and valuable for microbial community research, although they require further validation.

Yersinia enterocolitica Citrobacter freundii Enterobacter sakazakii Aeromonas hydrophila Enterobacter aerogenes Serratia marcescens Salmonella typhimurium Plesiomonas shigelloides Staphylococcus aureus

16S G+C (mol%)

V3 Length (bp)

V3 G+C (mol%)

efficiency with which they extract DNA from mixed samples, particularly those from the environment (Luna et al., 2006; Salonen et al., 2010; Terrat et al., 2012). In addition, universal primer hybridization efficiency varies with template type, concentration, and distribution (Hong et al., 2009). Nextgeneration sequencing technology is sufficiently random that low-frequency amplicons are less likely to yield sequences (Schloss et al., 2011) and different 16S rDNA variable regions and sequencing methods also influence the results (Claesson et al., 2010). From DNA extraction to 16S rDNA sequencing is a black box in which efficiencies are unknown. Errors and their indeterminacy will reduce the efficiency of present metagenomic methods and limit their advanced application.

54.9 54.9 54.9 54.9 55.88 54.9 55.88 56.37 51.96

H. Wang and others

Journal of Medical Microbiology 63

Comparison of actual and simulated microbiomes 2 % agarose gel and retrieval with a Qiagen Gel Extraction kit, amplicons were linked to sequencing barcodes and libraries constructed. All libraries were mixed for sequencing on the Illumina Genome Analyser IIx (GAIIx) to obtain 100 bp paired-end reads. Composition analyses of metagenomic data. Bacterial composi-

tions were analysed by directly mapping reads to full-length 16S rDNA sequences instead of clustering reads into OTUs. We did this to avoid errors that occur in bioinformatics flows (Barriuso et al., 2011; Hamady & Knight, 2009) that were not considered in this study. First, sequencing data were trimmed to reduce low-quality reads with more than two bases lower than Q13 by using PERL script, then were mapped to accurate reference sequences using BLAST and default parameters. Forward and reverse reads that exactly matched the reference were considered complete matches. The paired-end read numbers for each species were obtained and standardized to one million. Duplications of the 16S rDNA gene were considered and the copy number in each species was divided by read number to obtain the standardized bacterial number (SBN). Finally, the proportion of each species was calculated using the SBN of each species divided by the total SBN and the bacterial composition of each sample was obtained. By calculating the relative ratios for each species, the variations between measured and actual compositions were quantified, and a two-tailed paired t-test was carried out between relative ratios to evaluate the significance of variation using SPSS. Multiple alignments were performed in MUSCLE and the phylogenic tree was reconstructed using the neighbour-joining method. Ratios between accurate proportions (RAPs) were calculated by accurate proportions of S6 and S7 divided by the proportions of S1 to S5. Ratios between measured proportions (RMPs) were calculated by measured proportions of S6 and S7 divided by the proportions of S1 to S5.

RESULTS Actual sample compositions and sequencing results Ten bacterial species, namely, Lactococcus lactis, Yersinia enterocolitica, Citrobacter freundii, Enterobacter sakazakii, Aeromonas hydrophila, E. aerogenes, Serratia marcescens, Staphylococcus aureus, Salmonella typhimurium and Plesiomonas shigelloides were selected and mixed at varying concentrations to produce samples S1 to S7 (Table 2). In samples S1 to S5, each species contains the same 16S rDNA concentration; samples S6 and S7 are mixed gradient samples. Total 16S rDNA copies ranged from 103 to 109

copies ml21 in all samples. These samples were used to evaluate the accuracy, specificity, and sensitivity of metagenomic analysis. In our metagenomic study, we used the 16S rDNA V3 region to generate seven libraries for sequencing. The experiment yielded 6.5 million Illumina paired-end 100 bp reads. In all libraries, 62.72–99.65 % paired-end reads were mapped to references. Reads of eight species were unique (Table 3) and the bacterial numbers of each species were calculated (Table 4). The sequencing regions of E. aerogenes and Salmonella typhimurium are identical; therefore, their reads are indistinguishable (Fig. 1). Hybridization sites in each species were compared and only one substitution on the fourth base of the forward primer (G to A) was confirmed in L. lactis (Fig. 1), indicating that the primers were suitable for our purpose. Comparison of measured and actual compositions Measured composition in all samples. The results for S1 to

S5 showed that even if the initial composition of the mixture is the same, measurements vary widely. In these samples, the SBNs of each species were distributed over a wide range, although all species were present in equal concentrations. The species with the highest SBN was E. sakazakii in S1 and P. shigelloides in S2 to S5 (Fig. 2a). Two groups were observed: the SBNs of Y. enterocolitica, C. freundii, A. hydrophila and Staphylococcus aureus were stable in the first five samples and were defined as the stable group (S group); the other five species including L. lactis, E. sakazakii, Serratia marcescens and P. shigelloides were classified as the variable group (V group), which exhibited remarkably variable SBNs, and the total proportion of these five species was more than 95 %. The proportion of most species was clearly different between S1 and the other samples; in S2, the divergences became smaller; and in S3 to S5, the varying species proportions were indistinguishable (Fig. 2b). In the gradient mixed samples, S6 and S7, the measured compositions differed from the actual values. The actual initial proportions of Staphylococcus aureus and P. shigelloides

Table 2. Bacterial solutions in samples Values refer to log10 c.f.u. Sample no. S1 S2 S3 S4 S5 S6 S7

L. lactis

Y. enterocolitica

C. freundii

E. sakazakii

A. hydro.

E. aerogenes

S. marcescens

S. aureus

S. typhimurium

P. shigelloides

2 3 4 5 6 3 4

2 3 4 5 6 3 4

2 3 4 5 6 4 5

2 3 4 5 6 4 5

2 3 4 5 6 5 6

2 3 4 5 6 5 6

2 3 4 5 6 6 7

2 3 4 5 6 6 7

2 3 4 5 6 7 8

2 3 4 5 6 7 8

http://jmm.sgmjournals.org

435

H. Wang and others

Table 3. Number of paired reads of each species in all samples Sample no. S1 S2 S3 S4 S5 S6 S7

Raw reads 757 709 920 390 1 195 777 854 751 770 140 832 984 1 164 704

L. lactis 18 420 82 458 240 860 191 074 213 250 213 138

Y. C. E. A. E. enterocolitica freundii sakazakii hydro. aerogenes 2 099 895 261 24 24 8 4

4 407 18 748 44 480 20 543 22 621 48 54

488 671 257 913 149 740 54 849 54 525 1 414 1 437

644 1 441 4 893 2 457 2 421 67 30

were the same at 45 %. Even so, the measured proportions of Staphylococcus aureus were lowest in the V group, while P. shigelloides composed more than 99.5 % (Fig. 3a). The measured bacterial compositions of S6 and S7 were the same, although the concentrations of each species were tenfold greater in S7 than in S6. The similar results obtained for S1 to S5 suggest that some species, such as Staphylococcus aureus, are difficult to detect accurately with this approach. Since reads of E. aerogenes and Salmonella typhimurium cannot be distinguished and reads from the other eight species compose more than 99.7 % of the total, these two species had a negligible impact on the results for other species and were not included in the analyses of S1 to S7. Quantitative comparison in equally proportioned samples To evaluate the stability of measured compositions in equally proportioned samples S1 to S5, we studied the order of priority. For example, in S1, E. sakazakii appeared to be most prevalent; in S2 it was the second-most prevalent, third in S3, and fourth in S4 and S5 (Fig. 2c). Similar results were observed for Staphylococcus aureus. Y. enterocolitica and A. hydrophila changed little and ranked penultimate, antepenultimate or last and their order was constant except in S1. The relative order of the remaining four species was unchanged. Therefore, for the eight species considered in our analysis, six remained stable as measured by the metagenomic approach. The exceptions were E. sakazakii and Staphylococcus aureus.

0 0 0 0 0 0 0

S. marcescens 10 642 47 160 92 428 71 023 61 881 532 360

S. S. P. aureus typhimurium shigelloides 4 088 2 191 588 16 3 1 0

478 1 973 3 097 1 665 1 503 1 746 1 762

57 012 316 452 555 379 428 506 340 112 824 068 1 157 001

Relative ratio analysis also showed that the measured compositions were stable. For each species, link-relative ratios of proportion were calculated by the proportions in the higher concentration samples divided by those in the lower concentration samples one by one, so that four group ratios were obtained: S2/S1, S3/S2, S4/S3 and S5/S4. For each sample pair, the mean relative ratios of all species were 2.26±1.60, 1.13±0.75, 0.67±0.40 and 0.97±0.31, respectively (Fig. 2d). As the total concentration increased, the mean ratios approached 1 and the ranges narrowed. Twotailed paired t-tests were performed between these groups of relative ratios; P-values were 0.046 between S2/S1 and S3/S2, 0.060 between S3/S2 and S4/S3 and 0.067 between S4/S3 and S5/S4. Therefore, the measured compositions of S2 to S5 were stable and the lowest concentration sample S1 was not. Quantitative comparison between sample groups A comparison of the equally proportioned samples S1 to S5 and the gradient samples S6 and S7 revealed a variable pattern and indicated that the actual proportion significantly altered the results. Relative ratios of each species in S6 and S7 versus those in S1 to S5 were calculated (Table 5). Except for Y. enterocolitica and Staphylococcus aureus, the relationship between the RMP and RAP was nearly linear (Fig. 3b). RMP approached 1 following RAP, revealing that the measured differences for the same species between samples are more accurate when the real proportions do not differ. However, if the proportional differences are more

Table 4. Standardized paired reads for each species in all samples Values represent the SBN of each species (standardized to one million). Sample no. S1 S2 S3 S4 S5 S6 S7

436

L. lactis

Y. enterocolitica

C. freundii

E. sakazakii

A. hydro.

E. aerogenes

S. marcescens

S. aureus

S. typhimurium

P. shigelloides

32 548 127 668 283 163 329 973 441 451 257 119

3 709 1 386 307 41 50 10 3

7 787 29 027 52 292 35 477 46 828 58 47

863 465 399 322 176 039 94 721 112 873 1 708 1 238

1 138 2 231 5 752 4 243 5 012 81 26

0 0 0 0 0 0 0

18 804 73 017 108 661 122 652 128 101 643 310

7 223 3 392 691 28 6 1 0

845 3 055 3 641 2 875 3 111 2 109 1 518

100 738 489 956 652 922 740 004 704 069 995 400 996 861

Journal of Medical Microbiology 63

Comparison of actual and simulated microbiomes

E. aerogenes Salmonella C. freundii E. sakazakii A. hydro. P. shigelloides Y. enterocolitica S. marcescens L. lactis S. aureus

0

0.03

0.06

0.09

0.12

NJ 0.15

Fig. 1. Alignments of 16S rDNA V3 regions in ten bacterial species. Sequenced regions are marked in blue, blanks with green and primers in red; the reads for E. aerogenes and Salmonella typhimurium (species names in orange) are identical.

than 100-fold, the measured differences may be too large to be accepted. In addition, the RMP of Y. enterocolitica and Staphylococcus aureus ranged widely, while the reads for these species were few. Thus, if too few reads are obtained, the results will not be reliable.

DISCUSSION Our results revealed that bacterial concentrations do not significantly affect measured compositions if they are above a certain level. In S1 to S5, total bacterial concentrations ranged from 103 to 107 c.f.u. ml21 and all species were of equal proportion. For each species, the final proportions in S2 to S5 were distributed over a narrow range and the order was roughly stable. Stability was significant, particularly in S4 and S5; thus, in our study concentration was not the key factor if the concentration of one species was more than 103 c.f.u. ml21. This can be explained by the high efficiency and sensitivity of the PCR, in which a concentration of more than 103 c.f.u. ml21 is sufficient for efficient and stable amplification. In contrast, if the concentration of a species was lower than 103 c.f.u. ml21, the measured composition for more species was significantly affected. Therefore, it may be necessary to quantify total DNA in samples and set a suitable cut-off value to ensure that the concentrations of most species are over the threshold and to perform sample enrichment if they are not. The impact of initial proportion was significant for some species. The results of gradient interpretation samples S6 http://jmm.sgmjournals.org

and S7 were interesting. Since all species concentrations were more than 103 c.f.u. ml21 in both samples, the measured composition was expected and verified to be stable. The analyses of relative ratio suggested that even if the initial concentration and proportion are different, the detection efficiency for the same species is stable if the proportional differences are not large enough and the obtained reads are abundant. Therefore, first, the absolute measured proportion of each species in a sample is not comparable because of the large divergence between species. Second, the stability of results for a single species suggests proportional differences of the same species between samples could be used to measure changes in bacterial communities. Third, the quality of sequencing and species characteristics must be evaluated carefully when using real samples. To avoid excessive confounding factors, we used normal enteric bacteria and the variation in primer sites was small. However, in real samples, especially those from the environment, there are too many non-culturable species and diversity in the conserved regions of 16S rDNA is unknown, but predicted to be larger than what we have observed (Wang & Qian, 2009). Primer-binding efficiency is difficult to evaluate because there are few methods for studying non-culturable species and primers are based on known species. Several researchers have suggested that primers have a remarkable impact on bacterial community studies and primers with low sensitivity may lose half the diversity in a sample (Wooley et al., 2010). Primer evaluation has been addressed in a number of studies and tools are being 437

H. Wang and others

(b)

9 8 6

Bacterial proportion (%)

7 S1 S2 S3 S4 S5

5 4 3 2 1 0

100 90 80 70 60 50 40 30 20 10 0

S. aureus P. shigelloides S. marcescens A. hydrophila E. sakazakii C. freundii Y. enterocolitica L. lactis 1

2

3 Sample ID

4

5

hi

lit co

op

5.0

hy A.

te

(d)

dr

ro

us

en Y.

au S.

re .f

re

un

di

i

sc ce ar C

S.

m

is ct la

L.

la

s en

es id lo el ig sh P.

E.

sa

ka

za

ki

i

(c)

ic a

Y. e

nt e

L. l

ac ro tis co l C itic a .f re E. und sa i ka i za k S. A. h ii yd m ar r ce o. sc S ens P. . au re sh u ig el s lo id es

Standardized bacterial number (105)

(a) 10

4.0 Link-relative ratio

Sample ID

S1 S2 S3 S4

4.47

3.0 2.0

2.26

1.13

1.0

S5 1

2

3 4 5 6 7 Order of proportions in samples

8

0

2.26

0.34 S2/S1

1.12

0.67 0.04 S3/S2 S4/S3 Sample ID/sample ID 0.18

1.23 0.97 0.21 S5/S4

Fig. 2. Distribution of SBNs and proportions in S1 to S5. (a) SBN of all species; (b) bacterial compositions calculated using SBNs; (c) orders of species sorted by their proportions in the five mixed samples. The order from left to right represents the proportions from high to low; (d) link-relative ratio. The x-axis represents the sample pairs. The numbers on the dashed line are the relative ratios of all species in a sample; the numbers above or below are maxima and minima.

developed to help researchers design primers that are more efficient. In this study, we chose the V3 region and a primer pair that has been widely used and with confirmed high efficiency (Yang et al., 2009) to simplify the experiment and focus our efforts on the total efficiency of the workflow. Furthermore, the variation in primer-binding sites was few, so the primers should not have a vital impact on the results of the study. Many factors could affect the detection efficiency and we did not evaluate these factors one by one. Since the DNA extraction kit is highly efficient, the primers are suitable, and randomness in the metagenomic method does not produce spurious deviations; the secondary DNA structures in each species may be contributing variable factors, especially in the regions of 16S rDNA. Many secondary structures of 16S rDNA have been reported, such as reversed repeat sequences and hairpins, which differ between species 438

(Nelson et al., 2010). These structures may influence primer binding and extension efficiency and could explain the variable detection efficiency in all samples. Real samples are far more complex than the simulated ones used in this study. They contain many more species of bacteria, diverse chemical components, and cells or DNA from other organisms. We used two bacterial species that cannot be distinguished to simulate such a mixture, but the reads for these species were quite few. Other impact factors should be addressed in future studies. However, our results showed a significant deviation between measured and real compositions but confirmed the stability of detection using metagenomic methods. Thus, the absolute proportions may not be accurate, but the changes between samples could be used to study variation in the microbiome of a single environment over time. Journal of Medical Microbiology 63

Comparison of actual and simulated microbiomes

(a) P. shigelloides AP of S6 & S7

L. lactis Y. enterocolitica C. freundii E. sakazakii A. hydrophila S. marcescens S. aureus

MP of S6

MP of S7

1

(b) RAP (log10) –4

–3

–2

–1

0

L. lactis Y. enterocolitica C. freundii E. sakazakii A. hydrophila S. marcescens S. aureus P. shigelloides

1

–1 –2 –3 –4 RMP (log10)

Fig. 3. Proportions of each species in S6 and S7. (a) Distribution of proportions. From top to bottom, three groups of circles represent the true composition of S6 and S7, the measured composition of S6, and the measured composition of S7; (b) relationship between the true and measured proportions. AP, accurate proportion; MP, measured proportion.

Table 5. RAPs and RMPs using proportions in S6 and S7 divided by those in S1 to S5 Species

L. lactis Y. enterocolitica C. freundii E. sakazakii A. hydrophila S. marcescens S. aureus P. shigelloides

RAP

0.00045 0.00045 0.0045 0.0045 0.045 0.45 0.45 4.5

http://jmm.sgmjournals.org

RMP S6/S1

S6/S2

S6/S3

S6/S4

S6/S5

S7/S1

S7/S2

S7/S3

S7/S4

S7/S5

0.0079 0.0026 0.0074 0.0020 0.0711 0.0342 0.0002 9.8811

0.0020 0.0070 0.0020 0.0043 0.0363 0.0088 0.0004 2.0316

0.0009 0.0315 0.0011 0.0097 0.0141 0.0059 0.0017 1.5245

0.0008 0.2332 0.0016 0.0180 0.0191 0.0052 0.0437 1.3451

0.0006 0.0006 0.0012 0.0151 0.0161 0.0050 0.1945 1.4138

0.0037 0.0009 0.0060 0.0014 0.0227 0.0165 0.0000 9.8956

0.0009 0.0025 0.0016 0.0031 0.0116 0.0042 0.0000 2.0346

0.0004 0.0112 0.0009 0.0070 0.0045 0.0029 0.0000 1.5268

0.0004 0.0832 0.0013 0.0131 0.0061 0.0024 0.0000 1.3471

0.0003 0.0694 0.0010 0.0110 0.0052 0.0024 0.0000 1.4159

439

H. Wang and others

ACKNOWLEDGEMENTS This work was supported by the Priority Project on Infectious Disease Control and Prevention from the Ministry of Science and Technology and the Ministry of Health, P.R. China (2011ZX10004-001), and a special grant for Prevention and Treatment of Infectious Diseases (2012ZX10004215). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors have no competing interests to declare.

Lozupone, C., Lladser, M. E., Knights, D., Stombaugh, J. & Knight, R. (2011). UniFrac: an effective distance metric for microbial commun-

ity comparison. ISME J 5, 169–172. Luna, G. M., Dell’Anno, A. & Danovaro, R. (2006). DNA extraction

procedure: a critical issue for bacterial diversity assessment in marine sediments. Environ Microbiol 8, 308–320. Mardis, E. R. (2008). Next-generation DNA sequencing methods.

Annu Rev Genomics Hum Genet 9, 387–402.

REFERENCES

Nelson, K. E., Weinstock, G. M., Highlander, S. K., Worley, K. C., Creasy, H. H., Wortman, J. R., Rusch, D. B., Mitreva, M., Sodergren, E. & other authors (2010). A catalog of reference genomes from the

Barriuso, J., Valverde, J. R. & Mellado, R. P. (2011). Estimation of

Peterson, D. A., Frank, D. N., Pace, N. R. & Gordon, J. I. (2008).

bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows. BMC Bioinformatics 12, 473.

Metagenomic approaches for defining the pathogenesis of inflammatory bowel diseases. Cell Host Microbe 3, 417–427.

Charlson, E. S., Chen, J., Custers-Allen, R., Bittinger, K., Li, H., Sinha, R., Hwang, J., Bushman, F. D. & Collman, R. G. (2010). Disordered

Roesch, L. F., Fulthorpe, R. R., Riva, A., Casella, G., Hadwin, A. K., Kent, A. D., Daroub, S. H., Camargo, F. A., Farmerie, W. G. & Triplett, E. W. (2007). Pyrosequencing enumerates and contrasts soil microbial

human microbiome. Science 328, 994–999.

microbial communities in the upper respiratory tract of cigarette smokers. PLoS ONE 5, e15216.

diversity. ISME J 1, 283–290.

Claesson, M. J., Wang, Q., O’Sullivan, O., Greene-Diniz, R., Cole, J. R., Ross, R. P. & O’Toole, P. W. (2010). Comparison of two next-generation sequenc-

Salonen, A., Nikkila¨, J., Jalanka-Tuovinen, J., Immonen, O., Rajilic´Stojanovic´, M., Kekkonen, R. A., Palva, A. & de Vos, W. M. (2010).

ing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res 38, e200. Cole, J. R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R. J., KulamSyed-Mohideen, A. S., McGarrell, D. M., Marsh, T. & other authors (2009). The Ribosomal Database Project: improved alignments and new

Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods 81, 127–134. Schloss, P. D. & Handelsman, J. (2006). Introducing SONS, a tool for

tools for rRNA analysis. Nucleic Acids Res 37 (Database issue), D141–D145.

operational taxonomic unit-based comparisons of microbial community memberships and structures. Appl Environ Microbiol 72, 6773–6779.

Eckburg, P. B., Bik, E. M., Bernstein, C. N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S. R., Nelson, K. E. & Relman, D. A. (2005). Diversity

Schloss, P. D., Gevers, D. & Westcott, S. L. (2011). Reducing the

of the human intestinal microbial flora. Science 308, 1635–1638. Fox, G. E., Magrum, L. J., Balch, W. E., Wolfe, R. S. & Woese, C. R. (1977). Classification of methanogenic bacteria by 16S ribosomal

RNA characterization. Proc Natl Acad Sci U S A 74, 4537–4541. Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbru¨ck, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R. & other authors (2012). Defining seasonal marine microbial community dynamics. ISME J

6, 298–308.

effects of PCR amplification and sequencing artifacts on 16S rRNAbased studies. PLoS ONE 6, e27310. Sun, Y., Cai, Y., Liu, L., Yu, F., Farrell, M. L., McKendree, W. & Farmerie, W. (2009). ESPRIT: estimating species richness using large

collections of 16S rRNA pyrosequences. Nucleic Acids Res 37, e76. Terrat, S., Christen, R., Dequiedt, S., Lelie`vre, M., Nowak, V., Regnier, T., Bachar, D., Plassart, P., Wincker, P. & other authors (2012). Molecular

biomassandMetaTaxogenomicassessmentofsoilmicrobialcommunitiesas influenced by soil DNA extraction procedure. Microb Biotechnol 5, 135–141.

Giongo, A., Crabb, D. B., Davis-Richardson, A. G., Chauliac, D., Mobberley, J. M., Gano, K. A., Mukherjee, N., Casella, G., Roesch, L. F. & other authors (2010). PANGEA: pipeline for analysis of next

von Bubnoff, A. (2008). Next-generation sequencing: the race is on.

generation amplicons. ISME J 4, 852–861.

Wang, Y. & Qian, P. Y. (2009). Conservative fragments in bacterial 16S

Griffen, A. L., Beall, C. J., Firestone, N. D., Gross, E. L., Difranco, J. M., Hardman, J. H., Vriesendorp, B., Faust, R. A., Janies, D. A. & Leys, E. J. (2011). CORE: a phylogenetically-curated 16S rDNA database of

rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS ONE 4, e7401.

the core oral microbiome. PLoS ONE 6, e19051.

reveal numerous uncultured microorganisms in a natural community. Nature 345, 63–65.

Hamady, M. & Knight, R. (2009). Microbial community profiling for

human microbiome projects: tools, techniques, and challenges. Genome Res 19, 1141–1152. Hong, S., Bunge, J., Leslin, C., Jeon, S. & Epstein, S. S. (2009).

Polymerase chain reaction primers miss half of rRNA microbial diversity. ISME J 3, 1365–1373. Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Badger, J. H., Chinwalla, A. T., Creasy, H. H., Earl, A. M., FitzGerald, M. G. & other authors (2012). Structure, function and diversity of the healthy

human microbiome. Nature 486, 207–214. Kaeberlein, T., Lewis, K. & Epstein, S. S. (2002). Isolating ‘‘unculti-

vable’’ microorganisms in pure culture in a simulated natural environment. Science 296, 1127–1129. Lozupone, C., Hamady, M. & Knight, R. (2006). UniFrac–an online

tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7, 371.

440

Cell 132, 721–723.

Ward, D. M., Weller, R. & Bateson, M. M. (1990). 16S rRNA sequences

Woo, P. C. Y., Lau, S. K. P., Teng, J. L. L., Tse, H. & Yuen, K. Y. (2008).

Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories. Clin Microbiol Infect 14, 908–934. Wooley, J. C., Godzik, A. & Friedberg, I. (2010). A primer on

metagenomics. PLOS Comput Biol 6, e1000667. Yang, S., Ramachandran, P., Rothman, R., Hsieh, Y. H., Hardick, A., Won, H., Kecojevic, A., Jackman, J. & Gaydos, C. (2009). Rapid

identification of biothreat and other clinically relevant bacterial species by use of universal PCR coupled with high-resolution melting analysis. J Clin Microbiol 47, 2252–2255. Zhao, J., Schloss, P. D., Kalikin, L. M., Carmody, L. A., Foster, B. K., Petrosino, J. F., Cavalcoli, J. D., VanDevanter, D. R., Murray, S. & other authors (2012). Decade-long bacterial community dynamics in

cystic fibrosis airways. Proc Natl Acad Sci U S A 109, 5809–5814.

Journal of Medical Microbiology 63

Comparative analysis of microbiome between accurately identified 16S rDNA and quantified bacteria in simulated samples.

Although 16S rRNA gene (rDNA) sequencing is the gold standard for categorizing bacteria or characterizing microbial communities its clinical utility i...
697KB Sizes 0 Downloads 0 Views