Accepted Article

Received Date : 05-Jul-2014 Revised Date

: 21-Mar-2015

Accepted Date : 04-Apr-2015 Article type

: Resource Article

Accelerating plant DNA barcode reference library construction using herbarium specimens: improved experimental techniques

Chao Xu1†, Wenpan Dong1†, Shuo Shi1,2, Tao Cheng1, Changhao Li1,3, Yanlei Liu1, Ping Wu1, Hongkun Wu1, Peng Gao1, and Shiliang Zhou1*

1

State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China

2

College of Life Science, Hebei Normal University, Shijiazhuang 050024, China

3

University of Chinese Academy of Sciences, Beijing 100049, China

† These authors contributed equally to this work. * Corresponding Author: Shiliang Zhou, tel +86 6283 6503 fax +86 6259 0843; email [email protected] Running Title: DNA reconstruction for barcoding

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/1755-0998.12413 This article is protected by copyright. All rights reserved.

Accepted Article

Abstract A well-covered reference library is crucial for successful identification of species by DNA barcoding. The biggest difficulty in building such a reference library is the lack of materials of organisms. Herbarium collections are potentially an enormous resource of materials. In this study, we demonstrate that it is likely to build such reference libraries using the reconstructed (self-primed PCR amplified) DNA from the herbarium specimens.

We used 179 rosaceous specimens to test the effects of DNA reconstruction, 420 randomly sampled specimens to estimate the usable percentage, and another 223 specimens of true cherries (Cerasus, Rosaceae) to test the coverage of usable specimens to the species. The barcode rbcLb (the central four sevenths of rbcL gene) and matK was each amplified in two halves and sequenced on Roche GS 454 FLX+. DNA from the herbarium specimens was typically shorter than 300 bp. DNA reconstruction enabled amplification fragments of 400~500 bp without bringing or inducing any sequence errors. About one third of specimens in the national herbarium of China (PE) were proven usable after DNA reconstruction. The specimens in PE cover all Chinese true cherry species and 91.5% of vascular species listed in Flora of China. It is very possible to build well-covered reference libraries for DNA barcoding of vascular species in China.

As exemplified in this study, DNA reconstruction and DNA-labeled next generation sequencing can accelerate the construction of local reference libraries. By putting the local reference libraries together, a global library for DNA barcoding becomes closer to reality.

This article is protected by copyright. All rights reserved.

Accepted Article

Key words: DNA barcoding; DNA reconstruction; herbarium specimens; next generation sequencing

Introduction DNA barcoding was initially proposed to identify species with a short standard DNA sequence (Hebert et al. 2003). The technique has been applied when correct identifications of the species are necessary, such as community ecology (Kress et al. 2009), biodiversity assessment (Taberlet et al. 2012; von Crautlein et al. 2011), biological conservation (Krishnamurthy & Francis 2012), invasive plants (Armstrong & Ball 2005), medicinal and food security (Muellner et al. 2011), and diet analysis (Valentini et al. 2009). The US Food and Drug Administration (FDA) has already approved the application of DNA barcoding, a safe method, as a regulatory tool to identify fish samples destined for the American dining table (http://www.fda.gov/food/foodscienceresearch). For the purpose of traditional Chinese medicine security, Chinese Pharmacopoeia has also accepted DNA barcoding as a technique for identifying the traditional Chinese medicines (Techen et al. 2014). However, the extensive application of DNA barcoding is not an easy task. The unanimously recognized obstacle is the lack of a reliable reference library, which was and still is the main barrier to DNA barcoding for the next few years (Borisenko et al. 2009; Cho et al. 2013; Ekrem et al. 2007; Taylor & Harris 2012). There are approximately 314,400 known plant species (Mora et al. 2011), but only approximately one tenth of the species have sequences of the core barcodes, matK or rbcL (CBOL Plant Working Group 2009), deposited in GenBank or BOLD, and only 3.0% to 4.7% of the species are represented by two or more sequences (Table S1, Supporting information). There is still a long way to go before the matK or rbcL reference libraries with a reasonable coverage can be constructed. We should find a This article is protected by copyright. All rights reserved.

Accepted Article

This article is protected by copyright. All rights reserved.

Accepted Article

Unfortunately, herbarium specimen DNA is not safely and instantly preserved, leading to a number of complications. First, DNA molecules would have been modified by blocking lesions, leading to the generation of erroneous sequence information (Hansen et al. 2006; Hansen et al. 2001; Heyn et al. 2010). Second, extracts of herbarium specimens usually carry complicated PCR inhibitors that hinder PCR amplification, leading to a lower success rate (Kontanis & Reed 2006). Third, bacteria or fungi or other plants might have been introduced while specimen preparation and storage, which leads to DNA contamination (Sӓrkinen et al. 2012). Fourth, herbarium specimen DNA is typically highly fragmented due to autolysis, bacterial degradation, or spontaneous depurination. These biochemical effects may reduce the number of amplifiable template molecules (Staats et al. 2011), which strongly limits the ability of PCR to amplify longer fragments. Generally, the amplifiable length of herbarium DNA is strongly dependent on the age and state of specimen preservation. Fragments shorter than 300 bp are often suggested to be amplified from herbarium DNA in order to maximize outcomes (Drabkova et al. 2002; Sӓrkinen et al. 2012). The protocols for DNA extraction and PCR need to be optimized for generating massive sequence data using herbarium materials. Reagents, such as trehalose, bovine serum albumin (BSA), and polysorbate-20 (Tween-20), were added to the PCR mixtures (Samarakoon et al. 2013; Savolainen et al. 1995). Next generation sequencing (NGS) was employed to circumvent cloning steps or even PCR by sequencing the whole genomes (Besnard et al. 2014; Staats et al. 2013). This study will focus on using massive degraded DNA templates: DNA reconstruction [self-primed PCR amplification, Golenberg et al. (1996)] and estimating the feasibility of building local DNA barcoding reference libraries by using local herbarium specimens. The specific aims of this study are (1) to determine how long DNA templates can be reconstructed and (2) to evaluate the feasibility of building Chinese plant DNA barcode reference libraries using the herbarium specimens housed in the National Herbarium of Institute of Botany (PE), Beijing, This article is protected by copyright. All rights reserved.

Accepted Article

China.

Materials and Methods Specimen sampling All materials were taken from the specimens housed in the National Herbarium of the Institute of Botany, Chinese Academy of Sciences (PE), the largest herbarium in China. Three sampling strategies were implemented in this study. The first one was to sample a specific taxon to test how well the DNA reconstruction will be. Rosaceae was chosen for our familiarity with this family. A total of 179 specimens of 168 species belonging to 49 genera (Table S2, Supporting information) were sampled for this purpose. The second one was to sample seed plants randomly to test how well the DNA was generally preserved in herbarium specimens. A total of 420 samples representing 149 species belonging to 113 genera of 39 families were randomly selected (Table S3, Supporting information). The third one was to sample a specific taxon more thoroughly to give an example of building local reference libraries for that taxon. The genus Cerasus Mill. (Rosaceae) was considered because the genus is under taxonomic revision. A total of 223 specimens of Cerusus were sampled for this purpose (Table S4, Supporting information). At least five sheets from different localities were selected for each species if there were more than five specimens available.

DNA extraction Total genomic DNA was extracted using the mCTAB method (Li et al. 2013), in which an incubation time at 65°C was extended to 120 min. DNA was re-suspended in 40 μl TE buffer. DNA extracts were visualized on 1% agarose gels and quantified on a Nanodrop 2000c Spectrophotometer (Thermo Fisher Scientific Inc.). DNA quality This article is protected by copyright. All rights reserved.

Accepted Article

was estimated by calculating OD260/280 ratios. A ratio between 1.7 and 2.0 was considered acceptable.

Testing the effects of DNA reconstruction Golenberg et al. (1996) demonstrated that it was possible to reconstruct degraded DNA by polymerization on self-primed DNA fragments. Overlapping DNA fragments of degraded DNA will themselves act as primers for double stranded DNA extension under PCR conditions (Fig. 1), yielding longer DNA products that may amplify fragments spanning two or more short templates.

Reconstruction PCR was carried out in 20 μL reaction mixtures. Each PCR mixture contained 2.0 μL 10× buffer, 2.0 μL dNTPs (2 mmol/L), 16.0 μL total DNA, and 0.2 μL Taq polymerase (2.5 U/μL). The reactions started with a presoak in 94 °C for 3 min, followed by 20 cycles of 94 °C for 30 s, 50 °C for 30 s, 72 °C for 1 min, and ended after a final 10 min at 72 °C.

The effects of DNA reconstruction were deduced by comparing PCR success rate using the reconstructed and original DNA. The central four sevenths of rbcL gene, rbcLb (Dong et al. 2014a), was used as a marker. Primers specific to Rosaceae were designed (Table S5, Supporting information) based on the rbcLb dataset using the software FastPCR (Kalendar et al. 2009). Six sets of primer combinations were used to amplify the products ranging from 100 to 600 bp with an interval of approximately 100 bp (Fig. S1, Supporting information). The PCR products were visualized on 1% agarose gels containing ethidium bromide under an ultraviolet transilluminator. This article is protected by copyright. All rights reserved.

Accepted Article

DNA reconstruction error detection Fifteen samples were chosen to test whether the reconstruction PCR might introduce sequence errors (Table S6, Supporting information). Fragments of partial rbcL region were amplified using the primer pair RrbcL-bF/RrbcLb-R5 (Table S5, Supporting information) based on both reconstructed and non-reconstructed DNA templates. The fragments were sequenced on ABI Prism 3730xl (Applied Biosystems, Foster City, U.S.A.) following the manufacturer’s protocols.

PCR amplification with DNA-labeled primers We used 2×643 DNA-labeled primers to amplify matK and rbcLb of all 643 samples from the second and the third sampling strategies (420 random samples and 223 samples of Cerasus). The core sequences of primers for matK and rbcLb were the same as those used in Yu et al. (2011) and Dong et al. (2014a). Internal primers was used in combination with the DNA-labeled primers to amplify two short fragments of each marker (rbcLb-1, rbcLb-2, matK-1 and matK-2) instead of one long fragment from the degraded templates (Table S7, Supporting information). PCR reactions were assembled in a 10 μL volume containing 1 μL reconstructed DNA, 1 μL 10×Taq buffer, 1 μL 2 mmol/L dNTP, 0.5 μL 5 μmol/L primers, 5.9 μL ddH2O, and 0.25 U Taq polymerase. The amplification cycles were 94 °C for 3 min, 35 cycles of 94 °C for 30 s, 52 °C for 30 s, 72 °C for 30 s, and a final extension at 72 °C for 10 min. All amplification products were checked on 1% agarose gels.

This article is protected by copyright. All rights reserved.

Accepted Article

Next generation sequencing of the DNA-labeled samples The PCR products of matK and rbcLb from different samples were all mixed together to construct a DNA library and sequenced in the Majorbio Company in Shanghai, China, on a Roche 454 sequencer using a standard GS FLX Titanium Sequencing Kit XL+ (454 Life Sciences, Branford, CT, USA).

Data analysis The reads from a Roche 454 sequencer were sorted into genes according to the PCR primers. The reads of the same gene were further sorted into samples according to the DNA labels. Consensus sequences were created from all reads (>10×, QV > 20) of the same gene from the same sample under a criterion of >97% sequence similarity.

The output sequences were aligned with Clustal X ver 2.0 (Larkin et al. 2007). UPGMA trees were built with PAUP 4.0 (Swofford 2002) to check reliability of sequences. The BLAST method was used as well to verify the sequences.

Results Quality of DNA from herbarium specimens in PE The DNA from the herbarium specimens in PE was highly fragmented, and the lengths from most specimens (~70%) were shorter than 300 bp observed on agarose gels (Fig. S2, Supporting information). There was a negative correlation between the This article is protected by copyright. All rights reserved.

Accepted Article

PCR amplification success rates and the PCR amplicon sizes (Fig. 2). The PCR success rates were 82.7% for 100 bp, 71.5% for 200 bp, 51.4% for 300 bp, 27.9% for 400 bp, 29.1% for 500 bp, and only 18.4% for 600 bp size. It appeared that slightly more than half of the specimens were suitable for extracting genes of 300 bp.

DNA purity (OD260/280 ratio) was an important factor for PCR success. Fig. 3 showed that when the PCR products were successfully amplified, the OD260/280 ratios of their DNA templates were mainly distributed from 1.70 to 1.90. However, the DNA template with an OD260/280 ratio under 1.7 failed to amplify the targeted products, which was particularly obvious in the amplification of short fragments.

Effects of DNA reconstruction After reconstruction, the OD260/280 ratios of DNA from the herbarium specimens were closer to 1.8 (Fig. S3, Supporting information). The PCR success rates were significantly increased for PCR products from 200 bp to 500 bp (Fig. 4). The PCR success rate rose from 51.4% to 78.2% for 300 bp and from 27.9% to 58.7% for 400 bp after reconstruction. No remarkable improvements were observed for fragments longer than 500 bp (still 8.3% raise for 500 bp, 7.9% raise for 600 bp), indicating an upper limit of DNA from herbarium specimens.

Sequence errors after DNA reconstruction Of the same rbcL fragment from the same 15 samples, 15 sequences were from the reconstructed templates and 15 sequences were from non-reconstructed templates using Sanger sequencing method. The lengths of the fragments were ~424 This article is protected by copyright. All rights reserved.

Accepted Article

bp. Not a single base pair difference was observed between the 15 comparisons, indicating that the DNA reconstruction did not bring or induce any errors to the reconstructed DNA.

Sequence recoverability of herbarium specimens For the 420 randomly sampled herbarium specimens, the PCR amplification success rate was 37.4%, 29.3%, 19.3%, and 23.1% for rbcLb-1, rbcLb-2, matK-1 and matK-2, respectively, and the corresponding sequencing success rate was 28.1%, 31.4%, 11.0%, and 27.4% (Table 1). The same fragments of the 223 specimens of Cerasus were amplified using the Rosaceae-specific primers, the PCR success rates were much better for both matK-1 and matK-2, and the sequencing success rates were much higher for all four fragments (Table 1).

Adding the two sets of samples together, the total number of samples was 643. The percentage of sequencing success was 30.0%, 33.6%, 21.8%, and 31.1% for the same four fragments (Table 1). A small fraction (4.8%) of successfully amplified fragments failed to be sequenced, and another small fraction (4.5%) of samples without visible bands were successfully sequenced. Of the 643 samples, 42.0% had at least one fragment sequenced, but only 15.2% had all the four fragments sequenced.

There was a noticeable negative correlation between the age of specimens and sequence recovery (Fig. 5; Fig. S4, Supporting information). Specimens collected in the last 40 years had higher amplification and sequencing success rates than older specimens. Specimens collected before the 1970s had very low sequencing success This article is protected by copyright. All rights reserved.

Accepted Article

rates (12.4%). Sequencing success depended on PCR success rather than the age of specimens.

Discussion Herbarium specimens are a cheap and reliable source of materials for DNA barcoding Materials are the biggest concern for DNA barcoding because there has been limited investment in time, money and reliability of identifications. Herbarium specimens are a cheap and reliable source of materials under this situation. First, herbarium specimens are vouchers of DNA that can be traced back to the localities where they were collected. The collection information, identification information and specimen pictures have been digitized, filed and made available to public via the internet, for example, the Chinese Virtual Herbarium (CVH, http://www.cvh.org.cn/).

Second, herbarium specimens cover most of the known plant species. Herbarium collections are not only essential reference materials for taxonomists but also evidence of species occurrence in certain areas. It was estimated that c. 350 million specimens had been collected in the last 400 years, housed in 3,400 herbaria all over the world. Those collections represent most of the plant species on Earth. Moreover, the extinct species can only be found in herbaria, and the herbarium materials are more accessible due to strict protection of the living individuals by law.

This article is protected by copyright. All rights reserved.

Accepted Article

Third, the identifications of herbarium specimens are mostly reliable because the specimens have been subjected to several taxonomic studies. It is becoming increasingly difficult to find expert taxonomists to identify voucher specimens for DNA barcoding (Elliott & Jonathan Davies 2014). Correct identification of reference specimens is crucial in reference library construction for DNA barcoding, as it is unlikely for lay end users to detect mistakes. It is worth pointing out that the herbarium specimens are not free from problems. Misidentifications and nomenclatural diversifications due to contradictory taxonomic opinions are common phenomena. Only type specimens do not have nomenclatural problems of the names permanently attached to them. The type specimens have served as references for Linnaean binomial names since 1753 and most type specimens are still stored in herbaria (Bebber et al. 2010). Some type specimens have been used in DNA barcoding or for phylogenetic studies (Stuckas & Fritz 2011; Wallander & Albert 2000). The value of genetic information of type specimens lies more in solving species problems than enriching DNA barcoding reference libraries. However, if the type specimens were (genome) DNA barcoded, some researchers would use the genetic information instead of examining the type specimens themselves, which protects the type specimens from possible damages by reducing the number of handling.

Although there are some examples of successfully obtaining DNA sequences from herbarium specimens, such as rbcL of 369 bp from Cardiopteris specimens of 109 year old (Savolainen et al. 1995), rpl16 of 823 bp, rps16 of 700 bp and rrn16-trnI of 800 bp from a more than 200-year-old specimen collected by Adam Afzelius, a student of Carl Linnaeus (Ames & Spooner 2008), and trnV-ndhC of ca. 440-bp DNA fragments from potato leaves collected in early 17th century (Ames & Spooner 2008), some optimization of experimental techniques or even new techniques would be necessary for effective extraction of genetic information from ancient herbarium This article is protected by copyright. All rights reserved.

Accepted Article

specimens.

Optimizing experimental techniques The existing experimental techniques are not without deficiencies in extracting DNA information from herbarium specimens. Low sequence recovery is a common issue. PCR amplification success using degraded DNA from herbarium specimens as templates depends on the quality and length of templates and the length of focal regions. Special attention has been paid to the methods for DNA extraction (Drabkova et al. 2002; Sӓrkinen et al. 2012). The modified CTAB methods perform best for both the quantity and quality of resulting DNA (Sӓrkinen et al. 2012; Yang et al. 1997). Another way to increase PCR efficiency is to add reagents into PCR mix, which reduces the effects of PCR inhibitors (Samarakoon et al. 2013).

Considering the short length (< 300 bp, Fig. 2) of highly degraded DNA, strategies of using mini-barcodes (Boyer et al. 2012; Dong et al. 2014b; Hajibabaei et al. 2006; Shokralla et al. 2011) and metabarcodes (Giguet-Covex et al. 2014; Taberlet et al. 2012) were proposed. Other chloroplast regions such as the intron of trnL (UAA) and its P6 loop (Taberlet et al. 2007) were also suggested. Before the genome barcode is practicably affordable, the conventional barcodes are what we are striving for. This study is the first attempt to reconstruct longer DNA fragments suitable for massive sample sequencing on 454 or MiSeq platforms. The reconstructed DNA fragments can be adjusted to be approximately 400-500 bp, about a half-length of a conventional barcode, which are ready to be handled by 454 without assemblage or by MiSeq 300PE with a single assemblage.

The division of a barcode into two in PCR amplification implies doubling labor and sequencing expenses. Tremendous materials available from specimens mean This article is protected by copyright. All rights reserved.

Accepted Article

that highly efficient and low-cost methods are needed. The Sanger sequencing method is labor-intensive and relatively costly compared to NGS platforms. It took approximately six months for five people to sequence the COI barcodes of 41,650 specimens using the Sanger sequencing method (Hebert et al. 2013). NGS technologies could be adopted to meet the requirement of sequencing massively on a large scale, which will revolutionize the construction of DNA barcode reference libraries in a short period of time. By using the DNA-labeled PCR primers, we exemplified the application of Roche 454 in sequencing all four fragments of 643 specimen samples simultaneously. DNA contamination, a big problem for Sanger sequencing, is not an issue for NGS because almost all of the alleles can be sequenced and sorted out by phylogenetic analyses or identified by BLAST. Barcode sequences can also be extracted from genome-scale sequence data of very old herbarium specimens (Besnard et al. 2013; Besnard et al. 2014; Staats et al. 2013). Such initiatives demonstrate the feasibility of exploring the genetic information from treasured herbarium specimens such as type specimens. From the perspective of current DNA barcoding, gathering a few barcodes from all the organisms weighed more importantly than acquiring all the DNA data from a few individuals. However, with rapid sequencing technology development and cost reduction, it is conceivable that all living creatures would have their genome barcodes in the near future (Li et al. 2015).

Constructing local DNA barcoding reference libraries using herbarium specimens: an example of Chinese true cherries There are 35 true cherry species listed in Flora of China (Li et al. 2003). Among them 34 species were sampled in this study (Table S4, Supporting information). The species Cerasus discoidea Yu & Li we missed has four sheets of specimens in PE. With This article is protected by copyright. All rights reserved.

Accepted Article

an upper limit of five samples for each sampled species, the representatives are one species by one sample; two species by two samples, five species by three samples, eight species by four samples, and eighteen species by five (or more due to misidentifications) samples. On average, one species was represented by four samples from different counties if possible. The reference libraries of such a coverage are expected to support reliable true cherry species identification.

By extension, there are nearly 2.6 million specimens housed in the national herbarium (PE), the largest herbarium in China. Of the 1.76 million digitized specimens, 1.50 million were collected from aboriginal individuals in China. Among the 1.50 million native specimens, 128,788 sheets of 2410 species are fern and lycophytes, 24,026 sheets of 267 species are gymnosperms, and 1,349,495 sheets of 26,029 species are angiosperms. There are 31,362 vascular species in China and the coverage is 91.5% of all vascular species. Over 78.2% of the vascular species have more than 3 specimens. Moreover, there are 11,220 type specimens of 6,772 names of holotypes, lectotypes, neotypes and isotypes deposited in PE.

According to Table 1, DNA of specimens in PE is not as well conserved as in other herbaria such as the National Museum Wales (NMW) (de Vere et al. 2012). Even though only one third of specimens were usable, at least 78.2% of the Chinese vascular species could be covered. With the addition of the remaining 0.84 million specimens, it is very likely to build well-covered DNA barcoding reference libraries of the Chinese vascular species using the specimen materials housed in PE.

This article is protected by copyright. All rights reserved.

Accepted Article

Conclusion Although the DNA from herbarium specimens is usually highly degraded, it is still possible to amplify half the length of the conventional plant barcodes after DNA reconstruction. By integrating DNA reconstruction, DNA-labeled NGS and other new approaches, as exemplified in this study, it is very likely to build well-covered reference libraries in a short period of time for DNA barcoding of local plants. Utility of the herbarium specimens will certainly accelerate the construction of global reference libraries which support species identification worldwide using DNA barcoding method.

Acknowledgement The authors thank many other members at Shiliang Zhou’s lab for suggestions and comments. Mr. Baijian Bao provided information of digitized specimens in PE. This study was supported by grants from the Ministry of Science and Technology of China (2011FY120200, 2012AA021602 and 2012BAC01B05) and National Natural Science Foundation of China (31270239) to Shiliang Zhou.

References Ames M, Spooner DM (2008) DNA from herbarium specimens settles a controversy about origins of the European potato. American Journal of Botany 95, 252-257. Armstrong KF, Ball SL (2005) DNA barcodes for biosecurity: invasive species identification. Philosophical Transactions of the Royal Society B-Biological Sciences 360, 1813-1823. Bebber DP, Carine MA, Wood JR, et al. (2010) Herbaria are a major frontier for species discovery. Proceedings of the National academy of Sciences of the United States of America 107, 22169-22171.

This article is protected by copyright. All rights reserved.

Accepted Article

Besnard G, Christin PA, Male PJ, et al. (2013) Phylogenomics and taxonomy of Lecomtelleae (Poaceae), an isolated panicoid lineage from Madagascar. Annals of Botany 112, 1057-1066. Besnard G, Christin PA, Male PJ, et al. (2014) From museums to genomics: old herbarium specimens shed light on a C3 to C4 transition. Journal of Experimental Botany 65, 6711-6721. Borisenko AV, Sones JE, Hebert PD (2009) The front-end logistics of DNA barcoding: challenges and prospects. Molecular Ecology Resources 9 Suppl s1, 27-34. Boyer S, Brown SD, Collins RA, et al. (2012) Sliding window analyses for optimal selection of mini-barcodes, and application to 454-pyrosequencing for specimen identification from degraded DNA. PLOS ONE 7, e38215. CBOL Plant Working

Group (2009) A DNA barcode for land plants. Proceedings of the National

academy of Sciences of the United States of America 106, 12794-12797. Cho SY, Suh KI, Bae YJ (2013) DNA barcode library and its efficacy for identifying food-associated insect pests in Korea. Entomological Research 43, 253-261. de Vere N, Rich TCG, Ford CR, et al. (2012) DNA barcoding the native flowering plants and conifers of Wales. PLOS ONE 7, e37945. Dong W, Cheng T, Li C, et al. (2014a) Discriminating plants using the DNA barcode rbcLb: an appraisal based on a large dataset. Molecular Ecology Resources 14, 336-343. Dong W, Liu H, Xu C, et al. (2014b) A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: A case study on ginsengs. BMC Genetics 15, 138. Drabkova L, Kirschner J, Vlcek C (2002) Comparison of seven DNA extraction and amplification protocols in historical herbarium specimens of Juncaceae. Plant Molecular Biology Reporter 20, 161-175. Ekrem T, Willassen E, Stur E (2007) A comprehensive DNA sequence library is essential for identification with DNA barcodes. Molecular Phylogenetics and Evolution 43, 530-542. Elliott TL, Jonathan Davies T (2014) Challenges to barcoding an entire flora. Molecular Ecology Resources 14, 883-891. Giguet-Covex C, Pansu J, Arnaud F, et al. (2014) Long livestock farming history and human landscape shaping revealed by lake sediment DNA. Nature Communications 5, 3211. Golenberg EM, Bickel A, Weihs P (1996) Effect of highly fragmented DNA on PCR. Nucleic Acids Research 24, 5026-5033.

This article is protected by copyright. All rights reserved.

Accepted Article

Hajibabaei M, Smith MA, Janzen DH, et al. (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Molecular Ecology Notes 6, 959-964. Hansen AJ, Mitchell DL, Wiuf C, et al. (2006) Crosslinks rather than strand breaks determine access to ancient DNA sequences from frozen sediments. Genetics 173, 1175-1179. Hansen AJ, Willerslev E, Wiuf C, Mourier T, Arctander P (2001) Statistical evidence for miscoding lesions in ancient DNA templates. Molecular Biology and Evolution 18, 262-265. Hebert PDN, Cywinska A, Ball SL, DeWaard JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270, 313-321. Hebert PDN, Dewaard JR, Zakharov EV, et al. (2013) A DNA 'barcode blitz': rapid digitization and sequencing of a natural history collection. PLOS ONE 8, e68535. Heyn P, Stenzel U, Briggs AW, et al. (2010) Road blocks on paleogenomes-polymerase extension profiling reveals the frequency of blocking lesions in ancient DNA. Nucleic Acids Research 38, e161. Kalendar R, Lee D, Schulman AH (2009) FastPCR software for PCR primer and probe design and repeat search. Genes, Genomes and Genomics 3, 1-14. Kontanis EJ, Reed FA (2006) Evaluation of real-time PCR amplification efficiencies to detect PCR inhibitors. Journal of Forensic Sciences 51, 795-804. Kress WJ, Erickson DL, Jones FA, et al. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proceedings of the National academy of Sciences of the United States of America 106, 18621-18626. Krishnamurthy PK, Francis RA (2012) A critical review on the utility of DNA barcoding in biodiversity conservation. Biodiversity and Conservation 21, 1901-1919. Larkin MA, Blackshields G, Brown NP, et al. (2007) Clustal W and clustal X version 2.0. Bioinformatics 23, 2947-2948. Li C, Jiang S, Bartholomew B (2003) Flora of China. Vol. 9(Pittosporaceae through Connaraceae) Science Press; Missouri Botanical Garden Press, Beijing; St. Louis. Li J, Wang S, Jing Y, Wang L, Zhou S (2013) A modified CTAB protocol for plant DNA extraction. Chinese Bulletin of Botany 48, 72-78. Li X, Yang Y, Henry RJ, et al. (2015) Plant DNA barcoding: from gene to genome. Biological Reviews

This article is protected by copyright. All rights reserved.

Accepted Article

90, 157-166. Mora C, Tittensor DP, Adl S, Simpson AG, Worm B (2011) How many species are there on earth and in the ocean? PLoS Biology 9, e1001127. Muellner AN, Schaefer H, Lahaye R (2011) Evaluation of candidate DNA barcoding loci for economically important timber species of the mahogany family (Meliaceae). Molecular Ecology Resources 11, 450-460. Saarela JM, Sokoloff PC, Gillespie LJ, Consaul LL, Bull RD (2013) DNA barcoding the Canadian Arctic flora: core plastid barcodes (rbcL + matK) for 490 vascular plant species. PLOS ONE 8, e77982. Samarakoon T, Wang SY, Alford MH (2013) Enhancing PCR amplification of DNA from recalcitrant plant specimens using a trehalose-based additive. Applications in Plant Sciences 1, 1200236. Savolainen V, Cuénoud P, Spichiger R, et al. (1995) The use of herbarium specimens in DNA phylogenetics - evaluation and improvement. Plant Systematics and Evolution 197, 87-98. Shokralla S, Zhou X, Janzen DH, et al. (2011) Pyrosequencing for mini-barcoding of fresh and old museum specimens. PLOS ONE 6, e21252. Staats M, Cuenca A, Richardson JE, et al. (2011) DNA damage in plant herbarium tissue. PLOS ONE 6, e28448. Staats M, Erkens RH, van de Vossenberg B, et al. (2013) Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens. PLOS ONE 8, e69189. Stuckas H, Fritz U (2011) Identity of Pelodiscus sinensis revealed by DNA sequences of an approximately 180-year-old type specimen and a taxonomic reappraisal of Pelodiscus species (Testudines: Trionychidae). Journal of Zoological Systematics and Evolutionary Research 49, 335-339. Swofford D (2002) PAUP*. Phylogenetic analysis using parsimony (and other methods). Version 4.0 Beta, Sunderland, Massachusetts, Sinauer. Sӓrkinen T, Staats M, Richardson JE, Cowan RS, Bakker FT (2012) How to open the treasure chest? Optimising DNA extraction from herbarium specimens. PLOS ONE 7, e43808. Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH (2012) Environmental DNA. Molecular Ecology 21, 1789-1793. Taberlet P, Coissac E, Pompanon F, et al. (2007) Power and limitations of the chloroplast trnL (UAA)

This article is protected by copyright. All rights reserved.

Accepted Article

intron for plant DNA barcoding. Nucleic Acids Research 35, e14. Taylor HR, Harris WE (2012) An emergent science on the brink of irrelevance: a review of the past 8 years of DNA barcoding. Molecular Ecology Resources 12, 377-388. Techen N, Parveen I, Pan ZQ, Khan IA (2014) DNA barcoding of medicinal plant material for identification. Current Opinion in Biotechnology 25, 103-110. Valentini A, Miquel C, Nawaz MA, et al. (2009) New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach. Molecular Ecology Resources 9, 51-60. von Crautlein M, Korpelainen H, Pietilainen M, Rikkinen J (2011) DNA barcoding: a tool for improved taxon identification and detection of species diversity. Biodiversity and Conservation 20, 373-389. Wallander E, Albert VA (2000) Phylogeny and classification of Oleaceae based on rps16 and trnL-F sequence data. American Journal of Botany 87, 1827-1841. Yang H, Golenberg E, Shoshani J (1997) Proboscidean DNA from museum and fossil specimens: An assessment of ancient DNA extraction and amplification techniques. Biochemical Genetics 35, 165-179. Yu J, Xue JH, Zhou SL (2011) New universal matK primers for DNA barcoding angiosperms. Journal of Systematics and Evolution 49, 176-181.

Data accessibility Herbarium specimen information are available from online Supporting Information (Table S2-S4). The assembled sequences have already been deposited in GenBank. The accession number of rbcL is from KP768823 to KP769130 and the accession number of matK is from KP769131 to KP769413. Alignment of the sequences are also available from Dryad doi:10.5061/dryad.s7282.

This article is protected by copyright. All rights reserved.

Accepted Article

Author Contributions SZ, WD, and CX designed the study, CX, SS, YL, PW, HW, PG, and WD performed the laboratory work, CX, WD, TC, SS, and CL analyzed the data, WD, CX and SZ wrote the manuscript. All authors read and approve the final manuscript.

Figure legends Fig. 1 Reconstruction of a longer fragment from two (A) and three (B) shorter fragments. A. Fragment 3 was reconstructed from fragment 1 and 2; B. the gap between fragment 4 and 5 is bridged by fragment 6 into fragment 9. Fig. 2 Estimation of the length of degraded DNA from 179 Rosaceae specimens using PCR success rates. Fig. 3 Effects of DNA purity on PCR success for different amplicon sizes. The purity is measured by OD260/280. White boxplot: successful PCR; black boxplot: failed PCR. Fig. 4 Reconstruction effects of 179 degraded DNA on PCR successes of different amplicon sizes using non-reconstructed DNA as controls. Fig. 5 Association between sequencing success rates and the ages of 643 specimens stored in PE.

Tables Table 1. PCR amplification and sequencing success rates of two matK fragments and This article is protected by copyright. All rights reserved.

Accepted Article

two rbcLb fragments from 420 random samples and223 samples of Cerasus in PE.

Supporting information Additional supporting information may be found in the online version of this article. Fig. S1 A strategy for designing primers to amplify rbcLb fragments of different sizes. Fig. S2 Gel profiles of degraded DNA from herbarium specimens. The numbers on the top are in the same sequential order of samples as that in Table S2. Fig. S3 Comparisons of DNA purity (OD260/280) between the original and reconstructed DNA. Fig. S4 Association between sequencing success rates and the ages of 643 specimens stored in PE according to different fragments.

Table S1. The number of rbcL and matK sequences deposited in GenBank and BOLD prior to 20 April 2014. Table S2. Sample information of 179 rosaceous specimens used to test the effects of DNA reconstruction. Table S3. Sample information of 420 randomly selected specimens used in this study. Table S4. Sample information of 223 samples of Cerasus (Rosaceae) used in this study. Table S5. Rosaceae-specific primers for amplifying different sizes of rbcLb fragments for testing the lengths of degraded DNA from herbarium specimens and This article is protected by copyright. All rights reserved.

Accepted Article

the effects of degraded DNA reconstruction. Table S6. Sample information of 15 specimens used to detect DNA reconstruction errors. Table S7. Universal primers used to amplify rbcLb and matK of 643 herbarium specimens.

This article is protected by copyright. All rights reserved.

ccepted Article

Table 1. PCR amplification and sequencing success rates of two matK fragments and two rbcLb fragments from 420 random samples and 223 samples of Cerasus (Rosaceae) in PE.

Sample

420 random samples

Marker

rbcLb-1 rbcLb-2 matK-1 matK-2 Average

223 samples of Cerusus

rbcLb-1 rbcLb-2 matK-1 matK-2 Average

Total 643 samples

rbcLb-1 rbcLb-2 matK-1 matK-2 Average

PCR length (bp)

PCR success (%)

Sequencing success (%)

426 460 386 429 425 426 460 386 429 425 426 460 386 429 425

37.4 29.3 19.3 23.1 27.4 35.0 33.6 34.5 31.8 33.6 36.6 30.8 24.6 26.1 29.5

28.1 31.4 11.0 27.4 24.5 33.6 37.7 42.2 38.1 37.7 30.0 33.6 21.8 31.1 29.1

PCR detected and sequence recovered (%) 26.2 24.1 10.0 21.2 20.2 30.5 31.8 34.5 31.8 32.3 27.7 26.8 18.5 24.9 24.5

PCR detected but sequence unrecovered (%) 10.0 5.5 9.1 1.2 6.4 4.5 1.8 0.0 0.0 1.4 8.1 4.2 5.9 0.8 4.8

PCR undetected but sequence recovered (%) 2.1 7.4 0.7 5.5 3.8 3.1 5.4 7.6 6.3 5.8 2.5 6.7 3.1 5.8 4.5  

Accepted Article

This article is protected by copyright. All rights reserved.

Accepted Article

This article is protected by copyright. All rights reserved.

Accepted Article

This article is protected by copyright. All rights reserved.

Accepted Article

This article is protected by copyright. All rights reserved.

Accepted Article

This article is protected by copyright. All rights reserved.

Accelerating plant DNA barcode reference library construction using herbarium specimens: improved experimental techniques.

A well-covered reference library is crucial for successful identification of species by DNA barcoding. The biggest difficulty in building such a refer...
460KB Sizes 0 Downloads 6 Views