CRITICAL REVIEW

Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research Brendan P. Hodkinson and Elizabeth A. Grice* Department of Dermatology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania.

Significance: The colonization of wounds by specific microbes or communities of microbes may delay healing and/or lead to infection-related complication. Studies of wound-associated microbial communities (microbiomes) to date have primarily relied upon culture-based methods, which are known to have extreme biases and are not reliable for the characterization of microbiomes. Biofilms are very resistant to culture and are therefore especially difficult to study with techniques that remain standard in clinical settings. Recent Advances: Culture-independent approaches employing next-generation DNA sequencing have provided researchers and clinicians a window into woundassociated microbiomes that could not be achieved before and has begun to transform our view of wound-associated biodiversity. Within the past decade, many platforms have arisen for performing this type of sequencing, with various types of applications for microbiome research being possible on each. Critical Issues: Wound care incorporating knowledge of microbiomes gained from next-generation sequencing could guide clinical management and treatments. The purpose of this review is to outline the current platforms, their applications, and the steps necessary to undertake microbiome studies using next-generation sequencing. Future Directions: As DNA sequencing technology progresses, platforms will continue to produce longer reads and more reads per run at lower costs. A major future challenge is to implement these technologies in clinical settings for more precise and rapid identification of wound bioburden.

SCOPE AND SIGNIFICANCE Humans are known to host diverse, complex communities of microorganisms that include bacteria, archaea, microeukaryotes, and viruses. A breach in the epithelial barrier is a port of entry for microorganisms, and all wounds are contaminated to some degree by these typically commensal microbes along with others from the environment. Contamination can lead to colonization, infection (which can be recurrent), delayed healing, and potentially amputation. Next-generation sequencing provides a window into wound-associated mi-

ADVANCES IN WOUND CARE, VOLUME 00, NUMBER 00 Copyright ª 2014 by Mary Ann Liebert, Inc.

Elizabeth A. Grice, PhD Submitted for publication March 27, 2014. Accepted in revised form June 23, 2014. *Correspondence: Department of Dermatology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 (e-mail: [email protected]).

crobial communities (microbiomes) with a reasonable cost and timeframe. The utility of these sequencing-based techniques over culture-based techniques in a wound setting has been reviewed elsewhere.1–4 In this review, we outline the current technologies and highlight some of their applications with regard to wound microbiome research.

TRANSLATIONAL RELEVANCE Research into wound microbiomes to date has relied heavily on culturebased methods, which have dominated the field for decades, even

DOI: 10.1089/wound.2014.0542

j

1

HODKINSON AND GRICE

2

though these methods are known to introduce major biases.2 Until very recently, culture-free methods for studying microbial communities relied on imprecise fingerprinting techniques or molecular cloning followed by Sanger sequencing. While Sanger sequencing can provide an accurate picture of community composition, generating datasets large enough to allow community-wide comparisons (e.g., those designed to discern microbiomebased biomarkers) has often been time and cost prohibitive. With the advent of high-throughput next-generation sequencing, characterizing numerous microbial communities has become feasible and cost effective.

10000

PacBio 1000 Sanger 454 IonTorrent

Read 100 Length

SOLiD 10

1 1

CLINICAL RELEVANCE The communities of microbes associated with wounds can potentially cause recurrent infection and/or delayed healing, and may profoundly affect the local and systemic immune response in patients.3,5 Biofilms, which commonly form on orthopedic hardware and may form on chronic wounds, are very resistant to culture and are therefore especially difficult to study with the culturebased techniques that remain standard in clinical settings. The future of wound care may incorporate knowledge of microbiomes gained from nextgeneration sequencing, to more precisely identify colonizing/infecting microbiota, and to guide management and treatment. DISCUSSION What are the different next-generation sequencing platforms? In the following headings, we introduce the five major platform types that have been used for microbiome studies (Table 1 and Fig. 1). This should provide a comprehensive overview of the technologies to orient those attempting to navigate the literature or design new studies. Although there are additional next-generation sequencing platforms, these are not covered in detail here because they are not currently known to be in use for microbiome research.

Illumina

10

100 1000 10000 105

106

107

108

109

1010

Reads / Run

Figure 1. Sequencing space based on read length (in bases) and number of reads per run. Points represent official platform/chemistry combination releases and are color-coded based on the platform family. To see this illustration in color, the reader is referred to the web version of this article at www.liebertpub.com/wound

454 (Roche) GS FLX( + ) 454 Life Sciences (a Roche company) brought the first next-generation sequencing technologies to market, with the overall approach being introduced in 2005.6 The 454 family of platforms has been used ever since for a great variety of applications, and its long reads have made it especially appealing for studies of microbiomes, since longer reads can generally be identified with greater accuracy and precision. The overall approach for 454 is pyrosequencing based. The sequencing preparation begins with lengths of DNA (e.g., amplicons or nebulized genomic/metagenomic DNA) that have specific adapters on either end, created by using PCR primers with adapter sequences or by ligation; these are fixed to tiny beads (ideally, one bead will have one DNA fragment) that are suspended in a water-in-oil emulsion. An emulsion PCR step is then performed to make multiple copies of each DNA fragment, resulting in a set of beads in which each one contains many cloned copies of the same

Table 1. Summary of the five major next-generation sequencing platform families Platform Family 454 Illumina SOLiD Ion Torrent PacBio

Clonal Amplification

Chemistry

Highest Average Read Length

Emulsion PCR Bridge amplification Emulsion PCR Emulsion PCR N/A (single molecule)

Pyrosequencing (seq-by-synthesis) Reversible dye terminator (seq-by-synthesis) Oligonucleotide 8-mer chained ligation (seq-by-ligation) Proton detection (seq-by-synthesis) Phospholinked fluorescent nucleotides (seq-by-synthesis)

700 bp (paired-end sequencing available) 300 bp (overlapping paired-end sequencing available) 75 bp (paired-end sequencing available) 400 bp (bidirectional sequencing available) 8,500 bp

The average read length is given for the platform/chemistry combination in each family that provides the longest reads.

NEXT-GEN SEQUENCING IN WOUND MICROBIOME STUDIES

DNA fragment. A fiber-optic chip filled with a field of microwells, known as a PicoTiterPlate, is then washed with the emulsion, allowing a single bead to drop into each well. The wells are also filled with a set of enzymes for the sequencing process (e.g., DNA polymerase, ATP sulfurylase, and luciferase). At this point, sequencing-by-synthesis can begin, with the addition of bases triggering pyrophosphate release, which produces flashes of light that are recorded to infer the sequence of the DNA fragments in each well as each base type (A, C, G, T) is added. Currently, the most advanced chemistry/platform combination in this family (GS FLX + System with the GS FLX Titanium Sequencing Kit XL + ) can produce *1 million reads per run with reads up to 1,000 bases in length (mode read length: 700 bases). Paired-end sequencing is available, which produces pairs of reads, each of which begins at one end of a given DNA fragment. Samples can be multiplexed as long as the library is prepared in such a way that different molecular barcodes are found between the adapter sequences and the sequences of interest derived from the samples. With this approach, the indexing barcodes will appear at the beginning of each sequence (‘‘in-line’’), allowing each sequence to be assigned to a sample bioinformatically. One shortcoming of the 454 approach is that it frequently misidentifies the length of homopolymers (stretches of nucleotides in which all bases are identical). Additionally, this technology is often considered to be cost ineffective when compared with other next-generation sequencing technologies because, given a limited budget, one can produce many more sequences with Illumina, SOLiD, or Ion Torrent. However, for some applications that require longer read lengths, it remains the most costeffective platform. Although the technology broke new ground when it was introduced, 454 Life Sciences will no longer support the platform after 2016. Illumina (Solexa) GA/HiSeq/MiSeq/NextSeq Illumina produces the most widely used family of platforms. The technology was introduced in 2006 (www.illumina.com/technology/solexa_technology .ilmn) and was quickly embraced by many researchers because a larger amount of data could be generated in a more cost-effective manner. Over the years, read lengths have increased so that many of those who initially would have only used 454 have switched over to Illumina platforms due to the cost effectiveness of the technology.7–9 Although it is a sequencing-by-synthesis method whose release followed quickly on the heels of 454, the Illumina approach differs notably from 454 in two major ways: (1) it uses a flow cell with a field of

3

oligos attached, instead of a chip containing individual microwells with beads, and (2) it does not involve pyrosequencing, but rather reversible dye terminators. The dye-termination approach resembles the ‘‘traditional’’ Sanger sequencing.10 It is different from Sanger, however, in that the dye terminators are reversible, so they are removed after each imaging cycle to make way for the next reversible dye-terminated nucleotide.11 Sequencing preparation begins with lengths of DNA that have specific adapters on either end being washed over a flow cell filled with specific oligonucleotides that hybridize to the ends of the fragments. Each fragment is then replicated to make a cluster of identical fragments. Reversible dye-terminator nucleotides are then washed over the flow cell and given time to attach; the excess nucleotides are washed away, the flow cell is imaged, and the terminators are reversed so that the process can repeat and nucleotides can continue to be added in subsequent cycles. Currently, the longest reads produced on an Illumina platform can be found on the MiSeq, which can produce paired-end reads that are 300 bases in length each. The platform with the greatest output overall is the HiSeq 2500, producing 4 billion fragments in a paired-end fashion with 125 bases for each read in a single run. Illumina has recently released the HiSeq X Ten, which is an array of 10 HiSeq machines sold as a unit, for higher throughput than ever before. Another recent release is the NextSeq 500, which is being marketed as the first high-throughput desktop sequencer. Multiplexing for Illumina sequencing is typically handled differently from the ‘‘in-line’’ barcoding approach pioneered by 454, although this option is available. Illumina sequencing often involves a separate indexing read, which requires a separate indexing primer. An additional indexing read can be run using the adapters found on the lawn of the flow cell, making it possible to employ ‘‘dual indexing’’ for a greater degree of sample multiplexing. Applied Biosystems SOLiD This type of sequencing was introduced in 200712 and has not reached the same level of popularity as the 454 and Illumina platforms for microbiome research. Although it does not provide the read lengths achievable through either of the previous platforms, and is not as high throughput as the Illumina HiSeq, its utility has been demonstrated for microbiome applications.13 The SOLiD process begins with an emulsion PCR step akin to the one used by 454, but the sequencing itself is entirely different from the previously

4

HODKINSON AND GRICE

described systems. Sequencing involves a multiround, staggered, dibase incorporation system. DNA ligase is used for incorporation, making it a ‘‘sequencing-by-ligation’’ approach, as opposed to the ‘‘sequencing-by-synthesis’’ approaches mentioned previously. Mardis14 provides a thorough overview of the complex sequencing and decoding processes involved with using this system. The SOLiD 5500xl W Genetic Analyzer produces up to 3 billion reads per run with reads that are 75 bases long. Paired-end sequencing is available, but with the second read in the pair being only 35 bases long. Multiplexing of samples is possible through a system akin to the one used by Illumina, with a separate indexing run; while standard in-line molecular barcode sequencing would be possible, the short reads make this inadvisable. Although it can generate large numbers of sequences in a run, the persistent short read length has greatly limited its utility. Ion Torrent personal genome machine/proton By 2010, 454 had carved out a niche of providing longer reads, while Illumina and SOLiD had demonstrated the ability to provide massive numbers of sequences all in one shot. At this point, each company began to produce platforms that would cater toward a new type of customer; the researcher who could benefit from next-generation technologies but does not require datasets of the magnitude possible through the standard platforms. Within a short period of time, the 454 GS Junior, the Illumina MiSeq, and the SOLiD FlowChip were all released and geared toward those wanting something more scaled down. Ion Torrent entered the market in 2010 with the personal genome machine (PGM), claiming to be the first company to truly bring nextgeneration sequencing to the masses by making it feasible and affordable for smaller laboratories. The Ion Torrent system begins in a manner similar to 454, with a plate of microwells containing beads to which DNA fragments are attached. It differs from all of the other systems, however, in the manner in which base incorporation is detected. When a base is added to a growing DNA strand, a proton is released, which slightly alters the surrounding pH. Microdetectors sensitive to pH are associated with the wells on the plate, which is itself a semiconductor chip, and they record when these changes occur. As the different bases (A, C, G, T) are washed sequentially through, additions are recorded, allowing the sequence from each well to be inferred. The Ion Proton platform currently produces the highest output, with up to 50 million reads per run

that have read lengths of *200 bases, while the PGM (which has an output that is about an order of magnitude lower as far as read count) has the longest reads at *400 bases. One interesting feature, however, is that fragments longer than those that can be fully sequenced through this system are currently removed through a size-selection step, making it impossible to sequence the ends of longer fragments. Bidirectional sequencing is available, but ‘‘pairing’’ the reads themselves does not seem to be reliable with this technology in its current state.15 Multiplexing is possible through the standard in-line molecular barcode sequencing. Like 454, Ion Torrent is also susceptible to homopolymer-related errors. The Ion Torrent approach can be quite effective for generating microbiome data,16,17 although the strict size selection imposed and the lack of reliable mate-pairing for bidirectional reads hinder this technology from being more widely adopted by microbiome researchers. PacBio RS Pacific Biosciences (PacBio) uses a single-molecule, real-time sequencing approach. Although Helicos BioSciences produced the first single-molecule sequencing platform, PacBio has had much greater commercial success and currently leads the way for single-molecule sequencing. When the PacBio technology was first released, there was a great deal of concern regarding the high error rates in base calls. However, the company has since incorporated circular consensus sequencing (CCS) into their system, which has greatly reduced error rates by allowing fragments to be sequenced repeatedly and thereby checked for errors. The PacBio sequencing system involves no amplification step, setting it apart from the other major next-generation sequencing systems. The sequencing is performed on a chip containing many zero-mode waveguide (ZMW) detectors. DNA polymerases are attached to the ZMW detectors and phospholinked dye-labeled nucleotide incorporation is imaged in real time as DNA strands are synthesized. PacBio’s RS II C2 XL currently offers both the greatest read lengths (averaging around 4,600 bases) and the highest number of reads per run (about 47,000). The typical ‘‘paired-end’’ approach is not used with PacBio, since reads are typically long enough that fragments, through CCS, can be covered multiple times without having to sequence from each end independently. Multiplexing with PacBio does not involve an independent read, but rather follows the standard ‘‘in-line’’ barcoding model.

NEXT-GEN SEQUENCING IN WOUND MICROBIOME STUDIES

What are the major microbiome applications of next-generation sequencing? The various sequence-based, culture-independent microbiome studies typically have many elements in common, and a similar workflow is necessary for each (Fig. 2). However, the particular questions being addressed will guide the experimental design and the methodology for generating, processing, and interpreting data. The main approaches used for examining and characterizing microbiomes are outlined in the following headings. Amplicon-based profiling Methods that employ the sequencing of amplicon populations allow one to construct detailed community profiles of microbiota samples based on the relative abundances of the taxa that they contain. The diverse sequences from a single gene found in each of the organisms can serve as proxies for the taxa that they represent. Downstream analyses

Microbiome Sequencing Workflow Sample Collection [e.g., wound swabs]

DNA/RNA Extraction

Library Preparation [e.g., amplicons, DNA fragments, or cDNA]

Clonal Amplification [not necessary with PacBio]

Sequencing

Sequence Pre-Processing

Sequence Assembly [necessary for most, but not all, applications]

Community Characterization

[e.g., α-diversity, b-diversity, taxonomy, function]

Statistics & Data Visualization Figure 2. Standard sequencing workflow for microbiome research.

5

of sequence libraries can be performed to discern whether there are correlations between certain factors and (1) particular taxa or (2) shifts in overall community structure. Bacterial community profiling (16S amplicon sequencing). The best-studied part of the human microbiome is the bacterial portion. Bacteria make up the majority of the organisms on and in the human body and well-established procedures and workflows are in place for their study. By far the most popular genomic region for studying bacterial diversity is the gene encoding the RNA for the ribosomal small subunit (SSU), typically known as ‘‘16S.’’ This gene is ideal for a number of reasons: (1) it is present in all bacteria, (2) it contains stretches within it that are nearly universal in sequence throughout all bacteria, and (3) it contains hypervariable regions that are widely divergent between different taxa. The pattern of extremely conserved regions interspersed with hypervariable regions makes it possible to target the gene with primers and also uses it to identify taxa with some level of precision.18 Primers that will universally anneal to the bacterial 16S region are used to PCR amplify the diverse fragments of the gene found in the different organisms of a given DNA sample. In this way, a population of 16S amplicons is produced that reflects the composition of the community of organisms in a given sample. Fungal community profiling (ITS, LSU, and SSU amplicon sequencing). For surveys of fungal diversity, there is less of a universal consensus on the gene of interest. The three most commonly used loci are all ribosomal, and are known as the internal transcribed spacer (ITS), large subunit (LSU), and SSU regions. It is noteworthy that ‘‘SSU’’ is the more general term for the gene that is called ‘‘16S’’ in bacteria, although it is typically called ‘‘18S’’ in eukaryotes since it has a larger molecular weight. Of the three most commonly used ribosomal amplicons, ITS is the most effective locus for providing specieslevel identifications. In fact, this locus is now commonly used as the fungal ‘‘species barcode’’ region because it nearly always contains a sufficient level of variation for species differentiation.19 The LSU and SSU loci are more conserved, and are therefore quite effective for phylogenetically based microbiome analyses.20,21 Shotgun sequencing methods While the amplicon sequencing methods described previously work well for broad characterization and comparison of communities, they contain inherent biases that come from the use of

6

HODKINSON AND GRICE

specific primers and multiple cycles of amplification. Shotgun methods allow profiling of the whole community (including viruses, archaea, and microeukaryotes) based on fragments from throughout the genomes/transcriptomes of the diverse organisms contained therein. Perhaps most importantly, this type of approach can give direct information regarding function. In certain sample types, obtaining a large enough quantity of DNA can be difficult, and should be taken into consideration when deciding which sample preparation and sequencing protocols are to be used. Metagenomics (shotgun DNA sequencing). Shotgun metagenomic sequencing makes it possible to examine both the taxonomic composition and the functional genetic potential of a community. Since there are no markers that work across all of life (including viruses), this approach is currently the only way to profile whole microbial communities. Metagenomics generally allows for more accurate determination of the relative abundances of different organisms, since it typically involves little or no DNA amplification, which can introduce biases. Most often metagenomic shotgun sequencing is used to understand the functional potential of communities, which is typically inferred by querying sequence reads against databases, such as KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database22 and/or COG (Clusters of Orthologous Groups of proteins) database functional categories.23 An example of the information gleaned from metagenomic shotgun sequencing lies in a study of the gut metagenome and its association with obesity. Obese mice were observed to have an increased ratio of Firmicutes to Bacteroidetes, and this compositional shift translated into changes in the metabolic potential of the gut microbiota, where the obese mouse gut metagenome was enriched with genes for energy harvest.24 One weakness of metagenomic shotgun sequencing is that analysis often involves comparison between different parts of different genomes, making accurate classification (functional and/or taxonomic) a crucial step. Classification can be unreliable, though, since the number of whole-genome reference sequences currently available is limited. Traditional microbial whole-genome sequencing relies on the ability to culture the microorganism, which can be difficult as most microorganisms do not thrive under standard culture conditions or in isolation. However, developing technologies, such as coculturing single cells in gel microdroplets,25 are promising solutions to this obstacle. Advances in single-cell genomics and culturing technologies

in addition to high-throughput sequencing advances should help grow reference genome databases. Finally, for samples where metagenomic shotgun sequencing may not be feasible, prediction of metagenome functional content is possible with tools such as PICRUSt, which relies on marker gene data and reference genomes to infer composite metagenomes.26 Metatranscriptomics (RNA transcript sequencing). With metatranscriptomic sequencing, the full range of actively transcribed genes can be examined in any context. This approach makes it possible to take a snapshot of the activity happening at the molecular level in the organisms found in a sample. The transcripts can be genes from both the host and the members of the microbiome, so one can begin to examine host–microbe interactions and determine not only how the microbiome itself behaves, but also how the host may react to the members of the microbiome. Researchers have even been able to show collaboration between the host and associated microbes for performing specific functions that are critical for host survival.27 The problems of classification mentioned earlier regarding metagenomics hold here as well when examining transcription in the diverse assemblage of organisms in the microbiome. What bioinformatics tools and skills are needed for analysis? A number of open-source software packages integrate the analysis steps for next-generation microbiome sequence data. The two main programs used are QIIME28 and mothur,29 both of which provide automated scripts/commands for performing complex steps while remaining customizable to many different types of datasets and experimental designs. The CloVR30 and MG-RAST31 programs provide an even greater deal of automation (the former actually including QIIME and mothur commands), but offer less customizability. Many of these packages provide tutorials and documentation on their websites that are useful in orienting the user to the different workflows and processing and analysis steps that are available. Sequence preprocessing Typically, the initial output of next-generation sequencing is formatted in a way that is specific to the platform. Mothur and QIIME take most file types produced by sequencing platforms (e.g., .sff files produced by the 454 platform) and can perform the majority of the preprocessing steps described later from there. If necessary, files produced by the sequencing platform can be converted to FASTQ

NEXT-GEN SEQUENCING IN WOUND MICROBIOME STUDIES

format using either software produced by the platform’s manufacturer or custom scripts. Then, one must consider not only basic file format (e.g., FASTQ, FASTA, etc.), but also the arrangement of the files (e.g., whether barcodes are contained in a separate file or are in-line at the beginning of each sequence) and the way in which data are encoded in the definition lines. Some of the programs that are useful for this, in addition to mothur and QIIME, are fastx-toolkit32 and TagCleaner.33 However, these tools may fall short of providing complete solutions for sequence processing, and custom processing scripts may need to be written (using, e.g., BASH, Python, and/or Perl). Removal of low-quality sequences is imperative in microbiome studies, as variation introduced by error will inflate diversity estimations and suggest the presence of novel organisms. Sequences suspected to contain raw sequencing errors should be discarded, and different parameters have been described to aid in detection of these sequences.34 Sequencing of a mock community, made up of known microorganisms in known quantities, in parallel with experimental samples can provide an estimate of error rate. Additionally, chimeras that are produced during PCR amplification steps should be identified and removed from the dataset using tools designed for this purpose, such as ChimerSlayer or UCHIME.35,36 Assembly For shotgun-style methods, one important step in the preparation of the dataset is assembling the reads into longer stretches of DNA (i.e., contigs and/or scaffolds) based on the consensus of overlapping sequence reads. When assembling multiple genomes from samples with many different organisms (as is typically the case for microbiome studies that employ shotgun DNA sequencing), specialized assembly algorithms are required so that false/chimeric assemblies are minimized. Some programs that are geared toward assembly from metagenomic data are MetaVelvet,37 IDBA-UD,38 MetaPar,39 and MetAMOS.40 Assembly is a challenge for heterogeneous genomes when microorganisms are present in low abundance and thus only incomplete coverage can be achieved. These challenges are compounded by the fact that reference genomes are not available for most microorganisms. For amplicon-based methods, assembly is often necessary when a paired-end approach has been used. To join overlapping pairs of sequences, specialized programs, such as PANDAseq41 and PEAR,42 have been written, but this task can also be performed within QIIME and mothur.

7

Characterization To make biological sense out of the sequence data generated through next-generation technologies, one can begin by determining the withinsample (alpha-) diversity, the between-sample (beta-) diversity, the taxonomic composition, and the functional composition of the communities being studied. QIIME and mothur are ideal for determining diversity metrics and assigning taxonomy to amplicon sequences. MG-RAST allows a bigpicture look at both the taxonomic and functional composition of a dataset, but with limited customizability. Two programs that allow more detailed and customizable functional assessments of shotgun data are MEGAN43 and BLAST2GO.44 Pros and cons of different approaches to calculating operational taxonomic units, assigning taxonomy, inferring phylogeny, and calculating diversity metrics are extensively described in other reviews.45,46 Statistics and visualization After the broad characterization of microbial communities, the next steps are (1) to test for correlations/associations between aspects of the microbiome and various factors and (2) to visualize the results. The programs QIIME, mothur, and MGRAST provide some tools for statistical analysis and visualization. More advanced analyses and visualization can be performed in R47; other software packages that provide the ability to perform basic statistical analysis and data visualization are Matlab, SAS, SPSS, and Stata. While R is typically considered to have a steep learning curve, a strong background in programming is not necessarily required. For those with programming experience, Python can prove quite useful (especially in conjunction with SciPy and NumPy) for this purpose.

SUMMARY In the past decade, next-generation sequencing has enabled researchers to answer questions that were previously intractable. The market potential of this technology has spawned numerous platforms in a relatively short period of time, and new platforms are constantly being developed. As technology progresses, a major goal will be to fill in the sequencing space with platforms that can produce longer reads and more reads per run (i.e., add to the upper right portion of Fig. 1). The area of microbiome research has benefitted greatly from the advent of next-generation sequencing, and is one discipline that has grown by leaps and bounds in recent years as a result. A variety of computational tools and software packages have been

8

HODKINSON AND GRICE

developed to deal with data from nextgeneration sequencing platforms. Studies that utilize culture-independent nextgeneration sequencing approaches are beginning to provide valuable insight into the composition, diversity, and dynamics of wound bioburden, and its relationship to impaired healing and development of infection-related complication. A major challenge in the future will be bringing this technology to the clinic as a potential diagnostic and/or prognostic tool.

ACKNOWLEDGMENTS AND FUNDING SOURCES The authors thank the members of EG laboratory for discussions of next-generation sequencing platforms and the associated bioinformatics. B.P.H. was funded by an NIH/NIAMS T32 Grant Award to the University of Pennsylvania Department of Dermatology (AR007465).

TAKE-HOME MESSAGES  Next-generation sequencing has made it cost and time effective to fully characterize wound-associated microbial communities.  There are currently five major next-generation sequencing platform families used in microbiome studies, all of which have strengths and weaknesses that must be weighed when designing an experiment.  Amplicon-based methods are effective for characterizing and comparing the overall taxonomic/phylogenetic composition of bacterial and fungal communities.  Shotgun sequencing methods are effective for characterizing communities of microorganisms, including viruses, archaea, and nonfungal eukaryotes, and allow one to investigate (a) functional potential of the organisms, by examining genomic DNA (metagenomics), or (b) functions being performed in the cells, by looking at RNA transcripts (metatranscriptomics).  Bioinformatics associated with next-generation sequencing can typically be divided into the following major categories (listed in order): sequence preprocessing, sequence assembly, community characterization, hypothesis testing (within a statistical framework), and data visualization.

AUTHOR DISCLOSURE AND GHOSTWRITING The authors do not declare any conflicts of interest. The authors do not declare any ghostwriter contributions. ABOUT THE AUTHORS Brendan P. Hodkinson, PhD, is a Post-Doctoral Research Fellow in the Department of Dermatology

at the University of Pennsylvania Perelman School of Medicine and is a member of EAG laboratory. He received his PhD in 2011 from Duke University where his studies focused on the use of next-generation sequencing for microbiome research. Elizabeth A. Grice, PhD, is an Assistant Professor of Dermatology at the University of Pennsylvania Perelman School of Medicine. She received her doctorate in Human Genetics in 2006 from Johns Hopkins University, and performed post-doctoral work at the National Institutes of Health. Her research interests focus on genomic and metagenomic analyses of cutaneous host–microbe interactions.

REFERENCES 1. Martin JM, Zenilman JM, Lazarus GS. Molecular microbiology: new dimensions for cutaneous biology and wound healing. J Invest Dermatol 2010; 130:38–48.

betic wound microbiota correlates with prolonged skin defense response. Proc Natl Acad Sci U S A 2010;107:14799–14804.

2. Scales BS, Huffnagle GB. The microbiome in wound repair and tissue fibrosis. J Pathol 2013; 229:323–331.

6. Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated highdensity picolitre reactors. Nature 2005;437:376– 380.

3. Grice EA, Segre JA. Interaction of the microbiome with the innate immune response in chronic wounds. Adv Exp Med Biol 2012;946: 55–68.

7. Lange V, Bohme I, Hofmann J, et al. Costefficient high-throughput HLA typing by MiSeq amplicon sequencing. BMC Genomics 2014; 15:63.

4. Misic AM, Gardner SE, Grice EA. The wound microbiome: modern approaches to examining the role of microorganisms in impaired chronic wound healing. Adv Wound Care 2014;1:502–510.

8. Fadrosh DW, Ma B, Gajer P, et al. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2014;2:6.

5. Grice EA, Snitkin ES, Yockey LJ, Bermudez DM, Liechty KW, Segre JA. Longitudinal shift in dia-

9. Caporaso JG, Lauber CL, Walters WA, et al. Ultrahigh-throughput microbial community analysis on

the Illumina HiSeq and MiSeq platforms. ISME J 2012;6:1621–1624. 10. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977;74:5463–5467. 11. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008; 456:53–59. 12. Peckham HE, McLaughlin SF, Ni JN, et al. SOLiDTM Sequencing and 2-Base Encoding. San Diego, CA: American Society of Human Genetics, 2007. 13. Mitra S, Forster-Fromme K, Damms-Machado A, et al. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. BMC Genomics 2013;14 Suppl 5:S16.

NEXT-GEN SEQUENCING IN WOUND MICROBIOME STUDIES

14. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008; 9:387–402. 15. Ho A. Next-Generation Sequencing: Acquisition, Analysis, and Assembly. Albuquerque, NM: University of New Mexico, 2013. 16. Milani C, Hevia A, Foroni E, et al. Assessing the fecal microbiota: an optimized ion torrent 16S rRNA gene-based analysis protocol. PLoS One 2013;8:e68739. 17. Yergeau E, Lawrence JR, Sanschagrin S, Waiser MJ, Korber DR, Greer CW. Next-generation sequencing of microbial communities in the Athabasca River and its tributaries in relation to oil sands mining activities. Appl Environ Microbiol 2012;78:7626–7637. 18. Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A 1985;82:6955–6959. 19. Seifert KA. Progress towards DNA barcoding of fungi. Mol Ecol Resour 2009;9 Suppl s1:83–89. 20. Dollive S, Peterfreund GL, Sherrill-Mix S, et al. A tool kit for quantifying eukaryotic rRNA gene sequences from human microbiome samples. Genome Biol 2012;13:R60. 21. Hodkinson BP, Lendemer JC. Next-generation sequencing reveals sterile crustose lichen phylogeny. Mycosphere 2013;4:1028–1039. 22. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004;32:D277–D280. 23. Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003;4:41. 24. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2006;444:1027–1031. 25. Fitzsimons MS, Novotny M, Lo CC, et al. Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome. Genome Res 2013;23:878–888. 26. Langille MG, Zaneveld J, Caporaso JG, et al. Predictive functional profiling of microbial com-

munities using 16S rRNA marker gene sequences. Nat Biotechnol 2013;31:814–821. 27. Tartar A, Wheeler MM, Zhou X, Coy MR, Boucias DG, Scharf ME. Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnol Biofuels 2009;2:25. 28. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335–336. 29. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537–7541. 30. Angiuoli SV, Matalka M, Gussman A, et al. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 2011;12:356. 31. Meyer F, Paarmann D, D’Souza M, et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008;9:386. 32. Pearson WR, Wood T, Zhang Z, Miller W. Comparison of DNA sequences with protein sequences. Genomics 1997;46:24–36. 33. Schmieder R, Lim YW, Rohwer F, Edwards R. TagCleaner: identification and removal of tag sequences from genomic and metagenomic datasets. BMC Bioinformatics 2010;11:341. 34. Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 2011;6:e27310. 35. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 2011;27:2194– 2200. 36. Haas BJ, Gevers D, Earl AM, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 2011;21:494–504. 37. Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 2012;40:e155.

38. Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012;28:1420–1428. 39. Kim M, Ligo JG, Emad A, Farnoud F, Milenkovic O, Veeravalli VV. MetaPar: Metagenomic Sequence Assembly via Iterative Reclassification. In: arXiv 2013. http://arxiv.org/abs/1311.3932 40. Treangen TJ, Koren S, Sommer DD, et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 2013;14:R2. 41. Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 2012;13:31. 42. Zhang J, Kobert K, Flouri TX, Stamatakis A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 2014;30:614–620. 43. Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res 2007; 17:377–386. 44. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005; 21:3674–3676. 45. Kuczynski J, Lauber CL, Walters WA, et al. Experimental and analytical tools for studying the human microbiome. Nat Rev Genet 2012;13:47–58. 46. Hamady M, Knight R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 2009;19:1141–1152. 47. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2011.

Abbreviations and Acronyms CCS ¼ circular consensus sequencing COG ¼ Clusters of Orthologous Groups of proteins ITS ¼ internal transcribed spacer KEGG ¼ Kyoto Encyclopedia of Genes and Genomes LSU ¼ large subunit PGM ¼ personal genome machine SSU ¼ small subunit ZMW ¼ zero-mode waveguide

Glossary Amplicon: Lengths of DNA that have been amplified using primers via polymerase chain reaction. Barcode: Nucleotides in various combinations added to the ends of primers during a PCR so that the amplicon can be identified following sequencing. Metagenomics: Shotgun sequencing of heterogeneous microbes directly from their environment. The DNA is randomly sheared, sequenced, and assembled to reconstruct consensus sequences from the microbial genomes. Metagenomics provides information on the organisms present and the genomic coding potential. Microbiome: The totality of microbial genomes and/or microorganisms in a given environment.

9

Pyrosequencing: A method of sequencing that relies on the detection of pyrophosphate release when nucleotides are incorporated during a sequencing reaction. Sanger sequencing: Developed by Sanger et al. in 1977, a method of DNA sequencing based on DNA replication that incorporates chain-terminating dideoxyribonucleotides (ddNTPs). The ddNTPs are generally radioactively or fluorescently labeled for detection. Taxon: A taxonomic unit, or a group of organisms inferred to be phylogenetically related that are classified together. For example, phylum, order, genus, or species comprise different hierarchical levels of taxonomic rank to which a taxon can be assigned.

Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research.

Significance: The colonization of wounds by specific microbes or communities of microbes may delay healing and/or lead to infection-related complicati...
290KB Sizes 0 Downloads 10 Views