Journal of Human Evolution 79 (2015) 21e34

Contents lists available at ScienceDirect

Journal of Human Evolution journal homepage: www.elsevier.com/locate/jhevol

Ancient human genomics: the methodology behind reconstructing evolutionary pathways Stephanie Marciniak a, *, Jennifer Klunk a, b, Alison Devault a, c, Jacob Enk a, b, c, Hendrik N. Poinar a, b, d, * a

McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4L9, Canada Department of Biology, McMaster University, Hamilton, ON, Canada MYcroarray, Ann Arbor, MI, USA d Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada b c

a r t i c l e i n f o

a b s t r a c t

Article history: Received 15 May 2014 Accepted 7 November 2014 Available online 15 January 2015

High-throughput sequencing (HTS) has radically altered approaches to human evolutionary research. Recent contributions highlight that HTS is able to reach depths of the human lineage previously thought to be impossible. In this paper, we outline the methodological advances afforded by recent developments in DNA recovery, data output, scalability, speed, and resolution of the current sequencing technology. We review and critically evaluate the ‘DNA pipeline’ for ancient samples: from DNA extraction, to constructing immortalized sequence libraries, to enrichment strategies (e.g., polymerase chain reaction [PCR] and hybridization capture), and finally, to bioinformatic analyses of sequence data. We argue that continued evaluations and improvements to this process are essential to ensure sequence data validity. Also, we highlight the role of contamination and authentication in ancient DNA-HTS, which is particularly relevant to ancient human genomics, since sequencing the genomes of hominins such as Homo erectus and Homo heidelbergensis may soon be within the realm of possibility. © 2014 Elsevier Ltd. All rights reserved.

Keywords: Ancient DNA High-throughput sequencing Human evolution

Introduction Beginning with the first Neandertal mitochondrial DNA sequences reported (Krings et al., 1997, 1999), the study of ancient human DNA has recently culminated in the sequencing of entire archaic hominin genomes for familiar species such as the Neandertal (Green et al., 2010) and novel ‘species’ like the Denisovans (Reich et al., 2010; Krause et al., 2010a). This has been made possible with the advent of high-throughput sequencing (HTS) technology, revolutionizing the reconstruction of human evolution and demographic history. Previously, it was argued that the retrieval of DNA from specimens older than 100,000 years would be €bo and Wilson, 1991; Lindahl, 1993; difficult or impossible (P€ aa Krings et al., 1997). However, shifts in sequencing technology and bioinformatic approaches to ancient DNA sequence analysis demonstrate the feasibility of delving deep into the human lineage, as realized with a 400,000 year old mitochondrial genome

* Corresponding authors. E-mail addresses: [email protected] (S. Marciniak), [email protected] (H.N. Poinar). http://dx.doi.org/10.1016/j.jhevol.2014.11.003 0047-2484/© 2014 Elsevier Ltd. All rights reserved.

retrieved from an unknown hominin that shared a common ancestor with the Denisovan mitochondrial lineage (Meyer et al., 2014). The insights gleaned from the Neandertal and Denisovan genomes illuminate how HTS technology can be used to impact our understandings of human origins and complement the paleontological and archaeological records. Here we review advances in ancient DNA methods that have facilitated these contributions and discuss the challenges of retrieving high quality DNA from hominin fossil remains. We also explore how genomic data have been used to address hypotheses of human evolution, as well as the types of research questions best suited to the diverse sequencing strategies. Finally, we consider where further technological advancements in DNA retrieval and sequencing might take human evolutionary research in the future.

Ancient DNA: degradation and contamination DNA in ancient samples is typically a combination of endogenous and contaminant sequences degraded into short fragments, often averaging 40e60 base pairs (bp), and of lower quantity than €a €bo, 1989; the DNA typical of modern biological samples (Pa

22

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

O'Rourke et al., 2000; Hofreiter et al., 2001; Millar et al., 2008; Stoneking and Krause, 2011; Fulton, 2012). It is critical to understand the impact of post-mortem DNA damage and contamination (e.g., pre- or post-laboratory) on retrieving and authenticating ancient DNA, as such factors may limit genomic analysis. A number of reviews have already examined the impact of ancient DNA degradation on obtaining authentic sequence data (e.g., O'Rourke €a €bo et al., 2004). et al., 2000; Hofreiter et al., 2001; Pa DNA preservation in ancient samples can vary dramatically as post-mortem degradation is an idiosyncratic process where often, very little or no endogenous DNA will survive and if it does, €bo et al., preservation is highly variable (Hofreiter et al., 2001; P€ aa 2004; Willerslev and Cooper, 2005; Rohland and Hofreiter, 2007a; Stoneking and Krause, 2011). For example, of 21 bones screened for Neandertal-specific mitochondrial DNA by polymerase chain reaction (PCR) in Green et al. (2010), only three were selected for additional analysis, while seven of 15 Neandertal bones contained amplifiable mtDNA in Krings et al. (2000). This variability of DNA survival is due to not only rapidly occurring molecular damage (of unpredictable rates in complex systems) but the physical environment where the geochemistry of a site (e.g., soil pH, organic and inorganic organisms) facilitates or inhibits the subsequent retrieval of highly fragmented endogenous DNA. Broadly, the expectation is for the surviving fraction of endogenous DNA to constitute less than 1% and not exceed 5% of a sample, with 95%e 99% representing contamination (e.g., environmental, postmortem processes; Burbano et al., 2010; Green et al., 2010; Reich et al., 2010; Stoneking and Krause, 2011; Meyer et al., 2012; Carpenter et al., 2013; Fig. 1). Notable exceptions include permafrost remains, where well-preserved genomic data of a greater time-depth is retrievable from diverse specimens, including plants (e.g., Willerslev et al., 2003), mammals (e.g., mammoth [Hoss et al., 1994]; bison [Shapiro et al., 2004]; horse [Orlando et al., 2013]), microbes (e.g., Bellemain et al., 2013), or viruses (e.g., Legendre et al., 2014). A theoretical limit of DNA preservation has been estimated be€€ tween 100,000 and 1,000,000 years (Pa abo and Wilson, 1991; Lindahl, 1993; Willerslev and Cooper, 2005; Fulton, 2012), but the retrieval of ancient DNA is not temporally-bound, as a specimen's age is not linearly correlated to the amount of surviving DNA, which is characteristic of the inherent variability of DNA degradation and percentages of surviving DNA across specimens and spatio€€ temporal contexts (Tuross, 1994; O'Rourke et al., 2000; Pa abo et al., 2004; Millar et al., 2008; Allentoft et al., 2012) (Table 1).

Figure 1. Non-target DNA (approximately 95%) comprises the majority of surviving DNA in ancient samples, whereas the desired or targeted endogenous DNA is only a fraction (approximately 0e5%) of the overall constituents.

For example, depositional environments, whether the same or different sites, affect the fraction of preserved endogenous DNA in an unpredictable manner, such as a Denisovan phalanx containing 70% endogenous DNA recovered from a cave (southern Siberia) by Reich et al. (2010), in comparison to the 0.01%e0.03% of endogenous DNA recovered from an early modern human at the Tianyuan Cave site (northern China) by Fu et al. (2013a). Additionally, intrasite variability or differential tissue preservation may also exist, such as 0.17% endogenous DNA characterizing the Denisovan tooth compared with 70% endogenous DNA of the phalanx (Reich et al., 2010). The advantage of HTS technology in light of unpredictable preservation is maximizing detection of those short surviving endogenous DNA fragments from the overwhelming pool of nonspecific sequences. The characteristics of DNA in ancient samples, primarily the low quality and quantity of endogenous DNA with variable preservation, are the result of dynamic post-mortem molecular degradation processes due to strand breaks, baseless sites, miscoding lesions, €a €bo, 1989; Lindahl, 1993; Hoss and cross-links, to name a few (Pa €bo et al., 2004; Willerslev et al., 1996; Hofreiter et al., 2001; P€ aa and Cooper, 2005; Fulton, 2012). The cellular and biomolecular processes that support DNA integrity during life cease functioning after death, causing cellular degradation by endogenous nucleases and proteases with associated infiltrations of exogenous bacteria, fungi, or other organisms that further digest and non-specifically €a €bo et al., 2004; fragment the DNA (Hofreiter et al., 2001; Pa Molak and Ho, 2011). Molecular DNA damage is broadly categorized as: 1) shortening lesions that reduce the size of DNA, which prevent extension of the polymerase during PCR, such as strand breaks, condensation, cross-links, and oxidative damage; and 2) miscoding lesions due to hydrolytic damage, particularly deamination, that produce incorrect sequence reads, such as C to T (replacement of cytosine with uracil) and G to A (opposite strand) transitions (Hoss et al., 1996; O'Rourke et al., 2000; Hofreiter et al., €a €bo et al., 2004; Molak and Ho, 2011; Fulton, 2012; 2001; Pa Table 2). Proposed resolutions to counteract the effects of such damage include the use of N-phenylacyl thiazolium bromide (PTB) to break Maillard products, uracil-DNA-glycosylase to remove cytosine deamination, and overlapping amplifications of short PCR fragments alongside multiple extractions to overcome contami€a €bo et al., 2004; nating effects of damaged nucleotides (Pa Stoneking and Krause, 2011; Fulton, 2012). However, debate exists on how useful these approaches are to modify post-mortem damage (e.g., Hofreiter et al., 2001; Malmstrom et al., 2005; Rohland and Hofreiter, 2007b). Aside from molecular damage to ancient DNA, exogenous DNA contamination of samples may also occur due to crosscontamination between samples or experiments, laboratory equipment or reagents, and/or through laboratory personnel, €a €bo et al., 2004; including post-excavation specimen handling (Pa Brown and Brown, 2011; Fulton, 2012; Barta et al., 2013). Crosscontamination in the ancient DNA laboratory from previous PCR preparations (not PCR amplifications, which are restricted to the modern clean rooms) or related DNA work, such as preparing samples for extraction, library preparation, or indexing, is a critical issue that has not been fully addressed. In this situation, DNAcontaining molecules remain airborne and/or adhere to lab surfaces and equipment, which transfer contaminants to proceeding experiments (Brown and Brown, 2011; Fulton, 2012; Barta et al., 2013). This type of contamination is often undetectable, as negative controls employed in the experiment design may be unaffected, and is only revealed when sequence data are obtained that are contrary to the expectations within a genomic study (e.g., not phylogenetically sound; Handt et al., 1996; Brown and Brown, 2011; Fulton, 2012).

Table 1 Pre-sequencing assessments of contamination versus post-sequencing endogenous DNA recovery in selected studies Homo species

Age (years ago)

Genomic target

Estimates of sample contamination (%) and methods to identify or counteract its effect Pre-sequencing

Endogenous DNA

Reference

Post-sequencing

400,000

mtDNA

Shallow shotgun, molecule count, indexing adaptors

C-T mismatch, FLD

0.1e8.4%

Meyer et al., 2014

Neandertal

38,000

mtDNA, nuclear

15e99%: Amino acid racemisation, Neandertal assay (HVR)

C-T mismatch, damage pattern

Green et al., 2006

38,000

mtDNA

0e0.9%: Neandertal assay (HVRI)

Green et al., 2008

38,000e70,000

mtDNA

Copy number, indexed adaptors

18.2e40.2%

Briggs et al., 2009a

49,000

mtDNA

99.8%: Neandertal mtDNA array

Post-enrichment: 37%

Burbano et al., 2010

38,000

Nuclear

5.3 Gb of Neandertal DNA

Green et al., 2010

50,000

mtDNA

Neandertal mtDNA assay, indexed adaptors UDG/EndoVIII, indexed adaptors

70%

Prüfer et al., 2014

50,000e30,000

Indexed adaptors

50,000e30,000

Nuclear

Indexed adaptors

0.17% mtDNA (tooth) 70% nuclear (phalanx) Pre-enrichment: 0.047% Post-enrichment: 2.6% 70%

Reich et al., 2010

50,000e30,000

mtDNA, nuclear mtDNA

0.5%: Neandertal versus human substitutions, C-T mismatch 0.2e1.4%: Neandertal substitutions, FLD, C-T mismatch X chrom., allele counts, mtDNA substitutions 0.5e0.7%: C-T mismatch; mtDNA substitutions, Y chrom., nuclear Less than 1%: mtDNA, nuclear, Y chrom., FLD, C-T mismatch, genome coverage 0e1%: mtDNA, Y chrom., nuclear

75% in El Sidron (Spain), 5% and 99% in Vindija Cave (Croatia) specimens 8,341 mtDNA fragments

Meyer et al., 2012

30,000

mtDNA

5e100%: Neandertal assays

30%

Krause et al., 2010b

40,000

mtDNA, nuclear Genome

Shotgun, UDG/EndoVIII, indexed adaptors Indexed adaptors, Phusion and Taq polymerases Shotgun, indexed adaptors, end repair

0.01e0.03%

Fu et al., 2013a

Post-sequencing: 79% genome coverage Pre-capture: 0.04e6.2% Post-capture: 1.6e59.2%

Rasmussen et al., 2010 Carpenter et al., 2013

Denisovan

Early modern human

Palaeo-Eskimo

4,000

Homo sapiens sapiens

1500 BC to 1500 AD

mtDNA

UDG/EndoVIII, indexed adaptors

Shotgun 2nd extract, mtDNA substitutions, C-T mismatch, damage pattern, FLD 0.5%: mtDNA, nuclear, Y chrom., genome coverage, base error rates 0e1%: FLD, C-T mismatch, damage pattern, mtDNA substitutions 0.1e3.1%: C-T mismatch, damage pattern, genome coverage, mtDNA substitutions SNPs, mtDNA, Y chrom., European alleles Damage pattern, FLD, mtDNA substitutions

Krause et al., 2010a

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

Unknown hominin

23

24

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

Table 2 Molecular damage and potential resolutions Molecular damage Shortening lesions Strand breaks Cross-links

Oxidation Miscoding lesions

Cause

Effect

Biological (e.g., microorganisms), chemical (e.g., post-mortem cellular processes) DNAeDNA and DNA-protein reactions (e.g., Maillard products)

Depurination reduces size and amount of DNA Block PCR replication

Hydantoin derivatives from pyrimidines (cytosine, thymine) Hydrolytic damage deaminates amino acid groups

Block strand elongation

a) Cytosine-uracil b) Guanine-xanthine c) 50 -methylcytosine-thymine d) Adenine-hypoxanthine

Post-excavation specimen handling is similarly problematic due to the assumption that authentic ancient sequences exhibit characteristic signatures of damage, such as nucleotide misincorporations, since modern sequences are always undamaged (Sampietro et al., 2006; Green et al., 2009; Krause et al., 2010b; Stoneking and Krause, 2011). However, exogenous DNA from pre-laboratory specimen handling can co-extract with endogenous DNA and display damage (miscoding lesions) characteristic of ancient DNA samples, which creates obstacles for authentication (Malmstrom et al., 2005; Sampietro et al., 2006; Stoneking and Krause, 2011). For example, Malmstrom et al. (2005) show that post-excavation handling of 29 ancient dog specimens resulted in an overabundance of contaminating modern human DNA compared with endogenous DNA. A lack of data on the rate of nucleotide misincorporations over time and in various geological settings complicates authentication of ancient DNA signatures (Sampietro et al., 2006; Adler et al., 2011; Stoneking and Krause, 2011). A paradigm shift from PCR to HTS technology Prior to HTS technology, PCR was the most effective strategy to recover degraded, low quantity DNA for sequencing (Knapp and Hofreiter, 2010; Rizzi et al., 2012). This method targets and amplifies specific sequences (overlapping) framed within larger ‘preserved’ fragments to generate a single, larger contiguous sequence (Millar et al., 2008; Knapp and Hofreiter, 2010; Rizzi et al., 2012). The resultant PCR product is directly sequenced or cloned into a vector and multiple clones sequenced. The primary limitation of PCR-based approaches is the selective amplification of the fewer longer fragments, while the abundant short fragments are untargeted. This restriction directly impacts the sequence products retrieved in ancient hominin specimens, where, for example, Neandertal-specific variants of the mitochondrial hypervariable regions were limited to 120 bp e 360 bp segments (Krings et al., 1997; Ovchinnikov et al., 2000; Lalueza-Fox et al., 2005; Orlando et al., 2006). Although a multiplex PCR strategy (introduced by Krause et al., 2006) reconstructs longer DNA sequences, such as a complete Neandertal mtDNA genome assembled from 8,341 mtDNA fragments (Green et al., 2008), PCR-based approaches ultimately reduce the complexity of sequence data due to the bias of fragment sizes targeted (Knapp and Hofreiter, 2010; Molak and Ho, 2011). Despite the lack of comprehensive genomic data, PCR-based approaches demonstrated exciting preliminary advances in understanding human origins and our relations to other hominin species. For example, Neandertal-specific variants in the mitochondrial DNA (hypervariable regions I and II) helped stir the debate surrounding human origins, providing seemingly specific

Misincorporation of bases, still amplifiable

Resolution Short overlapping PCRs PTB (N-phenylacyl thiazolium bromide) breaks cross-links inconsistently (see Rohland and Hofreiter, 2007b) Short overlapping PCRs, polymerases (e.g., Beta polymerase), cloning Multiple extractions, independent PCRs, special polymerases e.g., Uracil-DNA-glycosylase, base excision repair (BER)

evidence for the Out of Africa replacement model of modern human origins (Krings et al., 1997, 1999; Ovchinnikov et al., 2000; Caramelli et al., 2003; Serre et al., 2004) as well as shedding light on the genetic diversity of Neandertals (Krings et al., 2000; Schmitz et al., 2002; Lalueza-Fox et al., 2005; Orlando et al., 2006). Genetic data also correlated with uniquely human traits (phenotypes) in the context of recent human evolution, such as the similarity in language acquisition of the FOXP2 gene between Neandertals and modern humans (Krause et al., 2007), the varying pigmentation of hair and/or skin of Neandertals determined from the melanocortin 1 receptor (Lalueza-Fox et al., 2007) and the O allele antiquity from the ABO blood group predating the divergence of Neandertals and modern humans (Lalueza-Fox et al., 2008). Yet, to reconstruct an entire archaic hominin genome, improvements in sequencing technology and/or retrieving ancient DNA were required, since PCR (classical or multiplex) is an impractical and costly approach. Hypothetically, without such progress, Green et al. (2006) posited that generating one-fold coverage of a Neandertal genome would require 20 g of bone with 6,000 runs on the pyrosequencing GS20 454 platform, a very costly exercise. High-throughput sequencing and the ancient hominin genome Previous reviews chronicle the impact, ongoing challenges, and future of HTS technology on the retrieval of authentic ancient DNA €a €bo et al., 2004; Millar et al., 2008; Kircher and Kelso, 2010; (e.g., Pa Knapp and Hofreiter, 2010). One of the major advances introduced by HTS technology is the ability to sequence millions of DNA molecules in parallel, thereby increasing the amount of sequence data generated and reducing the cost of sequencing. Alongside modest improvements in retrieving ancient DNA (e.g., a modified silica protocol applied by Meyer et al. (2014) recovered a 400,000 year old hominin mitochondrial genome), HTS overcomes the limitations of throughput, scalability, speed, and resolution (Table 3). The success of HTS, coupled with the capture of selected genomic regions prior to sequencing termed ‘targeted enrichment’, is revealed by Green et al.'s (2010) reconstructing a Neandertal genome that consisted of more than four billion nucleotides, which importantly established 1%e4% of the Neandertal nuclear DNA is shared among modern day Europeans and Asians, further adding complexity to the Out of Africa model of human origins. High-throughput sequencing is significant to ancient DNA research as it sequences the entire length of DNA fragments placed into libraries (e.g., 30 bp to 150 bp) rather than select regions, which enhances detection of the short endogenous molecules from a background of non-target sequences (Millar et al., 2008; Knapp

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

25

Table 3 Application of ancient DNA detection strategies within select human evolutionary studies Strategy

Homo species

PCR, cloning sequencing

Neandertal

40,000 43,000 100,000 38,000 (Vi-80 specimen, Croatia) 38,000 50,000 50,000e30,000 50,000e30,000

Palaeo-Eskimo

4,000

Diploid genome

Neandertal

49,000

Nuclear

Unknown hominin

400,000

mtDNA genome

Early modern human

40,000

mtDNA, nuclear (chrom. 21)

Homo sapiens sapiens Neandertal Denisovan Early modern human Neandertal

1500 BC to 1500 AD

Human genome

38,000e70,000 50,000e30,000 30,000 38,000

Neandertal

Shotgun

Neandertal Denisovan

Microarray and biotinylated baits (in-solution)

Whole genome in-solution (WISC) Primer extension capture

Restriction enzymes (bacterial DNA)

Genomic target mtDNA (HVRI) mtDNA (HVRII) mtDNA (HVRI, HVRII) mtDNA (HVRI) mtDNA (HVRI) mtDNA (HVRI) mtDNA (HVR), nuclear mtDNA genome mtDNA Nuclear genome mtDNA, nuclear

Multiplex PCR, 454 sequencing

Enrichment Microarray

Age (years) 30,000e100,000 30,000e100,000 42,000

Read length and/or coverage

Reference

379 bp 340 bp 357 bp HVRI, 288 bp HVRII

Krings et al., 1997 Krings et al., 1999 Krings et al., 2000

357 bp 47 bp 123 bp 2,705 bp unique mtDNA, 0.04% autosomal 16,565 bp, 34.9x coverage mtDNA genome, 52x coverage 1.86 Gb, 31x coverage mtDNA (tooth): 15,094 sequences, 58x coverage Nuclear (phalanx): 5.2 Gb, 1.9x coverage 2.4 billion bp, 20x coverage

Schmitz et al., 2002 Lalueza-Fox et al., 2005 Orlando et al., 2006 Green et al., 2006

14,000 protein-coding sites, 4.8x coverage (190,000x enrichment) 16,302 bp, 31.6x coverage

Burbano et al., 2010

Green et al., 2008 Prüfer et al., 2014 Meyer et al., 2012 Reich et al., 2010

Rasmussen et al., 2010

Meyer et al., 2014

Fu et al., 2013a

mtDNA genomes mtDNA mtDNA

mtDNA genome (16,566 bp), 35.6x coverage nuclear: 19.9 Mbp, 1.75x coverage 50,723 SNPs 0.3e19.9x mtDNA coverage 15.565 bp, 18e56x coverage 16,569 bp genome, 156x coverage Genome (16,566 bp) 31.2x coverage

Briggs et al., 2009a Krause et al., 2010a Krause et al., 2010b

Nuclear genome

5.3 Gb, 1.3x coverage

Green et al., 2010

and Hofreiter, 2010; Prüfer et al., 2010; Krause et al., 2010a) (Fig. 2). Simultaneously, this strategy reduces the risk of co-amplifying the longer, better preserved fragments typical of contaminant DNA (Krause et al., 2010a; Stoneking and Krause, 2011). Secondarily, HTS approaches reduce but do not eliminate sample destruction, as only small amounts of sample (less than 50 mg) are required, compared with earlier approaches requiring often up to a gram or more (Millar et al., 2008; Reich et al., 2010; Stoneking and Krause, 2011). Thirdly, HTS is scalable, which enables a range of research questions, from whole-genome sequencing to targeting specific genomic regions. Within ancient DNA studies, the flexibility of tailoring a sequencing strategy to the quality of specimens is critical

Carpenter et al., 2013

to ensure sequence authenticity. For example, in Fu et al. (2013a), low amounts of endogenous DNA (0.01%e0.03%) in a 40,000 year old anatomically modern human fossil precluded sequencing the entire genome; however, the mitochondrial genome and nuclear DNA (Chromosome 21) were retrieved via targeted enrichment, revealing genetic continuity with present-day mainland Asia. Ultimately, the parallelization of DNA sequencing reduces the cost of sequencing per sample, leading to widespread accessibility of this now affordable technology. Currently, the cost is approximately $0.07 to $10.00 per megabase depending on the platform (e.g., SOLiD, Illumina, Helicos, 454 GS GLX) compared with $500 per megabase with earlier Sanger sequencing technology (Kircher and

Figure 2. Different sequencing strategies are capable of characterizing different fractions of ancient DNA samples in terms of endogenous DNA content and fragment length.

26

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

Kelso, 2010; Liu et al., 2012). The practicality and feasibility of assembling ancient genomes is becoming more commonplace, such as the mitochondrial genomes of Neandertals (Green et al., 2008; Briggs et al., 2009a; Prüfer et al., 2014), Denisovans (Reich et al., 2010; Krause et al., 2010a), an unknown hominin (Meyer et al., 2014), and early modern humans (Krause et al., 2010b; Fu et al., 2013a), as well as high coverage nuclear genomes of a Neandertal (Green et al., 2010), Denisovan (Reich et al., 2010; Meyer et al., 2012), and Palaeo-Eskimo (Rasmussen et al., 2010). The comprehensive output of HTS data provides an increased time-depth and resolution to analyze hypotheses critical to revealing human evolutionary history, such as the genetic relationship between Neandertals, Denisovans, and modern humans (e.g., Reich et al., 2010; Krause et al., 2010a,b; Meyer et al., 2014; Prüfer et al., 2014), the timing and routes of hominin migrations (e.g., Green et al., 2010; Reich et al., 2010), and the origins of anatomically modern humans (e.g., Green et al., 2010; Krause et al., 2010b; Fu et al., 2013a; Meyer et al., 2014). Extraction protocols: DNA release versus DNA yield Ancient DNA extraction techniques seek to maximize the amount of DNA recovered while minimizing the co-extraction of inhibitors (Rohland and Hofreiter, 2007b), further sample degradation, and its inevitable loss during downstream purifications. Yet, protocols for extracting ancient DNA have undergone only modest changes due to a lack of research on how to best access DNA through enzymatic extractions from a range of ancient material that includes hair, bones, teeth, and soft tissue. Underlying the challenge of accessing viable DNA is the uncertainty of where and how DNA localizes in such ancient material, which raises questions such as: 1) Within bone, is DNA yielded from cellular components such as osteoblasts, osteocytes, and osteoclasts or from its organic (collagen) and inorganic (hydroxyapatite mineral) components?; 2) Within hair, is the quality and quantity of DNA from keratinous cells in the root or shaft complicated by intra-individual variability?; 3) Within soft tissue, do environmental factors or funerary preparations that enable tissue preservation affect the post-mortem sequestration of DNA?; 4) Within teeth, is DNA differentially sequestered in the cementum, pulp cavity, dentine, or enamel? As bones and teeth persist for hundreds of thousands of years, the following discussion on extraction strategies will focus on these hard tissues, due to their prominence in human evolutionary studies. The reader is directed to the following examples of protocols for extracting DNA from hair (Amory et al., 2007; Gilbert et al., 2007, 2008; Bengtsson et al., 2012) and soft tissue (Khairat et al., 2013; Matheson et al., 2014). The main extraction strategies are phenol-chloroform (Hagelberg et al., 1989; Hagelberg and Clegg, 1991; Loreille et al., 2007) and silica methods (O'Rourke et al., 2000; Rohland and Hofreiter, 2007a; Dabney et al., 2013) based on digesting and solubilizing DNA molecules bound to the bone or dental matrix. The silica method facilitates the adsorption of DNA to silica particles by using chaotropic salts, such as guanidinium isothiocyanate (GuSCN) or guanidine hydrochloride (GuHCl), to denature proteins and minimize the co-extraction of PCR inhibitors (Rohland and Hofreiter, 2007a; Dabney et al., 2013). In contrast, the phenolchloroform method relies on the difference in polarity between nucleic acids and proteins to sequester the nucleic acids in the aqueous phase by using repeated separation of hydrophobic and aqueous phases (Hagelberg and Clegg, 1991; Rohland and Hofreiter, 2007a,b). Although no single procedure is standardized, all methods result in the loss of fragments (less than 40 bp) through rounds of extraction and purification (e.g., sample concentration and elution with low-yield silica-membrane spin columns). However, Dabney

et al. (2013) recently proposed a modified silica method with changes to the binding buffer composition and using silica spin columns instead of silica-suspension to maximize the recovery of DNA fragments under 40 bp, as only 22% of 35 bp fragments are typically recovered (in comparison to 72% of 150 bp fragments). This method is successfully applied by Meyer et al. (2014) in the recovery of a 400,000 year old ancient hominin mitochondrial genome. Despite such ongoing progress, there remains an uncalculated loss of the very small DNA fragments (less than 30 bp) that cannot be overcome with established extraction and purification methods. The efficacy of extraction protocols in maximizing the yield of endogenous specimen DNA requires further research. Recently, Barta et al. (2014) suggest that the low amount of surviving endogenous DNA in ancient material may be partly due to the extraction procedures themselves, where 45%e99% of DNA can be lost in the process. The post-mortem decay of DNA is a dual process that not only exposes molecules to degradation through release into cellular spaces but may also sequester molecules within physiological spaces (Lindahl, 1993; Collins et al., 1995; Gilbert et al., 2005; Turner-Walker, 2008; Campos et al., 2012). The sequestration and protection processes that unpredictably preserve DNA over hundreds of thousands of years also complicate its extractability from bone, and this influences the quality of genomic data retrieved (Geigl, 2002; Campos et al., 2012). Extraction strategies rely on presuming uniform DNA distribution in the organic and inorganic bone fractions where demineralization releases half of the surviving DNA molecules, with the rest released through digestion of the organic matrix (Collins et al., 2002; Schwarz et al., 2009; Campos et al., 2012). However, it is not known to what extent extraction strategies successfully demineralize the mineral content and digest the protein matrix to release preserved DNA in a specimen, while the variable success rates in extracting ancient DNA suggest a correlation to where and how it is preserved in bone, rather than DNA quality itself (Campos et al., 2012; Sosa et al., 2013). Consequently, the reliance is on downstream processes, such as library preparation, to repair and recover sequence information from the short fragments of DNA to maximize genomic coverage. Library preparation: double- or single-stranded? The main sequencing platforms used in ancient DNA studies are the Illumina HiSeq, MiSeq, and NextSeq, having almost replaced other platforms in the last two years. However, new sequencing technology is developing rapidly. The next generation of sequencing platforms will involve sequencing individual molecules using nanopore technology (Schneider and Deeker, 2012), but its efficacy for the short average fragment lengths typical of ancient DNA is unclear. Current high-throughput sequencing platforms (Solexa-Illumina, 454-Roche, SOLiD-Applied Biosystems) generally follow the same principle of end-repair and ligation to universal adaptors (short DNA sequences) for constructing a sequencing library (Briggs and Heyn, 2012), but there are also platform-specific requirements prior to proceeding with sample preparation. With Illumina sequencing technology, adaptors act as priming sites to generate clusters of DNA on the flow cell of the sequencing platform, while also immortalizing the entire ancient DNA library through PCR amplification. Indexes are then added to individual libraries to act like a barcode. Indexing both ends of each library molecule allows for the pooling and identification of multiple libraries from a single sequencing run, where many samples are pooled and any ‘jumping’ artefacts are easily discerned and removed (Kircher et al., 2012). The most commonly used library preparation method is a double-stranded protocol (Fig. 3). This method creates blunt ends

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

from double-stranded DNA molecules because the fragmented DNA molecules often have overhangs (single-stranded breaks at different locations) on either strand (Briggs et al., 2007). These single-stranded breaks are frequently adjacent to deaminated cytosine residues on the 50 end, which suggests that deamination is an outcome of ancient DNA fragmentation (strand nicks, gaps; Briggs et al., 2007). As a result, two types of substitutions produce miscoding lesions in the ancient DNA template, C to T and G to A, which are prevalent near the 50 and 30 ends of DNA fragments, respectively (Briggs et al., 2007; Reich et al., 2010; Krause et al., 2010a). During template preparation, the blunt ends are created from DNA with overhangs by using two enzymes simultaneously: T4 DNA polymerase to remove 30 overhangs and fill-in 50 overhangs, while T4 polynucleotide kinase generates 50 -phosphorylated ends, which is necessary for subsequent ligation as the blunt end treated DNA initially lacks the 50 -phosphate group (Briggs et al., 2007; Reich et al., 2010). The process continues by ligating adaptors to the blunt ends using the enzyme T4 DNA ligase and then filling in the adaptors with Bst polymerase (Meyer and Kircher, 2010). The adaptors act as priming sites for subsequent indexing amplification (Meyer and Kircher, 2010). However, this method can only create libraries from double-stranded molecules. In poorly preserved ancient samples, DNA often contains single-stranded nicks or end modifications, such as abasic sites or modified base residues, that inhibit adaptor ligation in double-stranded library preparation (Gansauge and Meyer, 2013). Thus, double-stranded library preparation misses many molecules in these samples. Using a method to recover single-stranded DNA, as outlined by Gansauge and Meyer (2013), can allow for the recovery of more DNA from poorly preserved samples. Single-stranded library preparation begins with the dephosphorylation of both ends of the DNA molecules (Fig. 3). This prevents circularization of the single-stranded molecules with 50 phosphate groups by the ligase (Gansauge and Meyer, 2013). The next step involves heat denaturation of the molecules so that all molecules can be prepared into libraries, regardless of the initial double- or single-stranded state. The DNA templates are ligated to a biotinylated adaptor, as biotin enables the immobilization and capture of nucleic acids on streptavidin-coated magnetic beads due to the high-binding affinity of biotin and streptavidin. Library preparation proceeds with the annealing of a complementary adaptor primer and extension of the DNA insert. Adaptors are then ligated to the non-biotinylated end of the molecule through the protocol used for double-stranded library preparation. Finally, the non-biotinylated strand is denatured from the bead and can be amplified and indexed like a double-stranded library. An advantage of this method is that each of the reaction steps occurs while the DNA is bound to the streptavidin-coated bead. Therefore, this method circumvents the loss of DNA associated with purification steps, such as silica spin columns or carboxylated beads. The efficiency of DNA recovery of this method is anywhere between 30% and 70% (Gansauge and Meyer, 2013). Meyer et al. (2012) successfully use this protocol to recover autosomal regions (1.86 Gbp) of the human nuclear genome from Denisovan samples, resulting in an average of 30-fold coverage, compared with 1.9efold coverage of 5.2 Gbp recovered using a double-stranded library preparation method (Reich et al., 2010). The same extract used to generate the libraries for both preparation methods shows a six-fold and 22-fold increase in recovered molecules using the single-stranded method instead of the double-stranded method (Meyer et al., 2012). High-throughput sequencing strategies Once DNA extracts are converted to sequencing libraries, they are sequenced directly or enriched prior to sequencing (Fig. 4). In

27

the following section, we review the types of high-throughput sequencing strategies used to address ancient human genomics research questions. Shotgun sequencing Shotgun sequencing refers to the direct sequencing of a DNA library. This method is the simplest strategy and provides information about the ‘complete’ extractable DNA content of a sample (known as the metagenome), as well as the distribution of fragment lengths and potentially the level of modern human DNA contamination from either the environment and/or the laboratory. For example, Meyer et al. (2012) determined that less than 0.5% of autosomal human sequences were due to modern contamination and 70.0% of the total DNA content was endogenous Denisovan DNA. Some samples with a high endogenous DNA content may enable genome assembly from shotgun sequence data alone, using a reference sequence or even via de novo assembly (Knapp and Hofreiter, 2010). In this manner, shotgun sequencing recovered a 20-fold average coverage of 79% of the diploid genome from the hair of a 4,000 year old Palaeo-Eskimo individual, which contained 80% endogenous human DNA (Rasmussen et al., 2010). It is possible to use this method on even older specimens, exemplified by the recovery of a 31-fold average coverage of autosomal regions of the human genome in a sample from a Denisovan individual (Meyer et al., 2012). However, this sequencing strategy is not practical to use to recover entire genes or genomes if the endogenous DNA content is not high, as it would require very deep sequencing, (e.g., obtaining meaningful whole genome coverage greater than 30x), which is currently cost prohibitive. Primer extension capture The earliest form of in-solution targeted enrichment (hybridization-based capture) applied in human evolutionary studies was primer extension capture (PEC). This technique uses biotinylated primers as hybridization targets for the DNA in libraries. The primer itself is extended via a single round of PCR to form a doublestranded association between the primer and target. The hybrid primer/target is captured on streptavidin-coated magnetic beads, then the background DNA is washed away with final elution and amplification of the target sequences. This method recovered the mitochondrial genome sequences from five Neandertal individuals (Briggs et al., 2009a) and one Denisovan specimen (Krause et al., 2010a). While cost-effective and easy when it first appeared, PEC is becoming less commonly used in favor of hybridization-based capture methods because the use of primers instead of densely tiled probes limits the number and variety of targets that are captured. Hybridization-based capture Since most ancient DNA extracts typically contain a low proportion of endogenous molecules, targeted capture strategies are often employed prior to sequencing (Fig. 4). Hybridization capture allows genomic regions of interest to be sequenced more economically by removing background DNA and thus increasing the sequencing ‘space’ that the target DNA occupies in a library. Unlike targeted PCR (another enrichment strategy), hybridizationbased capture is highly scalable to targets of different lengths, whether that is the control region of the mitochondrial genome, the complete mitochondrial genome, select nuclear genes, or nearly complete nuclear genomes. Similar to PEC, hybridization-based capture uses oligonucleotide probes or ‘baits’ that closely match specific sequences of interest. These are allowed to hybridize to

28

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

Figure 3. Single- versus double-stranded library preparation process.

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

29

Figure 4. High-throughput sequencing strategies, including shotgun (direct sequencing) and targeted enrichment prior to sequencing (array- and in-solution based).

targets in a complex sequencing library and then the resulting hybrids are sequestered, with non-target or non-specifically-bound library molecules washed away. The technique dramatically increases the target proportion relative to exogenous or non-target endogenous background DNA (e.g., reviews by Mamanova et al., 2010; Horn, 2012). However, hybridization capture has its drawbacks. First, background DNA exclusion is very rarely complete due to a combination of complex library sequence composition and the kinetics of washing procedures. Second, since most hybridization capture probes are produced in silico using a reference sequence, it is rare that novel regions can be reconstructed using enriched sequencing data, especially with the very short DNA fragments typical to ancient DNA. While this is typically less of a concern for studies of ancient humans, it can be detrimental to research on bacterial and viral pathogens that have gained and lost large genomic regions during even their recent evolution.

Hybridization capture techniques fall into three categories: solid-support (e.g., microarray-based), in-suspension (e.g., baits bound to insoluble magnetic beads), and in-solution (e.g., with biotin-labeled DNA or RNA oligonucleotides). Solid-support or array-based enrichment is a hybridization capture-based method that provides a cost-effective approach for obtaining genome-size targets in a single capture experiment, since the hundreds of thousands to millions of probes immobilized on a single array offer the sufficient sequence diversity to span large genomic regions with high tiling density (Burbano et al., 2010). The technique was successfully used to capture 14,000 protein-coding positions from Neandertal specimens (Burbano et al., 2010) and was also recently used to capture the genomes of ancient Yersinia pestis and Vibrio cholerae (Bos et al., 2011; Devault et al., 2014; Wagner et al., 2014). Within ancient human DNA studies, in-suspension bead-bait constructs and in-solution biotinylated RNA or DNA probes are used

30

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

to capture genomic regions of interest (e.g., Carpenter et al., 2013; Fu et al., 2013a; Meyer et al., 2014). Commercially-synthesized custom probe sets can have a higher cost per unique probe than the equivalent probe design on a custom array, but they offer high scalability and experimental flexibility. Therefore, in-solution capture is the most cost-effective strategy for large sample sets targeting large numbers of nuclear loci, exomes, or mitochondrial, bacterial, or viral genomes (Gnirke et al., 2009). Additionally, recent advancements in building baits directly from genomic DNA provide an affordable option for researchers interested in whole-genome capture (Maricic et al., 2010; Carpenter et al., 2013; Enk et al., 2014). In theory, these methods can use genomic DNA from any source and even a related organism for the capture of DNA from an extinct relative, where, for example, modern elephant genomic DNA was used to capture mammoth DNA (Enk et al., 2014). It also removes some of the barriers to reconstructing novel genomic sequences inherent to reference-guided in silico bait design. Authentication of ancient DNA results in the genomic era As library preparation methods target all DNA sequences within a sample, both endogenous and contaminating molecules will be present in any given library. Contamination of ancient samples with modern DNA is almost inevitable due to exposure during prelaboratory processes such as excavation and museum handling as well as laboratory reagents, and therefore the effectiveness of previously suggested laboratory practices to ensure authenticity of hominin ancient DNA results is questioned (Garcia-Garcera et al., 2011). As such, methods of identifying endogenous from contaminant DNA, especially in the case of human remains, are necessary when studying ancient specimens. Despite extensive knowledge on sources of contamination and measures of prevention, there is no standardized methodological approach to overcome the confounding effects within ancient samples in the HTS (or genomics) era. It remains critical to understand the nature of DNA decay to comprehensively use the predictive and authentication aspects of contamination patterns in ancient specimens (Allentoft et al., 2012). Accordingly, authentication protocols have been proposed that aim to not only prevent endogenous and exogenous sources of contamination but facilitate its detection (e.g., Cooper and Poinar, 2000; Willerslev and Cooper, 2005), including separate laboratory facilities for modern and ancient specimen processing, independent replication of genomic results, patterns of damage in ancient samples (e.g., cytosine deamination), and appropriate molecular behavior (e.g., fragment length distribution). It is not the expectation that all ancient DNA studies comply with such criteria, as the onus is on the researcher to critically analyze the project design and incorporate relevant criteria to evaluate the scientific validity of the sequence data (Gilbert et al., 2005; Winters et al., 2011; Fulton, 2012). Yet, it is debatable to what extent such guidelines are reliable, primarily due to their introduction before HTS and its unique challenges, such as differentiating contaminant DNA from endogenous sequences and the inability to quantify contamination after targeted enrichment of samples. However, indirect measures are applied to detect contamination and support claims of specimen authenticity, such as the unique damage pattern of ancient DNA, fragment length distributions, high genome coverage, and bioinformatic approaches. Patterns of nucleotide misincorporations Since ancient DNA is damaged, the patterns and prevalence of nucleotide misincorporations can support specimen authenticity €€ (Pa abo,1989; Hoss et al.,1996; Hofreiter et al., 2001; Stiller et al., 2006; Briggs et al., 2009a). However, prior to HTS, the cloning of amplified

PCR products or amplicons rather than direct sequencing provided the most economical means of demonstrating specimen authentication (e.g., the ratio of endogenous to exogenous DNA) (Cooper and Poinar, 2000; Winters et al., 2011), particularly within human evolutionary studies (e.g., Krings et al.,1999; Ovchinnikov et al., 2000; Schmitz et al., 2002; Caramelli et al., 2003; Serre et al., 2004). One very common damage pattern in ancient DNA samples is the deamination of cytosine residues (leading to C to T and G to A transitions), primarily near the 50 and 30 ends of fragments due to the frequency of single-stranded overhangs and higher rate of deamination in single-stranded DNA as compared with doublestranded DNA (Lindahl, 1993; Gilbert et al., 2003; Brotherton et al., 2007; Briggs et al., 2009b). While this damage pattern is quantifiable in HTS data, it can exist in any specimen with degraded DNA and therefore requires differentiation from exogenous contaminating DNA. In the PCR days when amplicons often began from single molecules, deamination could result in undetected sequence errors, whereas HTS-produced data now enable detection of these substitutions by comparison to a reference sequence to quantitatively infer contamination and authenticity. For example, Meyer et al. (2014) demonstrate the rate of C to T substitutions in contaminated (modern DNA-containing) mitochondrial enriched libraries was less than 5%, while well-preserved (endogenous DNAcontaining) libraries exhibited higher rates of deamination as expected in ancient specimens. As cytosine deamination influences the incorporation of potentially erroneous bases, such damage is removed using the enzymes Uracil DNA Glycosylase (UDG) and Endonuclease VIII (Endo VIII) during library preparation. Uracil-DNA-Glycosylase catalyzes the removal of uracil leaving an abasic site, which is cleaved by Endo VIII at both strands (Briggs et al., 2009b). This creates shorter, yet more bioinformatically useful fragments for downstream processing. The downside is that the retrieved DNA sequences can no longer be used to authenticate the samples, and therefore libraries built both ways from a single extract are more useful for addressing issues of authenticity. Fragment length distribution Similarly, fragment length distributions have also been applied to authenticate a specimen's DNA (e.g., represents the degradation of DNA into short fragments rather than preserved long fragments) (Fig. 2). Yet, variability exists in different fossils and DNA extracts, with the potential for similarities between the ancient and contaminant sequences, which ultimately precludes the utility of this pattern to absolutely establish authenticity (Green et al., 2009; Krause et al., 2010b). For example, Krause et al. (2010b) analyzed previously studied contaminated Neandertal libraries, demonstrating the fragment length distribution of contaminating DNA molecules can be similar to the endogenous DNA, suggesting that fragment sizes alone are not sufficient to establish authenticity. Genome coverage and bioinformatic approaches With current HTS technology, it is increasingly possible to achieve high genome coverage, which is necessary to overcome the influence of contamination in samples (Millar et al., 2008). For example, Prüfer et al. (2014) obtained 52-fold coverage of the mitochondrial genome of an Altai Neandertal while Meyer et al. (2014) achieved 31.6-fold coverage of a mitochondrial genome from a 400,000 year old unknown hominin. Other methods of quality control include the bioinformatic estimation of contamination (Ginolhac et al., 2011; Fu et al., 2013b; Skoglund et al., 2014) through alignments of retrieved sequences to a reference (e.g., mtDNA, Y and/or X chromosomes, or nuclear) as

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

well as sequencing mtDNA to ensure a single individual is represented (e.g., Green et al., 2008; Burbano et al., 2010; Reich et al., 2010; Krause et al., 2010a,b; Meyer et al., 2012; Fu et al., 2013a; Prüfer et al., 2014). The use of such reference sequences in human evolutionary studies quantifies the ratio of endogenous (e.g., Neandertal, Denisovan, or other ancient human DNA) to presentday human DNA from the frequency of base substitutions. For example, Green et al. (2010) use the mtDNA genome of Neandertal specimens to identify variable positions from a worldwide database of 311 contemporary human mtDNAs. Reich et al. (2010) use nucleotide positions within the Denisovan specimen nuclear DNA to identify ancestral sequence variants that are derived among present-day humans to estimate sample contamination. Similarly, Green et al. (2010) and Reich et al. (2010) estimate modern male contamination in female hominin bones (Neandertal and Denisovan, respectively) using the human Y chromosome reference. Recently, Skoglund et al. (2014) designed a computational approach that also isolates presumably endogenous DNA based on a certain threshold of damage, while PALEOMIX created by Schubert et al. (2014) automates the genomic analysis pipeline. Aside from these bioinformatic measures, deep sequencing of specimens is also applied to demonstrate the retrieved DNA is from a single individual. For example, in Fu et al. (2013a) the alignment of 0.1%e7.7% of 4,423,607 total unique DNA fragments to the human mtDNA reference indicates consensus among four independent libraries, showing that a single individual is represented. Generally, the challenge with using modern human reference sequences is the assumption that the entire scale of human variation is represented within such comparisons of sequence positions. Currently, the best methods for authenticating ancient DNA results are the analysis of damage patterns, sequencing of blanks (which is not routinely performed), and bioinformatic estimation of contamination. Ultimately, the challenge with inferring contamination from sequence data, regardless of the method applied, is that modern contamination cannot be completely excluded, since the rate and mode of post-mortem damage remains poorly understood. Consensus calling The challenge with the data generated from HTS technology was the requirement for new algorithms and software to map the millions of short reads to a reference genome (Li et al., 2008). Since short reads are typical of ancient DNA, issues of accurately aligning these efficiently to a reference genome and handling ambiguities (e.g., mutations or sequencing errors) were critical in generating a consensus alignment (Li et al., 2008). Yet, the power of HTS technology enables the detection of rare sequence variants due to higher consensus accuracy (greater than 99%) and reduces the probability of stochastic errors (Schadt et al., 2010). The quality of DNA sequences are characterized by the likelihood that a nucleotide at a given position is incorrect, such as a Phred quality score of 20 or a one in 100 chance of an incorrect base call indicates an accuracy of 99% (Ewing and Green, 1998; Ewing et al., 1998). Accordingly, the availability of open-source bioinformatics tools (e.g., Burrows-Wheeler Aligner by Li and Durbin, 2009) enables alignments and generates a consensus based on posterior probability; however, the assumption is that a single consensus exists for the multiple sequences while the algorithms or parameters of such software (e.g., percent identity, gap size) influence the thresholds used in aggregating multiple sequences to a consensus (Lee, 2003). For an ancient hominin mtDNA genome, Meyer et al. (2014) initially used strict filtering parameters with five or more consensus sequences for 15,181 positions and an average coverage of 21.8 with at least 80% agreement. However, nucleotide positions where the specimen differs from the capture probes and mtDNA

31

sequence would be excluded, so by including sequences with C to T substitutions, lowering the consensus to three sequences and decreasing the agreement per position (greater than 67%), the mtDNA consensus consisted of 16,302 positions with average coverage of 31.6 (Meyer et al., 2014). Yet, ultimately, there is no agreement on how strict such parameters should be within ancient DNA research; for example, the minimum length of fragments included in mapping and alignments are variable. A related issue within human evolutionary studies is excluding present-day modern human contamination from endogenous hominin DNA, particularly early modern human species (existing up to 10,000 years ago). For example, despite adherence to authentication criteria, a resultant mtDNA sequence from a CroMagnon specimen was determined as inauthentic due to correspondence with modern European haplotypes (by Caramelli et al., 2003). Accordingly, this raises the issue of discarding results if modern-type DNA from a hominin specimen yields such a sequence similarity, with a subsequent focus on rare sequence variants in order to exclude contamination and support authenticity €bo et al., 2004; Serre et al., 2004). (Caramelli et al., 2003; P€ aa Although Krause et al. (2010b) argue that the advantages of library sequencing (e.g., targeting endogenous short DNA fragments, high coverage, and DNA damage patterns) enable identification of authentic datasets, the approach is not universally reliable to exclude present-day human contamination. Ancient human genomics and high-throughput sequencing Exploring the story of human origins requires a multidisciplinary approach from the physical, biological, and social sciences, where ancient DNA is but one component in the paleoanthropological toolkit. The breadth of research in ancient human genomics demonstrates the complexity of reconstructing the population and genetic history of past populations (as outlined in Table 3), where the questions arising from such genomic data outpace the answers. For example, improvements in ancient DNA technology add further complexity to ideas about modern human origins within the parameters of the multiregional and recent African origin models. Recent reviews (e.g., Stoneking and Krause, 2011; Shapiro and Hofreiter, 2014; Veeramah and Hammer, 2014) use ancient hominin sequence data to draw inferences about critical events in human evolutionary history, such as the origins of anatomically modern humans and the timing and routes of migrations out of Africa. Improvements in the ancient DNA pipeline, from extraction to sequencing, significantly increase the capacity to comprehensively use ancient human genomes and contribute to asking new questions about our evolutionary history. Going forward: the future of ancient DNA Methodological strategies for maximizing the retrieval, enrichment, and sequencing of short DNA fragments are continuing to dramatically alter how ancient DNA research proceeds in the area of human evolution. The optimization of extraction protocols will eventually lead to the retrieval of DNA fragments at the lower limit of the length distribution (under 30 bp) and computational practicality, with associated improvements in the alignment of such short DNA sequences to reference genomes. Most critically, the time-depth to which ancient DNA strategies are capable of reaching has significant applications to the study of the hominin lineage. No longer is it impossible to obtain authentic DNA sequences from 100,000 year old specimens, but going back further in time to a 400,000 year old hominin (by Meyer et al., 2014) demonstrates the possibilities of exploring DNA survival in hominin species that have yet to be sequenced, such as Homo erectus

32

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

(1,800,000 to 143,000 years ago) and Homo heidelbergensis (700,000 to 200,000 years ago). Older hominids such as Australopithecus afarensis (3,850,000 to 2,950,000 years ago), Australopithecus africanus (3,330,000 to 2,100,000 years ago), or Homo habilis (2,400,000 to 1,400,000 years ago) may currently be beyond the reach of ancient DNA techniques. However, an intriguing aspect of current research is identifying functional genomic elements by examining archaic hominin genomes (e.g., Abi-Rached et al., 2010; Mendez et al., 2013), although more remains to be explored. For example, Burbano et al. (2012) targeted via array enrichment the fast evolving genomic regions in the human lineage (human accelerated regions or HARs) in an Iberian Neandertal and Denisovan, leading to the identification of shared and unique (modern human) substitutions. Despite such progress, there are limitations within ancient DNA research. Issues of contamination and authenticity present an ongoing, though not insurmountable, challenge. The focus on wellpreserved and/or multicopy genetic regions restricts the information yielded from human genomic studies. Further understanding of the mechanisms of DNA preservation and post-mortem degradation may lead to novel methods of recovery and enable analyses of heavily contaminated samples, rather than excluding such specimens from research. The complexity of DNA survival, as related to damage patterns typical of ancient DNA, requires further elucidation in terms of the universality of this parameter to authentic specimens as well as its diversity among spatio-temporal contexts. The unique spectra of damage to ancient endogenous DNA may create new extraction methods that increase DNA recovery by targeting it specifically within the molecular environment as well as novel library preparation techniques, such as the previously discussed single-stranded library preparation. Ancient DNA research provides a lens through which the pathway of human evolution is elucidated. With ongoing developments in the retrieval and analysis of minute fragments of ancient DNA, we will gain new insights into our origins as well.

Acknowledgments We thank all members of the McMaster Ancient DNA Centre, McMaster University, the Michael G. DeGroote Institute of Infectious Disease Research, Social Sciences and Humanities Research Council (SSHRC) of Canada, and the Canada Research Chair program. The authors gratefully acknowledge the comments provided by the anonymous reviewers that helped to improve this manuscript.

References Abi-Rached, L., Moesta, A.K., Rajalingam, R., Guethlein, L.A., Parham, P., 2010. Human-specific evolution and adaptation led to major qualitative differences in the variable receptors of human and chimpanzee natural killer cells. PLoS Genet. 6, e1001192. Adler, C.J., Haak, W., Donlon, D., Cooper, A., 2011. Survival and recovery of DNA from ancient teeth and bones. J. Archaeol. Sci. 38, 956e964. Allentoft, M.E., Collins, M., Harker, D., Haile, J., Oskam, C.L., Hale, M.L., Campos, P.F., Samaniego, J.A., Gilbert, M.T.P., Willerslev, E., Zhang, G., Scofield, R.P., Holdaway, R.N., Bunce, M., 2012. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. B 279, 4724e4733. Amory, S., Keyser, C., Crubezy, E., Ludes, B., 2007. STR typing of ancient DNA extracted from hair shafts of Siberian mummies. Forensic Sci. Int. 166, 218e229. Barta, J.L., Monroe, C., Kemp, B.M., 2013. Further evaluation of the efficacy of contamination removal from bone surfaces. Forensic Sci. Int. 231, 340e348. Barta, J.L., Monroe, C., Teisberg, J.E., Winters, M., Flanigan, K., Kemp, B.M., 2014. One of the key characteristics of ancient DNA, low copy number, may be a product of its extraction. J. Archaeol. Sci. 46, 281e289. Bellemain, E., Davey, M.L., Kauserud, H., Epp, L.S., Boessenkool, S., Coissac, E., Geml, J., Edwards, M., Willerslev, E., Gussarova, G., Taberlet, P., Brochmann, C., 2013. Fungal palaeodiversity revealed using high-throughput metabarcoding of ancient DNA from arctic permafrost. Environ. Microb. 15, 1176e1189.

Bengtsson, C.F., Olsen, M.E., Brandt, L.O., Bertelsen, M.F., Willerslev, E., Tobin, D.J., Wilson, A.S., Gilbert, M.T.P., 2012. DNA from keratinous tissue. Part I: hair and nail. Ann. Anat. 194, 17e25. Bos, K.I., Schuenemann, V.J., Golding, G.B., Burbano, H.A., Waglechner, N., Coombes, B.K., McPhee, J.B., DeWitte, S.N., Meyer, M., Schmedes, S., Wood, J., Earn, D.J.D., Herring, D.A., Bauer, P., Poinar, H.N., Krause, J., 2011. A draft genome sequence of Yersinia pestis from victims of the Black Death. Nature 478, 506e510. Briggs, A.W., Heyn, P., 2012. Preparation of next-generation sequencing libraries from damaged DNA. In: Shapiro, B., Hofreiter, M. (Eds.), Ancient DNA Methods and Protocols. Humana Press, New York, pp. 143e154. Briggs, A.W., Stenzel, U., Johnson, P.L.F., Green, R.E., Kelso, J., Prüfer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M., P€ a€ abo, S., 2007. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA 104, 14616e14621. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., Lalueza-Fox, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Schmitz, R., Doronichev, V.B., €a €bo, S., 2009a. Targeted Golovanova, L.V., de la Rasilla, M., Fortes, J., Rosas, A., Pa retrieval and analysis of five Neandertal mtDNA genomes. Science 325, 318e321. €a €bo, S., 2009b. Removal Briggs, A.W., Stenzel, U., Meyer, M., Krause, J., Kircher, M., Pa of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38 (6), e87. Brotherton, P., Endicott, P., Sanchez, J.J., Beaumont, M., Barnett, R., Austin, J., Cooper, A., 2007. Novel high-resolution characterization of ancient DNA reveals C > U-type base modification events as the sole cause of post mortem miscoding lesions. Nucleic Acids Res. 35, 5717e5728. Brown, T., Brown, K., 2011. Biomolecular Archaeology: An Introduction. WileyBlackwell, Chichester. Burbano, H.A., Hodges, E., Green, R.E., Briggs, A.W., Krause, J., Meyer, M., Good, J.M., Maricic, T., Johnson, P.L.F., Xuan, Z., Rooks, M., Bhattacharjee, A., Brizuela, L., Albert, F.W., de la Rasilla, M., Fortea, J., Rosas, A., Lachmann, M., Hannon, G.J., €€ Pa abo, S., 2010. Targeted investigation of the Neandertal genome by arraybased sequence capture. Science 328, 723e725. Burbano, H.A., Green, R.E., Maricic, T., Lalueza-Fox, C., de la Rasilla, M., Rosas, A., €€ Kelso, J., Pollard, K.S., Lachmann, M., Pa abo, S., 2012. Analysis of human accelerated DNA regions using archaic hominin genomes. PLoS One 7 (3), e32877. Campos, P.F., Craig, O.E., Turner-Walker, G., Peacock, E., Willerslev, E., Gilbert, M.T.P., 2012. DNA in ancient bone e where is it located and how should we find it? Ann. Anat. 194, 7e16. Caramelli, D., Lalueza-Fox, C., Vernesi, C., Lari, M., Casoli, A., Mallegni, F., Chiarelli, B., Dupanloup, I., Betranpetit, J., Barbujani, G., Bertorelle, G., 2003. Evidence for a genetic discontinuity between Neandertals and 24,000-year-old anatomically modern Europeans. Proc. Natl. Acad. Sci. USA 100, 6593e6597. Carpenter, M.L., Buenrostro, J.D., Valdiosera, C., Schroeder, H., Allentoft, M.E., Sikora, M., Rasmussen, M., Gravel, S., Guillen, S., Nekhrizov, G., Leshtakov, K., Dimitrova, D., Theodossiev, N., Pettener, D., Luiselli, D., Sandoval, K., MorenoEstrada, A., Li, Y., Wang, J., Gilbert, M.T.P., Willerslev, E., Greenleaf, W.J., Bustamante, C.D., 2013. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am. J. Hum. Genet. 93, 1e13. Collins, M.J., Riley, M., Child, A.M., Turner-Walker, G., 1995. A basic mathematical simulation of the chemical degradation of ancient collagen. J. Archaeol. Sci. 22, 175e183. Collins, M.J., Nielsen-Marsh, C.M., Hiller, J., Smith, C.I., Roberts, J.P., Prigodich, R.V., Wess, T.J., Csapo, J., Millard, A.R., Turner-Walker, G., 2002. The survival of organic matter in bone: a review. Archaeometry 44, 383e394. Cooper, A., Poinar, H.N., 2000. Ancient DNA: do it right or not at all. Science 289, 1139. Dabney, J., Knapp, M., Glocke, I., Gansauge, M.T., Weihmann, A., Nickel, B., €€ Valdiosera, C., Garcia, N., Pa abo, S., Arsuaga, J.L., Meyer, M., 2013. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA 110, 15758e15763. Devault, A.M., Golding, G.B., Waglechner, N., Enk, J.M., Kuch, M., Tien, J.H., Shi, M., Fisman, D.N., Dhody, A.N., Forrest, S., Bos, K.I., Earn, D.J.D., Holmes, E.C., Poinar, H.N., 2014. Second-pandemic strain of Vibrio cholerae from the Philadelphia cholera outbreak of 1849. N. Engl. J. Med. 370, 334e340. Enk, J., Devault, A., Kuch, M., Murgha, Y., Rouillard, J.M., Poinar, H., 2014. Ancient whole genome enrichment using baits built from modern DNA. Mol. Biol. Evol. 31, 1292e1294. Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8, 175e185. Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186e194. €€ Fu, Q., Meyer, M., Gao, X., Stenzel, U., Burbano, H.A., Kelso, J., Pa abo, S., 2013a. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl. Acad. Sci. USA 110, 2223e2227. Fu, Q., Mittnik, A., Johnson, P.L.F., Bos, K., Lari, M., Bollongino, R., Sun, C., Giemsch, L., Schmitz, R., Burger, J., Ronchitelli, A.M., Martini, F., Cremonesi, R.G., Svoboda, J., Bauer, P., Caramelli, D., Castellano, S., Reich, D., P€ a€ abo, S., Krause, J., 2013b. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23 (7), 553e559. Fulton, T.L., 2012. Setting up an ancient DNA laboratory. In: Shapiro, B., Hofreiter, M. (Eds.), Ancient DNA Methods and Protocols. Humana Press, New York, pp. 1e11.

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34 Gansauge, M., Meyer, M., 2013. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8 (4), 737e748. Garcia-Garcera, M., Gigli, E., Sanchez-Quinto, F., Ramirez, O., Calafell, F., Civit, S., Lalueza-Fox, C., 2011. Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human palaeogenomics. PLoS One 6 (8), e24161. Geigl, E.M., 2002. On the circumstances surrounding the preservation and analysis of very old DNA. Archaeometry 44, 337e342. Gilbert, M.T.P., Willerslev, E., Hansen, A.J., Barnes, I., Rudbeck, L., Lynnerup, N., Cooper, A., 2003. Distribution patterns of post-mortem damage in human mitochondrial DNA. Am. J. Hum. Genet. 72, 32e47. Gilbert, M.T.P., Bandelt, H.J., Hofreiter, M., Barnes, I., 2005. Assessing ancient DNA studies. Trends Ecol. Evol. 20, 541e544. Gilbert, M.T.P., Tomsho, L.P., Rendulic, S., Packard, M., Drautz, D.I., Sher, A., Tikhonov, A., Dalen, L., Kuznetsova, T., Kosinstev, P., Campos, P.F., Higham, T., Collins, M.J., Wilson, A.S., Shidlovskiy, F., Buigues, B., Ericson, P.G.P., Germonpre, M., Gotherstrom, A., Iacumin, P., Nikolaev, V., Nowak-Kemp, M., Willerslev, E., Knight, J.R., Irzyk, G., Perbost, C.S., Fredrikson, K.M., Harkins, T.T., Sheridan, S., Miller, W., Shuster, S.C., 2007. Whole-genome shotgun sequencing of mitochondria from ancient hair shafts. Science 317, 1927e1930. Gilbert, M.T.P., Kivisild, T., Gronnow, B., Andersen, P.K., Metspalu, E., Reidla, M., Tamm, E., Axelsson, E., Gotherstrom, A., Campos, P.F., Rasmussen, M., Metspalu, M., Higham, T.F.G., Schwenninger, J.L., Nathan, R., De Hoog, C.J., Koch, A., Moller, L.N., Andreasen, C., Meldgaard, M., Villems, R., Bendixen, C., Willerslev, E., 2008. Palaeo-Eskimo mtDNA genome reveals matrilineal discontinuity in Greenland. Science 27, 1787e1789. Ginolhac, A., Rasmussen, M., Gilbert, M.T.P., Willerslev, E., Orlando, L., 2011. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27 (15), 2153e2155. Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E.M., Brockman, W., Fennell, T., Giannoukos, G., Fisher, S., Russ, C., Gabriel, S., Jaffe, D.B., Lander, E.S., Nusbaum, C., 2009. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27 (2), 182e189. Green, R.E., Krause, J., Ptak, S.E., Briggs, A.W., Ronan, M.T., Simons, J.F., Du, L., €bo, S., 2006. Analysis of one million Egholm, M., Rothberg, J.M., Paunovic, M., P€ aa base pairs of Neandertal DNA. Nature 444, 330e336. Green, R.E., Malaspinas, A.S., Krause, J., Briggs, A.W., Johnson, P.L.F., Uhler, C., Meyer, M., Good, J.M., Maricic, T., Stenzel, U., Prufer, K., Siebauer, M., Burbano, H.A., Ronan, M., Rothberg, J.M., Egholm, M., Rudan, P., Brajkovic, D., €bo, S., Kucan, Z., Gusic, I., Wikstrom, M., Laakkonen, L., Kelso, J., Slatkin, M., P€ aa 2008. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134, 416e426. Green, R.E., Briggs, A.W., Krause, J., Prüfer, K., Burbano, H.A., Siebauer, M., €bo, S., 2009. The Neandertal genome and ancient DNA Lachmann, M., P€ aa authenticity. Eur. Mol. Biol. 28, 2494e2502. Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H.Y., Hansen, N.F., Durand, E.Y., Malaspinas, A.S., Jensen, J.D., Marques-Bonet, T., Alkan, C., Prüfer, K., Meyer, M., Burbano, H.A., €ber, B., Ho € ffner, B., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Ho Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E.S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R.W., Johnson, P.L.F., Eichler, E.E., Falush, D., Birney, E., €a €bo, S., Mullikin, J.C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., Pa 2010. A draft sequence of the Neandertal genome. Science 328, 710e722. Hagelberg, E., Clegg, J.B., 1991. Isolation and characterization of DNA from archaeological bone. Proc. R. Soc. B 244, 45e50. Hagelberg, E., Sykes, B., Hedges, R., 1989. Ancient bone DNA amplified. Nature 342, 485. €bo, S., 1996. The retrieval of ancient human Handt, O., Krings, M., Ward, R.H., P€ aa DNA sequences. Am. J. Hum. Genet. 59, 368e376. €€ Hofreiter, M., Serre, D., Poinar, H.N., Kuch, M., Pa abo, S., 2001. Ancient DNA. Nat. Rev. Genet. 2, 353e359. Horn, S., 2012. Target enrichment via DNA hybridization capture. In: Shapiro, B., Hofreiter, M. (Eds.), Ancient DNA Methods and Protocols. Humana Press, New York, pp. 177e195. €bo, S., Vereshchagin, N.K., 1994. Mammoth DNA sequences. Nature 370, Hoss, M., P€ aa 333. €bo, S., 1996. DNA damage and Hoss, M., Jaruga, P., Zastawny, T.H., Dizdaroglu, M., P€ aa DNA sequence retrieval from ancient tissues. Nucleic Acids Res. 24, 1304e1307. Khairat, R., Ball, M., Hsieh Chang, C.C., Bianucci, R., Nerlich, A.G., Trautmann, M., Ismail, S., Shanab, G.M.L., Karim, A.M., Gad, Y.Z., Pusch, C.M., 2013. First insights into the metagenome of Egyptian mummies using next-generation sequencing. J. Appl. Genet. 54, 309e325. Kircher, M., Kelso, J., 2010. High-throughput DNA sequencing e concepts and limitations. Bioessays 32, 524e536. Kircher, M., Sawyer, S., Meyer, M., 2012. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3. Knapp, M., Hofreiter, M., 2010. Next generation sequencing of ancient DNA: requirements, strategies, and perspectives. Genes 1, 227e243. Krause, J., Dear, P.H., Pollack, J.L., Slatkin, M., Spriggs, H., Barnes, I., Lister, A.M., €a €bo, S., Hofreiter, M., 2006. Multiplex amplification of the Ebersberger, I., Pa mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439, 724e727.

33

Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R.E., Burbano, H.A., Hublin, J.-J., Hanni, C., Fortea, J., de la Rasilla, M., Bertranpetit, J., Rosas, A., €a €bo, S., 2007. The derived FOXP2 variant of modern humans was shared with Pa Neandertals. Curr. Biol. 17, 1908e1912. €€ Krause, J., Fu, Q., Good, J.M., Viola, B., Shunkov, M.V., Derevianko, A.P., Pa abo, S., 2010a. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature 464, 894e897. €bo, S., Krause, J., Briggs, A.W., Kircher, M., Maricic, T., Zwyns, N., Derevianko, A.P., P€ aa 2010b. A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr. Biol. 20, 231e236. €bo, S., 1997. Krings, M., Stone, A., Schmitz, R.W., Krainitzki, H., Stoneking, M., P€ aa Neandertal DNA sequences and the origin of modern humans. Cell 90, 19e30. €bo, S., 1999. DNA sequence of Krings, M., Geisert, H., Schmitz, R.W., Krainitzki, H., P€ aa the mitochondrial hypervariable region II from the Neandertal type specimen. Proc. Natl. Acad. Sci. USA 96, 5581e5585. Krings, M., Capelli, C., Tschentscher, F., Geisert, H., Meyer, S., von Haeseler, A., €€ Grossschmidt, K., Possnert, G., Paunovic, M., Pa abo, S., 2000. A view of Neandertal genetic diversity. Nat. Genet. 26, 144e146. Lalueza-Fox, C., Sampietro, M.L., Caramelli, D., Puder, Y., Lari, M., Calafell, F., Martinez-Maza, C., Bastir, M., Fortea, J., de la Rasilla, M., Bertranpetit, J., Rosas, A., 2005. Neandertal evolutionary genetics: mitochondrial DNA data from the Iberian Peninsula. Mol. Biol. Evol. 22, 1077e1081. Lalueza-Fox, C., Rompler, H., Caramelli, D., Staubert, C., Catalano, G., Hughes, D., Rohland, N., Pilli, E., Longo, L., Condemi, S., de la Rasilla, M., Fortea, J., Rosas, A., Stoneking, M., Schoneberg, T., Bertranpetit, J., Hofreiter, M., 2007. A melanocortin 1 receptor allele suggests varying pigmentation among Neandertals. Science 318, 1453e1455. Lalueza-Fox, C., Gigli, E., de la Rasilla, M., Fortea, J., Rosas, A., Bertranpetit, J., Krause, J., 2008. Genetic characterization of the ABO blood group in Neandertals. BMC Evol. Biol. 8, 342e347. Lee, C., 2003. Generating consensus sequences from partial order multiple alignment graphs. Bioinformatics 19 (8), 999e1008. Legendre, M., Bartoli, J., Shmakova, L., Jeudy, S., Labadie, K., Adrait, A., Lescot, M., Poirot, O., Bertaux, L., Bruley, C., Coute, Y., Rivkina, E., Abergel, C., Claverie, J.M., 2014. Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology. Proc. Natl. Acad. Sci. USA 111, 4274e4279. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with BurrowsWheeler Transform. Bioinformatics 25, 1754e1760. Li, H., Ruan, J., Durbin, R., 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851e1858. Lindahl, T., 1993. Instability and decay of the primary structure of DNA. Nature 362, 709e715. Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L., Law, M., 2012. Comparison of next-generation sequencing systems. J. Biomed. Biotech. 2012, 1e11. Loreille, O.M., Diegoli, T.M., Irwin, J.A., Coble, M.D., Parsons, T.J., 2007. High efficiency DNA extraction from bone by total demineralization. Forensic Sci. Int. 1, 191e195. Malmstrom, H., Stora, J., Dalen, L., Holmlund, G., Gotherstrom, A., 2005. Extensive human DNA contamination in extracts from ancient dog bones and teeth. Mol. Biol. Evol. 22, 2040e2047. Mamanova, L., Coffey, A.J., Scott, C.E., Kozarewa, I., Turner, E.H., Kumar, A., Howard, E., Shendure, J., Turner, D.J., 2010. Target-enrichment strategies for next-generation sequencing. Nat. Methods 7, 111e118. €a €bo, S., 2010. Multiplexed DNA capture of mitochondrial Maricic, T., Whitten, M., Pa genomes using PCR products. PLoS One 5, e14004. Matheson, C.D., David, R., Spigelman, M., Donoghue, H.D., 2014. Molecular confirmation of Schistosoma and family relationship in two ancient Egyptian mummies. Yearb. Mummy Stud. 2, 39e47. Mendez, F.L., Watkins, J.C., Hammer, M.F., 2013. Neandertal origin of genetic variation at the cluster of OAS immunity genes. Mol. Biol. Evol. 30, 798e801. Meyer, M., Kircher, M., 2010. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. vol. 5 (6) http://dx.doi.org/10.1101/pdb.prot5448. Meyer, M., Kircher, M., Gansauge, M.T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Jay, F., Prüfer, K., de Filippo, C., Sudmant, P.H., Alkan, C., Fu, Q., Do, R., Rohland, N., Tandon, A., Siebauer, M., Green, R.E., Bryc, K., Briggs, A.W., Stenzel, U., Dabney, J., Shendure, J., Kitzman, J., Hammer, M.F., Shunkov, M.V., Derevianko, A.P., Patterson, N., Andres, A.M., Eichler, E.E., Slatkin, M., Reich, D., €a €bo, S., 2012. A high coverage genome sequence from an archaic Kelso, J., Pa Denisovan individual. Science 338, 222e226. Meyer, M., Fu, Q., Aximu-Petri, A., Glocke, I., Nickel, B., Arsuaga, J.L., Martinez, I., €bo, S., 2014. Gracia, A., Bermudez de Castro, J.M., Carbonell, E., P€ aa A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403e406. Millar, C.D., Huynen, L., Subramanian, S., Mohandesan, E., Lambert, D.M., 2008. New developments in ancient genomics. Trends Ecol. Evol. 23, 386e393. Molak, M., Ho, S.Y.W., 2011. Evaluating the impact of post-mortem damage in ancient DNA: a theoretical approach. J. Mol. Evol. 73, 244e255. O'Rourke, D.H., Hayes, M.G., Carlyle, S.W., 2000. Ancient DNA studies in physical anthropology. A. Rev. Anthropol. 29, 217e242. Orlando, L., Darlu, P., Toussaint, M., Bonjean, D., Otte, M., Hanni, C., 2006. Revisiting Neandertal diversity with a 100,000 year old mtDNA sequence. Curr. Biol. 16, R400eR402.

34

S. Marciniak et al. / Journal of Human Evolution 79 (2015) 21e34

Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., Johnson, P.L.F., Fumagalli, M., Vilstrup, J.T., Raghavan, M., Korneliussen, T., Malaspinas, A.S., Vogt, J., Szklarcyzk, D., Kelstrup, C.D., Vinther, J., Dolocan, A., Stenderup, J., Velazquez, A.M.V., Cahill, J., Rasmussen, M., Wang, X., Min, J., Zazula, G.D., Seguin-Orlando, A., Mortensen, C., Magnussen, K., Thompson, J.F., Weinstock, J., Gregersen, K., Rø, K.H., Eisenmann, V., Rubin, C.J., Miller, D.C., Antczak, D.F., Bertelsen, M., Brunak, S., Al-Rasheid, K.A.S., Ryder, O., Andersson, L., Mundy, J., Krogh, A., Gilbert, M.T.P., Kjær, K., Sicheritz-Ponten, T., Jensen, L.J., Olsen, J.V., Hofreiter, M., Nielsen, R., Shapiro, B., Wang, J., Willerslev, E., 2013. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74e78. Ovchinnikov, I.V., Gotherstrom, A., Romanova, G.P., Kharitonov, V.M., Liden, K., Goodwin, W., 2000. Molecular analysis of Neandertal DNA from the northern Caucasus. Nature 404, 490e493. €bo, S., 1989. Ancient DNA: extraction, characterization, molecular cloning, and P€ aa enzymatic amplification. Proc. Natl. Acad. Sci. USA 86, 1939e1943. €bo, S., Wilson, A.C., 1991. Miocene DNA sequences e a dream come true? Curr. P€ aa Biol. 1, 45e46. €bo, S., Poinar, H., Serre, D., Jaenicke-Despres, V., Hebler, J., Rohland, N., Kuch, M., P€ aa Krause, J., Vigilant, L., Hofreiter, M., 2004. Genetic analyses from ancient DNA. A. Rev. Genet. 38, 645e679. €a €bo, S., Kelso, J., Green, R.E., 2010. CompuPrüfer, K., Stenzel, U., Hofreiter, M., Pa tational challenges in the analysis of ancient DNA. Genome Biol. 11, R47. Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., Heinze, A., Renaud, G., Sudmant, P.H., de Filippo, C., Li, H., Mallick, S., Dannemann, M., Fu, Q., Kircher, M., Kuhlwilm, M., Lachmann, M., Meyer, M., Ongyerth, M., Siebauer, M., Theunert, C., Tandon, A., Moorjani, P., Pickrell, J., Mullikin, J.C., Vohr, S.H., Green, R.E., Hellmann, I., Johnson, P.L.F., Blanche, H., Cann, H., Kitzman, J.O., Shendure, J., Eichler, E.E., Lein, E.S., Bakken, T.E., Golovanova, L.V., Doronichev, V.B., Shunkov, M.V., Derevianko, A.P., Viola, B., Slatkin, M., Reich, D., €a €bo, S., 2014. The complete genome sequence of a Neandertal from Kelson, J., Pa the Altai Mountains. Nature 505, 43e49. Rasmussen, M., Li, Y., Lindgreen, S., Pedersen, J.S., Albrechtsen, A., Moltke, I., Metspalu, M., Metspalu, E., Kivisild, T., Gupta, R., Bertalan, M., Nielsen, K., Gilbert, M.T.P., Wang, Y., Raghavan, M., Campos, P.F., Kamp, H.M., Wilson, A.S., Gledhill, A., Tridico, S., Bunce, M., Lorenzen, M.E.D., Binladen, J., Guo, X., Zhao, J., Zhang, X., Zhang, H., Li, Z., Chen, M., Orlando, L., Kristiansen, K., Bak, M., Tommerup, N., Bendixen, C., Pierre, T.L., Grønnow, B., Meldgaard, M., Andreasen, C., Fedorova, S.A., Osipova, L.P., Higham, T.F.G., Bronk Ramsey, C., n, T., Hansen, T.V., Nielsen, F.C., Crawford, M.H., Brunak, S., Sicheritz-Ponte Villems, R., Nielsen, R., Krogh, A., Wang, J., Willerslev, E., 2010. Ancient human genome sequence of an extinct Palaeo-Indian Eskimo. Nature 463, 757e762. Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., Viola, B., Briggs, A.W., Stenzel, U., Johnson, P.L.F., Maricic, T., Good, J.M., MarquesBonet, T., Alkan, C., Fu, Q., Mallick, S., Li, H., Meyer, M., Eichler, E.E., Stoneking, M., Richards, M., Talamo, S., Shunkov, M.V., Derevianko, A.P., €bo, S., 2010. Genetic history of an archaic Hublin, J.-J., Kelso, J., Slatkin, M., P€ aa hominin group from Denisova Cave in Siberia. Nature 468, 1053e1060. Rizzi, E., Lari, M., Gigli, E., De Bellis, G., Caramelli, D., 2012. Ancient DNA studies: new perspectives on old samples. Genet. Sel. Evol. 44, 21e40. Rohland, N., Hofreiter, M., 2007a. Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756e1762. Rohland, N., Hofreiter, M., 2007b. Comparison and optimization of ancient DNA extraction. Biotechniques 42, 343e352. Sampietro, M.L., Gilbert, M.T.P., Lao, O., Caramelli, D., Lari, M., Bertranpetit, J., Lalueza-Fox, C., 2006. Tracking down human contamination in ancient human teeth. Mol. Biol. Evol. 23, 1801e1807. Schadt, E.E., Turner, S., Kasarskis, A., 2010. A window into third-generation sequencing. Hum. Mol. Genet. 19 (R2), R227eR240.

€a €bo, S., Schmitz, R.W., Serre, D., Bonanii, G., Feine, S., Hillgruber, F., Krainitzki, H., Pa Smith, F.H., 2002. The Neandertal type specimen revisited: interdisciplinary investigations of skeletal remains from the Neander Valley Germany. Proc. Natl. Acad. Sci. USA 99, 13342e13347. Schneider, G.F., Deeker, C., 2012. DNA Sequencing with nanopores. Nat. Biotechnol. 30 (4), 326e328. Schubert, M., Ermini, L., Der Sarkissian, C., Jonsson, H., Ginolhac, A., Schaefer, R., Martin, M.D., Fernandez, R., Kircher, M., McCue, M., Willerslev, E., Orlando, L., 2014. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 1056e1082. Schwarz, C., Debruyne, R., Kuch, M., McNally, E., Schwarcz, H., Aubrey, A.D., Bada, J., Poinar, H., 2009. New insights from old bones: DNA preservation and degradation in permafrost preserved mammoth remains. Nucleic Acids Res. 37, 3215e3229. Serre, D., Langaney, A., Chech, M., Teschler-Nicola, M., Paunovic, M., Mennecier, P., €bo, S., 2004. No evidence of Neandertal mtDNA Hofrieter, M., Possnert, G., P€ aa contribution to early modern humans. PLoS Biol. 2, 0313e0317. Shapiro, B., Hofreiter, M., 2014. A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343, 1e7. Shapiro, B., Drummond, A.J., Rambaut, A., Wilson, M.C., Matheus, P.E., Sher, A.V., Pybus, O.G., Gilbert, M.T.P., Barnes, I., Binladen, J., Willerslev, E., Hansen, A.J., Baryshnikov, G.F., Burns, J.A., Davydov, S., Driver, J.C., Froese, D.G., Harington, C.R., Keddie, G., Kosintsev, P., Kunz, M.L., Martin, L.D., Stephenson, R.O., Storer, J., Tedford, R., Zimov, S., Cooper, A., 2004. Rise and fall of the Beringian steppe bison. Science 306, 1561e1565. €a €bo, S., Krause, J., Skoglund, P., Northoff, B.H., Shunkov, M.V., Derevianko, A.P., Pa Jakobsson, M., 2014. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl. Acad. Sci. USA 111, 2229e2234. Sosa, C., Vispe, E., Nunez, C., Baeta, M., Casalod, Y., Bolea, M., Hedges, R.E.M., Martinez-Jarreta, B., 2013. Association between ancient bone preservation and DNA yield: a multidisciplinary approach. Am. J. Phys. Anthropol. 151, 102e109. Stiller, M., Green, R.E., Ronan, M., Simons, J.F., Du, L., He, W., Egholm, M., Rothberg, J.M., Keates, S.G., Ovodov, N.D., Antipina, E.E., Baryshnikov, G.F., Kuzmin, Y.V., Vasilevski, A.A., Wuenschell, G.E., Termini, J., Hofreiter, M., Jae€bo, S., 2006. Patterns of nucleotide misincorporations nicke-Despres, V., P€ aa during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl. Acad. Sci. USA 103, 13578e13584. Stoneking, M., Krause, J., 2011. Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 12, 603e614. Turner-Walker, G., 2008. The chemical and microbial degradation of bones and teeth. In: Pinhasi, R., Mays, S. (Eds.), Advances in Human Paleopathology. John Wiley & Sons, Ltd., Chichester, pp. 3e29. Tuross, N., 1994. The biochemistry of ancient DNA in bone. Experientia 50, 530e535. Veeramah, K.R., Hammer, M.F., 2014. The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149e162. Wagner, D.M., Klunk, J., Harbeck, M., Devault, A., Waglechner, N., Sahl, J.W., Enk, J., Birdsell, D.N., Kuch, M., Lumibao, C., Poinar, D., Pearson, T., Fourment, M., Golding, B., Riehm, J.M., Earn, D.J.D., DeWitte, S., Rouillard, J.-M., Grupe, G., Wiechmann, I., Bliska, J.B., Keim, P.S., Scholz, H.C., Holmes, E.C., Poinar, H., 2014. Yersinia pestis and the Plague of Justinian 541-543 AD: a genomic analysis. Lancet Infect. Dis. 14, 319e326. Willerslev, E., Cooper, A., 2005. Ancient DNA. Proc. R. Soc. B 272, 3e16. Willerslev, E., Hansen, A.J., Binladen, J., Brand, T.B., Gilbert, M.T.P., Shapiro, B., Bunce, M., Wiuf, C., Gilichinsky, D.A., Cooper, A., 2003. Diverse plant and animal genetic records from Holocene and Pleistocene sediments. Science 300, 791e795. Winters, M., Barta, J.L., Monroe, C., Kemp, B.M., 2011. To clone or not to clone: method analysis for retrieving consensus sequences in ancient DNA samples. PLoS One 6, e21247.

Ancient human genomics: the methodology behind reconstructing evolutionary pathways.

High-throughput sequencing (HTS) has radically altered approaches to human evolutionary research. Recent contributions highlight that HTS is able to r...
3MB Sizes 0 Downloads 8 Views