169

ARTICLE Genetic structure of the four wild tomato species in the Solanum peruvianum s.l. species complex Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Joanne A. Labate, Larry D. Robertson, Susan R. Strickler, and Lukas A. Mueller

Abstract: The most diverse wild tomato species Solanum peruvianum sensu lato (s.l.) has been reclassified into four separate species: Solanum peruvianum sensu stricto (s.s.), Solanum corneliomuelleri, Solanum huaylasense, and Solanum arcanum. However, reproductive barriers among the species are incomplete and this can lead to discrepancies regarding genetic identity of germplasm. We used genotyping by sequencing (GBS) of S. peruvianum s.l., Solanum neorickii, and Solanum chmielewskii to develop tens of thousands of mapped single nucleotide polymorphisms (SNPs) to analyze genetic relationships within and among species. The data set was condensed to 14 043 SNPs with no missing data across 46 sampled plants. Origins of accessions were mapped using geographical information systems (GIS). Isolation by distance, pairwise genetic distances, and number of clusters were estimated using population genetics approaches. Isolation by distance was strongly supported, especially between interspecific pairs. Eriopersicon (S. peruvianum s.s., S. corneliomuelleri, S. huaylasense) and Arcanum (S. arcanum, S. neorickii, S. chmielewskii) species groups were genetically distinct, except for S. huaylasense which showed 50% membership proportions in each group. Solanum peruvianum and S. corneliomuelleri were not significantly differentiated from each other. Many thousands of SNP markers were identified that could potentially be used to distinguish pairs of species, including S. peruvianum versus S. corneliomuelleri, if they are verified on larger numbers of samples. Diagnostic markers will be valuable for delimiting morphologically similar and interfertile species in germplasm management. Approximately 12% of the SNPs rejected a genome-wide test of selective neutrality based on differentiation among species of S. peruvianum s.l. These are candidates for more comprehensive studies of microevolutionary processes within this species complex. Key words: genotyping by sequencing, single nucleotide polymorphism, geographic information systems, selection, population genetics. Résumé : La plus diverse des espèces sauvages de tomate Solanum peruvianum sensu lato (s.l.) a été reclassifiée en quatre espèces distinctes : Solanum peruvianum sensu stricto (s.s.), Solanum corneliomuelleri, Solanum huaylasense et Solanum arcanum. Cependant, les barrières reproductives entre ces espèces sont incomplètes et cela peut entraîner des discordances dans l'identification des ressources génétiques. Les auteurs ont employé le génotypage par séquençage (GBS) chez le S. peruvianum s.l., le Solanum neorickii et le Solanum chmielewskii pour identifier des dizaines de milliers de marqueurs SNP (polymorphisme mononucléotidique) et permettre l'analyse des relations génétiques au sein de ces espèces. Le jeu de données a été réduit a` 14 043 SNP sans données manquantes parmi les 46 plantes analysées. Les origines des accessions ont été déterminées au moyen de systèmes d'information géographique (GIS). L'isolement géographique, la distance génétique et le nombre de groupes ont été estimés en utilisant des approches de la génétique des populations. L'isolement géographique a été fortement supporté, particulièrement entre les paires d'espèces. Les groupes d'espèces Eriopersicon (S. peruvianum s.s., S. corneliomuelleri et S. huaylasense) et Arcanum (S. arcanum, S. neorickii et S. chmielewski) étaient génétiquement distincts a` l'exception du S. huaylasense qui présentait une appartenance égale aux deux groupes. Les espèces S. peruvianum et S. corneliomuelleri n'étaient pas significativement différentiés l'un de l'autre. Plusieurs milliers de marqueurs SNP ont été identifiés qui pourraient potentiellement distinguer des paires d'espèces, y compris S. peruvianum et S. corneliomuelleri, s'ils étaient validés sur un grand nombre d'échantillons. Des marqueurs diagnostiques seraient utiles en gestion des ressources génétiques pour circonscrire des espèces qui sont interfertiles et semblables sur le plan morphologique. À l'échelle du génome entier, environ 12 % des marqueurs SNP ont été rejetés sur la base d'un test de neutralité basé sur la différentiation au sein des espèces de S. peruvianum s.l. Ceux-ci pourront servir pour des études plus détaillées des processus de microévolution au sein de ce complexe d'espèces. [Traduit par la Rédaction] Mots-clés : génotypage par séquençage, polymorphisme mononucléotidique, systèmes d'information géographique, sélection, génétique des populations.

Introduction Tomato (Solanum lycopersicum L.) ranks fourth amongst crops in global net production value after rice, wheat, and soybean, and its wild species relatives constitute valuable resources for continued international food security (Vincent et al. 2013). All 12 wild tomato species are interfertile with the crop species, albeit to varying

degrees (Rick 1979a). The most diverse wild species Solanum peruvianum sensu lato (s.l.) was delineated into more than 30 races by Rick (1963, 1986) based on geographical distribution, morphology, and interfertility tests. Conserved germplasm accessions of these have served as a treasured source of new traits, such as disease resistance, since the early 20th century (Rick 1982, 1991). Solanum

Received 6 January 2014. Accepted 4 May 2014. Corresponding Editor: L. Lukens. J.A. Labate and L.D. Robertson. Plant Genetic Resources Unit, US Department of Agriculture, Agricultural Research Service, 630 W. North Street, Geneva, NY 14456, USA. S.R. Strickler and L.A. Mueller. Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853, USA. Corresponding author: Joanne A. Labate (e-mail: [email protected]). Genome 57: 169–180 (2014) dx.doi.org/10.1139/gen-2014-0003

Published at www.nrcresearchpress.com/gen on 5 May 2014.

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

170

peruvianum s.l. was recently reclassified into four new species of Solanum: S. peruvianum senso stricto (s.s.), Solanum corneliomuelleri, Solanum huaylasense, and Solanum arcanum (Peralta et al. 2005, 2008). At the Plant Genetic Resources Unit (PGRU) in Geneva, NY, USA, approximately 300 accessions of S. peruvianum s.l. are conserved and publicly available for crop improvement. These have been reclassified based on the taxonomic treatment by Peralta et al. (2008) (http://www.ars-grin.gov/cgi-bin/npgs/html/splist.pl?28572, accessed 29 October 2013). Obtaining viable and fertile interspecific hybrids from S. lycopersicum × S. peruvianum s.l. has proven to be difficult and requires the use of specialized tools and techniques such as genetic bridges and embryo rescue (Taylor and Al-Kummer 1982; Poysa 1990; Sacks et al. 1997; Kharkongar et al. 2013). However, the number of chromosomes is identical for all tomato species (2N = 24), and chromosome structure has not substantially diverged among them, rather, genic changes are responsible for fertility barriers (Rick 1979b). Structural changes between chromosomes of various tomato species were shown by high-resolution electron microscopy to be associated with heterochromatic regions and are thereby expected to have little impact on colinearity of linkage maps in euchromatin (Anderson et al. 2010). Some of the earliest molecular genetic marker applications to crop improvement were in tomato (Rick and Fobes 1974; Tanksley and Rick 1980). Although S. lycopersicum is relatively low in diversity, the high synteny of chromosomes and high transferability of molecular markers among all tomato species has allowed the development of relatively large numbers of markers, interspecific maps, and introgressed germplasm lines compared with other crops (Foolad and Panthee 2012). Various types of markers have been applied to study the diversity of S. peruvianum s.l. Isozymes were the first markers to clearly demonstrate a distinction between northern races (now S. arcanum) versus the remainder of the races (Rick 1986). Other types of markers such as cpDNA (Palmer and Zamir 1982), mtDNA (McClean and Hanson 1986), restriction fragment length polymorphism (RFLP) (Miller and Tanksley 1990), amplified fragment length polymorphism (AFLP) (Spooner et al. 2005; Nakazato et al. 2012), DNA sequences (Peralta and Spooner 2001; Merino Méndez 2009; Rodriguez et al. 2009; Nakazato et al. 2010), or combinations of these (Zuriaga et al. 2009) have been used to analyze the species complex, oftentimes within taxonomic but also within ecological or adaptive frameworks (Nakazato et al. 2010, 2012; Xia et al. 2010; Tellier et al. 2011). A 760 Mb assembled whole genome sequence of S. lycopersicum was published in 2012 as an outcome of an international effort initiated in 2003 (Hill 2012; The Tomato Genome Consortium 2012). Coding regions were confirmed to consist of mostly lowcopy DNA compared with other flowering plants (The Tomato Genome Consortium 2012). Heterochromatic regions are mainly distributed within or around centromeres, chromomeres, and telomeres, while euchromatin is located on distal ends of chromosomes (The Tomato Genome Consortium 2012). Next-generation sequencing (NGS) based marker methods, such as restriction site associated DNA (RAD) (Miller et al. 2007; Baird et al. 2008) and genotyping by sequencing (GBS) (Elshire et al. 2011), are enhanced by the availability of an assembled whole genome sequence because many markers can be mapped and allelism can be distinguished from paralogy. In this project, GBS was used to discover large numbers of mapped SNP markers and apply them to a population genetics analysis of S. peruvianum, S. corneliomuelleri, S. huaylasense, and S. arcanum germplasm conserved at PGRU and the C.M. Rick Tomato Genetics Resource Center (TGRC). GBS involves high throughput sequencing of terminal ends of restriction fragments in multiplexed format (Elshire et al. 2011). The method generates hundreds of thousands of SNP markers at a very low cost. Briefly, genomic complexity is reduced by restriction enzyme (RE) digest. Each DNA-digested sample is ligated to one of 96 unique barcoded

Genome Vol. 57, 2014

adapters that include primer binding sites. This allows multiplexing of 96 samples for subsequent PCR amplification and sequencing. Approximately 200 million 64-bp reads are generated per sequencing lane, which is equivalent to hundreds of thousands of unique sequences (sequence tags) per sample, depending on the crop. Tags are analyzed through a bioinformatics pipeline consisting of modules that are implemented through Perl scripts and applied based on the scientific hypothesis (Glaubitz et al. 2014). Sequence tags are aligned against each other and a reference genome if available. The primary objectives of this study were to (i) develop tens of thousands of mapped SNP markers in six wild tomato species (S. peruvianum, S. corneliomuelleri, S. huaylasense, S. arcanum, Solanum neorickii, and Solanum chmielewskii) using GBS and the S. lycopersicum whole genome sequence as a reference; (ii) verify geographic origins of wild tomato accessions using passport data and geographic information systems (GIS) tools; (iii) analyze population structure, genetic relationships, and isolation-by-distance among S. peruvianum s.l. accessions; (iv) estimate genetic differentiation among S. peruvianum s.l. species; and (v) analyze their allele frequencies to discover species-specific, potentially diagnostic markers and test the null hypothesis of selective neutrality of SNP loci.

Materials and methods Plant materials In 2009, 120 accessions of S. peruvianum, S. corneliomuelleri, S. huaylasense, and S. arcanum and two accessions each of S. chmielewskii and S. neorickii were grown in the field in Geneva, NY, for curatorial purposes, and species identities were verified using taxonomic keys (Peralta et al. 2005; Peralta et al. 2008). Accessions with identification numbers 1, 8, 19, 24, 27–34, and 41 (Table 1) were received in the form of seed from TGRC between 2009 and 2012. In 2013, accession numbers 7, 11, 12, and 44 (Table 1) were grown in the field to re-check species identities. The 46 accessions of wild tomato assembled for this study (Table 1) represented six species: S. peruvianum (12 accessions), S. corneliomuelleri (12 accessions), S. huaylasense (11 accessions), S. arcanum (7 accessions), S. chmielewskii (2 accessions), and S. neorickii (2 accessions). The first three belong to the Eriopersicon species group, and the latter three are members of the Arcanum species group (Peralta et al. 2008). In 2012, DNA was extracted from leaf tissue of one seedling from each accession using a Promega Wizard Genomic DNA isolation kit (Promega Corp., Madison, Wis.). To check genomic DNA quality, 1% agarose gels were used to visualize 100 ng of each sample and also 10 random samples that were digested with EcoRI according to manufacturer’s protocol (New England Biolabs, Ipswich, Mass.). Spectroscopy was used to further verify DNA quality based on a ratio of absorbance at 260:280 nm and 260:230 nm that exceeded 1.7. For each sample a minimum of 1 ␮g of genomic DNA at 100 ng/mL as quantified by Quant-iT PicoGreen dsDNA assay (Invitrogen, Life Technologies, Carlsbad, Calif.) was submitted to the Institute for Genomic Diversity (IGD), Cornell University (CU) (http://www.igd.cornell.edu, accessed 29 October 2013) for GBS library optimization and preparation. Genomic DNA was digested with methylation sensitive enzyme PstI (New England Biolabs, Ipswich, Mass.), ligated to one of 96 unique barcoded adaptors, and pooled into 96-plex format for PCR amplification and high-throughput sequencing on an Illumina HiSeq 2000 at the CU Life Sciences Core Laboratories Center (Grills et al. 2011). Each of the 46 DNAs was represented twice in the 96-plex reaction with the exception of sample numbers 34 and 35 (Table 1) that were represented four and three times, respectively, to fill the 96-well plate (95 samples plus one blank). High-throughput sequencing Published by NRC Research Press

Labate et al.

171

Table 1. Geographical origins of 46 accessions of six wild tomato species that were assayed with genotyping by sequencing (GBS) in this study.

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Accession No.

ID Latitude Longitude No. Species Country (decimal degrees) (decimal degrees) Locality as originally documented

G 32591 PI 127828 PI 127829 PI 251312 PI 251314 PI 266376

1 2 3 4 5 6

PI 365952

arc arc arc arc arc arc

Peru Peru Peru Peru Peru Peru

−6.505833 −7.271365 na −7.991389 −7.225000 −7.248808

−78.808889 −78.517779 na −79.103611 −78.883333 −78.650195

7 hua

Peru

−9.546944

−77.930278

PI 379016 PI 365968

8 arc 9 neo

Peru Peru

−6.515000 −13.066667

−79.150000 −72.666667

PI 365969

10 neo

Peru

−13.850000

−72.975000

PI 379029 PI 390673 PI 126434 PI 251301 PI 251302 PI 365939 PI 365940 PI 365941 PI 365942 PI 365949

11 12 13 14 15 16 17 18 19 20

Peru Peru Peru Peru Peru Peru Peru Peru Peru Peru

−12.950000 −10.416667 −11.819267 −11.500556 −11.516667 −11.483333 −11.460000 −11.641667 −12.116667 −12.635833

−74.016667 −76.133333 −76.318151 −76.651111 −76.683333 −77.083333 −76.900000 −76.958333 −76.466670 −75.978333

PI 365954

21 cor

Peru

−11.858333

−76.650000

PI 365955 PI 365938 NSL 109280 PI 129144 PI 365953 NSL 196641 NSL 116972

22 23 24 25 26 27 28

cor cor cor hua hua hua hua

Peru Peru Peru Peru Peru Peru Peru

−11.783333 −11.475000 −15.600000 na −10.158333 −9.533333 −10.133333

−76.633333 −77.108333 −73.770000 na −77.425000 −77.966667 −77.391667

NSL 109303 NSL 116769 NSL 164209 NSL 203221 NSL 236564 NSL 236565 PI 126435 PI 126935 PI 127832 PI 128651 PI 129149 PI 251306 PI 251309 PI 270435 PI 365948

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

hua hua hua hua hua hua per per per per per per per per per

Peru Peru Peru Peru Peru Peru Peru Peru Peru Chile Ecuador Peru Peru Mexico Peru

−8.829722 −8.814167 −8.689167 −8.816667 −8.866667 −8.866667 −11.819267 −7.985766 −16.399021 −18.526735 −2.206812 −9.940000 −15.850000 na −15.783333

−77.828889 −77.853333 −77.972222 −77.866667 −77.900000 −77.900000 −76.318151 −78.689980 −71.542248 −70.166358 −80.032192 −78.210000 −74.233333 na −74.391667

PI 379018 PI 390681 PI 390685

44 per 45 per 46 per

Peru Peru Peru

−10.800000 −12.044090 −10.685864

−77.066667 −77.064186 −76.262459

chm chm cor cor cor cor cor cor cor cor

23 km north of Chota; Collection site: Tunel Chotano Along the road between Cajamarca and San Juan Along the road between San Juan and Magdalena Collected in Cerro Campana, near Trujillo Collected along the Rio Jequetepeque, 64 km east of Pan-American Highway Collected near Magdalena, between Chilete and pass, road to Cajamarca, Cajamarca Dept Collected 4 km east of Pariacato and 38.7 miles on Casma Rd. to Huaras, Ancash Province Chauna, Cajamarca Collected about 7.5 km south of Curahuasi, Apurimac Province on road to Cuzco Collected from Yaca, 38.8 km west of Abancay on Puquio road, along Rio Pachachaca, Apurimac Province From Tambo, Ayacucho Province Collected from km 368 along road Lima to Huanuco From the roadside about 5 miles east of Matucana Collected 9 km west of Canta Collected 12 km west of Canta 37 km from Chancay on road to Pacaybamba, Lima Province, north of road South of Pacaybamba 0.5 km North of Trapiche 9 km on road to Quilca, Lima Province Cruz de Laya, 77 km from Lima, Lima Province Collected from Capillucas, 26 km northeast of Catahuasi on ImperialYauyos road, Lima Province Collected 4 km east of San Josede Palla, Lima Province. On road from Chosica to Huinco West of Puebla of San Jeronimo on road between Chosica and Huinco, Lima 34 km from Chancay on road to Pacaybamba, Lima Province, on right of road 21 km from Chaparra, 64 km from Pan-American Highway From east of Moquegua 92 km east of Pativilea on road to Conochas, Ancash Province North side of road from Casma to Huaraz, 28.1 miles east of Casma Found on steep banks both sides of the road, starting at 15.2 miles west of Chiquian Jcn and 63.8 miles east of Pativilca on Pativilca-Huaraz Found above the hydroelectric plant, Rio Santa, 4.5 km above Huallanca Found near the hydroelectric plant at Huallanca, Rio Santa 30 km west of Haullanca, Rio Santa Along the Rio Santa Huaylas Huaylas From the roadside about 5 miles east of Matucana Near Samne From Sacabaya, in a creek bed west of Arequipa From the Azapa Valley, east to southeast of Arica From about 15 km west of Guayaquil Collected in Culebras Collected in Chala na Rocky Banks 16.9 km west of Chala on Cerretera PanAmerican towards Nazca, Arequipa Cuujul, Lima Universidad Nacional Agraria, La Molina. Originally from Lima Universidad Nacional Agraria, La Molina. Originally from Cerro de Pasco

Note: arc, Solanum arcanum; hua, Solanum huaylasense; neo, Solanum neorickii; chm, Solanum chmielewskii; cor, Solanum corneliomuelleri; per, Solanum peruvianum; na, not available.

was performed twice on the 96-plex sample in two independent runs (flow cells C and D, Supplementary Table S11). Geographic information systems Geographical origins of wild tomato accessions were retrieved from Germplasm Resources Information Network (GRIN) and

1

TGRC online data bases. Latitude and longitude coordinates in decimal degrees were verified using Google Earth version 7.1.1.1888 (earth.google.com, accessed 29 October 2013) based on passport data and comparisons to referenced landmarks such as city or highway names. When coordinates were not available in either data base they were inferred based on narrative descriptions of

Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/gen-2014-0003. Published by NRC Research Press

172

Genome Vol. 57, 2014

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Fig. 1. Distribution of 10 175 mapped SNP markers along wild tomato chromosome 1 in comparison to protein coding positions and repetitive DNA. Chromosomes 2 through 12 are shown in Supplementary Fig. S1.

original collection sites and mapping in Google Earth. All original collection locations were mapped using GIS, except for accessions 3, 25, and 42 for which the documented collection sites were imprecise (Table 1). Genotyping by sequencing pipeline Raw data were analyzed via remote connection to CU Computational Biology Service Unit (CBSU) Computational Biology Application Suite for High Performance Computing (BioHPC) Lab (http://biohpc.org/implementations_cbsu.aspx, accessed 29 October 2013) (Bukowski et al. 2010) using the TASSEL 3.0 GBS pipeline (Bradbury et al. 2007; Glaubitz et al. 2014). High quality reads were organized into sequence tags that were aligned to each other and S. lycopersicum whole genome assembly build 2.40 (The Tomato Genome Consortium 2012) using Burrows–Wheeler alignment (BWA) tool (Li and Durbin 2009). The original 190 sequenced samples were reduced to 46 by merging samples with identical names across flow cells, lanes, or barcodes. All steps in the GBS pipeline and the key arguments used are summarized in Supplementary Table S1. The resultant HapMap (*.hmp) files for each chromosome were analyzed using the graphical version of TASSEL 4.0 (build June 27, 2013) by launching the program at the Web site http:// www.maizegenetics.net (accessed 29 October 2013). The initial set of 71 972 mapped SNPs, which included 184 that mapped to unassembled regions designated as chromosome 0, was filtered to create a matrix of 14 043 SNP markers with 0% missing data across all 46 samples; filter parameters were minimum count = 46, minimum

frequency = 0, maximum frequency = 1.0. This set of 14 043 SNPs was collapsed for analyses into 4695 (for six species) or 4559 (for four species) unique variables (with no specific genomic location) per set of SNPs in perfect linkage disequilibrium (LD) with each other using the Perl script AlleleCoder (Baldo et al. 2011). All raw data are available through the National Center for Biotechnology Information (NCBI) sequence read archive (SRA) (Sayers et al. 2011; Barrett et al. 2012) as study SRP034922, BioProject PRJNA230524, and BioSample numbers SAMN02436034–SAMN02436079. Data analyses The genomic locations of the 71 788 SNPs that mapped to chromosomes 1 through 12 were grouped into 1 Mb bins (Supplementary file Labate-71788-GBS-SNPs_v2.1mb.bin) using a Perl script (Supplementary file pileup2boxes.pl). These were graphed using an R script (Supplementary file labate_snp.R) based on the 12 S. lycopersicum chromosomes and their annotations (ITAG2.3) to illustrate genome coverage and comparisons to protein coding positions (Supplementary file ITAG2.3_genes.1mb.bin) and repetitive DNA (Supplementary file ITAG2.3_repeats.1mb.bin). STRUCTURE (Pritchard et al. 2000), PLINK (Purcell et al. 2007), and principal components analysis (PCA) using R were run on the CU BioHPC web computing resources site (Bukowski et al. 2010). STRUCTURE (Pritchard et al. 2000) parameters were 100 000 burn-in iterations followed by 200 000 iterations. DISTRUCT (Rosenberg 2004) was used to graph STRUCTURE results. Output from two to four runs for K = 1 to K = 8 clusters were used as input for Published by NRC Research Press

Labate et al.

173

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Fig. 2. Genetic structure of six wild tomato species for K = 2 clusters based on 14 043 SNPs. The vertical coordinate indicates the individual Q-matrix or membership coefficient in each cluster. The digits on the horizontal coordinate represent each individual identified in Table 1. The two clusters were associated with species that are members of the Arcanum species group versus the Eriopersicon species group, with the exception of Solanum huaylasense which showed an average admixture proportion of 50% within each group.

STRUCTURE HARVESTER version 0.6.93 (Earl and von Holdt 2012) to find the maximum rate of change in the log probability of data between successive K values, ⌬K (Evanno et al. 2005). A graphical visualization of genetic relationships among the 46 accessions was generated using PCA in conjunction with AlleleCoder (Baldo et al. 2011). Genepop version 4.2 (Raymond and Rousset 1995; Rousset 2008) was used for Fisher’s exact tests of population, i.e., species, differentiation and to estimate pairwise genetic distances (FST) between species. FST is defined as (Q2 − Q3)/(1 − Q3) where Q2 is the probability of identity in state of pairs of genes between individuals within subpopulations, and Q3 is the probability of identity in state of pairs of genes between subpopulations (Rousset 2008). The software program Lositan estimates the expected distribution of FST versus He (expected heterozygosity) under a neutral island model of population structure (Antao et al. 2008 and references therein). Lositan was applied to detect FST outliers, i.e., high or low FST values relative to neutral expectations, to test the null hypothesis of selective neutrality of SNP loci (Beaumont and Nichols 1996). Program options were mean neutral FST, force mean FST, false discovery rate = 0.10, and 100 000 simulations. The alpha level used to detect outliers was adjusted downward using Bonferroni’s correction for multiple tests. Allele frequencies within S. peruvianum, S. corneliomuelleri, S. huaylasense, and S. arcanum were computed using PLINK (Purcell et al. 2007). The Python script plink2treemix.py was used to convert the PLINK stratified allele frequencies file into TreeMix format (http://code.google.com/ p/treemix/, accessed 29 October 2013). TreeMix version 1.1 (Pickrell and Pritchard 2012) was used to graph potential interspecific migration events and to apply the four-population test to look for evidence of interspecific admixture. SPAGeDi version 1.4 (Hardy and Vekemans 2002) was used to test isolation-by-distance using

the genetic distance parameter âr (Rousset 2000) and ln(distance in km), which the program estimated from latitude and longitude coordinates. Rousset’s âr uses an individual within a population as the unit of analysis, rather than a subpopulation within a total population (Sumner et al. 2001). Resampling Procedures version 1.3 (Howell 2001) was used for randomization tests of correlation of two variables (genetic and geographic distances) (Efron and Tibshirani 1993) to estimate P-values independently for 180 intraspecific and 559 interspecific pairs of distances.

Results In closely examining grow-outs of S. peruvianum, S. corneliomuelleri, S. huaylasense, and S. arcanum accessions in 2009, it was difficult in some instances to confirm an accession matched one of the four wild tomato species based on dichotomous keys (Peralta et al. 2005, 2008). After consultations with expert Solanaceae taxonomists (D. Spooner, USDA-ARS, Madison, WI; S. Knaap, The Natural History Museum, London; I. Peralta, National University of Cuyo, Mendoza, Argentiina, personal communications, 2009) the following general distinctions were sought in applying the dichotomous keys: S. peruvianum has bifurcate inflorescence, short soft white pubescence, and curved anthers; S. corneliomulleri has furcate or simple inflorescence, long glandular pubescence, and very curved anthers; S. arcanum usually has simple inflorescence, short pubescence, and straight anthers; and S. huaylasense has bifurcate inflorescence, very little pubescence, straight anthers (or sometimes curved), and long peduncles. In 2013, species verifications of four accessions grown in the field in Geneva, NY, identified PI 379029 and PI 390673 (samples 11 and 12 in Table 1) as S. chmielewskii, PI 365952 (sample 7 in Table 1) as S. huaylasense, and PI 379018 Published by NRC Research Press

174

Genome Vol. 57, 2014

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Fig. 3. Principal components analysis of the six wild tomato species Solanum neorickii, Solanum arcanum, Solanum chmielewskii, Solanum corneliomuelleri, Solanum huaylasense, and Solanum peruvianum for 46 accessions × 4695 unique variables. The numerals represent identification numbers in Table 1. Arcanum species group was distinct from Eriopersicon species group, with the exception of sample 4. Solanum huaylasense, S. corneliomuelleri, and S. peruvianum were weakly separated from each other with intergrading overlap. Proportion of variance explained equaled 0.19, 0.09, and 0.05 for PC1, PC2, and PC3, respectively.

(sample 44 in Table 1) as apparently segregating for S. peruvianum and S. corneliomuelleri. For GBS our goal was to sample 12 morphologically verified accessions per species but too few numbers of S. arcanum and S. huaylasense were available. Therefore, we sampled the eight available accessions of S. arcanum, two accessions each of S. chmielewskii and S. neorickii, which are both closely related to S. arcanum (Peralta et al. 2008), and 10 accessions of S. huaylasense. In 2013 S. arcanum accession LA1364 (PI 365952, sample 7 in Table 1) was reclassified as S. huaylasense by TGRC based on geographic distribution and morphology (http://tgrc.ucdavis.edu, accessed 29 October 2013). However, we note that GBS results for this sample (see below) showed evidence of genetic admixture with the Arcanum species group, although the 2013 grow-out morphology was typical of S. huaylasense. In addition, S. chmielewskii accession LA1306 (PI 390673, sample 12 in Table 1) had previously been reported to group with S. arcanum based on 61 morphological characters (Fig. 15 in Peralta et al. 2008). Genotyping by sequencing and genotypes Between 106 × 106 and greater than 107 × 106 good, barcoded reads were generated by two independent 96-plex high-throughput sequencing runs (Supplementary Table S1). Reads were aligned with each other into 862 715 tags, of which 561 557 (65.1%) were

aligned to unique positions in the tomato genome. All tags that did not align or that aligned to multiple genomic locations were discarded (Supplementary Table S1). The 561 557 tags were used to identify 71 972 SNPs by filtering for high quality polymorphisms, as described below. Based on 561 557 aligned tags of 64-bp length each and a genome size of 900 Mb, approximately 4% of the genome was covered by the tags. Divergence between S. lycopersicum and S. peruvianum s.l. showed an average of 4% divergence at synonymous sites (Nesbitt and Tanksley 2002) and one to two SNPs per 64 nucleotides over all sites (Labate and Robertson 2012) that are within the range of functionality of the BWA algorithm (Li and Durbin 2009). The 33.3% of GBS tags that did not align were either missing in the reference genome or were more diverged from S. lycopersicum than average. Pipeline parameters were set so that SNP differences within a sample were scored as heterozygous, and SNPs with the same genomic location but different pairs of alleles, i.e., tri- or tetranucleotide sites, were eliminated. To filter out poorly performing DNA samples, a minimum taxon coverage of 0.10 required at least 10% of SNP genotypes to be scored in that sample. None of the 46 samples were eliminated at this stringency. Similarly, a minimum site coverage of 0.10 required at least 10% of genotypes to be scored for that SNP. No sites were eliminated at that stringency. The final marker counts were 184 SNPs Published by NRC Research Press

Labate et al.

175

Fig. 4. Geographical origins of 43 accessions of the six wild tomato species Solanum neorickii, Solanum arcanum, Solanum chmielewskii, Solanum corneliomuelleri, Solanum huaylasense, and Solanum peruvianum. The numerals represent identification numbers in Table 1. Map data: Google, Mapcity. 39

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Ecuador

31

32, 30 8, 1

Peru

29

33, 34

5, 6, 2 4, 36

40

27, 7 26, 28 44

S. arcanum S. neorickii S. chmielewskii S. corneliomulleri S. huaylasense S. peruvianum

12 46

11 17

23, 16

20

15, 14

18

10

43 22

9

24

13, 35

37

41

21 45

38 19

that mapped to unassembled DNA chromosome 0, and between 4062 and 10 175 SNPs per chromosome for chromosomes 1 through 12 for a total of 71 972 mapped, filtered SNPs in 46 samples. This represented an average of 97 SNPs/Mb among the six wild tomato species (Supplementary Table S1). These data are available in hapmap format at Sol Genomics Network (SGN) (Bombarely et al. 2011). The mapped SNPs were largely associated with coding regions and sparse in repetitive DNA regions across all 12 chromosomes as would be expected for the method of genome complexity reduction (Elshire et al. 2011) (Fig. 1; Supplementary Fig. S1). By filtering out sites with missing data in at least one sample, 14 043 SNPs remained; this was used as the primary data set. Of these, the number of heterozygous sites per genotype ranged from 33 to 1572. Five samples were outliers for unusually low numbers of heterozygous sites. These were S. neorickii sample 10 (PI 365969) with 33 sites, S. neorickii sample 9 (PI 365968) with 38 sites, S. arcanum sample 1 (G 32591 synonymous to LA2157, a naturally self-fertilizing accession) with 56 sites, and two accessions of the facultatively autogamous species S. chmielewskii sample 12 (PI 390673) with 70 sites and sample 11 (PI 379029 synonymous to LA1306) with 79 sites. The next smallest number of heterozygous sites was 267 (greater than 3-fold higher) in S. arcanum sample 3 (PI 127829), and the most highly heterozygous sample was S. peruvianum sample 45 (PI 390681) with 1572 sites. Genetic structure and differentiation between species Results from STRUCTURE runs showed the optimal number of clusters to be K = 2 based on the highest ⌬K (Evanno et al. 2005)

(Supplementary Table S2). Genomic proportions of each of the 46 samples showed that the two clusters represented the Arcanum species group versus the Eriopersicon species group, with the exception that S. huaylasense genomic proportions were evenly divided between the two groups (Fig. 2). Solanum arcanum sample 4 (PI 251312) showed evidence of admixture with the Eriopersicon species group and 2009 field notes classified it as S. arcanum with some uncertainty. Sample 7 (PI 365952) was originally classified as S. arcanum, reclassified as S. huaylasense by TGRC, and appeared to be genetically admixed (Fig. 2). Solanum neorickii and S. chmielewskii accessions showed 100% membership proportions in the Arcanum species group, the 12 S. corneliomuelleri accessions were 99.9%–100% in the Eriopersicon species group, and S. peruvianum was mostly in the Eriopersicon species group with a small fraction of the Arcanum species group in some samples (Fig. 2). To perform PCA, the 14 043 SNP set was collapsed into 4695 unique variables, i.e., SNPs in perfect LD among the six species, to decrease computational demands. Solanum arcanum, S. neorickii, and S. chmielewskii were separated from S. huaylasense, S. corneliomuelleri, and S. peruvianum by PC1 and PC2 axes (Fig. 3). The PC2 axis separated S. neorickii and S. chmielewskii from S. arcanum. Solanum peruvianum, S. corneliomuelleri, and S. huaylasense showed intergradation with each other along the PC1 axis, with S. peruvianum and S. corneliomuelleri overlapping most extensively and some accessions of S. huaylasense clustered distinctly together. To explore the role of geographical origins in species differentiation, genetic distances between pairs of S. arcanum, S. huaylasense, S. corneliomuelleri, and S. peruvianum samples were correlated Published by NRC Research Press

176

Genome Vol. 57, 2014

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Fig. 5. Spatial genetic structure of 39 mapped wild tomato accessions showed evidence of isolation by distance. Pairwise genetic distance between 739 pairs of individuals versus natural logarithm of geographical distance between original collection sites for the four wild tomato species Solanum arcanum, Solanum huaylasense, Solanum corneliomuelleri, and Solanum peruvianum based on 39 accessions × 4559 unique variables matrix. Permutation test of 1000 random replications gave slope = 0.0009, standard deviation = 0.0132, P = 0.000.

with ln-transformed geographic distance (km) between original collection sites in Ecuador and Peru (Table 1; Fig. 4). Latitude and longitude coordinates were identical for two pairs of samples (13, 35 and 33, 34). This gave 739 independent pairs of samples that consisted of 180 intraspecific pairs and 559 interspecific pairs. Support for isolation by distance was highly significant; 100% of the 1000 random permutations of the data showed that the predicted correlation was less than the observed correlation. The pattern of isolation by distance showed a steeper slope for inter-versus intraspecific pairs (Fig. 5). Therefore, the pairs were tested separately in 10 000 random replications each, which demonstrated that intraspecific pairs showed less isolation by distance (r = 0.2152, P = 0.003) than interspecific pairs (r = 0.3515, P = 0.000), although both were highly significant (Table 2). Based on 14 043 SNPs, the smallest genetic distance between species was FST = 0.0277 for S. peruvianum versus S. corneliomuelleri, and the largest was FST = 0.2285 for S. peruvianum versus S. arcanum (Table 3). Using Fisher’s exact test, the 4559 unique variables data set showed that all species pairs were highly significantly differentiated from each other, except S. peruvianum versus S. corneliomuelleri (Table 3). Graphical comparisons in allele frequencies for the 14 043 SNPs further illustrated the differentiation between S. arcanum and members of the Eriopersicon species group, as well as the high similarity between S. peruvianum and S. corneliomuelleri (Fig. 6). There were only 16 fixed differences between species pairs, i.e., an allele was found at 100% versus 0% frequency in a given comparison. These fixed differences involved fixation in S. arcanum versus the other three species (Fig. 6; Supplementary Table S3). However, there were between 1606 and 5886 alleles that were present (frequency between 0% and 100%) in one species versus absent (frequency = 0%) in a second species when all 12 reciprocal pairwise comparisons were made (Supplementary Table S3). These approximated an average of 8–11 SNPs/Mb between species pairs (Supplementary Table S3). For tests of selective neutrality, we used the 14 043 SNP data set from the S. peruvianum s.l. species. The maximum significant

Table 2. Randomization test of correlation of genetic distance with geographical distance for pairs of accessions of four wild tomato species based on 10 000 random replications. Comparison

No. of pairs

r

P

Intraspecific pairs Interspecific pairs

180 559

0.2152 0.3515

0.003 0.000

Note: Genetic distance âr and geographical distance ln(distance km).

Table 3. Pairwise FST genetic distances and ␹2 test of differentiation (Fisher’s exact test) between four wild tomato species based on 14 043 SNPs. Taxon

arc

cor

hua

cor

0.2144 ␹2 = inf df = 7490 P = HS 0.1810 ␹2 = inf df = 6700 P = HS 0.2285 ␹2 = inf df = 7496 P = HS

0.0901 ␹2 = inf df = 7896 P = HS 0.0277 ␹2 = 6644 df = 7980 P = 1.00

0.0909 ␹2 = inf df = 8136 P = HS

hua

per

Note: arc, Solanum arcanum; cor, Solanum corneliomuelleri; hua, Solanum huaylasense; per, Solanum peruvianum; inf, infinity; df, degrees of freedom; HS, highly significant. Fisher’s exact tests were performed using the 4559 unique variable data set, which collapsed multiple SNPs in perfect linkage disequilibrium into a single observation.

Published by NRC Research Press

Labate et al.

177

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Fig. 6. Pairwise comparisons between six wild tomato species in frequencies of the minor allele for 14 043 SNPs. Intensity of data point is proportional to number of pairs. Species pairs (graphs a–f) are presented left to right, top to bottom by decreasing genetic distance. Some alleles showed fixation in Solanum arcanum but were absent in the three other species (a–c); allele frequencies in Solanum corneliomuelleri and Solanum peruvianum were most similar (f).

P-value was adjusted from 0.05 to 3.7 × 10−6 to avoid type I error across 14 043 tests (Beaumont and Balding 2004). This resulted in 895 SNPs with FST significantly greater than and 789 SNPs with FST significantly less than the simulated mean neutral FST value (Supplementary file Outlier.loci). The former are candidates for positive selection and the latter are candidates for balancing selection (Beaumont and Nichols 1996; Antao et al. 2008). These outlier SNPs were graphed to their chromosomal locations and were observed to be spread throughout the genome, with clusters in distal regions and sparse coverage in proximal regions (Supplementary file Outlier.loci). The four-population test looked at three possible simple evolutionary trees: {{S. arcanum, S. corneliomuelleri}; {S. huaylasense, S. peruvianum}}, {{S. arcanum, S. huaylasense}; {S. peruvianum, S. corneliomuelleri}}, and {{S. arcanum, S. peruvianum}; {S. huaylasense, S. corneliomuelleri}} and for each tree tested whether allele frequency differences between the first species pair were uncorrelated with those between the second species pair. If the tree is correct then the differences between species pairs should be uncorrelated (Reich et al. 2009). The 14 043 SNP matrix was used in blocks of size 1, 20, 100, and 1000 SNPs with similar results (not shown). The Z-score was highly significant in the {{S. arcanum, S. corneliomuelleri}; {S. huaylasense, S. peruvianum}} and {{S. arcanum, S. peruvianum}; {S. huaylasense, S. corneliomuelleri}} trees (P < 0.001) and weakly significant in the {{S. arcanum, S. huaylasense}; {S. peruvianum, S. corneliomuelleri}} tree (P < 0.10) (Table 4). This implied gene flow between branches of the trees and rejection of a four-population structure.

Table 4. The f4 statistic four-population test of four wild tomato species based on 14 043 SNPs in 140 blocks of size 100. Taxa

f4

SE

Z-score

arc,cor; hua,per arc,hua; per,cor arc,per; hua,cor

0.00556622 0.000241079 0.00580729

0.000363013 0.000155076 0.000384464

15.3334*** 1.55459* 15.1049***

Note: arc, Solanum arcanum; cor, Solanum corneliomuelleri; hua, Solanum huaylasense; per, Solanum peruvianum. * and ***, significant at P < 0.10 and P < 0.001, respectively.

The TreeMix graph showed the maximum likelihood (ML) tree that was generated using blocks of 100 SNPs (Supplementary Fig. S2). The ML tree in Newick format is reported in Supplementary Fig. S2. Solanum chmielewskii was defined as the root, and three potential migration events were allowed. The tree represents the topology of relationships among taxa based on covariance patterns in genome-wide allele frequencies (Pickrell and Pritchard 2012). Drift parameter plotted on the x axis represents genetic differentiation in allele frequencies relative to a common ancestral population (Maloney et al. 2011). Results showed a narrow range of drift parameters for S. arcanum, S. corneliomuelleri, S. huaylasense, and S. peruvianum (0.03 to 0.04) and evidence of migration from the Eriopersicon–Arcanum divide into S. neorickii (migration weight = 0.5, Supplementary Fig. S2). The two additional allowed migration events, inferred to be S. neorickii to S. corneliomuelleri and S. arcanum to S. peruvianum, were improbable because their weights were essentially zero. Published by NRC Research Press

178

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Discussion The development of tens of thousands of mapped SNP markers in wild tomato species was highly successful using GBS. SNP density was lower in the heterochromatin than in the euchromatin, which is counter intuitive but probably related to the fact that fewer reads aligned in repetitive regions because of the methodology. Estimated genome size of S. lycopersicum (1.88–2.07 pg/2C) was slightly smaller than S. peruvianum (2.27 pg/2C) (Arumuganathan and Earle 1991), so it is likely that additional markers would have been identified if one of the wild species genomes had been available to use as a reference. Relative levels of heterozygosity of sampled plants were consistent with what would be expected for mating systems of the different species. Proportions of heterozygous SNP sites ranged from 0.2% in the highly autogamous S. neorickii to 11.2% in the typically self-incompatible, allogamous S. peruvianum (Spooner et al. 2005). These markers are now available for other applications such as mapping morphological differences between species, marker assisted breeding, germplasm characterization, taxonomy, and DNA fingerprinting (Shirasawa and Hirakawa 2013). Based on neutrality tests to detect FST outliers, a similar fraction of SNP loci (6%) showed signal of balancing versus positive selection. This is likely an overestimate because nearly two-thirds of the tested SNPs were in perfect LD with the remaining one-third. Of particular interest will be the SNP loci associated with positive selection. The loci containing these SNPs may be direct targets or tightly linked to loci involved in adaptive divergence within the S. peruvianum s.l. species complex. There are several strategies for follow-up studies to look for supporting evidence of adaptive divergence among species for the candidate loci (reviewed by Storz 2005). Although most of the accessions in our study were originally collected from Peru, they encompassed a broad north to south distribution from Ecuador to Chile. The single S. peruvianum sample 39 (Table 1) from Ecuador was unusual in that it mapped far north relative to the species’ native range of central Peru to northern Chile (Peralta et al. 2008). Originally collected in 1938 (http://www.ars-grin.gov/cgi-bin/npgs/acc/display.pl?1133671, accessed 29 October 2013), the original population likely arose via recent migration because the accession did not appear to be diverged from S. peruvianum either genotypically or phenotypically. Factors such as extensive genetic variation within populations and species (Rick 1983; Arunyawat et al. 2007) and use of S. lycopersicum as a reference genome creating bias towards markers in conserved regions preclude drawing phylogenetic inferences from the current study. Morphological differences among species are not always clearly evident because of environmental influences, thus it is desirable to identify species-diagnostic alleles. Although extensive intergradation was evident among species genotypes, the isolation-by-distance results supported the concept of separate species. Isolation-by-distance has been observed within wild tomato species (Nakazato et al. 2012 and references therein), but this is the first study to compare isolation-by-distance within versus between closely related tomato species. Our study supported the high genetic similarity between S. peruvianum and S. corneliomuelleri (Rodriguez et al. 2009; Zuriaga et al. 2009) but also identified thousands of potentially diagnostic SNP alleles (Supplementary Table S3) between the two species. These are candidate SNPs for testing on larger samples of germplasm using techniques such as real-time PCR or microarray assays. Solanum peruvianum and S. corneliomuelleri are thought to have diverged very recently and showed extensive niche overlap, although their distributions differed for certain climatic variables (Nakazato et al. 2010). The strong genetic divergence between Arcanum versus Eriopersicon species groups was expected (Rick 1986; Peralta and Spooner 2001, 2005; Peralta et al. 2008). This can explain the rejection of the null hypothesis in the four-population

Genome Vol. 57, 2014

test (Reich et al. 2009). The stratification of S. huaylasense into 50% Arcanum and 50% Eriopersicon genome proportions was not predicted. No geographical pattern was evident to explain this, and the observation requires further investigation. Molecular evidence showed five S. huaylasense accessions to form a tight cluster away from S. arcanum, S. peruvianum, and S. corneliomuelleri samples (Zuriaga et al. 2009). However, the same study found evidence of affinity of S. huaylasense to the Arcanum species group as well as a possible hybrid, S. peruvianum × S. arcanum accession LA1364, that was classified as S. huaylasense. TreeMix results are used to identify new relationships among groups in terms of splits and mixtures or migration, with mixture weights being correlated with admixture proportions (Pickrell and Pritchard 2012). Our results supported known relationships among the six species (Peralta et al. 2008) and suggested migration from ancestral S. peruvianum s.l. complex into S. neorickii. Solanum neorickii is hypothesized to have directly evolved from its sister species S. chmielewskii (Rick et al. 1976), and it would be interesting to explore the potential role played by S. peruvianum s.l. in this speciation event. The absence of admixture among S. arcanum, S. corneliomuelleri, S. huaylasense, and S. peruvianum supported a hypothesis that their shared polymorphisms were a product of incomplete lineage sorting rather than gene flow among remotely isolated populations (Nakazato et al. 2012).

Acknowledgements We thank Susan M. Sheffer, William Garman, Jonathan Spencer, and Paul Kisly for excellent technical support; Aureliano Bombarely for providing an R script; and the C.M. Rick Tomato Genetics Resource Center for providing seed samples. USDA is an equal opportunity provider and employer. The use of trade, firm, or corporation names in this publication is for the information and convenience of the reader. Such use does not constitute an official endorsement or approval by the United States Department of Agriculture or the Agricultural Research Service of any product or service to the exclusion of others that may be suitable. This research was supported by CRIS Project No. 1910-21000-024-00D. Part of this work was carried out using the resources of the Computational Biology Service Unit from Cornell University, which is partially funded by Microsoft Corporation.

References Anderson, L.K., Covey, P.A., Larsen, L.R., Bedinger, P., and Stack, S.M. 2010. Structural differences in chromosomes distinguish species in the tomato clade. Cytogenet. Genome Res. 129: 24–34. doi:10.1159/000313850. Antao, T., Lopes, A., Lopes, R.J., Beja-Pereira, A., and Luikart, G. 2008. LOSITAN: a workbench to detect molecular adaptation based on a FST-outlier method. BMC Bioinform. 9(1): 323. doi:10.1186/1471-2105-9-323. Arumuganathan, K., and Earle, E.D. 1991. Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9: 208–218. doi:10.1007/BF02672069. Arunyawat, U., Stephan, W., and Städler, T. 2007. Using multilocus sequence data to assess population structure, natural selection, and linkage disequilibrium in wild tomatoes. Mol. Biol. Evol. 24: 2310–2322. doi:10.1093/molbev/ msm162. PMID:17675653. Baird, N.A., Etter, P.D., Atwood, T.S., Currey, M.C., Shiver, A.L., Lewis, Z.A., et al. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS ONE, 3: e3376. doi:10.1371/journal.pone.0003376. PMID:18852878. Baldo, A.M., Francis, D.M., Caramante, M., Robertson, L.D., and Labate, J.A. 2011. AlleleCoder: a PERL script for coding co-dominant polymorphism data for PCA analysis. Plant Genet. Resour. 9: 528–530. doi:10.1017/S1479262111000839. Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., et al. 2012. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 40(D1): D57–D63. doi:10. 1093/nar/gkr1163. PMID:22139929. Beaumont, M.A., and Balding, D.J. 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13: 969–980. doi:10.1111/j. 1365-294X.2004.02125.x. PMID:15012769. Beaumont, M.A., and Nichols, R.A. 1996. Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. B Biol. Sci. 263: 1619–1626. doi:10.1098/rspb.1996.0237. Bombarely, A., Menda, N., Tecle, I.Y., Buels, R.M., Strickler, S., Fischer-York, T., et al. 2011. The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl. Nucleic Acids Res. 39(s1): D1149–1155. doi:10.1093/nar/gkq866. Published by NRC Research Press

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

Labate et al.

Bradbury, P.J., Zhang, Z., Kroon, D.E., Casstevens, T.M., Ramdoss, Y., and Buckler, E.S. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics, 23: 2633–2635. doi:10.1093/bio informatics/btm308. PMID:17586829. Bukowski, R., Sun, Q., Howard, M., and Pillardy, J. 2010. BioHPC: computational biology application suite for high performance computing. J. Biomol. Tech. 21(3 Suppl.): S23. Earl, D.A., and von Holdt, B.M. 2012. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Res. 4: 359–361. doi:10.1007/s12686-011-9548-7. Efron, B., and Tibshirani, R. 1993. An introduction to the bootstrap. Chapman and Hall, New York. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., and Mitchell, S.E. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE, 6: e19379. doi:10.1371/journal. pone.0019379. PMID:21573248. Evanno, G., Regnaut, S., and Goudet, J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14: 2611–2620. doi:10.1111/j.1365-294X.2005.02553.x. PMID:15969739. Foolad, M., and Panthee, D. 2012. Marker-assisted selection in tomato breeding. Crit. Rev. Plant Sci. 31: 93–123. doi:10.1080/07352689.2011.616057. Glaubitz, J.C., Casstevens, T.M., Lu, F., Harriman, J., Elshire, R.J., Sun, Q., and Buckler, E.S. 2014. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE, 9: e90346. doi:10.1371/journal.pone.0090346. PMID:24587335. Grills, G., Schweitzer, P., Sun, Q., Pillardy, J., Wang, W., Stelick, T., et al. 2011. Integrated core facility support and optimization of next generation sequencing technologies. J. Biomol. Tech. 22(Suppl.): S32. Hardy, O.J., and Vekemans, X. 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol. Ecol. Notes, 2: 618–620. doi:10.1046/j.1471-8286.2002.00305.x. Hill, R. 2012. Tomato genome sequence bears fruit. Nature News, doi:10.1038/ nature.2012.10751. Howell, D. 2001. Resampling procedures, ver. 1.3 [online]. Available from http:// www.uvm.edu/⬃dhowell/StatPages/Resampling/Resampling.html [accessed 29 October 2013]. Kharkongar, H.P., Khanna, V.K., Tyagi, W., Rai, M., and Meetei, N.T. 2013. Wide hybridization and embryo-rescue for crop improvement in Solanum. Agrotechnology, S11: 004. Labate, J.A., and Robertson, L.D. 2012. Evidence of cryptic introgression in tomato (Solanum lycopersicum L.) based on wild tomato species alleles. BMC Plant Biol. 12: 133. doi:10.1186/1471-2229-12-133. PMID:22871151. Li, H., and Durbin, R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25: 1754–1760. doi:10.1093/ bioinformatics/btp324. PMID:19451168. Maloney, P.E., Vogler, D.R., Eckert, A.J., Jensen, C.E., and Neale, D.B. 2011. Population biology of sugar pine (Pinus lambertiana Dougl.) with reference to historical disturbances in the Lake Tahoe basin: implications for restoration. For. Ecol. Manage. 262: 770–779. doi:10.1016/j.foreco.2011.05.011. McClean, P.E., and Hanson, M.R. 1986. Mitochondrial DNA sequence divergence among Lycopersicon and related Solanum species. Genetics, 112: 649–667. PMID:17246320. Merino Méndez, C.G. 2009. Population genetic approaches to speciation of wild tomatoes with special reference to Solanum habrochaites and S. arcanum. Ph.D. dissertation, Ludwig-Maximilians-Universität, München. pp. 119. Miller, J.C., and Tanksley, S.D. 1990. RFLP analysis of phylogenetic relationships and genetic variation in the genus Lycopersicon. Theor. Appl. Genet. 80: 437– 448. doi:10.1007/BF00226743. PMID:24221000. Miller, M.R., Dunham, J.P., Amores, A., Cresko, W.A., and Johnson, E.A. 2007. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 17: 240–248. doi:10.1101/gr.5681207. PMID:17189378. Nakazato, T., Warren, D.L., and Moyle, L.C. 2010. Ecological and geographic modes of species divergence in wild tomatoes. Am. J. Bot. 97: 680–693. doi: 10.3732/ajb.0900216. PMID:21622430. Nakazato, T., Franklin, R.A., Kirk, B.C., and Housworth, E.A. 2012. Population structure, demographic history, and evolutionary patterns of a green-fruited tomato, Solanum peruvianum (Solanaceae), revealed by spatial genetics analyses. Am. J. Bot. 99: 1207–1216. doi:10.3732/ajb.1100210. PMID:22711557. Nesbitt, T.C., and Tanksley, S.D. 2002. Comparative sequencing in the genus Lycopersicon: implications for the evolution of fruit size in the domestication of cultivated tomatoes. Genetics, 162: 365–379. PMID:12242247. Palmer, J.D., and Zamir, D. 1982. Chloroplast DNA evolution and phylogenetic relationships in Lycopersicon. Proc. Natl. Acad. Sci. U.S.A. 79: 5006–5010. doi: 10.1073/pnas.79.16.5006. PMID:16593219. Peralta, I.E., and Spooner, D.M. 2001. Granule-bound starch synthase (GBSSI) gene phylogeny of wild tomatoes (Solanum L. section Lycopersicon [Mill] Wettst. subsection Lycopersicon). Am. J. Bot. 88: 1888–1902. doi:10.2307/3558365. PMID:21669622. Peralta, I.E., and Spooner, D.M. 2005. Morphological characterization and relationships of wild tomatoes (Solanum L. section Lycopersicon). Monogr. Syst. Bot. Mo. Bot. Gard. 104: 227–257. Peralta, I.E., Knapp, S., and Spooner, D.M. 2005. New species of wild tomatoes

179

(Solanum section Lycopersicon: Solanaceae) from northern Peru. Syst. Bot. 30: 424–434. doi:10.1600/0363644054223657. Peralta, I.E., Knapp, S., and Spooner, D. 2008. The taxonomy of tomatoes: a revision of wild tomatoes (Solanum section Lycopersicon) and their outgroup relatives (Solanum sections Juglandifolium and Lycopersicoides). Syst. Bot. Monogr. 84: 1–186. Pickrell, J.K., and Pritchard, J.K. 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8: e1002967. doi: 10.1371/journal.pgen.1002967. PMID:23166502. Poysa, V. 1990. The development of bridge lines for interspecific gene transfer between Lycopersicon esculentum and L. peruvianum. Theor. Appl. Genet. 79: 187–192. doi:10.1007/BF00225950. PMID:24226217. Pritchard, J.K., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics, 155: 945–959. PMID: 10835412. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., et al. 2007. PLINK: a toolset for whole-genome association and populationbased linkage analysis. Am. J. Hum. Genet. 81: 559–575. doi:10.1086/519795. PMID:17701901. Raymond, M., and Rousset, F. 1995. GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J. Hered. 86: 248–249. Reich, D., Thangaraj, K., Patterson, N., Price, A.L., and Singh, L. 2009. Reconstructing Indian population history. Nature, 461: 489–494. doi:10.1038/ nature08365. PMID:19779445. Rick, C.M. 1963. Barriers to interbreeding in Lycopersicon peruvianum. Evolution, 17: 216–232. doi:10.2307/2406467. Rick, C.M. 1979a. Biosystematic studies in Lycopersicon and closely related species of Solanum. In The biology and taxonomy of the Solanaceae. Edited by J.G. Hawkes, R.N. Lester, and A.D. Skelding. Linnean Soc. Symp. Ser. No. 7, Academic Press, NY. pp. 667–678. Rick, C.M.P. 1979b. Evolution of interspecific barriers in Lycopersicon. In Proceedings of the conference broadening the genetic base of crops. Wageningen, the Netherlands. pp. 283–286. Rick, C.M. 1982. The potential of exotic germplasm for tomato improvement. In Plant improvement and somatic cell genetics. Edited by I.K. Vasil, W.R. Scowcroft, and K.J. Frey. Academic Press, NY. pp. 1–28. Rick, C.M. 1983. Tomato (Lycopersicon). In Isozymes in plant genetics and breeding, Part B. Edited by S.D. Tanksley and T.J. Orton. Elsevier Science, Amsterdam. pp. 147–165. Rick, C.M. 1986. Reproductive isolation in the Lycopersicon peruvianum complex. In Solanaceae, biology and systematics. Edited by W.G. D’Arcy. Columbia University Press, NY. pp. 477–495. Rick, C.M. 1991. Tomato resources of South America reveal many genetic treasures. Diversity, 7: 54–56. Rick, C.M., and Fobes, J.F. 1974. Association of an allozyme with nematode resistance. Tomato Genet. Coop. Rep. 24: 25. Rick, C.M., Kesicki, E., Fobes, J.F., and Holle, M. 1976. Genetic and biosystematic studies on two new sibling species of Lycopersicon from interandean Peru. Theor. Appl. Genet. 47: 55–68. PMID:24414383. Rodriguez, F., Wu, F., Ané, C., Tanksley, S., and Spooner, D.M. 2009. Do potatoes and tomatoes have a single evolutionary history, and what proportion of the genome supports this history? BMC Evol. Biol. 9: 191. doi:10.1186/1471-2148-9191. PMID:19664206. Rosenberg, N.A. 2004. DISTRUCT: a program for the graphical display of population structure. Mol. Ecol. Notes, 4: 137–138. doi:10.1046/j.1471-8286.2003. 00566.x. Rousset, F. 2000. Genetic differentiation between individuals. J. Evol. Biol. 13: 58–62. doi:10.1046/j.1420-9101.2000.00137.x. Rousset, F. 2008. genepop’007: a complete re-implementation of the genepop software for Windows and Linux. Mol. Ecol. Resour. 8: 103–106. doi:10.1111/j. 1471-8286.2007.01931.x. PMID:21585727. Sacks, E.J., Gerhardt, L.M., Graham, E.B., Jacobs, J., Thorrup, T.A., and St. Clair, D.A. 1997. Variation among 41 genotypes of tomato (Lycopersicon esculentum Mill.) for crossability to L. peruvianum (L.) Mill. Ann. Bot. 80: 469– 477. Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., et al. 2011. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 39(Suppl. 1): D38–D51. doi:10.1093/nar/gkn741. Shirasawa, K., and Hirakawa, H. 2013. DNA marker applications to molecular genetics and genomics in tomato. Breed. Sci. 63: 21–30. doi:10.1270/jsbbs. 63.21. PMID:23641178. Spooner, D.M., Peralta, I.E., and Knapp, S. 2005. Comparison of AFLPs with other markers for phylogenetic inference in wild tomatoes [Solanum L section Lycopersicon (Mill.) Wettst. subsection Lycopersicon]. Taxon, 54: 43–61. Storz, J.F. 2005. Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol. Ecol. 14: 671–688. doi:10.1111/j.1365-294X.2005. 02437.x. PMID:15723660. Sumner, J., Rousset, F., Estoup, A., and Moritz, C. 2001. ‘Neighbourhood’ size, dispersal and density estimates in the prickly forest skink (Gnypetoscincus queenslandiae) using individual genetic and demographic methods. Mol. Ecol. 10: 1917–1927. doi:10.1046/j.0962-1083.2001.01337.x. PMID:11555236. Tanksley, S.D., and Rick, C.M. 1980. Isozymic gene linkage map of the tomato: Published by NRC Research Press

180

insights into fleshy fruit evolution. Nature, 485: 635–641. doi:10.1038/ nature11119. PMID:22660326. Vincent, H., Wiersema, J., Kell, S., Fielder, H., Dobbie, S., Castañeda-Álvarez, N.P., et al. 2013. A prioritized crop wild relative inventory to help underpin global food security. Biol. Conserv. 167: 265–275. doi:10.1016/j.biocon.2013.08.011. Xia, H., Camus-Kulandaivelu, L., Stephan, W., Tellier, A., and Zhang, Z. 2010. Nucleotide diversity patterns of local adaptation at drought-related candidate genes in wild tomatoes. Mol. Ecol. 19: 4144–4154. doi:10.1111/j.1365-294X. 2010.04762.x. Zuriaga, E., Blanca, J., and Nuez, F. 2009. Classification and phylogenetic relationships in Solanum section Lycopersicon based on AFLP and two nuclear gene sequences. Genet. Resour. Crop Evol. 56: 663–678. doi:10.1007/s10722-0089392-0.

Genome Downloaded from www.nrcresearchpress.com by UNR MATHEWSON IGT KNWLDG on 09/03/14 For personal use only.

applications in genetics and breeding. Theor. Appl. Genet. 58: 161–170. doi: 10.1007/BF00279708. PMID:24301284. Taylor, I.B., and Al-Kummer, M.K. 1982. The formation of complex hybrids between Lycopersicon esculentum and L. peruvianum, and their potential use in promoting interspecific gene transfer. Theor. Appl. Genet. 61: 59–63. doi:10. 1007/BF00261511. Tellier, A., Fischer, I., Merino, C., Xia, H., Camus-Kulandaivelu, L., Städler, T., and Stephan, W. 2011. Fitness effects of derived deleterious mutations in four closely related wild tomato species with spatial structure. Heredity, 107: 189–199. doi:10.1038/hdy.2010.175. PMID:21245893. The Tomato Genome Consortium. 2012. The tomato genome sequence provides

Genome Vol. 57, 2014

Published by NRC Research Press

Genetic structure of the four wild tomato species in the Solanum peruvianum s.l. species complex.

The most diverse wild tomato species Solanum peruvianum sensu lato (s.l.) has been reclassified into four separate species: Solanum peruvianum sensu s...
2MB Sizes 0 Downloads 3 Views