Mol Genet Genomics DOI 10.1007/s00438-014-0827-3

Original Paper

Linkage disequilibrium and population‑structure analysis among Capsicum annuum L. cultivars for use in association mapping Padma Nimmakayala · Venkata L. Abburi · Lavanya Abburi · Suresh Babu Alaparthi · Robert Cantrell · Minkyu Park · Doil Choi · Gerald Hankins · Sridhar Malkaram · Umesh K. Reddy 

Received: 13 November 2013 / Accepted: 11 February 2014 © Springer-Verlag Berlin Heidelberg 2014

Abstract  Knowledge of population structure and linkage disequilibrium among the worldwide collections of peppers currently classified as hot, mild, sweet and ornamental types is indispensable for applying association mapping and genomic selection to improve pepper. The current study aimed to resolve the genetic diversity and relatedness of Capsicum annuum germplasm by use of simple sequence repeat (SSR) loci across all chromosomes in samples collected in 2011 and 2012. The physical distance covered by the entire set of SSRs used was 2,265.9 Mb from the 3.48-Gb hot-pepper genome size. The model-based program STRUCTURE was used to infer five clusters, which was further confirmed by classical molecular-genetic diversity analysis. Mean heterozygosity of various loci was estimated to be 0.15. Linkage disequilibrium (LD) was used to identify 17 LD blocks across various chromosomes with sizes from 0.154 Kb to 126.28 Mb. CAMS-142 of chromosome 1 was significantly associated with both capsaicin P. Nimmakayala, V. L. Abburi and U. K. Reddy contributed equally to this work. Communicated by S. Hohmann. Electronic supplementary material  The online version of this article (doi:10.1007/s00438-014-0827-3) contains supplementary material, which is available to authorized users. P. Nimmakayala · V. L. Abburi · L. Abburi · S. B. Alaparthi · R. Cantrell · G. Hankins · S. Malkaram · U. K. Reddy (*)  Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV 25112‑1000, USA e-mail: [email protected] M. Park · D. Choi  Department of Plant Science, Plant Genomics and Breeding Institute, College of Agriculture and Life Sciences, Seoul National University, Seoul 151‑321, Republic of Korea

(CA) and dihydrocapsaicin (DCA) levels. Further, CAMS142 was located in an LD block of 98.18 Mb. CAMS-142 amplified bands of 244, 268, 283 and 326 bp. Alleles 268 and 283 bp had positive effects on both CA and DCA levels, with an average R2 of 12.15 % (CA) and 12.3 % (DCA). Eight markers from seven different chromosomes were significantly associated with fruit weight, contributing an average effect of 15 %. CAMS-199, HpmsE082 and CAMS-190 are the three major quantitative trait loci located on chromosomes 8, 9, and 10, respectively, and were associated with fruit weight in samples from both years of the study. This research demonstrates the effectiveness of using genome-wide SSR-based markers to assess features of LD and genetic diversity within C. annuum. Keywords  SSRs · Population structure · Association mapping · Capsaicin · Fruit weight

Introduction Capsicum annuum L. is a highland Mexican pepper type that has spread widely across the world and grown as sweet and hot pepper (Hernández-Verdugo et al. 2001). The wild progenitor is an erect plant with small red-colored fruits that are soft-fleshed and pungent (Hill et al. 2013). Christopher Columbus encountered Capsicum fruit during his travels to the Americas and named them “peppers” because of their spicy hot taste (Powis et al. 2013). Pickersgill (1997) noted that selection for different goals within domesticated C. annuum types might have fragmented the genetic diversity into narrow pools. Despite the great economic and cultural importance of C. annuum, the population structure of worldwide collections is little known (AguilarMeléndez et al. 2009). Genome-wide association studies

13



and genomic selection have been used in several crops with high-throughput single-nucleotide polymorphisms (SNPs); however, use of simple sequence repeat markers (SSRs) is comparatively superior in resolving population structure. In comparing analyses based on 89 SSRs (primarily dinucleotide repeats) to that based on 847 SNPs in individuals from the same 259 inbred maize lines, Hamblin et al. (2007) concluded that the SSRs performed better when clustering germplasm into populations than did the 847 SNPs or 554 SNP haplotypes. The authors compared heterozygosity levels and allele frequencies of SNPs and SSRs that had implications for their use in assessing relatedness and evaluation of genetic diversity. With recent advances in sequencing technologies, microsatellites can be selected from regular intervals covering the entire length of a genome. Genomewide SSR markers have been useful in genetic-diversity studies of diverse plant genotypes (Sonah et al. 2011; Arumugasundaram et al. 2011; Reyes-Valdés et al. 2013; Liu et al. 2013; Hao et al. 2011). The current study is to resolve molecular-diversity of C. annuum collections and population structure using genome-wide SSR markers in pepper and evaluate its use in an association-mapping study. Examining linkage disequilibrium (LD) distribution across the C. annuum genome can help in understanding the consequences of selection throughout its breeding history. Cultivars of C. annuum diverged with breeding methods meant to suit various cooking methods, food traditions and wide-ranging ecologies across the world. Understanding genome divergence and polymorphism related to fruit traits is a way to study genomic footprints of adaptation. An examination of population structure, diversity at the molecular level, LD distribution across various chromosomes and quantitative trait loci (QTL) estimated by using model-based association mapping can provide insights into the evolution of various traits among C. annuum cultivars. Despite several QTL studies of pepper (Ben Chaim et al. 2001, 2003; Paran 2003; Yarnes et al. 2012), the current research is unique in that it involved association mapping to identify markers for various fruit-related traits. Our association-mapping study aimed to identify genome-wide effects on FW and capsaiciates in C. annuum populations by using a set of microsatellites previously mapped to various chromosomes (Minamiyama et al. 2006; Yi et al. 2006; Sugita et al. 2013) and validated their position on the whole-genome sequence draft. Association mapping has been found effective for mining new markers and has been used with all major crops, including maize, rice, barley, tomato, wheat, sorghum, sugarcane, soybean, grape, and melon (Tomason et al. 2013; Zhang et al. 2011). For estimating Q and K, the multiallelic and codominant microsatellites can be useful because they are selectively neutral (Zhu et al. 2008). As compared with other marker systems such as SNPs, SSRs are relatively new

13

Mol Genet Genomics

alleles and show higher rates of mutation (Matsuoka et al. 2002). The current study aimed to resolve the population structure and relatedness of C. annuum germplasm with genome-wide SSRs and to estimate LD across the linkage groups for association mapping for FW and capsaiciates.

Materials and methods For molecular-diversity analysis, we included 96 accessions of C. annuum belonging to various countries representing a wide geographical area (Table S1). These selfed accessions were grown in three replications during two seasons (2011 and 2012) adapting a row-to-plant spacing of 100 × 30 cm. We grew ten plants per accession for each replication. We collected tissue from five plants for FW (mg) and harvested ten fruits per replication at the breaker stage that were oven-dried at 75 °C to grind into fine powder for estimating capsaicin (CA) and dihydrocapsaicin (DCA). Ground samples (100 mg) were mixed with 2 mL pure acetonitrile thoroughly on a vortex. The mixture was incubated at 50 °C for 1 h followed by 1-h sonication before centrifugation at 10,000 rpm for 15 min. The supernatant was filtered through a phenomenex 0.2-μm PTFE membrane filter (Torrance, CA, USA) before analysis. The sample was then transferred to high-performance liquid chromatography (HPLC) vials. The HPLC system was equipped with a 1525 binary HPLC pump, 2707 autosampler and 2998 photodiode array detector (Waters Corp., Milford, MA, USA). A 10-μL sample was injected. The mobile phases were acetonitrile and 2 % acetic acid at a flow rate of 0.6 mL/min. An X-Bridge C18 column (4.6 × 150 mm; 5 μm) coupled with a guard column (Waters Corp.) was used. CA and DCA levels were detected at 280 nm. Retention times were approximately 9.3 and 9.7 min for CA and DCA, respectively. Stock solutions of CA and DCA (Sigma-Aldrich) were prepared in acetonitrile for a linear standard curve ranging from 12.5 to 500 ppm. The integrated peak areas of CA and DCA in unknown samples were converted to parts per million by using an equation of the line describing the contribution of CA and DCA in the standard (R2 = 0.99).

Marker resources We used 176 single-locus SSRs mapped to various linkage groups of published maps (Minamiyama et al. 2006; Yi et al. 2006; Sugita et al. 2013). Chromosomal assignment and position on the physical map was deduced from the whole-genome sequence draft for hot pepper (CM334) that spanned 3.48 Gb (kindly provided by Drs. Park and Choi of Seoul National University). In all, 17 SSR markers were located in a physical distance of 172.08 Mb on

Mol Genet Genomics

chromosome 1 (Chr-1). In addition, we used 11 SSRs spanning a cumulative length of 117.08 Mb on Chr-2, 24 SSRs (235.56 Mb) on Chr-3, 15 SSRs (187.54 Mb) on Chr-4, 12 SSRs (185.11 Mb) on Chr-5, 4 SSRs (138.71 Mb) on Chr6, 11 SSRs (208.62 Mb) on Chr-7, 27 SSRs (171.7 Mb) on Chr-8, 12 SSRs (231.15 Mb) on Chr-9, 9 SSRs (216.12 Mb) on Chr-10, 10 SSRs (211.70 Mb) on Chr-11 and 7 SSRs (190.48 Mb) on Chr-12.

DNA isolation and PCR Genomic DNA isolation involved the plant DNA isolation kit (QIAGEN cat# 69104). PCR reaction consisted of 50 ng genomic DNA, 0.20 μM mixed forward and reverse primers, 1× buffer (10 mM Tris–HCl, pH 8.2, 50 mM KCl, Triton 0.1 %, BSA 1 mg/mL), 1.5 mM MgCl2, 0.2 mM dNTPs and 1 U Taq polymerase in 10-μL reaction volumes. Amplification involved a GeneAmp PCR 9700 System thermal cycler (Applied Biosystems Inc.) programmed at 94 °C for 2 min followed by 35 cycles of 94 °C for 30 s, 50–65 °C for 30 s, 72 °C for 1 min, and a final extension step at 72 °C for 10 min. Amplified products were separated on a high-throughput DNA fragment analyzer (AdvanCE FS; Advanced Analytical Technologies, Inc., Ames, IA). Amplified PCR products were diluted to 1:11 times depending on the concentration of products, dilution and injection voltage adjusted to prevent overloading of the PCR product on the fragment analyzer. PCR product of 2 μL was pipetted into 22 μL of 1× TE dilution buffer in plate wells. Samples were size-separated by using a 96-capillary automated system with capillaries of 80 cm. Polymer and other required reagents were from the doublestranded DNA kit DNF-900 (Advanced Analytical Technologies, Inc.), which can effectively separate the amplicon ranges between 35 and 500 bp and resolve 1-bp differences between various alleles. Following capillary electrophoresis, data were processed by use of PRO Size 2.0 (Advanced Analytical Technologies, Inc.). The data were normalized to 35-bp lower marker and 500-bp upper marker and calibrated to the 75- to 400-bp range.

Chain Monte Carlo generations and 100,000 generations of burn-in were used for estimating each value of K. The optimal K value was determined by an ad hoc statistic, ΔK (Evanno et al. 2005). The number of Ks in each dataset was evaluated by using ΔK values estimated with the software STRUCTURE Harvester. CLUMPP 1.1 (Jakobsson and Rosenberg 2007) was used to find optimal alignments of independent runs and the output was used directly as input into DISTRUCT 1.1, a program for cluster visualization (Rosenberg 2004). Genetic distance estimates were calculated as described (Crossa and Franco (2004). The neighbor-joining (NJ) algorithm was used to build a dendogram based on genetic distances with MEGA var. 5 (Tamura et al. 2011). The genetic variance was partitioned between and among the pepper groups identified by molecular-diversity and population-structure analyses by analysis of molecular variance (AMOVA) with Arlequin 2.0 (Schneider et al. 2000). Molecular-genetic diversity was estimated by genetic diversity and heterozygosity (h). FIS and FST were estimated by Wright’s F-statistics (Weir and Cockerham 1984) with PopGene 1.31 (Yeh and Boyle 1999). Analysis of gene flow (Nm) based on the private allele method (Slatkin 1985) was estimated with Genepop 4.1 (Rousset 2008). LD was estimated as the r2 between all pairs of SSRs with 1,000 permutations with TASSEL 2.2 (http://www .maizegenetics.net) (Bradbury et al. 2007). To visualize LD throughout the genome, we produced heatmaps based on P values for pair-wise r2 estimates for all marker pairs. These heatmaps were used to identify LD blocks in various pepper chromosomes at specific map locations. A marker set was used to estimate the kinship (K) matrix with TASSEL 2.1, which involves the proportion of alleles shared between each pair of accessions in the study. The Q matrix was adapted from K-5 cluster results obtained in structure analysis. GLM and MLM models of TASSEL 3.0 were used for association mapping. A false discovery rate (FDR) of 0.05 was used as a threshold for significant association with Bonferroni correction (Sidak 1967) or at a moderate threshold as described (Benjamini and Hochberg 1995).

Results Statistical analysis Data from all three replications for CA and DCA were log-10-transformed before ANOVA. Data for FW, CA and DCA were analyzed by ANOVA. Structure 2.2 (Pritchard et al. 2000), which involves posterior probability of the data for a given K, Pr(X|K), was used to cluster individuals into number of population (K) groups. K was determined following the admixture model with correlated alleles, with a K of 2–10. Five independent runs of 100,000 Markov

We recorded FW for two seasons and CA and DCA levels for 96 C. annuum collections. FW and CA and DCA levels significantly differed among the collections. The correlation of FW across the two seasons was estimated at 0.92. The correlation of CA and DCA levels across the seasons was 0.755 and 0.746, respectively. The correlation between CA and DCA for 2011 and 2012 was 0.979 and 0.88, respectively. We found a significantly high negative correlation between FW and total capsaicinoids (r2 = −0.394).

13



Mean FW ranged from 111.26 ± 3.78 g in Peperone Quarrato Red to 0.33 g in Bird’s Eye Baby during both 2011 and 2012; mean FW was 104.7 ± 4.47 g for Peperone Quarrato Red and 0.18 ± 0.02 for Tepin, another wild accession. Mean CA level was highest for Prikkinu (933.19 ± 95.59) and lowest for Anthoni Romanian (0.04 ± 0.037). Mean DCA level was highest for Tepin (568.81 ± 11.60) and lowest for Yankee Bell (0.027 ± 0.02). In total, 118 polymorphic SSR loci amplified 366 scorable alleles among 96 diverse C. annuum types. Mean heterozygosity of polymorphic loci was 0.15. Five alleles amplified by CAMS-826 represented the highest Shannon index (SI, an indicator of the polymorphic information content in diverse accessions) of 1.73. The mean number of alleles was 2.1 with an average SI of 0.54. Observed and expected heterozygosity and SI for all the loci in the study are in Table S3. Molecular-genetic diversity study revealed that five main clusters in the C. annuum collections. Molecular-diversity indices were used to construct an NJ tree (Fig. 1). Genetic divergence analysis revealed that both clusters I and II had pungent types but the highest divergence, 30 %, because wild- to semi-wild collections were in cluster II. A divergence of 24 % was noted between clusters II and V, predominantly sweet peppers. Medium

Fig. 1  Neighbor-joining (NJ) tree of the 96 Capsicum annuum accessions constructed with SSR genotypes

13

Mol Genet Genomics

pungent types grouped into two clusters (III and IV), with a narrow diversity of 3 %. STRUCTURE was used to infer K2 to K10 clusters. The K-5 cluster was the most appropriate because it produced the highest ΔK distribution (9.45) and the highest Ln′(K) (616.5) (Fig. 2). Hot pungent C. annuum types were grouped into clusters I and II as resolved by STRUCTURE and the NJ tree. Clusters III and V obtained from STRUCTURE analysis contained moderately pungent accessions. Accessions of cluster IV with the NJ tree were largely sweet and bell peppers but were in cluster IV with STRUCTURE analysis. Because our research resolved the diverse C. annuum collections into five main groups, we further sought the support of population genetics. AMOVA revealed highly significantly different variance within and between the five groups (Table 1). Gene flow estimates (Nm) and distributions of FST and FIS are in Table S2. In total, for 17 loci, FST was from 0.4 to 0.69, so these markers contributed to most of the divergence among the five groups. FIS was >0.9 for 73 loci, so these loci are fixed. Higher Nm estimates indicate increased gene flow, and the loci with low Nm estimates were the private alleles for various groups in the study. For HpmsE068, the Nm was 55.63, which indicates that this locus is highly migratory across the groups, followed by CAMS-089 (18.59), CAMS-888 (18.41), HpmsE067 (16.88), CAMS-153 (15.88), HpmsE091 (14.09), HpmsE020 (13.66), Hpms 2-23 (12.23) and CAMS-301 (12.06). In total, for 67 loci, Nm estimates were

Linkage disequilibrium and population-structure analysis among Capsicum annuum L. cultivars for use in association mapping.

Knowledge of population structure and linkage disequilibrium among the worldwide collections of peppers currently classified as hot, mild, sweet and o...
616KB Sizes 0 Downloads 3 Views