Mol Genet Genomics DOI 10.1007/s00438-015-1017-7

ORIGINAL PAPER

Genome‑wide patterns of genetic variation among silkworms Xingtan Zhang · Mengyun Nie · Qian Zhao · Yuqian Wu · Genhong Wang · Qingyou Xia 

Received: 17 September 2014 / Accepted: 20 February 2015 © Springer-Verlag Berlin Heidelberg 2015

Abstract  Although the draft genome sequence of silkworm is available for a decade, its genetic variations, especially structural variations, are far from well explored. In this study, we identified 1,298,659 SNPs and 9,731 indels, of which 32 % of SNPs and 92.2 % of indels were novel compared to previous silkworm re-sequencing analysis. In addition, we applied a read depth-based approach to investigate copy number variations among 21 silkworm strains at genome-wide level. This effort resulted in 562 duplicated and 41 deleted CNV regions, and among them 442 CNV were newly identified. Functional annotation of genes affected by these genetic variations reveal that these genes include a wide spectrum of molecular functions, such as immunity and drug detoxification, which are important for the adaptive evolution of silkworms. We further validated the predicted CNV regions using q-PCR. 94.7 % (36/38) of the selected regions show divergent copy numbers compared to a single-copy gene OR2. In addition, potential presence/absence variations are also observed in our study: Communicated by S. Hohmann. Electronic supplementary material  The online version of this article (doi:10.1007/s00438-015-1017-7) contains supplementary material, which is available to authorized users. X. Zhang · M. Nie · Y. Wu · G. Wang · Q. Xia (*)  State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China e-mail: [email protected] X. Zhang · Y. Wu  School of Life Sciences, Chongqing University, Chongqing 400044, China Q. Zhao  Department of Biology, Syracuse University, Syracuse 13210, NY, USA

11 genes are present in the reference genome, but absent in other strains. Overall, we draw an integrative map of silkworm genetic variation at genome-wide level. The identification of genetic variations in this study improves our understanding that these variants play important roles in shaping phenotypic variations between wild and domesticated silkworms. Keywords  Genetic variation · SNPs · Indels · CNVs · PAVs · Silkworm

Introduction As the fundamental level of biodiversity, genetic variations deserve more attention. It has been well documented that lower genetic variation depressed individual fitness, resistance to disease and parasites, and flexibility in coping with environmental challenges (Lacy 1997). In a small isolated population of organisms, individuals are forced to breed with close relatives. When this happens, the genetic features of the individuals become more and more uniform, and genetic flaws will be increasingly accumulated. By contrast, when a population of an organism contains a large gene pool, the group has a greater opportunity of surviving and flourishing than a population with limited genetic variability. Awareness of the importance of genetic variation resulted in extensive researches on this field. In postgenomic era, large efforts have been made to investigate genetic variations at genome-wide level among individuals or between populations, since it is believed that such genomic variations contribute to phenotypic variation (Lai et al. 2010; Zheng et al. 2011; Horton et al. 2012). In the broadest sense, variants in genome can be divided into two different nucleotide composition classes: single

13



nucleotide variants and structural variants (Eichler et al. 2007). The most prevalent class of genetic variation is single nucleotide polymorphisms (SNPs), and its functional importance has been debated by hundreds of publications during the last decade. One of the most impressive works based on SNPs analysis is genome-wide association study (GWAS). GWAS is a recently developed technology to examine whether the common genetic variants are associated with traits of interest. By interrogating the entire genome for associations between common gene variants (SNPs) and a phenotype, GWAS contributes significantly to the understanding of the complex biological pathways and networks. For instance, human height is a highly heritable trait and serves as an important model for quantitative genetics. By projecting hundreds of thousands of genetic markers genotyped on a total ~63,000 people, 54 loci were identified affecting height variation (Gudbjartsson et al. 2008; Lettre et al. 2008; Weedon et al. 2008). Another application of SNP data is to understand the origin of domestic animals and cultivated crops. Using population SNP data, recent re-sequencing studies cast light on the genetic basis of domestication and identified artificial selected genomic regions which harbored economically important genes (Xia et al. 2009; Lai et al. 2010; Rubin et al. 2010; Zheng et al. 2011; Xu et al. 2012; Mace et al. 2013). Whole-genome re-sequencing of eight chicken lines detected some selective sweeps underlying important economic traits. One of the most striking ones found in all domesticated chickens occurred at the locus for thyroid stimulating hormone receptor (TSHR), which has a pivotal role in metabolic regulation and photoperiod control of reproduction in vertebrates (Rubin et al. 2010). In addition to single nucleotide variants, recent reports are making considerable advances on structure alteration, such as copy number variations (CNVs). This type of variants is expected to play vital roles in shaping phenotypic variations (Springer et al. 2009; Conrad et al. 2010). For example, autism spectrum disorders (ASDs) are characterized by language impairments, social deficits, and repetitive behaviors. The disorders are probably caused by copy number variations in humans. Investigation of 165 families affected by autism and 99 normal families demonstrated that a high frequency of de novo CNVs in ASD patients is a risk factor for autism (Sebat et al. 2007). CNVs also comprise a significant part of the genomes in domesticated species, indicating that these structure polymorphisms provide raw materials for artificial selection (Clop et al. 2012). A typical example is that domestication generated significant diversity of copy number of CBP gene in domesticated silkworm. This gene serves as a major determinant of cocoon color (Sakudoh et al. 2011). Another example that demonstrated the importance of CNVs in shaping phenotype is the appearance of pea comb in chicken. A CNV

13

Mol Genet Genomics

in chicken with pea comb phenotype disturbed the action of neighboring conserved regulatory elements and thus increased SOX5 mRNA expression in the developing comb (Wright et al. 2009). The domesticated silkworm (Bombyx mori) serves as a research model organism due to its miniature size and interesting developmental processes; it is also an economically important insect, being a primary producer of silk. In addition, the special character of the rapidly synthesizing silk proteins allows this insect to be a bioreactor for the production of recombinant protein for biomedical and biological applications (Xia et al. 2014). Pioneer studies have characterized the SNP/indels in 40 silkworms (Xia et al. 2009). However, it is estimated that more than 1,000 Bombyx mori inbred and mutant strains are kept all over the world, suggesting that the genetic variations are only addressed in a very small proportion by now. In this study, we used newly released resequencing data (Zhao et al. 2014) of four silkworm genomes (two wild strains: AK and NC; two domestic strains: N4 and XF) to identify whole-genome patterns of SNPs, indels and PAVs. For CNV identification, we applied the mrsFASTbased method, which was powerful to accurately predict the boundaries of duplicated CNVs. To generate a larger CNV set, we recruited 21 re-sequenced silkworm data whose raw coverage was larger than 3.5× from two researches (Xia et al. 2009; Zhao et al. 2014). This study improves our understanding that genetic variations shape phenotypic variations between wild and domesticated silkworms.

Results SNPs and indels in silkworm genome We detected SNPs and indel polymorphisms (IDPs) in nonrepeat regions. SNPs were called with SOAPsnp (Li et al. 2009) using a conservative quality filter pipeline described before (Wang et al. 2008). Totally, we identified 1,298,659 potential SNPs in non-repeat regions, of which 629,799 were located in the genic regions (Fig. 1). In addition, 9,731 potential indels ranging from 1 to 5 bp in length were identified, of which 4,750 had potential effect affecting genic regions. The numbers of genic SNPs/indels identified as coding, intronic or UTR are listed in Table 1. On comparing our results to previous study (Xia et al. 2009), 412,952 (~32 %) SNPs and 8,972 (~92.2 %) indels in this study are first identified. In addition, we compared our SNP/indel data between wild and domesticated silkworms. Only ~200,000 (15.4 %) polymorphisms (including SNPs and indels) are harbored simultaneously by the two sets, suggesting that domesticated strains are clearly genetically differentiated from the wild ones (Supplementary Fig. 1).

20

15

5

40

chr2

35

2

5

3

20

3

ch

15

r4

15

25

r ch

chr

10

30

Un

5

55

45

chr1

10

5

65

60

50

1

10

Mol Genet Genomics

4

10 5

5

10

ch

ch

5

27 chr

15

Chromosome

2

Gene Density

3

SNP/Indel

chr25

N4

5

5 10 15

NC XF

chr24

10

AK

10

chr8

5

5

4

chr9

10

1

chr7

chr2 6

10 5

10

6 chr

10

5

15

r28

5

r5

10

CNVs

20

15

WILD STRAINS chr1

0

DOMESTICATED STRAINS

3

10

chr2

15

5 10

5

5

10

15

ch

r22

10

5

r11

ch

15

5

10

15

ch

r1

1

2

r2

ch 10

5

ch 5

0

3

10

5

10

15

5

10

5

10

15

5

10

5

chr16

10

chr17

chr15

5

8

15

14

chr

chr1

10

r19

5

r1

ch

ch

15

r2

10

5

10

Fig. 1  Genome-wide landscape of genetic variations in silkworm. 28 chromosomes were reconstructed by joining the corresponding scaffolds. ChrUn represents all the scaffolds that cannot be anchored to 28 chromosomes. The gene density of chromosomes is visualized by the green

lines; the regions with high gene density are marked by red color. SNPs/ indels and CNV regions in wild and domesticated strains are showed along the chromosomes. In the CNV ring, dark blue and black represent duplicated and deleted CNVs, respectively (Color figure online)

We focused on annotation of large-effect SNPs/indels, which were predicted to have potentially disabling effects on gene function: 83 SNPs were expected to induce premature stop codons, 76 were expected to alter initiation methionine residues, 143 were expected to disrupt splicing donor or acceptor sites and 60 indels in coding regions were predicted to induce frame-shifts. In addition, 76 SNPs removed annotated stop codons, resulting in longer open

reading frames (Supplementary Table 1). These largeeffect SNPs and indels had a wide range of influence on silkworms. Enrichment analysis showed that the genes with large-effect SNPs/indels were significantly enriched (p value =10

B

100

112

150

200

250

300

286 350

CNV Regions

Fig. 3  CNV features among domesticated and wild silkworms. a Silkworm genome assembly is represented by black bars, and an integrative CNVs map along the black bars is indicated by red rectangles. The blue colors along the black bars represent previously identified segmental duplication. Tracks underneath the chromosomes represent the CNV datasets (in order from top to bottom) for DZ, AK, N4, NC and XF. The colors for each bar in the dataset tracks represent the average estimated CN for each CNV as shown in the legend. b The histogram shows shared and unique CNV regions in the analyzed strains. The x-axis represents the numbers of CNV regions that are classified in each group. The y-axis represents the CNV regions shared by the number of strains. 286 CNV regions are unique CNVs that only appear in one specific strain (Color figure online)

ChrUn

Chr28

Chr27

Chr26

Chr25

Chr24

Chr23

Chr22

Chr21

Chr20

Chr19

Chr18

Chr17

Chr16

Chr15

Chr14

Chr13

Chr12

Chr11

Chr10

Chr9

Chr8

Chr7

Chr6

Chr5

Chr4

Chr3

Mol Genet Genomics

Mol Genet Genomics

Genes overlapping with CNVs Based on silkworm GLEAN gene annotation, we detected 1,280 genes overlapping with duplicated CNV regions and assigned a copy number (CN) estimate to each gene (Supplementary Table 3). We annotated these genes using Blast2GO (Supplementary Table 3). The functional genes overlapping with CNVs include a wide spectrum of molecular functions, such as immunity, drug detoxification, molecular interaction, signal recognition and so on. This analysis also provides a rich resource for investigating genes that may play important roles in the lineage-specific adaptation to particular environment, for example, genes involved in immunity and molecular interaction, such as antibacterial peptide and zinc finger protein. Furthermore, genes involved in drug detoxification, such as cytochrome p450 and oxidase peroxidase, are detected in these CNVaffected genes as well. It is worth noting that two lepidopteron-specific gene families (GMC and Lipoprotein 11) are found in the CNV regions, both of which show functional divergence in the developmental and physiological

100

Percent of genes

Domesticated

242 192

Wild

10

2 1

1 0.1 0.01

c cell p ell e e xtrac nveloart ma e mem cromolec llular reg pe bran u e-en lar compion close d lumlex orga en n orga nelle elle antio part xida bindi nt ca ng en mole zyme re talytic gul cu struc lar transd ator t trans ural m ucer cri ole trans ption reg cule l a tion r ulator anat egula omic to al str uctur transport r biolo e form er cellu bio gical ad ation cellu lar comp logical re hesion lar co g o mpo nent bio ulation nent g orga enesis niza c e d estab evelopm llular pro tion ce enta lishm ent o l proc ss es f loca lizati s multi on l o c m cellu a lar o etabolic lization rgan p isma rocess l pr resp pigme ocess onse n to sti tation mulu s

Fig. 4  Annotation of genes overlapping with duplicated CNVs. WEGO annotation of CNV-affected genes for domesticated and wild silkworms

processes (Sun et al. 2012; Zhang et al. 2012). These results suggest that CNVs might be responsible for gene duplication and gene family expansion. Other genes related with important functions, such as signal transduction and transmembrane proteins, are also observed in our data. We expected that the CNV-affected genes contributed to the phenotypic differentiation between wild and domesticated silkworms. For this purpose, we used reference gene set as control and identified the shared variation genes in wild silkworms and those in domesticated silkworms. Subsequently, the two sets of data were compared to remove those genes that were present in both wild and domesticated silkworms, and the remaining gene sets were considered as those differentiating genes affected by CNVs between wild and domesticated silkworms. We compared the functional annotation between the wild and domesticated gene sets in WEGO (Ye et al. 2006) and observed that obvious differences presented in multiple functional GO terms (Fig. 4). Briefly, domesticated silkworms harbored more genes involved in cellular component biogenesis, molecular transduction, multicellular organismal process and response to stimulus than wild silkworms. By contrast, wild silkworms presented more genes involved in molecular function and biological processes, such as enzyme regulation and biological adhesion. In addition, we also detected 59 genes that overlapped with deleted CNVs. We annotated these genes using Blast2GO (Conesa et al. 2005) and observed that these genes were significantly enriched (p value 30 % and identity >80 % (Xu et al. 2012). Second, some of these potential contigs might exist in reference genome, but were not included in the current assembly. To filter such false-positive contigs, we remapped Sanger reads that were used to assemble the reference genome to these potential contigs using megablast (Morgulis et al. 2008) and removed the contigs with any effective hit (effective hits were considered if they represented 90 % of the reads with a rescored similarity of >94 %). After that, we conducted de novo gene annotation with AUGUSTUS (Stanke and Morgenstern 2005) and tried to exclude genes that have homologs in other species rather than animals by searching against the NR database, because these genes were likely from contamination. Consequently, only one new gene was left and showed homology with phosphocarrier protein HPr-like (XP_004531716) in Ceratitis capitata (data not shown). In addition, we also identified genes that were present in the reference genome, but absent in other strains. 11 potential PAV genes were detected (Supplementary Table 5), according to our strict criteria (see methods). In our results, XF has the highest frequency of gene lost events, six genes are absent in XF, while only one gene lost event is detected in N4. Moreover, we extracted the annotation information from silkworm database for these genes, of which four showed evidence that they had important functions in cells (Supplementary Table 5). For example, gene BGIBMGA004526 has transporter activity and is a predicted sugar transporter.

Discussion Having benefited from next-generation-sequencing technologies and development of bioinformatics tools, we are able to unravel the tremendous genetic variations and diversity at the genome scale. We report here an integrative map of silkworm genetic variations, including SNPs, indels, CNVs and PAVs. These potential SNPs and indels will be useful as genetic markers of genome variation for silkworm breeding. Identification of CNV regions in silkworm genome improves our understanding that these genetic variations play important roles in shaping the phenotypic variations between wild and domesticated silkworms. Furthermore, the functions of genes affected by these genetic variations are investigated using Blast2GO. These genes are enriched in a wide range of functions, of which items that are expected to divert wild and domesticated silkworms are detected, such as immunity and response to stimulus. In addition, 11 genes show evidence that they are possibly present in the reference genome, but absent in other silkworm strains. Our analysis provides a valuable resource for understanding phenotypic variations between wild and domesticated silkworm as well as improving silkworm breeding. Copy number variations among the four accessions of silkworm have been discussed using the same data (Zhao et al. 2014). However, it is well documented that different CNV-detection algorithms may give drastically different results and each method has its own advantages and weaknesses (Mills et al. 2011; Duan et al. 2013; Zhao et al. 2013a). For example, ReadDepth is expected to have good performance in identifying large CNV regions; however, it is not applicable for the detection of precise breakpoints. In this study, we applied a different CNV-detection method, which integrated the mrsFAST mapping algorithm and whole-genome shotgun sequence detection (WSSD) method. Our method has several advantages. First, this method is able to construct personalized CNV maps in or near SD regions by considering alternative mapping positions for reads aligning to repetitive regions, whereas other read depth-based methods report only one mapping location per read. Second, mrsFAST is more appropriate for detecting CNV in duplication and repeat-rich regions, since CNVs are often found in or near duplicated regions in the genome (Marques-Bonet et al. 2009a). Third, this method is powerful to accurately predict the boundaries of duplicated sequences by implementing a step that refines large CNVs (Alkan et al. 2009; Bickhart et al. 2012). In addition, we validated our CNV results using de novo assembly based CNV-detection algorithm and q-PCR analysis. Both of the validations show that our RD-based CNV-detection method is able to detect high-quality CNVs in or near duplicated regions in the genome.

13



Methods

Mol Genet Genomics

To obtain confident CNV sets for 21 silkworm strains, four steps were applied.

SNPs and indels detection To identify potential SNPs in non-repeat regions, we first applied SOAP2 (Li et al. 2008) and aligned reads to a slightly masked silkworm genome version. Then, a three-step procedure was used to detect the SNPs. (i) We calculated the likelihood of each strain’s genotype using SOAPsnp (Li et al. 2009). Based on the alignment results, with consideration and analysis of data characteristics, sequencing quality and other experimental influences, the Bayesian model was applied to the actual data to calculate the probability of genotypes. The genotype with the highest probability was selected for further analysis. (ii) We filtered the sites that were consistent with reference bases in the consensus sequences. (iii) On the basis of the resequencing data of the four silkworm strains, sites with sufficient quality, called effective sites, were used for SNP determination. Sufficient quality was based on the following criteria: 5 ≤ depth ≤ 60, with depth calculated using data from each individual and average mappable sites 20. To exclude SNP calling errors caused by incorrect mapping or indels, we did not call two adjacent SNPs that were separated by  mean + 4 × STDEV; deletions  mean + 3 × STDEV), we added them into the larger CNV calls. Otherwise, we chopped those small

Mol Genet Genomics

windows out of the larger CNV calls. Finally, absolute copy number was estimated within 1-kbp non-overlapping windows for all the five genomes. 4. Filter CNV regions: Since this method worked well for duplication intervals greater than 20-kbp (MarquesBonet et al. 2009b), we only retained the CNV regions longer than 20 kbp for further analysis. In addition, we removed short-read mapping artifacts following the steps reported previously (Alkan et al. 2009). The reference genome assembly was first broken into 36-bp reads (sliding size 1-bp) and remapped back to the same reference genome with mrsFAST. CNV regions were called using the same steps we did for real data. Comparing these predictions with public silkworm segmental duplications (Zhao et al. 2014), we classified any intervals as artifacts if they did not agree with the known duplication set. A total of 1.56 Mbp of artifact regions was removed from our CNV call sets. We also applied a de novo assembly algorithm to call CNVs. Re-sequencing data were first assembled by ABySS version 1.5.2 (Simpson et al. 2009) with k-mer equal to 61. The rest of the parameters were left as default. Then, Magnolya (Nijkamp et al. 2012) was used to fit the Poisson mixture model and determine copy numbers for the assembled contigs. The contigs with copy numbers at least 2 were considered as CNV contigs. Q‑PCR validation of CNV regions Q-PCR analysis was used to validate predicted duplication CNVs. All the primers used in this study were designed by Primer 5.0 and listed in Supplementary Table 6. Briefly, each PCR reaction was prepared according to the recommended protocol and carried out using ABI Stepone Plus system. Copy numbers were analyzed according to comparative Ct method. The ∆Ct and ∆∆Ct were calculated by the formulas ∆Ct  = Cttarget  − Ctcontrol (single copy) and ∆∆Ct  =  ∆Ctcnv  −  ∆Ctsingle_copy, respectively. Since q-PCR is a sensitive technology, we set up a criterion that any region with at least 1.5-fold changes compared to OR2 is considered to be effective copy number variation in this study. Identification of potential gene loss events To identify potential gene loss events, we first extracted genes with 90 % coverage in the reference genome. Then, we added paired-end reads to support the gene loss events. As the insert size of these libraries is 500 bp, reads that mapped to the reference genome with obvious longer insert sizes indicated a possible deletion in between.

The candidate lost genes with at least two such uniquely mapped spite reads were finally identified as lost genes. Acknowledgments  The research was supported by the National Basic Research Program of China (No. 2012CB114600) and Fundamental Research Funds for the Central Universities (No. XDJK2013C043).

References Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41:1061–1067 Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE (2002) Recent segmental duplications in the human genome. Science 297:1003–1007 Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580 Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, Van Tassell CP, Sonstegard TS, Eichler EE, Liu GE (2012) Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res 22:778–790 Cingolani P, Platts A, le Wang L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly Austin 6:80–92 Clop A, Vidal O, Amills M (2012) Copy number variation in the genomes of domestic animals. Anim Genet 43:503–517 Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676 Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control C, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712 Duan J, Zhang JG, Deng HW, Wang YP (2013) Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PLoS One 8:e59128 Eichler EE, Nickerson DA, Altshuler D, Bowcock AM, Brooks LD, Carter NP, Church DM, Felsenfeld A, Guyer M, Lee C, Lupski JR, Mullikin JC, Pritchard JK, Sebat J, Sherry ST, Smith D, Valle D, Waterston RH (2007) Completing the map of human genetic variation. Nature 447:161–165 Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, Zusmanovich P, Sulem P, Thorlacius S, Gylfason A, Steinberg S, Helgadottir A, Ingason A, Steinthorsdottir V, Olafsdottir EJ, Olafsdottir GH, Jonsson T, Borch-Johnsen K, Hansen T, Andersen G, Jorgensen T, Pedersen O, Aben KK, Witjes JA, Swinkels DW, den Heijer M, Franke B, Verbeek AL, Becker DM, Yanek LR, Becker LC, Tryggvadottir L, Rafnar T, Gulcher J, Kiemeney LA, Kong A, Thorsteinsdottir U, Stefansson K (2008) Many sequence variants affecting diversity of adult human height. Nat Genet 40:609–615

13

Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler EE, Sahinalp SC (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7:576–577 Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjalmsson BJ, Nordborg M, Borevitz JO, Bergelson J (2012) Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet 44:212–216 Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951 Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240 Krieger J, Klink O, Mohl C, Raming K, Breer H (2003) A candidate olfactory receptor subtype highly conserved across different insect orders. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 189:519–526 Lacy RC (1997) Importance of genetic variation to the viability of mammalian populations. J Mammal 78:320 Lai J, Li R, Xu X, Jin W, Xu M, Zhao H, Xiang Z, Song W, Ying K, Zhang M, Jiao Y, Ni P, Zhang J, Li D, Guo X, Ye K, Jian M, Wang B, Zheng H, Liang H, Zhang X, Wang S, Chen S, Li J, Fu Y, Springer NM, Yang H, Wang J, Dai J, Schnable PS, Wang J (2010) Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet 42:1027–1030 Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, Eyheramendy S, Voight BF, Butler JL, Guiducci C, Illig T, Hackett R, Heid IM, Jacobs KB, Lyssenko V, Uda M, Diabetes Genetics I, Fusion Kora, Prostate LC, Ovarian Cancer Screening T, Nurses’ Health S, SardiNia Boehnke M, Chanock SJ, Groop LC, Hu FB, Isomaa B, Kraft P, Peltonen L, Salomaa V, Schlessinger D, Hunter DJ, Hayes RB, Abecasis GR, Wichmann HE, Mohlke KL, Hirschhorn JN (2008) Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 40:584–591 Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714 Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132 Li S, Li R, Li H, Lu J, Li Y, Bolund L, Schierup MH, Wang J (2013) SOAPindel: efficient identification of indels from short paired reads. Genome Res 23:195–200 Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18 Mace ES, Tai S, Gilding EK, Li Y, Prentis PJ, Bian L, Campbell BC, Hu W, Innes DJ, Han X, Cruickshank A, Dai C, Frere C, Zhang H, Hunt CH, Wang X, Shatte T, Wang M, Su Z, Li J, Lin X, Godwin ID, Jordan DR, Wang J (2013) Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum. Nat Commun 4:2320 Marques-Bonet T, Girirajan S, Eichler EE (2009a) The origins and impact of primate segmental duplications. Trends Genet 25:443–454 Marques-Bonet T, Kidd JM, Ventura M, Graves TA, Cheng Z, Hillier LW, Jiang Z, Baker C, Malfavon-Borja R, Fulton LA, Alkan C, Aksay G, Girirajan S, Siswara P, Chen L, Cardone MF, Navarro A, Mardis ER, Wilson RK, Eichler EE (2009b) A burst of

13

Mol Genet Genomics segmental duplications in the genome of the African great ape ancestor. Nature 457:877–881 Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HY, Leng J, Li R, Li Y, Lin CY, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stutz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, Korbel JO, Genomes P (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470:59–65 Morgulis A, Gertz EM, Schaffer AA, Agarwala R (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22:134–141 Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schaffer AA (2008) Database indexing for production MegaBLAST searches. Bioinformatics 24:1757–1764 Nakagawa T, Sakurai T, Nishioka T, Touhara K (2005) Insect sexpheromone signals mediated by specific combinations of olfactory receptors. Science 307:1638–1642 Nijkamp JF, van den Broek MA, Geertman JM, Reinders MJ, Daran JM, de Ridder D (2012) De novo detection of copy number variation by co-assembly. Bioinformatics 28:3195–3202 Rubin CJ, Zody MC, Eriksson J, Meadows JR, Sherwood E, Webster MT, Jiang L, Ingman M, Sharpe T, Ka S, Hallbook F, Besnier F, Carlborg O, Bed’hom B, Tixier-Boichard M, Jensen P, Siegel P, Lindblad-Toh K, Andersson L (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464:587–591 Sakudoh T, Nakashima T, Kuroki Y, Fujiyama A, Kohara Y, Honda N, Fujimoto H, Shimada T, Nakagaki M, Banno Y, Tsuchida K (2011) Diversity in copy number and structure of a silkworm morphogenetic gene as a result of domestication. Genetics 187:965–976 Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, Leotta A, Pai D, Zhang R, Lee YH, Hicks J, Spence SJ, Lee AT, Puura K, Lehtimaki T, Ledbetter D, Gregersen PK, Bregman J, Sutcliffe JS, Jobanputra V, Chung W, Warburton D, King MC, Skuse D, Geschwind DH, Gilliam TC, Ye K, Wigler M (2007) Strong association of de novo copy number mutations with autism. Science 316:445–449 Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123 Smit AFA, Hubley R, Green P (1996) RepeatMasker Open-3.0. http:// www.repeatmasker.org. Accessed 15 Jan 2014 Springer NM, Ying K, Fu Y, Ji T, Yeh CT, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, Iniguez AL, Barbazuk WB, Jeddeloh JA, Nettleton D, Schnable PS (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5:e1000734 Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33:W465–W467 Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Genomes P, Eichler EE (2010) Diversity of human copy number variation and multicopy genes. Science 330:641–646 Sun W, Shen YH, Yang WJ, Cao YF, Xiang ZH, Zhang Z (2012) Expansion of the silkworm GMC oxidoreductase genes is associated with immunity. Insect Biochem Mol Biol 42:935–945

Mol Genet Genomics Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, Blayney ML, Beck S, Hurles ME (2008) Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat Genet 40:90–95 Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, Ma L, Li G, Yang Z, Zhang G, Yang B, Yu C, Liang F, Li W, Li S, Li D, Ni P, Ruan J, Li Q, Zhu H, Liu D, Lu Z, Li N, Guo G, Zhang J, Ye J, Fang L, Hao Q, Chen Q, Liang Y, Su Y, San A, Ping C, Yang S, Chen F, Li L, Zhou K, Zheng H, Ren Y, Yang L, Gao Y, Yang G, Li Z, Feng X, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H, Wang J (2008) The diploid genome sequence of an Asian individual. Nature 456:60–65 Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JR, Stevens S, Hall AS, Samani NJ, Shields B, Prokopenko I, Farrall M, Dominiczak A, Diabetes Genetics I, Wellcome Trust Case Control C, Johnson T, Bergmann S, Beckmann JS, Vollenweider P, Waterworth DM, Mooser V, Palmer CN, Morris AD, Ouwehand WH, Cambridge GEMC, Zhao JH, Li S, Loos RJ, Barroso I, Deloukas P, Sandhu MS, Wheeler E, Soranzo N, Inouye M, Wareham NJ, Caulfield M, Munroe PB, Hattersley AT, McCarthy MI, Frayling TM (2008) Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40:575–583 Wright D, Boije H, Meadows JR, Bed’hom B, Gourichon D, Vieaud A, Tixier-Boichard M, Rubin CJ, Imsland F, Hallbook F, Andersson L (2009) Copy number variation in intron 1 of SOX5 causes the Pea-comb phenotype in chickens. PLoS Genet 5:e1000512 Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–1875 Xia Q, Zhou Z, Lu C, Cheng D, Dai F, Li B, Zhao P, Zha X, Cheng T, Chai C, Pan G, Xu J, Liu C, Lin Y, Qian J, Hou Y, Wu Z, Li G, Pan M, Li C, Shen Y, Lan X, Yuan L, Li T, Xu H, Yang G, Wan Y, Zhu Y, Yu M, Shen W, Wu D, Xiang Z, Yu J, Wang J, Li R, Shi J, Li H, Li G, Su J, Wang X, Li G, Zhang Z, Wu Q, Li J, Zhang Q, Wei N, Xu J, Sun H, Dong L, Liu D, Zhao S, Zhao X, Meng Q, Lan F, Huang X, Li Y, Fang L, Li C, Li D, Sun Y, Zhang Z, Yang Z, Huang Y, Xi Y, Qi Q, He D, Huang H, Zhang X, Wang Z, Li W, Cao Y, Yu Y, Yu H, Li J, Ye J, Chen H, Zhou Y, Liu B, Wang J,

Ye J, Ji H, Li S, Ni P, Zhang J, Zhang Y, Zheng H, Mao B, Wang W, Ye C, Li S, Wang J, Wong GK, Yang H, Biology Analysis G (2004) A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science 306:1937–1940 Xia Q, Guo Y, Zhang Z, Li D, Xuan Z, Li Z, Dai F, Li Y, Cheng D, Li R, Cheng T, Jiang T, Becquet C, Xu X, Liu C, Zha X, Fan W, Lin Y, Shen Y, Jiang L, Jensen J, Hellmann I, Tang S, Zhao P, Xu H, Yu C, Zhang G, Li J, Cao J, Liu S, He N, Zhou Y, Liu H, Zhao J, Ye C, Du Z, Pan G, Zhao A, Shao H, Zeng W, Wu P, Li C, Pan M, Li J, Yin X, Li D, Wang J, Zheng H, Wang W, Zhang X, Li S, Yang H, Lu C, Nielsen R, Zhou Z, Wang J, Xiang Z, Wang J (2009) Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326:433–436 Xia Q, Li S, Feng Q (2014) Advances in silkworm studies accelerated by the genome sequencing of Bombyx mori. Annu Rev Entomol 59:513–536 Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L, Li J, He W, Zhang G, Zheng X, Zhang F, Li Y, Yu C, Kristiansen K, Zhang X, Wang J, Wright M, McCouch S, Nielsen R, Wang J, Wang W (2012) Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol 30:105–111 Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L, Wang J (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:W293–W297 Zhang Y, Dong Z, Liu S, Yang Q, Zhao P, Xia Q (2012) Identification of novel members reveals the structural and functional divergence of lepidopteran-specific Lipoprotein_11 family. Funct Integr Genomics 12:705–715 Zhao M, Wang Q, Wang Q, Jia P, Zhao Z (2013a) Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinform 14(Suppl 11):S1 Zhao Q, Zhu Z, Kasahara M, Morishita S, Zhang Z (2013b) Segmental duplications in the silkworm genome. BMC Genom 14:521 Zhao Q, Han MJ, Sun W, Zhang Z (2014) Copy number variations among silkworms. BMC Genom 15:251 Zheng LY, Guo XS, He B, Sun LJ, Peng Y, Dong SS, Liu TF, Jiang S, Ramachandran S, Liu CM, Jing HC (2011) Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor). Genome Biol 12:R114

13

Genome-wide patterns of genetic variation among silkworms.

Although the draft genome sequence of silkworm is available for a decade, its genetic variations, especially structural variations, are far from well ...
1MB Sizes 0 Downloads 7 Views