Gene 537 (2014) 189–196

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Incidence, complexity and diversity of simple sequence repeats across potexvirus genomes Chaudhary Mashhood Alam a, Avadhesh Kumar Singh b, Choudhary Sharfuddin a, Safdar Ali b,⁎ a b

Department of Botany, Patna University, Bihar 800005, India Department of Biomedical Sciences, SRCASW, University of Delhi, Vasundhara Enclave, New Delhi 110096, India

a r t i c l e

i n f o

Article history: Accepted 4 January 2014 Available online 13 January 2014 Keywords: Potexvirus Simple sequence repeats Imperfect Microsatellite Extraction Correlation studies

a b s t r a c t An in-silico analysis of simple sequence repeats (SSRs) in genomes of 32 species of potexviruses was performed wherein a total of 691 SSRs and 33 cSSRs were observed. Though SSRs were present in all the studied genomes their incident frequency ranged from 11 to 30 per genome. Further, 10 potexvirus genomes possessed no cSSRs when extracted at a dMAX of 10 and wherein present, the highest frequency was 3. SSR and cSSR incidence, relative density and relative abundance were non-significantly correlated with genome size and GC content suggesting an ongoing evolutionary and adaptive phase of the virus species. SSRs present primarily ranged from mono- to tri-nucleotide repeat motifs with a greatly skewed distribution across the coding and non-coding regions. Present work is an effort for the undergoing compilation and analysis of incidence, distribution and variation of the viral repeat sequences to understand their evolutionary and functional relevance. © 2014 Elsevier B.V. All rights reserved.

1. Introduction A sizeable number of satellite sequences (mega-, mini- and micro-) have been characterized from different species (Jeffreys et al., 1998) and their presence had been established in coding regions as well. They may be present as a part of the transcriptome or in the vicinity of the coding regions. The repeat sequences are known to be responsible for the regulation of gene expression at transcriptional and translational level or even by gene silencing (Li et al., 2004; Rocha et al., 2002; Vergnaud and Denoeud, 2000). At the clinical level, expansion and contraction of these repeats are established risk factors for Huntington's disease, Myotonic dystrophy and several others genetic diseases (Borstnik and Pumpernik, 2002; Di Prospero and Fischbeck, 2005; Dushlaine et al., 2005; Richards, 2001; Sutherland and Richards, 1995). Simple sequence repeats (SSRs), also called as micro- or minisatellites, are tandem repetitions of relatively short DNA motifs. Variation in microsatellite distribution, complexity and copy number is high, owing to their unstable nature due to strand slippage and unequal recombination. This ensures a pivotal role in generating genomic diversity within and across species; which in turn provides a platform for evolution and adaptability (Deback et al., 2009; Kashi and King, 2006). Though their role in gene regulation has been the focus of studies for some time now and has been elucidated in few cases (Kashi and King,

Abbreviations: SSR, simple sequence repeat; cSSR, compound simple sequence repeat; IMEx, Imperfect Microsatellite Extraction; RD, relative density; RA, relative abundance; RDRP, RNA dependent RNA polymerase; TGB, triple gene block; CP, coat protein. ⁎ Corresponding author. Tel.: +91 11 22623503; fax: +91 11 22623504. E-mail addresses: [email protected], [email protected] (S. Ali). 0378-1119/$ – see front matter © 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gene.2014.01.007

2006; Usdin, 2008), but understanding their implications at functional and evolutionary level is still at a nascent stage. Different factors like genome features such as size and GC content are known to influence the occurrence and complexity of microsatellites (Coenye and Vandamme, 2005; Dieringer and Schlotterer, 2003; Kelkar et al., 2008). However, this correlation is not universal and therefore, a single priority rule cannot be forged for predicting their occurrence and density. Moreover, variable length of microsatellites owing to differential copy numbers affects the local DNA structure or the encoded proteins and can be one of the means for regulating/altering the expression profile of the respective genes (Mrazek et al., 2007). Their presence in the realms of the virus world has been long debated but with several reports of their incidence and analysis in viruses like Hepatitis C virus (HCV) and Human cytomegalovirus (HCMV) (Chen et al., 2009; Picone et al., 2005) the onus now has shifted to understanding their enigma in the smallest entities of the living world. Based on the presence of interruptions, microsatellites may be interrupted, pure, compound, interrupted compound, complex and interrupted complex (Chambers and MacAvoy, 2000). Present study primarily focuses on pure and compound microsatellites (two or more microsatellites adjacent to each other). Their presence has been reported in diverse taxa across viruses, prokaryotes and eukaryotes (Chen et al., 2012; Gur-Arie et al., 2000; Kofler et al., 2008). Furthermore, the compound microsatellites constitute ~ 10% of SSRs in human genome (Weber, 1990), including highly polymorphic compound repeats such as (dC-dA)n(dG-dT)n (Bull et al., 1999). Other eukaryotic genomes like Macaca mulatta, Mus musculus and Rattus norvegicus have 4–25% of compound microsatellites (Kofler et al., 2008). An exhaustive study of the diversifications in satellite sequences would provide insight into the imperfections and evolution of microsatellites.

190

C.M. Alam et al. / Gene 537 (2014) 189–196

Also, viral microsatellites have the potential for generating genomic diversity and phenotypic changes (Li et al., 2004). Their presence and possible functional significance in plant viruses have been recognized only recently (George et al., 2012; Xiangyan et al., 2011). Here, we systematically analyzed the occurrence, size, and density of different microsatellites in the highly divergent potexviruses, a possible model for understanding functional aspects, evolutionary relationships, and adaptation to divergent hosts. Potexviruses have positive-strand RNA genomes ranging from 4 to 7 kb, which encodes for five open reading frames (ORFs). Virus particles mostly occur in the cytoplasm and occasionally in the nuclei. They possess methylguanosine cap at 5′ end and poly (A) tail at 3′ end (Huang et al., 2004; Huisman et al., 1988). The first ORF encodes the viral RNA dependent RNA polymerase (RDRP) required for replication, followed by three overlapping genes known as the triple-gene block (TGB 1, 2 and 3) needed for virus cell-to-cell movement (Verchot-Lubicz, 2005) and ORF ends with viral coat protein (CP) which is essential for virion assembly and virus cell-to-cell movement (Huisman et al., 1988; Santa Cruz et al., 1998). Infection by these viruses causes chlorosis, necrotic lesion, decreased leaf size, mosaic and ringspot symptoms in monocotyledonous and dicotyledonous plants. They lead to severe infection to potato, papaya, tomato, strawberry and some ornamental plants causing severe loss of revenue worldwide. They are transmitted through sap inoculation and don't need any vector for its transmission but spreads through contact between infected plants. Owing to presence of almost all species in India few of them like Mint virus X, Papaya mosaic virus and Potato virus X are candidates for infective genetics research in plants. 2. Materials and methods 2.1. Genome sequences Potexvirus genus belonging to the family Alphaflexiviridae has 37 known species according to the ninth report of the International

Committee on the Taxonomy of Viruses (ICTV) (King et al., 2012). Of the 37 species, complete genome sequence of 32 potexviruses available at NCBI (http://www.ncbi.nlm.nih.-gov/) was assessed and analyzed for simple and compound microsatellites. The accession number of the studied sequences, the species and their GC content has been summarized in Table 1. The genome size ranged from ~3.8 kb (P13) to 7.1 kb (P1). 2.2. Microsatellite identification and investigation The search for microsatellites was performed employing the ‘Advance-Mode’ of IMEx with parameters as reported for HIV (Chen et al., 2012); as in Type of Repeat: perfect; Repeat Size: all; Minimum Repeat Number: 6, 3, 3, 3, 3, 3; Maximum distance allowed between any two SSRs (dMAX) is 10. Other parameters were set as default. 2.3. Statistical analysis We used Microsoft Office Excel 2007 for statistical analysis. Linear regression was used for correlation studies. 3. Results 3.1. Incident frequency of SSRs and cSSRs Genome wide scan for microsatellites across 32 potexviruses revealed a total of 691 SSRs and 33 cSSRs (Table 1, Supplementary Table 1). The presence of SSRs was observed in all the genomes but their incident frequency varied ranging from 11 (P20) to 30 (P4). Compound cSSRs are defined as two adjacent SSRs separated by less than a predefined maximum distance dMAX (Kofler et al., 2008). Ten potexvirus genomes lacked cSSR at dMAX of 10 and in genomes harboring cSSRs, the maximum observed frequency was 3 (P3 and P24). Interestingly, these variations are not directly correlated with genome size as

Table 1 Overview of simple microsatellites in complete potexvirus genome sequences. S. no

Name

Accession number

Genome size

GC content

SSR

RA

RD

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32

Allium virus X Alstroemeria virus X Alternanthera mosaic virus Asparagus virus 3 Bamboo mosaic virus Cactus virus X Cassava common mosaic virus Clover yellow mosaic virus Cymbidium mosaic virus Foxtail mosaic virus Hosta virus X Hydrangea ringspot virus Lagenaria mild mosaic virus Lettuce virus X Lily virus X Malva mosaic virus Mint virus X Narcissus mosaic virus Nerine virus X Opuntia virus X Papaya mosaic virus Pepino mosaic virus Phaius virus X Plantago asiatica mosaic virus Potato aucuba mosaic virus Potato virus X Schlumbergera virus X Strawberry mild yellow edge virus Tamus red mosaic virus Tulip virus X White clover mosaic virus Zygocactus virus X

FJ670570 AB206396 GQ179646 AB304848 AF018156 AF308158 U23414 D29630 JQ860108 M62730 JQ911698 AY707100 AB546335 AM745758 AJ633822 FZ416760 AY789138 AY225449 HQ166713 AY366209 D13957 JX866666 AB353071 AB360796 S73580 M72416 AY366207 AJ577359 JN389521 AB066288 X06728 AY366208

7118 7009 6607 6935 6366 6614 6376 7015 6224 6151 6431 6185 3860 7212 5823 6858 5914 6956 6581 6663 6656 6412 5816 6102 7059 7568 6633 5970 6495 6056 5845 6624

50.6 51.4 51.7 55 50.62 51.17 47.16 49.42 48.8 52.41 52.48 58.02 49.06 54.63 53.46 44.8 59.18 47.25 48.97 49.87 47.92 40.72 50.91 56.58 43.63 47.97 47.13 53.11 48.72 57.11 44.37 53.56

22 26 29 30 23 24 26 27 20 18 16 26 12 23 17 23 19 26 19 11 24 14 12 29 20 26 31 13 19 18 28 20

3.09 3.71 4.39 4.33 3.61 3.63 4.08 3.85 3.21 2.93 2.49 4.20 3.11 3.19 2.92 3.35 3.21 3.74 2.89 1.65 3.61 2.18 2.06 4.75 2.83 3.44 4.67 2.18 2.93 2.97 4.79 3.02

21.07 26.68 31.63 28.55 23.56 24.19 27.29 25.23 20.73 18.70 16.33 29.91 22.28 22.32 19.23 22.16 21.47 24.73 18.69 10.96 27.49 13.57 12.90 31.63 18.70 23.39 31.96 13.74 18.63 19.65 32.51 19.93

C.M. Alam et al. / Gene 537 (2014) 189–196

in neither the smallest genome (P13) has the least incidence nor the largest genome (P1) accounts for the highest incident frequency. 3.2. Relative abundance and relative density Owing to the variant incident frequencies we looked into the relative abundance (RA) and relative density (RD) for microsatellites. RA is the

191

number of microsatellites present per kb of the genome whereas RD is the presence of microsatellites per kb of the genome. The distribution of RA and RD for SSRs and cSSRs across the genomes has been summarized in Tables 1, Fig. 1B-C and Table 2, Fig. 2B-C respectively. For SSRs, P31 had the highest RA and RD of 4.79 and 32.51 respectively closely followed by P27 with respective values of 4.67 and 31.96. The least observed values for RA and RD were 1.65 and 10.06

Fig. 1. Analysis of SSRs (A) distribution of SSRs; (B) relative abundance: SSRs present per kb of genome; (C) relative density: Total length covered by SSR per kb of genome.

192

C.M. Alam et al. / Gene 537 (2014) 189–196

respectively in the species P20 (Table 1, Supplementary Table 1). For cSSRs, RA ranged from zero in ten species to a maximum of 0.45 in P3 which coincidentally also had the highest RD of 6.81 (Table 2, Supplementary Table 2). 3.3. cSSR% and complexity Further, we studied what percent of the SSRs were present as part of cSSR which is known as cSSR%. Across the 22 species which harbored cSSR, the cSSR% ranged from 3.57 (P31) to 11.76 (P15) (Table 2). We also examined the numbers of individual microsatellites present in a compound microsatellite determining the cSSR complexity and it ranged from 1 to 2 (Table 2, Supplementary Table 2). These results strongly suggest a divergent presence and distribution of cSSRs across the genomes. 3.4. Effect of increasing dMAX on incidence of cSSRs For IMEx, dMAX value can be between 0 and 50 (Mudunuri and Nagarajaram, 2007). We studied the effect on cSSR incidence with increasing dMAX in five randomly selected genomes T1, T8, T16, T24 and T32 and as expected observed an increase in cSSRs% with higher dMAX (Fig. 3).

with genome size and GC content (R2 = 0.0003, P N 0.1). Similarly relative density (R 2 = 0.009, P N 0.01) and relative abundance (R2 = 0.002, P N 0.01) were non-significantly correlated with genome size as well as with GC content R2 = 0.007, P N 0.01; and R2 = 0.005, P N 0.01 respectively. However, since the genome sizes of the studied viruses are very similar to one another (with the exception of P13), a significant dependence was not expected. The regression analysis of cSSR for cSSR% (R2 = 0.002, P N 0.05), relative density (R2 = 0.009, P N 0.05) and relative abundance (R2 = 0.002, P N 0.05) shows a nonsignificant correlation with genome size. Similarly GC content is also non-significantly correlated for cSSR% (R2 = 0.006, P N 0.05), relative density (R2 = 0.042, P N 0.05) and relative abundance (R2 = 0.021, P N 0.05). 3.6. Motifs types in analyzed genomes The extracted SSRs in the study revealed the presence of mono- to penta-nucleotide repeat motifs of which di-nucleotide motifs were the most predominant one. This is in accordance with the studies on other reported viral species. The di-nucleotides were subsequently followed by mono-nucleotide and tri-nucleotide repeat motifs (Fig. 4, Supplementary Table 1). As for the tetra- and penta-nucleotides a total of 2 and 3 such motifs were observed respectively (Supplementary Table 1).

3.5. Correlation studies 3.7. SSRs/cSSRs in coding regions We tested the SSR data for correlation between genome size/GC content and number/relative abundance/relative density of SSRs and cSSRs. Incidence of SSRs is non-significantly correlated (R2 = 0.22, P N 0.01) Table 2 Overview of compound microsatellites in potexvirus whole genome sequences. S. no

SSR

cSSRa

RDb

RAc

cSSR %d

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32

22 26 29 30 23 24 26 27 20 18 16 26 12 23 17 23 19 26 19 11 24 14 12 29 20 26 31 13 19 18 28 20

0 1 3 2 1 0 2 0 2 0 1 2 0 1 2 1 1 2 1 0 1 1 0 3 1 1 2 0 0 1 1 0

0.00 2.57 6.81 4.47 2.83 0.00 4.86 0.00 5.30 0.00 2.95 5.34 0.00 2.63 5.32 2.77 2.87 3.74 1.67 0.00 2.70 1.87 0.00 7.21 1.70 1.59 4.82 0.00 0.00 2.97 2.57 0.00

0.00 0.14 0.45 0.29 0.16 0.00 0.31 0.00 0.32 0.00 0.16 0.32 0.00 0.14 0.34 0.15 0.17 0.29 0.15 0.00 0.15 0.16 0.00 0.49 0.14 0.13 0.30 0.00 0.00 0.17 0.17 0.00

0.00 3.85 10.34 6.67 4.35 0.00 7.69 0.00 10.00 0.00 6.25 7.69 0.00 4.35 11.76 4.35 5.26 7.69 5.26 0.00 4.17 7.14 0.00 10.34 5.00 3.85 6.45 0.00 0.00 5.56 3.57 0.00

a

Number of compound microsatellites. Relative density is defined as the total length (bp) contributed by each compound microsatellite per kb of sequence analyzed. c Relative abundance: number of compound microsatellites present per kb of the/ genome (kb). d cSSRs-% is the percentage of individual microsatellites being part of a compound microsatellite. b

We further ascertained the differential distribution of SSRs across coding and non-coding regions. The SSRs were predominantly present in the coding region spanning three genes namely RNA dependent RNA polymerase (RDRP), triple gene blocks 1 2 & 3 (TGB) and coat protein. The RDRP gene houses an astounding 7 of every 10 observed SSRs while ~5% of the SSRs were present in the non coding region (Fig. 5A). Similar distribution was observed for cSSRs as well with nearly 80% present on the RDRP gene while a miniscule ~ 3% goes to the noncoding regions (Fig. 5B). Further, we analyzed the differential distribution of mono-, di- and tri-nucleotide motifs across the coding and non-coding regions and observed that RDRP gene accounted for ~ 70% of the each of the motif types (Fig. 6). 4. Discussion The fact that a bacterial genome can be sequenced in a day has led to a tremendous surge in genomic sequences being available with the thousandth prokaryotic genome sequenced a few years back (Flicek and Birney, 2009; Lagesen et al., 2010; Reeves et al., 2009). This accumulation of data has led to an increased and sustained effort for unraveling the genome complexities beyond the nucleotide sequence. Amongst these has been to discern the distribution of various types of repetitive sequences present across the genome. Of the repetitive sequences, the SSRs are exhibiting an increasing presence across different species (coding and non coding regions) and the involvement at functional level. The presence of SSRs in genomes thus assumes both clinical and evolutionary significance. Moreover, their presence and variation have been studied in reference to genome evolution in 257 viruses accounting for a strong correlation between type of the repeat, genome size and host adaptability (Zhao et al., 2012). Thus, owing to its ubiquitous presence and instability, the SSRs play a positive role in adaptive evolution both within and across the species. Herein we explored 32 potexvirus genomes for the presence, abundance, and composition of SSRs and observed a total of 691 SSRs and 33 cSSRs. Though SSRs were observed in all the studied genomes their incident frequency varied from 11 to 30. The SSRs incidence is proportional to genome size with the tobamovirus having 11–36 SSRs, lower than potyviruses (23–45 SSRs) (Xiangyan et al., 2011) or Human immunodeficiency virus isolates (22–48 SSRs) (Chen et al., 2009) but higher

C.M. Alam et al. / Gene 537 (2014) 189–196

193

Fig. 2. Analysis of cSSRs (A) analysis of cSSR% — Percentage of individual microsatellites being part of a compound microsatellite; (B) relative abundance of cSSR — Number of cSSR present per kb of genome; (C) relative density of cSSR — total length covered by cSSR per kb of genome.

than geminivirus (4–19 SSRs) (George et al., 2012). Notably, no significant correlation was observed between genome size and relative density/relative abundance of microsatellites, concurrent with that in Escherichia coli/HIV-1. Though the significance of cSSRs in potexviruses is unclear, they are possibly associated with the regulation of gene expression (Chen et al., 2011). Most cSSRs comprised of two motifs only. The largest cSSR was

composed of two motifs compared to prokaryotes (4 motifs) and eukaryotes (N8 motifs). Surprisingly, 10 potexvirus species did not possess any cSSR. This may be due to the non-evolving nature of these species, because in virus species wherein cSSR is present large numbers of strains and isolates have been identified suggesting the cSSRs to have a positive impact on evolution of viruses to be better acclimatized in the given conditions.

194

C.M. Alam et al. / Gene 537 (2014) 189–196

Fig. 3. Frequency of cSSR-% (percentage of individual microsatellites being part of a compound microsatellite) in relation to varying dMAX (10 to 50) across five randomly selected potexvirus species.

Majority of the SSRs and compound microsatellites were found to exist in coding regions as has been reported in eukaryotes (Metzgar et al., 2000; Tóth et al., 2000) and some prokaryotes (Gur-Arie et al., 2000; Li et al., 2004); possibly due to increased selection in coding regions (Ellegren, 2004; Karaoglu et al., 2005). However, microsatellite

accumulation in the coding regions of viral genomes can also be attributed to high coding density (Chen et al., 2009; George et al., 2012). SSRs present in protein coding regions are known to be associated with functions such as social behavior in voles, sporulation efficiency and cell adhesion in yeast, skeletal morphology in domestic dogs and adaptive

Fig. 4. Average distribution of (A) mono- or di-nucleotide repeat motifs and (B) tri-nucleotide repeat motifs.

C.M. Alam et al. / Gene 537 (2014) 189–196

195

Fig. 5. Differential distribution of (A) SSRs (%) and (B) cSSRs (%) in coding/non-coding regions of potexvirus.

divergence in barley and wheat populations (Kashi and King, 2006). Though the information about microsatellite sequences is increasing exponentially our understanding of the complexities involved in its

diversity and resultant impact on the functional attributes is far from satisfactory. An in depth analysis of viral repetitive sequences in correlation to their evolution, host adaptability and regulatory mechanisms

Fig. 6. Distribution of mono-, di- and tri-nucleotide SSR motifs (%) across coding/non-coding regions of potexvirus.

196

C.M. Alam et al. / Gene 537 (2014) 189–196

may provide the platform for ascertaining its impact on genome diversity, complexities and function. 5. Conclusion The complete relevance of our findings would help in the understanding of the functional and evolutionary significance of viral repeat sequences. However, repetitive sequences are accepted hot spots for recombination with recombination enzymes having high affinity for di-nucleotide repeat sequences (Biet et al., 1999). The distribution bias observed for SSR and cSSR at three loci namely, RDRP, TGB and coat protein suggests their involvement in recombination, leading to sequence diversity and ultimately host adaptation. More detailed study of compound microsatellites in divergent viral genomes will pave the way for understanding complex biological features such as changes in virulence and their emergence as new epidemics. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.01.007. Conflict of interests The authors declare that they have no conflicts of personal, communication or financial interests. Acknowledgments We thank the Department of Botany, Patna University and Department of Biomedical Sciences, Shaheed Rajguru College of Applied Sciences for Women, University of Delhi for the financial and infrastructural support provided. References Biet, E., Sun, J., Dutreix, M., 1999. Conserved sequence preference in DNA binding among recombination proteins: an effect of ssDNA secondary structure. Nucleic Acids Res. 27, 596–600. Borstnik, B., Pumpernik, D., 2002. Tandem repeats in protein coding regions of primate genes. Genome Res. 12, 909–915. Bull, L.N., Pabon-Pena, C.R., Freimer, N.B., 1999. Compound microsatellite repeats: practical and theoretical features. Genome Res. 9, 830–838. Chambers, G.K., MacAvoy, E.S., 2000. Microsatellites: consensus and controversy. Comp. Biochem. Physiol. Biochem. Mol. Biol. 126, 455–476. Chen, M., et al., 2009. Similar distribution of simple sequence repeats in diverse completed Human Immunodeficiency Virus Type 1 genomes. FEBS Lett. 583, 2959–2963. Chen, M., et al., 2011. Compound microsatellites in complete Escherichia coli genomes. FEBS Lett. 585, 1072–1076. Chen, M., Tan, Z., Zeng, G., Zhuotong, Z., 2012. Differential distribution of compound microsatellites in various Human Immunodeficiency Virus Type 1 complete genomes. Infect. Genet. Evol. 12, 1452–1457. Coenye, T., Vandamme, P., 2005. Characterization of mononucleotide repeats in sequenced prokaryotic genomes. DNA Res. 12, 221–233. Deback, C., et al., 2009. Utilization of microsatellite polymorphism for differentiating herpes simplex virus type 1 strains. J. Clin. Microbiol. 47, 533–540. Di Prospero, N.A., Fischbeck, K.A., 2005. Therapeutic development for triplet repeat expansion diseases. Nat. Rev. Genet. 6, 756–765. Dieringer, D., Schlotterer, C., 2003. Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res. 13, 2242–2251.

Dushlaine, C.T.O., Edwards, R.J., Park, S.D., Shields, D.C., 2005. Tandem repeat copy number variation in protein-coding regions of the human genes. Genome Biol. 6, R69. Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5, 435–445. Flicek, P., Birney, E., 2009. Sense from sequence reads: methods for alignment and assembly. Nat. Methods 6 (Suppl. 11), S6–S12. George, B., Mashhood, A.C., Jain, S.K., Sharfuddin, C., Chakraborty, S., 2012. Differential distribution and occurrence of simple sequence repeats in diverse geminivirus genomes. Virus Genes 45, 556–566. Gur-Arie, R., Cohen, C.J., Eitan, Y., 2000. Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res. 10, 62–71. Huang, Y.L., Han, Y.T., Chang, Y.T., Hsu, Y.H., Meng, M., 2004. Critical residues for GTP methylation and formation of the covalent m7GMP-enzyme intermediate in the capping enzyme domain of bamboo mosaic virus. J. Virol. 78, 1271–1280. Huisman, M.J., Linthorst, H.J., Bol, J.F., Cornelissen, J.C., 1988. The complete nucleotide sequence of potato virus X and its homologies at the amino acid level with various plusstranded RNA viruses. J. Gen. Virol. 69, 1789–1798. Jeffreys, J., Murray, J., Neumann, R., 1998. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot. Mol. Cell 2, 267–273. Karaoglu, H., Lee, C.M., Meyer, W., 2005. Survey of simple sequence repeats in completed fungal genomes. Mol. Biol. Evol. 22, 639–649. Kashi, Y., King, D.G., 2006. Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 22, 253–259. Kelkar, Y.D., Tyekucheva, S., Chiaromonte, F., Makova, K.D., 2008. The genome-wide determinants of human and chimpanzee microsatellite evolution. Genome Res. 18, 30–38. King, A.M.Q., Adams, M.J., Carstens, E.B., Lefkowitz, E.J., 2012. Virus Taxonomy: Classification and Nomenclature of Viruses. Ninth Report of the International Committee on Taxonomy of Viruses. Elsevier, San Diego. Kofler, R., Schlotterer, C., Luschutzky, E., Lelley, T., 2008. Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites. BMC Genomics 9, 612. Lagesen, K., Ussery, D.W., Wassenaar, T.M., 2010. Genome update: the 1000th genome — a cautionary tale. Microbiology 156, 603–608. Li, Y.C., Korol, A.B., Fahima, T., Nevo, E., 2004. Microsatellites within genes: structure, function, and evolution. Mol. Biol. Evol. 21, 991–1007. Metzgar, D., Bytof, J., Wills, C., 2000. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 10, 72–80. Mrazek, J., Guo, X., Shah, A., 2007. Simple sequence repeats in prokaryotic genomes. Proc. Natl. Acad. Sci. U. S. A. 104, 8472–8477. Mudunuri, S.B., Nagarajaram, H.A., 2007. IMEx: imperfect microsatellite extractor. Bioinformatics 23, 1181–1187. Picone, O., Ville, Y., Costa, J.M., Rouzioux, C., Leruez-Ville, M., 2005. Human cytomegalovirus (HCMV) short tandem repeats analysis in congenital infection. J. Clin. Virol. 32, 254–256. Reeves, G.A., Talavera, D., Thornton, J.M., 2009. Genome and proteome annotation: organization, interpretation and integration. J. R. Soc. Interface. 6, 129–147. Richards, R.I., 2001. Dynamic mutations: a decade of unstable expanded repeats in human genetic disease. Hum. Mol. Genet. 10, 2187–2194. Rocha, E.P., Matic, I., Taddei, F., 2002. Over-expression of repeats in stress response genes: a strategy to increase versatility under stressful conditions? Nucl. Acid Res. 30, 1886–1894. Santa Cruz, S., Roberts, A.G., Prior, D.A., Chapman, S., Oparka, K.J., 1998. Cell-to-cell and phloem-mediated transport of potato virus X. The role of virions. Plant Cell 10, 495–510. Sutherland, G.R., Richards, R.I., 1995. Simple tandem repeats and human genetic disease. Proc. Natl. Acad. Sci. U.S.A. 92, 3636–3641. Tóth, G., Gáspári, Z., Jurka, J., 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 967–981. Usdin, K., 2008. The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome Res. 18, 1011–1019. Verchot-Lubicz, J., 2005. A new model for cell-to-cell movement of potexviruses. Mol. Plant Microbe Interact. 18, 283–290. Vergnaud, G., Denoeud, F., 2000. Minisatellites: mutability and genome architecture. Genome Res. 10, 899–907. Weber, J.L., 1990. Informativeness of human (dC-dA)n. (dG-dT)n polymorphisms. Genomics 7, 524–530. Xiangyan, Z., et al., 2011. Microsatellites in different Potyvirus genomes: survey and analysis. Gene 488, 52–56. Zhao, X., et al., 2012. Coevolution between simple sequence repeats (SSRs) and virus genome size. BMC Genomics 13, 435.

Incidence, complexity and diversity of simple sequence repeats across potexvirus genomes.

An in-silico analysis of simple sequence repeats (SSRs) in genomes of 32 species of potexviruses was performed wherein a total of 691 SSRs and 33 cSSR...
1MB Sizes 0 Downloads 0 Views