Interdiscip Sci Comput Life Sci (2014) 6: 149–157 DOI: 10.1007/s12539-013-0024-9

Whole Genome Phylogeny of P rochlorococcus marinus Group of Cyanobacteria: Genome Alignment and Overlapping Gene Approach 1

Ratna Prabha1,4, Dhananjaya P. Singh1,∗ , Shailendra K. Gupta2 , Anil Rai3 (National Bureau of Agriculturally Important Microorganisms, Indian Council of Agricultural Research, Kushmaur, Maunath Bhanjan 275103, India) 2 (CSIR-Indian Institute of Toxicology Research, 80, Mahatma Gandhi Marg, Kaisarbagh, Lucknow 226001, India) 3 (Indian Agricultural Statistical Research Institute, Indian Council of Agricultural Research, Pusa, New Delhi 110012, India) 4 (Department of Biotechnology, Mewar University, Gangrar, Chittorgarh, Rajasthan, India)

Received 2 July 2013 / Revised 21 October 2013 / Accepted 10 January 2014

Abstract: Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure. Key words: Prochlorococcus marinus, evolution, whole genome phylogeny, genome alignment, overlapping-gene content, horizontal gene transfer.

1 Introduction Prochlorococcus is one of the most abundant unicellular marine photosynthetic cyanobacterium contributing 30-80% of the total photosynthesis in the low- to mid-latitude oceans and playing significant role in the global carbon cycling and climate regulation on the earth (Bryant, 2003; Paul et al., 2010). Prochlorococcus characteristically lacks phycobilisomes and uses divenyl chlorophyll-binding protein for its unique light harvesting machinery (Partensky et al., 1999). On the basis of vertical niche partitioning and pigment composition, members of Prochlorococcus are divided into two ecotypes. High light (HL) adapted ecotypes are found most abundantly at water surfaces and low light (LL) adapted strains dominate deeper water. These organisms are found from a depth of 5m (MED4, MIT9215) ∗

Corresponding author. E-mail: dpsfarm@rediffmail.com

upto a depth of 135m (MIT9312, MIT9313) (Kettler et al., 2007; Luo et al., 2008; Paul et al., 2010). HL strains occupy the upper, well illuminated but nutrient poor layer of water while LL strains thrive well in dimmer light and nutrient-rich environment (Dufresne et al., 2003). With their simple physiological and metabolic patterns in global environment and characteristic light harvesting machinery and small genome size, these organisms represent an excellent biological system to explore evolutionary diversification in the natural environment and ecological variation in wild conditions. Prochlorococcus marinus is the smallest known organism with relatively small genome size (1.7 to 2.7 Mb) and GC content (30.8 to 50.7%) (Kettler et al., 2007). Niche differentiation, molecular sequence data and physiological studies revealed complexity and inconsistency with the distribution of HL/LL ecotypes. The 16S rRNA phylogeny did not show consistency with the differences among the isolates because of the

150

global diversification and adaptation mechanisms. This is why various attempts were made to decipher phylogeny of this organism using diverse approaches (Kettler et al., 2007; Luo et al., 2008; Moore et al., 1998; Rocap et al., 2002). Phylogenies derived from single gene comparison (16S rRNA gene) showed inconsistency with each other due to different factors such as horizontal gene transfer (HGT), paralogy and highly variable rates of evolution (Snel et al., 1999). Inconsistency in using any single gene like 16S or group of genes conserved across the genomes lies in the assumption of its vertical descent which is not followed by all the genes (Eisen, 2000). It is considered that the footprints of evolutionary history are spread within the whole genome of an organism but are not limited to the genes or other short and highly conserved sequences (Sims et al., 2009). Evolutionary analysis of microbes using complete genome sequences therefore, provides distinct and clear picture that allows the identification of a complex combination of different factors such as vertical descent, gene transfer, gene and genome duplication, gene invention, gene loss and degradation, recombination, convergence and selection that affects almost every lineage of a tree (Eisen, 2000). Usually, three different approaches are available for whole genome phylogenetic analysis, i) information contained in the form of gene content or gene order; ii) sequence statistics such as occurrence of k-mers and iii) genomic sequence comparison (Eisen, 2000; Snel et al., 1999; Vishnoi et al., 2010). All these approaches have their own limitations. Gene content does not always reflect true evolutionary history as it is affected by the factors like loss and duplication of genes and may lead to the discrepancies in phylogeny of both closely and distantly related genomes (Eisen, 2000; Jiang et al., 2008). Gene order is not able to provide sufficient resolution (Jiang et al., 2008; Suyama and Bork, 2001; Vishnoi et al., 2010; Wolf et al., 2001). Sequence statistics for phylogenetic analysis is an alignment free method and cannot be used to understand biological basis of evolution as it did not consider evolutionary mechanisms for construction of genome trees (Qi et al., 2004). Overlapping genes (OGs) i.e. adjacent genes whose coding sequences are conserved among the species and overlap partially or entirely are also suggested as phylogenomic markers because they are ubiquitous in microbial genomes and show a strong relationship with the total number of genes (Luo et al., 2006). This is further modified by Jiang et al., (2008) who suggested that not only OG content but OG order is equally important. This is why the comparison of entire genome, instead of a small portion of the genome for drawing phylogenetic inference is gaining attention. Major reason for whole genome based phylogenetic inference is that the nucleotide composition is relatively constant within the species and aligning complete genomes can lead to

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

the identification of foreign genes and gives information about HGT (Eisen, 2000; Lawrence and Ochman, 1997; Lawrence and Ochman, 1998). Phylogeny of genus Prochlorococcus has been proposed on the basis of random concatenation of one hundred protein sequences sampled from core genome and gene content (Kettler et al., 2007) and gene order (Luo et al., 2008). Here, we propose phylogeny of P. marinus on the basis of the entire genomic sequence comparison for which two different approaches, genome alignment (Henz et al., 2005), considering overlapping gene content and gene order (Jiang et al., 2008) were used. While comparing the whole genome phylogeny and 16S rRNA phylogeny with earlier reports (Kettler et al., 2007; Luo et al., 2008), we showed that the whole genome phylogeny was consistent with the phenotypic and genotypic HL/LL distinction in P. marinus strains. We also discussed the underlying reasons for the genome-wide diversification and adaptation.

2 Materials and Methods 2.1

Nucleotide sequences

All twelve complete genome sequences available for P. marinus along with their 16S rRNA gene sequences were obtained from NCBI FTP site (ftp:// ftp.ncbi.nih.gov/genomes/bacteria/). 2.2

Distance estimation

MUMMER (Delcher et al., 2002) from GGDC web server (Auch et al., 2010) was used as a tool for inferring intergenomic distances for species delimitation to compare complete genomes. Default parameters were used for MUMMER which implements coverage algorithm with distance function and 100% identity. OGtree (Jiang et al., 2008) was used for constructing OG distance between the genomes for the analysis of both the OG content and OG order (http://bioalgorithm.life.nctu.edu.tw/OGtree/). Default parameters that were taken for OGtree included 1 for weight of overlapping-gene order and gene content, 1e-9 as threshold of E-value and 80% for threshold of alignment coverage in each sequence. For alignment of 16S rRNA gene, MUSCLE (Edgar, 2004) was used. 2.3

Phylogenetic analysis

Neighbor-Net (Bryant and Moulton, 2004) implemented in SplitsTree (Huson, 1998) was used for phylogenetic network construction for complete genomes. Genome tree are found to have influence on the phylogenetic classification of bacteria (Snel et al., 2005). Phylogeny of 16S rRNA gene was constructed with MEGA 5.0 (Tamura et al., 2011) using Neighbor-Joining reconstruction with 100 bootstrap iterations and tree was displayed using TreeDyn (Chevenet et al., 2006).

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

2.4

151

is observed in P. marinus on the basis of genetic and physiological differences associated with light adaptation, source of nutrition and predator influence (Kettler et al., 2007; Paul et al., 2010; Rocap et al., 2003). Various genomic parameters of all 12 Prochlorococcus genome sequences are presented in Table 1. Low-light (LL) adapted strains vary in genome size from 1.7 (CCMP1375 (SS120) and MIT9211) to 2.4 Mb (MIT9313) while High-light (HL) adapted strains consistently possess a genome size of 1.7 Mb with an exception of MIT9301 (1.6 Mb). GC content in LL strains vary from 34 to 50% while HL strains have invariably low GC content (30-31%) (Table 1). HL strains have slightly higher percentage of coding regions in the genomes in comparison to LL strains in which total number of genes and protein coding genes are more than that in HL strains (Table 1). Information about number and percentage of genes which are identified as ortholog, paralog or acquired through horizontal transfer in each of the twelve strains is shown in Table 2. In comparison to HL strains, LL strains show higher percentage of paralogs and horizontally acquired genes. Across HL strains, a consistency is observed when we consider the percentage of ortholog, paralog and horizontally transferred genes. Lowest number of horizontally transferred genes was reported in NATL1A whereas highest number was reported in MIT9303. Interestingly, both of these P. marinus genomes were the member of LL-adapted group.

Identification of conserved regions across genomes

All the genomes were aligned using Progressive Mauve with default parameters (Darling et al., 2010). In the process of genome alignment, Mauve uses breakpoint elimination algorithm for the identification of genome sequence regions, looks for local multiple alignments and compute a phylogenetic guide tree. A subset of the local alignment is then used as anchors for carrying out recursive anchoring so that additional alignment anchors can be identified and progressive alignment is performed (Darling et al., 2007). 2.5

Identification of ortholog, paralog and horizontally transferred genes

Number and percentage of orthologs, paralogs and horizontally transferred genes in each genome was identified using Integrated Microbial Genomes (IMG) system (http://img.jgi.doe.gov/cgi-bin/w/main.cgi) which is a resource for analyzing multiple microbial genomes in an integrated way (Markowitz et al., 2012).

3 Results and Discussion Physiologically two major P. marinus ecotypes are Low-light (LL) adapted strains (MIT9313, MIT9303, CCMP1375 (SS120), MIT9211, NATL2A and NATL1A) and High-light (HL) adapted strains (CCMP1986 (MED4), MIT9515, MIT9312, MIT9301, AS9601 and MIT9215). This evolutionary distinction

Table 1

Information about general genomic features of Prochlorococcus marinus strains; genome size, GC content, percentage of coding region in the genome, total number of genes and number of protein coding genes Percentage of

Ecotypes

Low-light (LL)

Strains

Genome Size

GC Content

(in Mb)

(%)

Total

Protein

coding region

number

coding

in the genome

of genes

genes

P. marinus MIT9313

2.4

50

81

2330

2269

P. marinus MIT9303

2.7

50

84

3136

2997

P. marinus CCMP1375 (SS120)

1.7

36

88

1930

1883

P. marinus MIT9211

1.7

38

89

1900

1854

P. marinus NATL2A

1.8

35

87

2228

2162

P. marinus NATL1A

1.9

34

86

2250

2193

P. marinus CCMP1986 (MED4)

1.7

30

87

1763

1717

High-light (HL)

P. marinus MIT9515

1.7

30

88

1964

1905

adapted strains

P. marinus MIT9312

1.7

31

89

1856

1810

P. marinus MIT9301

1.6

31

90

1962

1906

P. marinus AS9601

1.7

31

90

1983

1920

P. marinus MIT9215

1.7

31

89

2054

1982

adapted strains

152

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

Table 2

Number and percentage of genes identified as orthologs, paralogs and horizontally transferred in the genomes of all 12 strains Ortholog genes

Ecotypes

Paralog genes

Horizontal transfer genes

Genome Name Number

Percentage

2216

94.94%

P. marinus MIT9313

Number Percentage 196

Number

Percentage

8.40%

60

2.57%

Low-light (LL)

P. marinus MIT9303

2719

86.79%

306

9.77%

229

7.31%

adapted strains

P. marinus CCMP1375 (SS120)

1841

95.24%

139

7.19%

49

2.53%

P. marinus MIT9211

1816

95.53%

110

5.79%

75

3.95%

P. marinus NATL2A

2097

93.16%

222

9.86%

106

4.71%

P. marinus NATL1A

1879

94.66%

183

9.22%

16

0.81%

P. marinus CCMP1986 (MED4)

1705

96.55%

104

5.89%

35

1.98%

High-light (HL)

P. marinus MIT9515

1870

95.02%

120

6.10%

69

3.51%

adapted strains

P. marinus MIT9312

1786

96.02%

126

6.77%

64

3.44%

P. marinus MIT9301

1865

94.81%

115

5.85%

51

2.59%

P. marinus AS9601

1881

94.62%

122

6.14%

56

2.82%

P. marinus MIT9215

1952

94.80%

154

7.48%

78

3.79%

Phylogenetic tree prepared on the basis of 16S rRNA gene did not correspond with the physiological differentiation of P. marinus ecotypes (Fig. 1). Phylogeny based on 16S rRNA gene fails to resolve the distinction between two groups and show clades where LL and HL are distributed in a mixed way. Fig. 2 shows phylogenetic tree obtained through genome alignment while that obtained by considering overlapping-gene content and gene order is shown in Fig. 3. Both of these trees corresponded well with the genotypic and phenotypic HL/LL distinction and clearly differentiated twelve strains in two groups; one group containing LL strains while another including HL strains. Fig. 4 shows an alignment of all the genomes of P. marinus with Pro-

gressive Mauve using default parameters (Darling et al., 2010).

NATL1A NATL2A

Fig. 2 NATL2A

100

64

MIT9301 MIT9312 MED4

Phylogenetic tree obtained through genome alignment of all 12 genomes.

NATL1A

54

99

MIT9515 MED4

100

MIT9312

97 58

AS9601

MIT9215 69 MIT9301

100

MIT9313 MIT9303

0.004

Fig. 1

MIT9215

AS9601

MIT9515

CCMP1375

MIT9313 MIT9303

CCMP1375

MIT9211 100

MIT9211

0.1

Phylogenetic tree obtained by alignment of 16S rRNA gene sequences of 12 genomes.

Phylogeny obtained through complete genomes (either genome alignment or overlapping-gene content and gene order) is similar to that obtained by considering gene order (Luo et al., 2008). All HL strains showed monophyly whereas LL strains formed paraphyletic cluster suggesting multiple ancestors and parallel evolution. Clade including CCMP1375 and MIT9211 was resolved in both the phylogenetic trees obtained through complete genome analysis and gene content and order but is not resolved when only 16S rRNA gene (as shown by us) or a few other genes were considered (Luo et al., 2008). Furthermore, it is clear that two LL strains NATL1A and NATL2A showing divergence from other

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

153

MIT9211 CCMP1375

0.1

MIT9313 MIT9215 MIT9303 MIT9301

AS9601

MIT9312 NATL2A NATL1A

Fig. 3

CCMP1986 MIT9515

Phylogenetic tree obtained by overlapping-gene content and overlapping-gene order analysis of 12 genomes.

LL strains originated from HL strains. Organisms sharing same clade showed similarity in different genomic features such as genome size, GC coding and % coding genes. Thus, the phylogeny obtained through complete genome sequences seems to be more accurate and show consistency with phenotypic differentiation of this genus. Genome comparison results in the identification of differences among organisms at the genome level which leads to draw the biological significance of similarities or dissimilarities within related organisms (Darling et al., 2007). In genome alignment, 59 conserved regions or local colinear blocks (LCB’s) were observed across all the twelve strains (Fig. 4). Rearrangement is observed in the gene orientation if HL-strains are compared with LL-strains. Strain MIT9313 and MIT9303 show quite similarity with each other but divergence with rest of the ten strains. These strains show highest numbers of inverted genomic regions as compared to other members (Fig. 4), differ in other genomic parameters (such as size, GC content etc.) and generally group together as single clade in the above phylogenetic analyses. During evolution, organisms experience mutations in genomes due to events like rearrangement and lateral transfer of genes which led to significant variations, even among closely related species, in terms of gene order and gene content (Darling et al., 2007). Large-scale evolutionary processes such as lateral transfer, genome rearrangements, gene duplication and gene loss poses direct influence on genomes while local-evolutionary factors influence individual genes (Darling et al., 2004). Due to these factors, genome arrangement has large reflective impact on organism’s phenotype and can in-

fluence gene expression too (Darling et al., 2008). Horizontal gene transfer (HGT) is an important factor causing complexity in gene distribution and results in inconsistent phylogenies based on 16S rRNA (Kettler et al., 2007). It is indicated that cyanobacterial genomes have acquired 9.5 to 16.6% of their genes through HGT (Nakamura et al., 2004), which is suggested for the strains of P. marinus since this species coexisted with the Synechococcus (Kettler et al., 2007). Cyanophages are other important organisms infecting cyanobacteria, mediating transfer and recombination of genes among marine organisms (Zhaxybayeva et al., 2006). Progressive genome decay and genome streamlining is proposed as one line of evolutionary pattern for Prochlorococcus (Dufresne et al., 2003; Rocap et al., 2003). Although the progressive gene loss is suggested across the lineage of Synechococcus to LL ecotypes to HL strains (Dufresne et al., 2003; Palenik et al., 2003; Rocap et al., 2003) the analysis of the orthrologs, paralogs and horizontally transferred genes in each genome using Integrated Microbial Genomes (IMG) system (Markowitz et al., 2012) revealed that LL strains especially MIT9313, MIT9303, NATL1A and NATL2A possess higher number of paralogs and horizontally transferred genes as compared to HL ecotypes. It is suggested that the prominent LL strains MIT9303 (number of genes 3022) and MIT9313 (number of genes 2843) have gained a large number of genes including many unique ones even though their genome size is more or less similar to Synechococcus (Kettler et al., 2007). Although the differentiation of the Prochlorococcus genus showed the loss of a number of useful but dispensable core genes that were present in the genome of its last common ancestor Synechococcus (Partensky and Garczarek, 2010), it is again evident that compared to HL strains certain LL strains acquired significantly large number of foreign genes and paralogs during the course of evolution. Variation in gene gain within LL strains is observed and HL strains follow almost similar trend. It is reported that Prochlorococcus has lost majority of genes involved in many biosynthetic functions (Hess et al., 2001) and LL strains inhabiting the nutrient rich environment could manage to gain or even retain certain genes for specific functions that were absent from HL-ecotypes (Kettler et al., 2007). Majority of LL strains possess bigger genome size and higher number of genes (Table 1) to facilitate metabolic network for their environmental adaptation in contrast to HL strains. It is interesting to note that even if the gene loss has remained dominant processes during evolution of streamlined genomes of Prochlorococcus, majority of these strains have continuously, but at slower pace acquired genes by lateral transfer from the coexisting microorganisms in their environment (Kettler et al., 2007). In this study, we also aimed to identify genes which are shared by all the species and which are gained

154

Fig. 4

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

Alignment of all the twelve genomes of P. marinus strains with Progressive Mauve using default parameters. Each black horizontal line represents a genome sequence. High scoring local alignments among the sequences are represented as rounded boxes. Inverted regions among the twelve genomes are clearly depicted as blocks below a genome’s center line. Conserved regions of the genome are represented as colored blocks and are called local colinear blocks (LCB’s). Connecting lines between LCB represents the complicated rearrangement landscape across genomes of dataset. Inverted LCBs appear on the bottom strand of the double stranded genome.

in evolutionary time period. For this purpose, we identified the pan genome of all twelve strains which encompasses core genome (conserved in all strains) and dispensable genome (present in one or more but

not in all genomes). Functional categories identified for genes in core- and dispensable-genome reveals that across all the genomes major proportion of genes in core and dispensable genome are involved

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

in metabolic functions (carbohydrate transport and metabolism; nucleotide transport and metabolism; lipid transport and metabolism; inorganic ion transport and metabolism; coenzyme transport and metabolism; energy production and conversion; amino acid transport and metabolism; secondary metabolites biosynthesis, transport and catabolism) followed by genes involved in cellular processes and signaling functions (defense mechanisms; signal transduction mechanisms; cell wall/membrane/envelope biogenesis; posttranslational modification, protein turnover, chaperones; cell cycle control, cell division, chromosome partitioning; intracellular trafficking, secretion, and vesicular transport; cell motility; extracellular structures; cytoskeleton), and then in the information storage and processing function (translation, ribosomal structure and biogenesis; transcription; replication, recombination and repair; chromatin structure and dynamics). These, three functional categories are most affected in the course of evolution and are subject to change. Furthermore, genes for chromatin structure and dynamics (present in AS9601, MIT9515, MIT9312, MIT9301 and CCMP19786 (HL strains); NATL2A and NATL1A (LL strains)) cytoskeleton (present only in AS9601 (HL strain)) and extracellular structures (present in AS9601 (HL strain) and NATL2A, NATL1A (LL strains)) are not identified in core genome and are reported only in dispensable genome suggesting that these genes are acquired by the species as per their needs in due course of adaptation.

4 Conclusion The whole genome phylogeny of Prochlorococcus genomes corresponded well with the genotypic and phenotypic diversification of HL- and LL ecotypes. Genome alignment within these species identified genomic regions that are highly conserved across the genomes. In comparison to rest of the ten strains, MIT9313 and MIT9303 showed divergence. It is observed that gene gain is a favorable process during the course of evolution of Prochlorococcus genus within especially the LL-strains. P. marinus strains move from nutrient-deficient to nutrient rich environment. There is a continuous gene gain within and between the genomes to fulfill metabolic activities although, genome reduction and gene loss is also proposed for HL ecotypes. During the course of evolution, strains of P. marinus showed flexibility in their genomes to cope with their surrounding environment. With the change in the habitat, there is increase in metabolic activities and thus, the strains require genes to cope-up the purpose. It is observed that in comparison to HL, LL strains show a greater number of paralogous and laterally acquired genes, and this gene gain is probably for their metabolic activities. Across all the genomes, major pro-

155

portion of genes in core and dispensable genome tend to be involved in metabolic functions followed by genes involved in cellular processes, signaling functions and information storage and processing. Genes for chromatin structure and dynamics, cytoskeleton and extracellular structures are not identified in core genome and they are reported only in dispensable genome suggesting that they are acquired by the species as per their needs in the evolutionary process of their adaptation.

Acknowledgement DPS is grateful to Indian Council of Agricultural Research for financial support in the form of “National Agricultural Bioinformatics Grid” (NABG), NAIP.

Conflict of Interest Authors declared no conflict of interest.

References [1] Angiuoli, S.V., Hotopp, J.C.D., Salzberg, S.L., Tettelin, H. 2011. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics 12, 272. [2] Auch, A.F., Klenk, H., G¨ oker, M. 2010. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci 2, 142–148. [3] Bryant. 2003. The beauty in small things revealed. Proc Natl Acad Sci USA 100, 9647–9649. [4] Bryant, D., Moulton, V. 2004. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 21, 255–256. [5] Chevenet, F., Brun, C., Banuls, A.L., Jacq, B., Chisten, R. 2006. TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics 7, 439. [6] Darling, A.C., Mau, B., Blattner, F.R., Perna, N.T. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7), 1394–1403. [7] Darling, A.E., Mau, B., Perna, N.T. 2010. progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE 5(6), e11147. [8] Darling, A.E., Miklo’s, I., Ragan, M. A. 2008. Dynamics of Genome Rearrangement in Bacterial Populations. PLoS Genet 4(7), e1000128. [9] Darling, A.E., Treangen, T.J., Messeguer, X., Perna, N.T. 2007. Analyzing patterns of microbial evolution using the mauve genome alignment system. Methods Mol Biol 396, 135–152. [10] Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L. 2002. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30, 2478–2483.

156

Interdiscip Sci Comput Life Sci (2014) 6: 149–157

[11] Dikow, R.B. 2011. Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae). BMC Genomics 12, 237.

[25] Luo, Y., Fu, C., Zhang, D.Y., Lin, K. 2006. Overlapping genes as rare genomic markers: the phylogeny of g-Proteobacteria as a case study. Trends Genet 22, 593–596.

[12] Dufresne, A., Garczarek, L., Partensky, F. 2005. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol 6, R14.

[26] Markowitz, V.M., Chen, A.I., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., Ratner, A., Jacob, B., Huang, J., Williams, P., Huntemann, M., Anderson, I., Mavromatis, K., Ivanova, N.N., Kyrpides, N.C. 2012. IMG: the integrated microbial genomes database and comparative analysis system. Nucl Acids Res. 40, D115–D122.

[13] Dufresne, A., Salanoubat, M., Partensky, F., Artiguenave, F., Axmann, I.M., Barbe, V., Duprat, S., Galperin, M.Y., Koonin, E.V., Le Gall, F., Makarova, K.S., Ostrowski, M., Oztas, S., Robert, C., Rogozin, I.B., Scanlan, D.J., Tandeau de Marsac, N., Weissenbach, J., Wincker, P., Wolf, Y.I., Hess, W.R. 2003. Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci USA 100, 10020–10025. [14] Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32, 1792–1797.

[27] Martiny, A.C., Coleman, M.L., Chisholm, S.W. 2006. Phosphate acquisition genes in Prochlorococcus ecoytypes: evidence for genome wide adaptation. Proc Natl Acad Sci USA 103, 12552–12557. [28] Miele, V., Penel, S., Duret, L. 2011. Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 12, 116.

[15] Eisen, J.A. 2000. Assessing evolutionary relationships among microbes from whole-genome analysis. Curr Opin Microbiol 3, 475–480.

[29] Moore, L.R., Goericke, R.E., Chisholm, S.W. 2002. Utilization of different nitrogen sources by the marine cyanobacterial, Prochlorococcus and Synechococcus. Limnol Oceanogr 47, 989–996.

[16] Henz, S.R., Huson, D.H., Auch, A.F., Nieselt-Struwe, K., Schuster, S.C. 2005. Whole-genome prokaryotic phylogeny. Bioinformatics 21, 2329–2335.

[30] Moore, L.R., Rocap, G., Chisholm, S.W. 1998. Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 393, 464–467.

[17] Hess, W.R., Rocap, G., Ting, C.S., Larimer, F., Stilwagen, S., Lamerdin, J., Chisholm, S.W. 2001. The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynth Res 70, 53–71.

[31] Nakamura, Y., Itoh, T., Matsuda, H., Gojobor, T. 2004. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat Genet 36, 760–766.

[18] Huson, D.H. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73.

[32] Partensky, F., Garczarek, L. 2010. Prochlorococcus: advantages and limits of minimalism. Ann Rev Mar Sci 2, 305–331.

[19] Jiang, L., Lin, K., Lu, C.L. 2008. OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes. Nucl Acids Res 36, W475–W480.

[33] Partensky, F., Hess, W.R., Vaulot, D. 1999. Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev 63, 106–127.

[20] Kettler, C.G., Martiny, A.C., Huang, K., Zucker, J., Coleman, M.L., Rodrigue, S., Chen, F., Lapidus, A., Ferriera, S.., Johnson, J., Steglich, C., Church, G.M., Richardson, P., Chisholm, S.W. 2007. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3, e231.

[34] Paul, S., Dutta, A., Bag, S.K., Das, S. 2010. Distinct, ecotype-specific genome and proteome signatures in the marine cyanobacteria Prochlorococcus. BMC Genomics 11, 103.

[21] Laing, C., Buchanan, C., Taboada, E.N., Zhang, Y., Kropinski, A., Villegas, A., Thomas, J.E., Gannon, V.P.J. 2010. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinformatics 11, 461. [22] Lawrence, J.G., Ochman, H. 1997. Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44, 383–397. [23] Lawrence, J.G., Ochman, H. 1998. Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA 95, 9413–9417. [24] Luo, H., Shi, J., Arndt, W., Tang, J., Friedman, R. 2008. Gene Order phylogeny of the genus Prochlorococcus. PLoS ONE 3, e3837.

[35] Qi, J., Wang, B., Hao, B.I. 2004. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58, 1–11. [36] Rocap, G., Distel, D.L., Waterbury, J.B., Chisholm, S.W. 2002. Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Env Microbiol 68, 1180–1191. [37] Rocap, G., Larimer, F.W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N.A., Arellano, A., Coleman, M., Hauser, L., Hess, W.R., Johnson, Z.I., Land, M., Lindell, D., Post, A.F., Regala, W., Shah, M., Shaw, S.L., Steglich, C., Sullivan, M.B., Ting, C.S., Tolonen, A., Webb, E.A., Zinser, E.R., Chisholm, S.W. 2003. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424, 1042–1047.

Interdiscip Sci Comput Life Sci (2014) 6: 149–157 [38] Sims, G.E., Jun, S., Wu, G.A., Kim, S. 2009. Wholegenome phylogeny of mammals: evolutionary information in genic and nongenic regions. Proc Natl Acad Sci USA 106, 17077–17082. [39] Snel, B., Bork, P., Huynen, M.A. 1999. Genome phylogeny based on gene content. Nat Genet 21, 108–110. [40] Snel, B., Huynen, M.A., Dutilh, B.E. 2005. Genome trees and the nature of genome evolution. Ann Rev Microbiol 59, 191–209. [41] Sullivan, M.B., Coleman, M.L., Weigele, P., Rohwer, F., Chisholm, S.W. 2005. Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol 3, e144. [42] Sullivan, M.B., Waterbury, J.B., Chisholm, S.W. 2003. Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature 424, 1047–1051. [43] Sun, Z. 2011. New genomic approaches reveal the process of genome reduction in Prochlorococcus. Open Access Dissertations. Paper 352. [44] Suyama, M., Bork, P. 2001. Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trends Genet 17, 10–13. [45] Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S. 2011. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28, 2731-2739.

157 [46] Tettelin, H., Riley, D., Cattuto, C., Medini, D. 2008. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 12, 472–477. [47] Vishnoi, A., Roy, R., Prasad, H.K., Bhattacharya, A. 2010. Anchor-Based Whole Genome Phylogeny (ABWGP): a tool for inferring evolutionary relationship among closely related microorganisms. PLoS ONE 5, e14159. [48] Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L., Koonin, E.V. 2001. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol. 1, 8. [49] Zeidner, G., Bielawski, J.P., Shmoish, M., Scanlan, D.J., Sabehi, G., Beja, O. 2005. Potential photosynthesis gene recombination between Prochlorococcus and Synechococcus via viral intermediates. Environ Microbiol 7, 1505–1513. [50] Zhaxybayeva, O., Doolittle, W.F., Papke, R.T., Gogarten, J.P. 2009. Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biol Evol 1, 325–339. [51] Zhaxybayeva, O., Gogarten, J.P., Charlebois, R.L., Doolittle, W.F., Papke, R.T. 2006. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res 16, 1099-1108.

Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically ...
812KB Sizes 0 Downloads 7 Views