FEBS Letters xxx (2015) xxx–xxx

journal homepage: www.FEBSLetters.org

Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules Ti-Cheng Chang, Ioannis Stergiopoulos ⇑ Department of Plant Pathology, University of California Davis, Davis, CA, USA

a r t i c l e

i n f o

Article history: Received 15 April 2015 Revised 11 May 2015 Accepted 20 May 2015 Available online xxxx Edited by Takashi Gojobori Keywords: Chitin Family 14 carbohydrate-binding module Modularity Promiscuity Versatility Supra-domain

a b s t r a c t Domain promiscuity is a powerful evolutionary force that promotes functional innovation in proteins, thus increasing proteome and organismal complexity. Carbohydrate-binding modules, in particular, are known to partake in complex modular architectures that play crucial roles in numerous biochemical and molecular processes. However, the extent, functional, and evolutionary significance of promiscuity is shrouded in mystery for most CBM families. Here, we analyzed the global promiscuity of family 14 carbohydrate-binding modules (CBM14s) and show that fusion, fission, and reorganization events with numerous other domain types interplayed incessantly in a lineage-dependent manner to likely facilitate species adaptation and functional innovation in the family. Ó 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

1. Introduction Carbohydrate-binding modules (CBMs) are ubiquitous molecules in nature that are frequently found as discrete non-catalytic components of carbohydrate-active enzymes (CAZymes), in which they promote the avidity of the enzyme for the target saccharide substrate. As a result, the general architecture of modular CAZymes is frequently composed of the catalytic module linked to one or several CBMs, while additional discrete domains might

Abbreviations: CBM, carbohydrate-binding modules; CBM14, family 14 carbohydrate-binding module; CAZymes, carbohydrate-active enzymes; HMM, Hidden Markov Model; SP, signal peptide; GH18, glycoside hydrolase family 18 domain; GH19, glycoside hydrolase family 19 domain; LDLa, lipoprotein receptor class A domain; IGv, Immunoglobulin variable-set domain; GlcNAc, N-acetylglucosamine (or N-acetyl-D-glucosamine); Ig, immunoglobulin; VCBPs, variable region-containing chitin-binding proteins; SRCR, Scavenger receptor cysteine-rich protein domain; Sp, serine protease (e.g. as in Sp22D gene from Anopheles gambiae); Pdi, polysaccharide deacetylase domain Author contributions: T.C., performed analyses; analyzed data; wrote the manuscript. I.S., conceived and supervised the study; analyzed data; wrote the manuscript. ⇑ Corresponding author at: University of California Davis, Department of Plant Pathology, One Shield Avenue, Davis, CA 95616-8751, USA. Fax: +1 530 752 5674. E-mail address: [email protected] (I. Stergiopoulos).

be present as well, thereby increasing the complexity of these architectures [1,2]. Overall, the high propensity of CBMs to fuse with other domains suggests that these modules are part of an organism’s proteome backbone that can be readily recruited for service in a milieu of enzymatic and biochemical processes [3]. However, despite the functional and evolutionary significance of acknowledging modular arrangements in CBM-containing proteins, the number and diversity of combinations in which a particular CBM can be involved, often referred to as versatility or promiscuity [4,5], remains for most CBM families largely unknown. Recently, we reported on the molecular evolutionary analysis of family 14 carbohydrate-binding modules (CBM14s) across all domains of life [6]. Members of this family show specific affinity for chitin, a b(1 ? 4) linked N-acetyl-D-glucosamine (GlcNAc) polysaccharide, and thus, not surprisingly, CBM14s are frequently connected to catalytic domains with chitinolytic activity. Our previous analysis indicated that the evolution of this family was largely shaped by horizontal gene transfer, multiple lineage-specific expansions and contractions, and positive selection that were overall suggestive of functional diversification [6]. Here, we expand these studies to further examine the dynamics of the CBM14 family evolution from the perspective of versatility or promiscuity, by surveying the diversity of domain types and complexity of domain

http://dx.doi.org/10.1016/j.febslet.2015.05.048 0014-5793/Ó 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

2

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

architectures exhibited by modular CBM14-containing proteins. Our analysis revealed an impressive repertoire of domains and consequently domain architectures associated with the CBM14, indicating that the ligand-binding properties of this module have been exploited several times in nature, possibly as a means to facilitate functional innovation in higher eukaryotes. 2. Materials and methods 2.1. Identification of domains in multimeric CBM14-containing proteins and subsequent domain–domain interaction network analysis To precisely identify the diversity of domain types that are associated with the CBM14, we used InterProScan [7] and the Hidden Markov Model (HMM) of each domain deposited in the Pfam database [8] to search against the 3432 CBM14-containing proteins (e-value 0, in which the phylogenetic distributions of the domain pairs were mapped at the phylum and subphylum level (Fig. 3). The resulted network was clearly dominated by domain pairs present in Arthropoda and Cephalochordata, which included 52 Arthropoda-specific and 15 Cephalochordata-specific pairs. An analysis of the origin of the major domain pairs present in modular CBM14-containing proteins, indicated that the earliest domain fusion event took place between the CBM14 and GH18 domain, most likely before the divergence of the metazoans from the rest of the Eukaryotic linages. Another significant domain fusion event represents the fusion between CBM14 with LDLa, which probably took place before the divergence of Ecdysozoas within metazoan. Notably, in Arthropoda the LDLa domain was also frequently fused with the polysaccharide deacetylase domain (Pdi; Pfam ID: PF01522), indicating a further extension of the original CBM14– LDLa architecture before the divergence of insects. In these organisms, proteins with the CBM14–LDLa–Pdi architecture are members of the chitin deacetylase gene family, which catalyzes the N-deacetylation of chitin to chitosan during morphogenesis and molting [31]. Finally, the third major fusion event took place between CBM14 and IGv, most likely before the divergence of

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

6

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

Chordata, as it is only present in the Cephalochordata and Tunicata subphyla of Choradata animals. The remaining domain pairs were mainly specific to each phylum, including 115 species-specific domain pairs, which may have occurred more recently compared to the three most abundant pairs described above. Overall, the network analyses indicated that the domain pairs and evolutionary constraints have interplayed incessantly to shape the sequence, structure, and function of the CBM14-containing proteins. Acknowledgments This work used resources of the UC Davis College of Agricultural and Environmental Sciences Computing Center Farm cluster and was financially supported by UC Davis faculty start-up funds. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.febslet.2015.05. 048. References [1] Boraston, A., Bolam, D., Gilbert, H. and Davies, G. (2004) Carbohydratebinding modules: fine-tuning polysaccharide recognition. Biochem. J. 382, 769–781. [2] Shoseyov, O., Shani, Z. and Levy, I. (2006) Carbohydrate binding modules: biochemical properties and novel applications. Microbiol. Mol. Biol. Rev. 70 (2), 283–295. [3] Guillen, D., Sanchez, S. and Rodriguez-Sanoja, R. (2010) Carbohydrate-binding domains: multiplicity of biological roles. Appl. Microbiol. Biotechnol. 85 (5), 1241–1249. [4] Weiner, J., Moore, A.D. and Bornberg-Bauer, E. (2008) Just how versatile are domains? BMC Evol. Biol. 8 (1), 285. [5] Basu, M.K., Carmel, L., Rogozin, I.B. and Koonin, E.V. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res. 18 (3), 449–461. [6] Chang, T.C. and Stergiopoulos, I. (2015) Inter-and intra-domain horizontal gene transfer, gain-loss asymmetry and positive selection mark the evolutionary history of the CBM14 family. FEBS J. 282 (10), 2014–2028. [7] Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. and Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acid Res. 33, W116–W120. [8] Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014) Pfam: the protein families database. Nucleic Acid Res. 42 (D1), D222–D230. [9] Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L. and Mistry, J. (2013) Pfam: the protein families database. Nucleic Acid Res., 1223. gkt1223. [10] Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L. and Ideker, T. (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27 (3), 431–432.

[11] Saito, R., Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Lotia, S., Pico, A.R., Bader, G.D. and Ideker, T. (2012) A travel guide to Cytoscape plugins. Nat. Methods 9 (11), 1069–1076. [12] Merzendorfer, H. and Zimoch, L. (2003) Chitin metabolism in insects: structure, function and regulation of chitin synthases and chitinases. J. Exp. Biol. 206 (24), 4393–4412. [13] Ruiz-Herrera, J. (1991) Fungal Cell Wall: Structure, Synthesis, and Assembly, CRC Press. [14] Martinez, J.P., Falomir, M.P. and Gozalbo, D. (2009) Chitin: a structural biopolysaccharide. eLS. [15] Karlsson, M. and Stenlid, J. (2008) Evolution of family 18 glycoside hydrolases: diversity, domain structures and phylogenetic relationships. J. Mol. Microbiol. Biotechnol. 16 (3–4), 208–223. [16] Dieckmann, M., Dietrich, M.F. and Herz, J. (2010) Lipoprotein receptors-an evolutionarily ancient multifunctional receptor family. Biol. Chem. 391 (11), 1341–1363. [17] Kim, E., Lee, Y., Kim, J.-S., Song, B.-S., Kim, S.-U., Huh, J.-W., Lee, S.-R., Kim, S.-H., Hong, Y. and Chang, K.-T. (2010) Extracellular domain of V-set and immunoglobulin domain containing 1 (VSIG1) interacts with sertoli cell membrane protein, while its PDZ-binding motif forms a complex with ZO-1. Mol. Cells 30 (5), 443–448. [18] Itoh, M., Nacher, J.C., Kuma, K.-I., Goto, S. and Kanehisa, M. (2007) Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol. 8 (6), R121. [19] Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S., Aerts, J., Schneider, R. and Bagos, P.G. (2011) Using graph theory to analyze biological networks. Biodata Min. 4. [20] Cannon, J., Haire, R. and Litman, G. (2002) Identification of diversified immunoglobulin-like variable region-containing genes in a protochordate. Nat. Immunol. 3, 1200–1207. [21] Hamid, R., Khan, M.A., Ahmad, M., Ahmad, M.M., Abdin, M.Z., Musarrat, J. and Javed, S. (2013) Chitinases: an update. J. Pharm. Biol. Sci. 5 (1), 21. [22] Ubhayasekera, W. (2011) Structure and function of chitinases from glycoside hydrolase family 19. Polym. Int. 60 (6), 890–896. [23] Zhao, Y., Park, R.-D. and Muzzarelli, R.A. (2010) Chitin deacetylases: properties and applications. Mar. Drugs 8 (1), 24–46. [24] Merzendorfer, H. (2013) Insect-derived chitinasesYellow Biotechnology II, pp. 19–50, Springer. [25] Kohler, A., Chen, L.-H,, Hurlburt, N., Salvucci, A., Schwessinger, B., Fisher, A. and Stergiopoulos, I. Structural analysis of an Avr4 effector ortholog provides mechanistic insight into chitin-binding and recognition by the Cf-4 receptor. Submitted for publication. [26] Vogel, C., Berzuini, C., Bashton, M., Gough, J. and Teichmann, S.A. (2004) Supradomains: evolutionary units larger than single protein domains. J. Mol. Biol. 336 (3), 809–823. [27] Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C. and Teichmann, S.A. (2004) Structure, function and evolution of multidomain proteins. Curr. Opt. Struct. Biol. 14 (2), 208–216. [28] Gorman, M., Andreeva, O. and Paskewitz, S. (2000) Sp22D: a multidomain serine protease with a putative role in insect immunity. Gene 251 (1), 9–17. [29] Vogel, C., Teichmann, S.A. and Pereira-Leal, J. (2005) The relationship between domain duplication and recombination. J. Mol. Biol. 346 (1), 355–365. [30] Nasir, A., Kim, K.M. and Caetano-Anolles, G. (2014) Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput. Biol. 10 (1), e1003452. [31] Dixit, R., Arakane, Y., Specht, C.A., Richard, C., Kramer, K.J., Beeman, R.W. and Muthukrishnan, S. (2008) Domain organization and phylogenetic analysis of proteins from the chitin deacetylase gene family of Tribolium castaneum and three other species of insects. Insect Biochem. Mol. Biol. 38 (4), 440–451.

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules.

Domain promiscuity is a powerful evolutionary force that promotes functional innovation in proteins, thus increasing proteome and organismal complexit...
2MB Sizes 1 Downloads 5 Views