Gene 540 (2014) 16–19

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Comparative genomic analysis of eutherian Mas-related G protein-coupled receptor genes Marko Premzl ⁎ Laboratory of Genomics, Centre of Animal Reproduction, Zagreb, Croatia

a r t i c l e

i n f o

Article history: Accepted 25 February 2014 Available online 26 February 2014 Keywords: Comparative genomic analysis Gene annotations Molecular evolution Phylogenetic analysis

a b s t r a c t The present study made attempts to update comprehensive eutherian Mas-related G protein-coupled receptor gene data sets, using public eutherian genomic sequence data sets and new genomics and molecular evolution tests. Among 254 potential coding sequences, the most comprehensive gene data set of eutherian Mas-related G protein-coupled receptor genes included 119 complete coding sequences that described eight major gene clusters. The present analysis integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis and first explained differential gene expansion patterns of eutherian Mas-related G protein-coupled receptor genes. The updated classification and nomenclature of eutherian Mas-related G protein-coupled receptor genes were proposed as new framework of future experiments. © 2014 Elsevier B.V. All rights reserved.

1. Introduction One major aim of initial sequencing and analysis of human genome was its potential applications in biomedical research (International Human Genome Sequencing Consortium, 2001). Among others, discoveries of new human genes were expected to update gene data sets and uncover new potential drug targets. For example, the data set of human G protein-coupled receptor genes included known, as well as new potential drug targets (International Human Genome Sequencing Consortium, 2001). The numerous human G protein-coupled receptor genes, associated with different functions in cells, were classified in five major clades (Fredriksson et al., 2003; Lagerström and Schiöth, 2008). The eutherian Mas-related G protein-coupled receptor genes clustered within rhodopsin clade of G protein-coupled receptors (Fredriksson et al., 2003; Lagerström and Schiöth, 2008). The descriptions of Mas-related G protein-coupled receptor genes were known in human, rhesus monkey and some rodent species (Burstein et al., 2006; Dong et al., 2001; Lembo et al., 2002; Young et al., 1986; Zhang et al., 2005; Zylka et al., 2003). One unresolved issue in eutherian Mas-related G protein-coupled receptor gene analyses was patterns of differential gene expansions of eutherian Mas-related G proteincoupled receptor genes. For example, there were differences in Mas-

Abbreviations: a¯, average pairwise identity; a¯ad, average absolute deviation for a¯; amax, largest pairwise identity; amin, smallest pairwise identity; C, C-terminal region; ECL1-3, extracellular regions 1-3; H-I-H-VIII, predicted α-helical secondary structure elements; ICL13, intracellular regions 1-3; MGR, Mas-related G protein-coupled receptor gene; N, Nterminal region. ⁎ Laboratory of Genomics, Centre of Animal Reproduction, 55 Heinzel St., 10000 Zagreb, Croatia. E-mail address: [email protected].

http://dx.doi.org/10.1016/j.gene.2014.02.049 0378-1119/© 2014 Elsevier B.V. All rights reserved.

related G protein-coupled receptor gene numbers between primates and rodents that are of major interest, because rodents were common experimental model animals in biomedical research. In addition, most comprehensive eutherian gene data sets were subject to further revisions, due to the incompleteness of present genomic sequence assemblies (Harrow et al., 2012) and potential sequence errors (International Human Genome Sequencing Consortium, 2004; Mouse Genome Sequencing Consortium, 2009). Thus, the present study made attempts to update comprehensive eutherian Mas-related G proteincoupled receptor genes, using public eutherian genomic sequence data sets (Margulies et al., 2005; Murphy et al., 2001; O'Leary et al., 2013) and eutherian comparative genomic analysis protocol that included new genomics and molecular evolution tests (Premzl, 2012, 2014). The protocol was established as one framework of eutherian gene descriptions that included guidance in protection against potential sequence errors in public eutherian genomic sequences. 2. Material and methods 2.1. Gene annotations The BioEdit 7.0.5.3 program was used in nucleotide and protein sequence analyses (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). The Ensembl genome browser's BLAST or BLAT tools were used in identification of genes in genomic sequence assemblies (http://www. ensembl.org/index.html). The direct evidence of eutherian gene annotations in NCBI's nr, est_human, est_mouse and est_others databases was used in analyses of gene features (http://www.ncbi.nlm.nih.gov). The potential coding sequences were first annotated and tested using test of reliability of eutherian public genomic sequences that used genomic

M. Premzl / Gene 540 (2014) 16–19

sequence redundancies. In the first test step nucleotide sequence coverage of each potential coding sequence was analysed using Netblast (ftp://ftp.ncbi.nlm.nih.gov/blast/documents/netblast.html) and primary experimental genomic sequence data (http://www.ncbi.nlm.nih. gov/Traces/trace.cgi). In the second test step potential coding sequences were classified as complete coding sequences if consensus trace sequence coverage was available for every nucleotide. Alternatively, potential coding sequences were classified as putative coding sequences. The complete coding sequences were used in phylogenetic and protein molecular evolution analyses. The guidelines of human and mouse gene nomenclature were used in gene descriptions (http://www.genenames. org/guidelines.html and http://www.informatics.jax.org/mgihome/ nomen/gene.shtml). The present eutherian Mas-related G proteincoupled receptor gene data set was reviewed as third party annotation data set by EBI (http://www.ebi.ac.uk/embl/Documentation/third_ party_annotation_dataset.html). The masking of transposable elements using RepeatMasker version open-3.3.0 and its default settings was used in genomic sequence alignments (sensitive mode, cross_match version 1.080812, RepBase Update 20110920, RM database version 20110920) except simple repeats and low complexity elements were not masked (http://www.repeatmasker.org/). The genomic sequence alignments used mVISTA web tool and its AVID algorithm and default settings (http://genome.lbl.gov/vista/index.shtml). 2.2. Phylogenetic analysis Using ClustalW implemented in BioEdit 7.0.5.3, the complete coding sequences were first translated and aligned at amino acid and nucleotide sequence levels, and these alignments were corrected manually. The phylogenetic tree calculations used MEGA5 program and neighbour-joining (default settings, except gaps/missing data = pairwise deletion) (data not shown), minimum evolution (default settings, except gaps/missing data = pairwise deletion) and maximum parsimony (default settings, except gaps/missing data = use all sites) (data not shown) methods (http://www.megasoftware.net). However, the maximum likelihood methods were not used in phylogenetic analysis, because their homogeneity and stationarity assumptions were not satisfied (data not shown). The pairwise nucleotide sequence identities of complete coding sequences were calculated using BioEdit 7.0.5.3 and used in statistical analysis (Microsoft Office Excel). 2.3. Protein molecular evolution analysis The test of protein molecular evolution integrated protein secondary structure element predictions with patterns of nucleotide sequence similarities. The protein secondary structure element predictions used present eutherian Mas-related G protein-coupled receptor protein primary structure alignments and protein tertiary structures 1F88, 2RH1, 3EML, 3PBL, 3ODU and 3RZE deposited in Protein Data Bank (http://www.rcsb.org/pdb/home/home.do). The DeepView/SwissPdbViever 4.0.1 program was used in protein tertiary structure analyses (http://spdbv.vital-it.ch/). The codon usage statistics were calculated using MEGA5 and ratios between observed and expected amino acid codon counts determined relative synonymous codon usages (R). The amino acid codons with R ≤ 0.7 were determined as not preferable amino acid codons. In present analysis, the not preferable amino acid codons were: TTT (0.66), TTA (0.32), TTG (0.54), CTT (0.51), CTA (0.3), ATA (0.36), GTT (0.64), GTA (0.26), TCA (0.67), TCG (0.33), CCG (0.53), ACG (0.51), GCA (0.61), GCG (0.43), TAT (0.64), CAT (0.65), CAA (0.43), AAA (0.64), CGT (0.27), CGA (0.33), AGT (0.7) and GGT (0.53). The reference protein sequence amino acid sites were designated as invariant amino acid sites (invariant alignment positions), forward amino acid sites (variant alignment positions that did not include not preferable amino acid codons) or compensatory amino acid sites (variant alignment positions that included not preferable amino acid codons). Thus, the presence of preferable amino acid codons

17

and absence of not preferable amino acid codons indicated that forward amino acid sites could have major influence on protein function. Conversely, the presence of not preferable amino acid codons indicated that compensatory amino acid sites could have minor influence on protein function.

3. Results and discussion 3.1. Gene annotations The present study annotated most comprehensive gene data set of eutherian Mas-related G protein-coupled receptor MGR genes. There were 119 eutherian MGR complete coding sequences among 254 potential coding sequences (Fig. 1). The eutherian MGR gene data set included: 12 MGRA genes, 3 MGRB genes, 12 MGRC genes, 7 MGRD genes, 7 MGRE genes, 5 MGRF genes, 66 MGRG genes and 7 MGRH genes. Under accession numbers HG426065–HG426183, the eutherian MGR gene data set was made available in public databases (Supplementary data file 1). For example, there were eight human, four rhesus monkey, 21 mouse and 16 brown rat MGR complete coding sequences. The present estimates confirmed that there were different patterns of MGR differential gene expansions between primates and rodents (Burstein et al., 2006; Dong et al., 2001; Zylka et al., 2003). The present genomic sequence alignments were limited by unavailability of complete direct evidence of eutherian MGR gene annotations in public databases (Supplementary data file 2). However, the genomic sequence alignments described common predicted promoter regions of eutherian MGRC genes that included two common genomic sequence regions (Supplementary data file 2C). The nucleotide sequence alignments of two common genomic sequence regions were available on request. In addition, the genomic sequence alignments were in agreement with present calculations of pairwise nucleotide sequence identity patterns among eutherian MGR major gene clusters (Section 3.2).

3.2. Phylogenetic analysis The present work first described eight eutherian MGRA-MGRH major gene clusters (Fig. 1). The evidence of differential gene expansions was shown in eutherian MGRG major gene cluster that included orthologues and paralogues. The updated eutherian MGR gene classification was corroborated by calculations of pairwise nucleotide sequence identity patterns among eutherian MGR major gene clusters (Supplementary data file 3). For example, whereas the eutherian MGRA-MGRF major gene cluster average pairwise nucleotide sequence identity calculations respectively were typical in comparisons between eutherian orthologues, eutherian MGRG major gene cluster average pairwise nucleotide sequence identity calculation was typical in comparisons between eutherian orthologues and paralogues. One exception was eutherian MGRH major gene cluster average pairwise nucleotide sequence identity calculation that was atypical in comparisons between eutherian orthologues (Supplementary data file 2G). The eutherian MGR gene data set average pairwise nucleotide sequence identity calculation was ā = 0.459 (amax = 1, amin = 0.296, āad = 0.081) that described eutherian MGR genes as close homologues (Section 3.3). The present work proposed revision of existing eutherian MGR gene classifications (Burstein et al., 2006; Dong et al., 2001; Zylka et al., 2003). For example, the eutherian MGRG major gene cluster included human and rhesus monkey MrgX genes and mouse and brown rat MrgA-MrgC genes that were previously classified in four separate gene clusters. Thus, the present description of eutherian MGRG major gene cluster that included evidence of differential gene expansions first explained differential gene expansion patterns of eutherian MGR genes (Supplementary data file 2F). Indeed, the other eutherian MGR major gene clusters were expected to include orthologues (Zylka et al., 2003).

18

M. Premzl / Gene 540 (2014) 16–19

A

B

C

Fig. 1. A) Phylogenetic analysis of eutherian Mas-related G protein-coupled receptor genes. Using maximum composite likelihood method, the minimum evolution tree was calculated. The estimates N50% were shown after 1000 bootstrap replicates. B) Predictions of secondary structure elements in eutherian Mas-related G protein-coupled receptors. The eight predicted α-helical secondary structure elements were indicated (H-I-H-VIII). The seven predicted transmembrane α-helical secondary structure elements were labelled grey (H-I-H-VII). The numbers indicated numbers of amino acids. C, C-terminal region; ICL, intracellular region; ECL, extracellular region; N, N-terminal region. C) Reference human MGRA protein primary structure. The eight predicted α-helical secondary structure elements were labelled using rectangles (H-I-H-VIII). The invariant amino acid sites were shown using white letters on violet backgrounds, and forward amino acid sites were shown using white letters on red backgrounds.

3.3. Protein molecular evolution analysis The present analysis first described eight eutherian MGRA-MGRH major protein clusters (Fig. 1) that included data set of 119 close homologues (Supplementary data file 4). Using primary protein sequence

comparisons with bovine rhodopsin G protein-coupled receptor (Palczewski et al., 2000), human β2 adrenergic G protein-coupled receptor (Rasmussen et al., 2007; Rosenbaum et al., 2007), human A2A adenosine G protein-coupled receptor (Jaakola et al., 2008), human dopamine D3 G protein-coupled receptor (Chien et al., 2010), human

M. Premzl / Gene 540 (2014) 16–19

chemokine CXCR4 G protein-coupled receptor (Wu et al., 2010) and human histamine H1 G protein-coupled receptor (Shimamura et al., 2011), the eutherian MGR protein data set secondary structure element predictions were made (Fig. 1B). The clustering of human MGR genes within rhodopsin clade of G protein-coupled receptors justified present analytical approach (Fredriksson et al., 2003; Lagerström and Schiöth, 2008). Thus, the eutherian MGR protein data set primary structure regions were described as transmembrane regions (predicted α-helical secondary structure elements H-I-H-VII), extracellular regions (N-terminal region and extracellular regions ECL1-3) and intracellular regions (intracellular regions ICL1-3 and C-terminal region) (Fig. 1B). The average pairwise nucleotide sequence identity calculation of transmembrane regions ā = 0.554 (amax = 1, amin = 0.399, āad =0.077) was higher in comparisons with pairwise nucleotide sequence identity calculation of intracellular regions ā = 0.424 (amax =1, amin = 0.116, āad = 0.09) and pairwise nucleotide sequence identity calculation of extracellular regions ā = 0.226 (amax = 1, amin = 0.038, āad = 0.091). Indeed, the present estimates were in agreement with analysis of Choi and Lahn (2003) that indicated that functional diversifications of MGR proteins were evident in their extracellular regions. In reference human MGRA protein primary structure, the present application of new test of protein molecular evolution that used codon usage statistics described three invariant amino acid sites (N50, W157 and D173), as well as five forward amino acid sites (D77, A125, E129, W139 and D305) (Fig. 1C). For example, the invariant amino acid site N50 that was located in predicted α-helical secondary structure element H-I and forward amino acid site D77 that was located in predicted αhelical secondary structure element H-II corresponded to key bovine rhodopsin crystal structure determinant amino acid sites N55 and D83 (Palczewski et al., 2000). Whereas the forward amino acid site E129 that was located in predicted α-helical secondary structure element H-III corresponded to first amino acid of (D/E)R(Y/W) amino acid motif common in G protein-coupled receptors, invariant amino acid site W157 that was located in predicted α-helical secondary structure element H-IV corresponded to key bovine rhodopsin crystal structure determinant amino acid site W161 (Palczewski et al., 2000). 4. Conclusions The present analysis integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis and first explained differential gene expansion patterns of eutherian Mas-related G protein-coupled receptor genes. The updated classification and nomenclature of eutherian Mas-related G protein-coupled receptor genes were proposed as new framework of future experiments. Finally, the eutherian comparative genomic analysis protocol was proposed as one framework of eutherian gene descriptions that included guidance in protection against potential sequence errors in public eutherian genomic sequences. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.02.049. Conflict of Interest There has been no conflict of interest. Acknowledgments MP would like to thank reviewers on helpful comments. References Burstein, E.S., Ott, T.R., Feddock, M., Ma, J.N., Fuhs, S., Wong, S., Schiffer, H.H., Brann, M.R., Nash, N.R., 2006. Characterization of the Mas-related gene family: structural and

19

functional conservation of human and rhesus MrgX receptors. British Journal of Pharmacology 147, 73–82. Chien, E.Y., Liu, W., Zhao, Q., Katritch, V., Han, G.W., Hanson, M.A., Shi, L., Newman, A.H., Javitch, J.A., Cherezov, V., Stevens, R.C., 2010. Structure of the human dopamine D3 receptor in complex with a D2/D3 selective antagonist. Science 330, 1091–1095. Choi, S.S., Lahn, B.T., 2003. Adaptive evolution of MRG, a neuron-specific gene family implicated in nociception. Genome Research 13, 2252–2259. Dong, X., Han, S., Zylka, M.J., Simon, M.I., Anderson, D.J., 2001. A diverse family of GPCRs expressed in specific subsets of nociceptive sensory neurons. Cell 106, 619–632. Fredriksson, R., Lagerström, M.C., Lundin, L.G., Schiöth, H.B., 2003. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Molecular Pharmacology 63, 1256–1272. Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigó, R., Hubbard, T.J., 2012. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research 22, 1760–1774. International Human Genome Sequencing Consortium, 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. International Human Genome Sequencing Consortium, 2004. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945. Jaakola, V.P., Griffith, M.T., Hanson, M.A., Cherezov, V., Chien, E.Y., Lane, J.R., Ijzerman, A.P., Stevens, R.C., 2008. The 2.6 angstrom crystal structure of a human A2A adenosine receptor bound to an antagonist. Science 322, 1211–1217. Lagerström, M.C., Schiöth, H.B., 2008. Structural diversity of G protein-coupled receptors and significance for drug discovery. Nature Reviews. Drug Discovery 7, 339–357. Lembo, P.M., Grazzini, E., Groblewski, T., O'Donnell, D., Roy, M.O., Zhang, J., Hoffert, C., Cao, J., Schmidt, R., Pelletier, M., Labarre, M., Gosselin, M., Fortin, Y., Banville, D., Shen, S.H., Ström, P., Payza, K., Dray, A., Walker, P., Ahmad, S., 2002. Proenkephalin A gene products activate a new family of sensory neuron-specific GPCRs. Nature Neuroscience 5, 201–209. Margulies, E.H., Vinson, J.P., Comparative Sequencing Program, N.I.S.C., Miller, W., Jaffe, D.B., Lindblad-Toh, K., Chang, J.L., Green, E.D., Lander, E.S., Mullikin, J.C., Clamp, M., 2005. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proceedings of the National Academy of Sciences of the United States of America 102, 4795–4800. Mouse Genome Sequencing Consortium, 2009. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology 7, e1000112. Murphy, W.J., Eizirik, E., Johnson, W.E., Zhang, Y.P., Ryder, O.A., O'Brien, S.J., 2001. Molecular phylogenetics and the origins of placental mammals. Nature 409, 614–618. O'Leary, M.A., Bloch, J.I., Flynn, J.J., Gaudin, T.J., Giallombardo, A., Giannini, N.P., Goldberg, S.L., Kraatz, B.P., Luo, Z.X., Meng, J., Ni, X., Novacek, M.J., Perini, F.A., Randall, Z.S., Rougier, G.W., Sargis, E.J., Silcox, M.T., Simmons, N.B., Spaulding, M., Velazco, P.M., Weksler, M., Wible, J.R., Cirranello, A.L., 2013. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science 339, 662–667. Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong, I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M., Miyano, M., 2000. Crystal structure of rhodopsin: a G protein-coupled receptor. Science 289, 739–745. Premzl, M., 2012. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases. Functional & Integrative Genomics 12, 599–607. Premzl, M., 2014. Comparative genomic analysis of eutherian ribonuclease A genes. Molecular Genetics and Genomics. http://dx.doi.org/10.1007/s00438-013-0801-5. Rasmussen, S.G., Choi, H.J., Rosenbaum, D.M., Kobilka, T.S., Thian, F.S., Edwards, P.C., Burghammer, M., Ratnala, V.R., Sanishvili, R., Fischetti, R.F., Schertler, G.F., Weis, W.I., Kobilka, B.K., 2007. Crystal structure of the human beta2 adrenergic G-proteincoupled receptor. Nature 450, 383–387. Rosenbaum, D.M., Cherezov, V., Hanson, M.A., Rasmussen, S.G., Thian, F.S., Kobilka, T.S., Choi, H.J., Yao, X.J., Weis, W.I., Stevens, R.C., Kobilka, B.K., 2007. GPCR engineering yields high-resolution structural insights into beta2-adrenergic receptor function. Science 318, 1266–1273. Shimamura, T., Shiroishi, M., Weyand, S., Tsujimoto, H., Winter, G., Katritch, V., Abagyan, R., Cherezov, V., Liu, W., Han, G.W., Kobayashi, T., Stevens, R.C., Iwata, S., 2011. Structure of the human histamine H1 receptor complex with doxepin. Nature 475, 65–70. Wu, B., Chien, E.Y., Mol, C.D., Fenalti, G., Liu, W., Katritch, V., Abagyan, R., Brooun, A., Wells, P., Bi, F.C., Hamel, D.J., Kuhn, P., Handel, T.M., Cherezov, V., Stevens, R.C., 2010. Structures of the CXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists. Science 330, 1066–1071. Young, D., Waitches, G., Birchmeier, C., Fasano, O., Wigler, M., 1986. Isolation and characterization of a new cellular oncogene encoding a protein with multiple potential transmembrane domains. Cell 45, 711–719. Zhang, L., Taylor, N., Xie, Y., Ford, R., Johnson, J., Paulsen, J.E., Bates, B., 2005. Cloning and expression of MRG receptors in macaque, mouse, and human. Brain Research. Molecular Brain Research 133, 187–197. Zylka, M.J., Dong, X., Southwell, A.L., Anderson, D.J., 2003. Atypical expansion in mice of the sensory neuron-specific Mrg G protein-coupled receptor family. Proceedings of the National Academy of Sciences of the United States of America 100, 10043–10048.

Comparative genomic analysis of eutherian Mas-related G protein-coupled receptor genes.

The present study made attempts to update comprehensive eutherian Mas-related G protein-coupled receptor gene data sets, using public eutherian genomi...
1MB Sizes 1 Downloads 3 Views