DNA AND CELL BIOLOGY Volume 10, Number 1, 1991 Mary Ann Liebert, Inc., Publishers Pp. 1-14

The P450 Superfamily: Update on New Sequences, Gene Mapping, and Recommended Nomenclature DANIEL W. NEBERT,* DAVID R. NELSON.t MINOR J. COON.Î RONALD W. ESTABROOK,§ RENE FEYEREISEN.1 YOSHIAKI FUJII-KURIYAMA, I FRANK J. GONZALEZ,** F. PETER GUENGERICH.tt IRWIN C. GUNSALUS.ÎÎ ERIC F. JOHNSON,§§ JOHN C. LOPER.11 RYO SATO,I I MICHAEL R. WATERMAN,§ and DAVID J. WAXMAN***

ABSTRACT We provide here a list of 154 P450 genes and seven putative pseudogenes that have been characterized as of October 20, 1990. These genes have been described in a total of 23 eukaryotes (including nine mammalian and one plant species) and six prokaryotes. Of 27 gene families so far described, 10 exist in all mammals. These 10 families comprise 18 subfamilies, of which 16 and 14 have been mapped in the human and mouse genomes, respectively; to date, each subfamily appears to represent a cluster of tightly linked genes. We propose here a modest revision of the initially proposed (Nebert et al, DNA 6, 1-11,1987) and updated (Nebert et al, DNA 8, 1-13, 1989) nomenclature system based on evolution of the superfamily. For the gene we recommend that the italicized root symbol CYP for human (Cyp for mouse), representing cytochrome P450, be followed by an Arabic number denoting the family, a letter designating the subfamily (when two or more exist), and an Arabic numeral representing the individual gene within the subfamily. A hyphen should precede the final number in mouse genes. We suggest that the human nomenclature system be used for other species. This system is consistent with our earlier proposed nomenclature for P450 of all eukaryotes and prokaryotes, except that we are discouraging the future use of cumbersome Roman numerals.

INTRODUCTION

ucts, and alcohols. Frequently the metabolism of xenobiotics can lead to toxic metabolites, of which some have been implicated as agents that may initiate carcinogenesis and tumor promotion, During 1985, with less than a dozen cDNA and deduced protein sequences available, it became apparent that the amino acid sequences could be aligned and classified on the basis of proposed evolutionary relationships of the cor-

enzymes are important in the oxidative metabolism of numerous endogenous compounds such as stebile acids, fatty acids, prostaglandins, leukotrienes, and biogenic amines. Many of these enzymes also metabolize a wide range of foreign chemicals (xenobiotics) including drugs, environmental pollutants, natural plant prod-

P450 roids,

*Department of Environmental Health, University of Cincinnati Medical Center, Cincinnati, OH 45267-0056. tDepartment of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599. ÎDepartment of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109. §Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75235. IDepartment of Entomology, Oregon State University, Corvallis, OR 97331-2907. I Department of Chemistry, Faculty of Science, Tohoku University, Sendai, 980 Japan. "Laboratory of Molecular Carcinogenesis, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892. ttDepartment of Biochemistry and the Center in Molecular Toxicology, Vanderbilt University School of Medicine, Nashville,

TN

37232.

î ÎDepartment of Biochemistry, University of Illinois, Urbana, IL 61801. §§Department of Basic and Clinical Research, Scripps Clinic and Research Foundation, La Jolla, CA 92037. "Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati Medical Center, Cincinnati, OH

45267-0524. IIII Institute for Protein Research, Osaka University, Osaka, 565 Japan. ""Department of Biological Chemistry and Molecular Pharmacology and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115. 1

2

NEBERT ET AL.

genes. A committee was formed to carry out this task. The foundations for the nomenclature system were laid with the first report (Nebert et al, 1987) and subsequent update (Nebert et al, 1989). The latter report incorporated recommendations of the Committee for Human Gene Mapping and the Committee on Standardized Nomenclature of Mouse Genes. Our present knowledge indicates that each P450 gene almost always produces a single protein. To date, there appear to be two exceptions to this rule where "functional" alternative splicing might occur, i.e., differential processing of the P450 transcript such that entire exons or portions of exons are exchanged in order to produce an enzyme with a new catalytic activity (Lacroix et al, 1990;

responding

Lephart

Naming

et a

al, 1990).

P450 gene

or

enzyme

be referred to

as P450c7, P450scc, P450cll, P450cl7, P450arom, P450c21, and P450c27, respectively, or simply c7, sec, ell, cl7, arom, c21, and c27. A P450 protein sequence from one gene family is de-

fined

usually having 40°Jo identical with three exceptions (Fig. 1). (i) The CYP2D subfamily includes the most distant members of the CYP2 family; exclusion of the CYP2D sequences results in all remaining CYP2 genes encoding enzymes that are >40% similar to each other, (ii) The CYP4C1 sequence from cockroach is clearly related to the CYP4 family but is only 30-36% identical to the other members of this family, (iii) The nuclear genes encoding two mitochondrial P450 proteins, sec and ell, are included in the same family (CYP11) even though the enzyme sequences are only 34-39% identical. Mammalian sequences within the same subfamily are always >55% identical. Inclusion of more distant species (e.g., chicken cl7 and trout IA1) within the same subfamily drops this value to >46%. To date, all genes within a given family have the same number of exons and similar intron-exon boundaries. Interestingly, P450 genes in evolutionarily related families share some intron-exon boundaries; for example, CYP17 and CYP21 share all except two introns (Picado-Leonard and Miller, 1987), CYP2 and CYP4 families share four introns, and two of those four shared introns are also found in CYP11. Conclusions drawn from intron-exon boundaries, in combination with the unweighted pair-group method of analysis (Fig. 1), are remarkably similar. Thus, comparative data about intron/exon organization should be considered, in addition to standard amino acid alignments, when one draws conclusions about P450 evolution stances that

defined

subfamily have

Recommendations for naming a P450 gene include the italicized root symbol CYP (Cyp for the mouse), denoting cytochrome P450, an Arabic number designating the P450 family, a letter indicating the subfamily when two or more subfamilies are known to exist within that family, and an Arabic numeral representing the individual gene. With mouse genes, the final number should be preceded by a hyphen. For the gene product (enzyme), it had been suggested to use a Roman numeral designating the family, followed by a letter indicating the subfamily and an Arabic numeral corresponding to the individual gene (Nebert et al, 1989). Thus, it was proposed that the rat P-448 enzyme be called P450IA2, or simply IA2. This nomenclature has become quite cumbersome with the larger numbers, however. For example, if there were a hypothetical second gene and its enzyme in the B subfamily of the 88th family, CYP88B2 is clearly easier than P450LXXXVIIIB2. We therefore would like to propose the use of Arabic rather than Roman numerals and are in favor of the same nomenclature for the gene as well as its transcript and product: e.g., italicized CYP1A2 (Cypla-2 in mouse) for the gene and cDNA; nonitalicized CYP1A2 for the mRNA and protein in all species including mouse. On the other hand, one might continue using the trivial name for the enzyme and Roman numerals where they have already been established, e.g., olfl and IIC16 are encoded by the rat CYP2G! and rabbit CYP2C16 genes, respectively. When describing data about the gene or gene product, the authors are free to use, for example, CYP2G1 or CYP2G1 (Nebert and Gonzalez, 1987). throughout the text. Another possibility for the protein If there appears to be no subfamily and no second funcmight be P450 1A2 or P450 2G1, or simply 1A2 or 2G1. tional gene in a particular family, one need not specify the Alternatively, the authors might use the trivial name olfl subfamily and gene number. For example, the human (also in the text, as long as the officially assigned names of the cow, pig, rat, chicken, and mouse) P45017 is the sole gene and the species under study are provided in a foot- member of the CYP17 gene family, and the gene and ennote. zyme are named CYP17 (Cyp-17 in mouse) and CYP17, Although we prefer the consistent usage of the gene no- respectively. menclature, authors may continue to use trivial names as There are presently 154 P450 genes and seven putative long as the names are compatible with GenBank and other pseudogenes that have been described in 23 eukaryotes (innucleic acid and protein databases. For the protein, this, cluding nine mammalian and one plant species) and in six means no hyphen in P450, no Greek letters, and no subprokaryotes (Table 2). Of the 27 gene families described so scripts or superscripts. For example, P4507a, P450scc, far, 10 exist in all mammals. These 10 families comprise 18 P45011|8, P45017a, P450arom, P450c2i, and P450c27 should subfamilies, or clusters of genes, of which 16 and 14 have

UPDATE ON P450 GENE NOMENCLATURE Table 1. Chromosomal

and

Subchromosomal Localization

of

CYP Genes3

P450 cluster gene

Chromosomal location

Human CYP1 CYP2A CYP2B

15q22-qter (near MPT) 19ql3.1-13.2 19ql2-ql3.2

or

CYP2C CYP2D CYP2E CYP2F CYP3A CYP4A CYP4B CYP11A CYP11B CYP17 CYP19 CYP21 CYP27

10q24.1-24.3 22qll.2-qter 10 19

References

Miles et al (1990a) Miles et al (1988) Yamano et al. (1989b) Shephard et al. (1989)

Nhamburo et al. (1990) Spurr et al. (1989) O.W. McBride and J.P. Hardwick, in Nhamburo et al (1989)

7q22-qter 1

Ipl2-p34 15q21-q22 8q21-q22

preparation

10

15q21 6p (within HLA) 2q33-qter

J.J.

Cali, D.W. Russell, U. Francke et al, in preparation

Mouse

Cypl Cyp2a Cyp2b Cyp2c Cyp2d Cyp2e

Cyp2f CyP3 Cyp4a Cyplla Cyp-17 Cyp-19 Cyp-21 Cyp-27 aReferences

Mid-9 (near 7

Mpi-1) (near Gpi-1)

Proximal 7 19 15 7 7 5 4 (near

(Coh)

Wong et al. (1989) Nhamburo et al. (1990)

MMTV13)

9 19 9 17 1 are not

T. Matsunaga et al. (1990) Miles et al. (1990b) Burkhart et al (1990) Miles et al. (1990b)

S. Kimura et al. (1989b) C.R. Wolf, personal communication Youngblood et al. (1989) Youngblood et al. (1991) Youngblood et al. (1989)

(within H-2) J.J. included in this table if the work

was

been mapped in the human and murine genome, respectively (Table 1). This P450 nomenclature system affords a convenient medium for colleagues across distantly related fields to converse with one another. Discussed briefly below are several potential problems that may arise from the application of this system of nomenclature, as is true with any phylogenetic analysis based on evolution.

Uncertainty about orthologous genes An

that

"orthologous gene" in two species refers to a gene know with certainty corresponds to the ancestral

we

Cali, D.W. Russell, U. Francke et al, in preparation cited in the Nebert et al. (1989) update.

gene which existed before the evolutionary divergence of the two species. In several subfamilies (CYP2A, 2B, 2C, 2D, 3A, and 4A), numerous species-specific gene duplications and gene conversion events have made orthologue as-

signments between species impossible. "Gene conversion" refers to an event during replication in which some portion of a gene is replaced by the corresponding part of a nearby gene (or pseudogene). Where the classification of proteins encoded by orthologous genes cannot be certain—particularly between widely diverged species and especially in subfamilies containing three or more genes sequential numbering is recommended, usually on a chronological basis, —

NEBERT ET AL. as

the

protein

sequences become available in the literature

(e.g., CYP2C1, CYP2C2, CYP2C3, CYP2C4,... CYP2C23). The same rules might become necessary for genes in other subfamilies, once it becomes apparent that three or more closely related genes exist in that subfamily. On the basis of amino acid similarities, we can be certain about the orthologous CYP1A1 and the orthologous CYP1A2 genes in the human, mouse, rat, rabbit, hamster, dog, and monkey. On the other hand, trout appears to have only the CYP1A1 gene in its CYP1 family, suggesting that CYP1A2 arose from a gene duplication event in land animals following the divergence of land animals and sea animals approximately 370 million years ago (Heilmann et al, 1988).

CYPGENE SUPERFAMILY

Uncertainty about

enzyme activities

catalytic activity would aid us of assignment orthologous genes across species (Nebert et al, 1989). This assumption can no longer be made. A recent study has shown that a single amino acid difference in a peptide of about 500 residues can be critical in changing the catalytic activity from "coumarin 7-hydroxylase" to "testosterone 15a-hydroxylase" (Lindberg and Negishi, 1989). These data indicate that a particular ancestral gene existing 20 million years ago, for example, could have undergone a single nucleotide substitution in rat but not in mouse 10 million years ago, leading to a single amino acid change, such that the orthologous gene in the rat and mouse would exhibit entirely different catalytic activities. [The rat and mouse species are believed to have separated from one another about 17 million years ago (for review, see Nebert and Gonzalez, 1987).] In studies too numerous to describe here, genetically engineered cDNAs encoding chimeric P450 proteins have also been shown to cause changes in catalytic activities. CYP27 [formerly CYP26 (Nebert et al, 1989)] designates the gene whose product hydroxylates one of the two It had been believed that

in





1. Unweighted-pair-group method of analysis (UPGMA) of the P450 superfamily. The genes within each subfamily represent all those that have been characterized in various species. Estimations of branching in the oldest portion of the tree are subject to the largest error (Nelson and Strobel, 1987), and molecularly driven events during evolution will contribute to the uncertainty of UPGMA branching patterns (Gonzalez and Nebert, 1990). This tree was calculated by using amino acid sequences deduced FIG.

r-n@>—gg^B -i—i_i_l_i_i_i_l_i_i_

4.0

3.0

2.0

1.0 EVOLUTIONARY DISTANCE

0

from 147 of the 154 cDNA sequences available. The branch lengths on the trunk of the tree connecting major branches are so short that the order is not easy to determine; this causes considerable variation in the tree below the CYP71 branch. Above this point there is very little difference between the UPGMA and neighbor-joining methods.

UPDATE ON P450 GENE NOMENCLATURE Table 2. Update

Gene

Trivial

symbol

CYP1A1

form 6 IA1 Dahl MKahl

Cypla-1

P,

P-448, d, HCB P3, d, form 4 LM4 MC4 Dah2

CYP Genes

Dog Monkey Mouse Rat Human

a3

Rat Rat Rat

Cyp2a-4

15aoh-l

Mouse

Cyp2a-5

15-hydroxylases. Third, the CYP10, CYP11, and CYP27 families appear to be more closely related in evolution; although not much is known yet about CYP10, the CYP11A, CYP11B, and CYP27 enzymes are bound to mitochondrial membranes. The vast majority of P450 enzymes (e.g., those encoded by genes in eight of the 10 mammalian families) are located in the endoplasmic reticulum. Fourth, most of the yeast genes seem to be clustered together, and most of the bacterial genes appear to be clustered together. The major exception is the CYP55 gene from the fungus Fusarium oxysporum that encodes a soluble, non-membrane-bound P450 enzyme similar to the soluble P450 enzymes found in prokaryotes. Lastly, although several laboratories have been unable to express numerous mammalian cDNA-derived P450 proteins in bacteria, the P450c7 protein has been expressed successfully in bacteria and will most likely be the first mammalian P450 protein to be studied by X-ray crystallography (J.Y.L. Chiang, personal communication). This success might somehow reflect the fact that the mammalian CYP7 family appears to be most distantly related to the other mammalian P450 gene families and most closely related to bacterial P450 —

genes

(Fig. 1).

Allelic variants

Finally, a large number of allelic variants have been identified in the human and rodent. The Nomenclature

NEBERT ET AL.

10

Committee has arbitrarily assigned proteins having

The P450 superfamily: update on new sequences, gene mapping, and recommended nomenclature.

We provide here a list of 154 P450 genes and seven putative pseudogenes that have been characterized as of October 20, 1990. These genes have been des...
2MB Sizes 0 Downloads 0 Views