SPECIAL ARTICLE THE PRIMATE MHC WORKSHOP O E G S T G E E S T , T H E N E T H E R L A N D S , 1989

Evolution of Class I Major Histocompatibflity Complex Genes and Molecules in Humans and Apes Peter Parham and David A. Lawlor

ABBREVIATIONS EBV Epstein-Barr virus

As understanding of the physiological function of major histocompatibility complex (MHC) molecules has increased, so has interest in the evolution of their diversity and polymorphism. Intensive research has primarily been applied to the M H C of one rodent species, Mus musculus, and one primate species, Homo sapiens. Although serology revealed some cross-reactivities between HLA and H-2 products, it was only with the cloning, sequencing, and mapping of MFIC genes that detailed comparisons could be made between the two systems. Serology had in fact indicated that H-2 and HLA polymorphisms were very different and this was borne out by the molecular genetics. It was easier to discern evolutionary relationships for class II than for class I MHC products. For example, convincing homologies between I-E and HLA-DR and between I°A and HLA-DQ were identified whereas no equivalent relationships could be found for H-2K,D,L and HLAA,B,C. Moreover, the number of class I genes showed much greater variation than the class II genes, a tendency that was confirmed by studies of other species [1-3]. In summary, the first comparisons of humans and mice showed that, on an evolutionary time scale, the structure, expression, and numbers of class I genes can

Prom the Department of Cell Biology, Stanford University, Stanford, California. Address reprint requests to Dr. Peter Parham. Department of Cell Biology, Sherman FairchildBuilding, Stanford University, Stanford. CA 94305. AcceptedSeptember 18, 1990.

HumanImmunology30,119-128 (19911 @AmericanSocietyfor Histocompatibilityand Immunogenetics,1991

MHC

major histocompatibility complex

be subject to considerable change [4, 5]. Further insights into the details of the evolution were hard to find, due to the extent of the divergence between the class I genes of humans and mice; although assessment of those amino acid residues conserved in all H-2K, D,L and HLA-A,B,C proteins identified basic structural features of antigen-presenting class I M H C molecules [6]. It gradually became apparent that reconstruction of events contributing to the evolution of the class I MHC region would require extensive comparisons to be made, first within the polymorphisms and haplotypes of a species and second between closely related species. The clinical and legal interests in HLA polymorphisms, for example, as they relate to tissue transplantation, disease susceptibility, paternity testing, and genetic fingerprinting, has resulted in Homo sapiens being the most extensively studied species [7]. Certain human populations have been studied in great detail and with a focus of the Eleventh International Histocompatibility Workshop being human anthropology, it becomes realistic to plan for a comprehensive description of HLA polymorphisms in the contemporary human population. The species most closely related to humans are the African anthropoid apes, comprising the gorillas and the chimpanzees. Combining estimates based on the analysis of fossils and on the extent of interspecies D N A hybridization suggest that these ape species shared a common ancestor with humans some 7 to 12 million years ago [8, 9]. In fact, the molecular and morphological similarities between humans, chimpanzees, and gorillas have so far made it impossible for investigators to agree on a pattern of evolutionary branching, i.e., which 119 0198-8859/91/$3.50

120

two of the three species are most closely related [10, 11]. In contrast, it is generally agreed that the various species of Asian ape, the orangutans and gibbons, are more distantly related. Thus the lineage leading to the Asian apes separated from that leading to humans and the African apes 14 to 16 million years ago prior to separation of the separate lineages giving rise to humans, chimpanzee, and gorillas [8, 9]. Southern blots indicate the presence of approximately 20 class I genes and gene fragments in the HLA region and, in contrast to observations on H-2, this number does not appear to drastically differ between haplotypes [ 12, 13]. O f these genes only three (HLA-A, -B, and -C) code for heavy chain glycoproteins that associate with ~2-microglobulin (~82-m) and are expressed at the surfaces of many cell types. HLA-A,B,C gene products are also the only human class I molecules to exhibit high diversity, to have a demonstrated role in antigen presentation, and to be distinguished by "HLA typing" with alloantisera. Our initial goals were to analyze the nature of HLAA,B,C allelic polymorphism, the mutagenetic events contributing to its evolution and its consequences for the interactions of these antigen-presenting molecules with antigenic peptides, T-cell receptors, and the CD8 co-receptor. With contributions from many laboratories there are now sequences for over 77 different HLAA,B,C alleles, not including those confirmed by independent determination [14-16]. They provide an understanding of the differences between the products of the three loci, of the overall pattern of conservation and variability, and of the mechanism generating new alleles, that is unlikely to change as additional sequences accrue. Despite this accomplishment the relationships between the various families of HLA-A,B,C alleles [ 17, 18] are far from being understood and their clarification will require further sequence determination and better methods for sequence analysis. Comparison of murine class I M H C genes has shown that genetically linked nonclassical class I MHC genes can contribute to the evolution of new alleles at antigenpresenting loci; in particular various H-2K b mutants were shown to be formed by the homologous introduction of small segments of sequence from Q and Tl genes [19]. Such results would suggest that knowledge of all the class I genes, pseudogenes, and gene segments in the HLA region will be necessary for understanding the evolution of HLA-A,B,C alleles. Serological comparisons of HLA antigens and their homologues in apes showed that various MHC polymorphisms were shared by these species and suggested that they may have originated in and been inherited from common ancestral species [20-23]. If true, then it is apparent that study of HLA-A,B,C alleles alone will

P. Parham and D. A. Lawlor

be insufficient to understand their evolution and that critical insights will be gained from phylogenetic comparisons. We have therefore begun a study of the class I MHC genes of the African and Asian apes [24, 25]. In collaboration with Dr. Frances E. Ward (Duke University) and with the help and resources of personnel at the Yerkes Primate Center (Atlanta, GA) we have established a panel of Epstein-Barr-virus (EBV)-transformed B-cell lines from four ape species: Pan troglodytes (common chimpanzee), Pan paniscus (pygmy chimpanzee), Gorilla goril,ta (lowland gorilla), Pongo pygmaeus (Bornean orangutan), and Pongo pygmaeus (Sumatran orangutan). These cell lines [25], which have been used to isolate c D N A clones encoding class I heavy chains and 32-microglobulin, have also been deposited with the American Type Culture Collection (ATCC) and can be obtained from Dr. Robert Hay. We have recently reviewed the results so far obtained from this analysis and also their interpretation in terms of the evolution of the M H C [3, 24]. Here we shall briefly summarize our findings with emphasis on their implications for HLAA,B,C evolution. D I F F E R E N C E S B E T W E E N HLA-A, HLA-B, A N D HLA-C ALLELES Class I M H C proteins including HLA-A, -B, and -C have a characteristic structural organization which is also reflected in their genes [26]. The variable class I heavy chain consists of three extracellular domains (al, ~2, and a3), a membrane spanning region, and a cytoplasmic tail. Each of the extracellular domains and the transmembrane region is encoded by a separate exon while the cytoplasmic tail is encoded by two or three small exons dependent upon the locus. A fourth extracellular domain is contributed by a second, and in humans invariant, polypeptide called ~2-microglobulin. Coding sequences of HLA-A,B,C alleles can be divided into two according to their pattern of sequence variation [15]. The 3' exons encoding c~3, the transmembrane region, and the cytoplasmic tail are quite conserved between alleles of the same locus but exhibit considerable differences between alleles of different loci. Many "locus-specific" positions at which all alleles of a locus have the same nucleotide, but there are differences between loci, are found in these exons. The 5' exons encoding a~, c~2,and also the leader peptide ~ e in general more variable, having the majority of substitutions that distinguish alleles of a locus. On the basis of the complete coding sequence it is now possible to unambiguously assign new sequences to either the HLA-A, -B, or -C locus [15]. The division of the nucleotide sequence into a more variable 5' and more conserved, locus-specific 3' part

Evolution of Class I MHC Genes and Molecules

TABLE I

Positions with high variability (--'4.0) in HLA-A,B,C molecules Variability

Position

HLA-A

9 24 45

9.1

62 63 67 69 70 77 80 81 82

13.9 4.2

95 97

5.4 5.8

HLA-B

HLA-C

4.8 7+0 10.0

10.0

8.3 6.3

152 156 163

5.9 8.3

5.0 14.0

Peptide Peptide

5.2 6.2 15.9 5.6 8.1

Peptide Peptide Peptide ° T c R + peptide Peptide Peptide TcR Peptide Peptide Peptide Peptide TcR

13.5 4.8 6.4 6.2 7.0 5.0 5.5

99 114 116

Potential contact

Peptide Peptide

10.2 5.5 10.3

Peptide Pepdde TcR + peptide

Variability was calculated from 25 HLA-A, 35 HLA-B, and 18 HLA-C molecules as described by Wu and Kabat [30]. Potential contacts for the side chains of positions with high variability are adapted from Bjorkman et at. [31l as described by Bjorkman and Parham [16]. Position 45 is at the end of the "45 pocket" described by Garrett et al. [J3].

reflects the different functions of the protein domains of the class I M H C molecule. The c*l and a2 domains form the binding sites for variable peptides and T-cell receptors (TcR) [27] whereas the c,3 domain interacts with CD8, the invariant co-receptor of class I reactive T cells [28, 29]. No functions other than as a membrane anchor have so far been ascribed to the transmembrane and cytoplasmic domains. In addition to locus-specific sequences in the carbox-y-terminal domains the pattern of substitutions in ~ and a2 shows features that distinguish the products of the HLA-A, -B, and -C loci. Certain positions that are conserved in the products of one locus exhibit variation in the products of another and certain sequence motifs are restricted to subsets of allelic products of one or other locus. On the basis of the Wu and Kabat [30] parameter of variability positions with high variability can be defined and their number and distribution varies with locus (Table 1, Fig. 1), although all are found within the antigen recognition site [16, 31, 32]. For example, we and others have often stated that the oq helix is the most variable region of the class I heavy chain, yet inspection shows that this variation is highly

12t

dependent upon locus. In the short amino-terminal region of the helix (residues 62 and 63) high vadab~ity is only found in HLA-A molecules contrasting wkh the longer carboxy-terminal region (residues 67-82) where high variability derives from HLA-B a:olecules. For HLA-C molec-,.des the ¢x: hel~ is strikingly conserved, there being no positions with hi#-, variabiti~'. The crystallographic structure.~ of i-H.A-A2.1 and -Aw68.1 indicate that the majority of high-variabi|ity positions directly affect pepdde binding rather thau contacts with T c R s m a property rejecting conservation of the interaction between TcRs an8 class ! MHC molecules [31-33]. Even within the peptide-binding groove there is restriction of high-variability residues to p~rticular areas and again they differ for~:HLA-A, -B, and -C (Table 1, Fig. 1). Comparison of~he A2.1 and Aw68.1 structures also revealed the division of the peptidebinding groove into various subskes or pockets [33L Certain of these features are conserved in all HLAA,B,C molecules and perhaps represent binding sites for common features of antigenic peptides, while others are diversified. Locus-specific effects are g a i n evident as illustrated by the 45 pocket, involving residues 24, 26, 34, 45, and 67, which exhibits 13 structural motifs in HLA-B molecules compared to just two in either HLA-A or HLA-C molecules. M E C H A N I S M S OF HLA-A,B,C DIVERSIFICATION All inherited changes in the MHC originate with mutations in the germ line of single individuals which may then be propagated and increase in frequency within the population. Two categories ofmutagenetic event can be distinguished: those that change the number of genes or the organization of genes of a haplotype and those that create new alleles at an individual locus. We shall focus upon the latter. The frequency with which new alleles are formed is low and so no example of an individual expressing an HLA-A,B,C antigen not shared with either parent has been documented. This means that we can only consider mutagenetic events that took place in the experimentally irretrievable past and attempt their reconstruction from comparisons of the current alleles. Insight into the different mutagenetic events that produce new HLA-A,B,C alleles has been gained from comparison of sequences of alleles that by some other criterion are judged to be very similar and may therefore be evolutionarily separated by one or a small number of genetic events. Serology, isoelectric focusing, and recognition by cytotoxic T cells (CTL) have all been used to identify such pairs or larger groups of alleles, with the subtypes of HLA-A2 and HLA-B27 being particularly informative [34].

122

P. Parham and D. A. Lawlor

A

FIGURE 1 Positions of high variability in the structure of HLA-A, -B, and -C molecules. Variability for 25 HLA-A, 35 HLA-B, and 18 HLA-C molecules was calculated as described by Wu and Kabat [30]. Positions with variability -> 4.0 were designated as having high variability and all such positions were found in the al and a2 domains. Their positions for HLA-A (A), HLA-B (B), and HLA-C (C) molecules are indicated on a ribbon diagram of the structure of HLA-A2. Adapted from Fig. 2 of Bjorkman et al. [31].

B

45

6z

C

Point mutations and a variety of homologous recombinational mechanisms are thought to have contributed to the contemporary HLA-A,B,C polymorphism [3, 17]. The original source of all substitutions has to be point mutation as homologous recombination in the absence of sequence differences has no effect on sequence diversity. However, the power of recombination is its capacity to generate multiple new alleles from each new substitution. The importance of this capacity for "recombinational amplification" of point substitution may explain the paucity of known alleles which differ by a single substitution. One example is the B'2703 subtype of HLA-B27 which differs from the most common allele, B'2705, by a single nucleotide substitution [34, 35]. This is a change that has not been found in any other class I sequence and results in substitution of histidine for tyrosine at position 59. The OZB subtype of HLA-A2 differs by a single unique substitution of glutamic acid for alanine at position 236 from the common A2.1 subtype [36]; however, in this case the structural analysis has been done at the protein level and it is unknown whether a single nucleotide substitution is responsible. Comparison of the relevant codons indicates that these A2 alleles may only differ by a single, unique substitution. That both B'2703 and A2-OZB are lowfrequency variants is consistent with them being the result of relatively recent point mutations. The relationships between other alleles implicates various recombinational mechanisms. For example, the HLA-Aw69 allele is probably the result of homologous recombination between Aw68.1 and A2.1 alleles [37]; Bw42, the result of a similar event between B7 and B8 alleles [14]. In each of these cases the 5' part of the hybrid gene is identical to one allele and the 3' part is identical to the othei'. Another common pattern of relationship is when two otherwise identical alleles differ in a short localized segment of sequence. An illustrative example is provided by the HLA-A2.3 variant of HLAA2 which appears restricted to Chinese individuals [17, 38]. This allele differs from HLA-A2.1 by a cluster of three substitutions in codons 149 to 156 of exon 3 encoding the a2 helix. The substitutions found in A2.3 are not unique to this molecule, all being found in alleles HLA-A25 and -A26 of the A10 family [39, 40]. The

Evolution of Class I MHC Genes and Molecules

simplest interpretation of these data is that the cluster of substitutions in A2.3 and A10 alleles have a common evolutionary origin, in which case one must then argue that a double recombination between A2.1 and an A10 allele was the mutational event responsible for the formation of A2.3. Such events have been called segmental exchanges or microrecombinations by different authors [19]. In considering such single events in the past, we cannot discern whether they were reciprocal exchanges, i.e., classical double recombinations, or nonreciprocal allelic conversions. An alternative but more complicated, and to our thinking less likely, interpretation is that the cluster of substitutions in A2.3 arose independently from those in A10 alleles. This would constitute convergent evolution requiring an identical set of three independent point substitutions and implies a very strong selection for both the final sequence and the intermediate forms. In this discussion of HLA-A2.3 we assumed that HLA-A2.1 is the older allele and that HLA-A2-3 was derived from it. The basis for this assumption is their differing frequency and geographical distribution: HLA-A2.1 is at high frequency in many different human populations whereas HLA-A2.3 has only been found in one or two Chinese individuals. It is therefore argued that HLA-A2.1 is the older allele and has with time spread throughout the world while HLA-A2.3 is newly arisen in the Chinese population. However, the argument can be inverted by suggesting that HLA-A2.3 was the older allele and gave rise to HLA-A2.1, which is so selectively advantageous that HLA-A2.3 is being lost from the population and all we see are the final vestiges of A2.3, confined to a single population. Irrespective of which allele came first, the sequence relationship provides evidence for segmental exchange. Another example of the segmental exchange phenomenon is gained from comparison of the A25 and A26 subtypes of A10 [40]. Their sequences only differ by a cluster of substitutions in codons 76 to 83 encoding part of the a~ helix (Fig. 2). The substitutions found in A26 are shared by the other HLA-A alleles including A1 and A29.1 but are not found in HLA-B or -C sequences. The substitutions found in HLA-A25 are shared with HLA-A32 and no other HLA-A,B,C sequences. If A25 were the older allele, then one would postulate that A26 was produced by a segmental exchange between A25 and either an A1- or A29-related allele. On the other hand, if A26 were the older allele then an exchange between A26 and A32 would be postulated for the formation of A25. Analysis of ancestral haplo~pes and linkage relationships may provide evidence to favor one or other of these hypotheses. The region of difference between A25 and A26 is also that part of the heavy chain that is critical for forma-

123

Consensus PatrAl ^2.t Aw68.1 A24 A32 ^Z5

A26 AI Bw58 PatrB3 S38 ¢39

60 70 SO 90 WDRgTqIVgAqSQTORESLRTLRGYYgQS~A --Q--RSA--R ..... VD-G .......... D --0--~g---H---~-V0-@. . . . . . . . . . . ---R-R~ ......... VD-G . . . . . . . . . . . . --g--Og---n ...... ~--tALZ ....... --q--RS---H ......... tALE ....... ---S-R~---~ ......... gALl ...... D ---s-z~---a ..... ^~-c......... D --q--R~n--a ..... ^~-¢......... D --G--R~H--SA--Y--~--IALR ....... ..... R~---$~ ..... ~--IA~R ....... ---H---C-T~--t--S--tA~ .......

~u4

~S

(-)

(-1

+

-

-

(÷) ÷

(-)

---~---C-TST

........

R- .........

---~---c-x~x ---h---F-TST

........

B8

B7

---S---Y---A

• .......... J- . . . . . . . . . . M- ......... ~ ......... (-) ........ D (-)

~65

PacrB2 earrCl

....... -H ....

........ ........

S-T~A--Y ..... KY-R-A-A--V---I~

÷ (÷) (+)

FIGURE 2 Conservation of Bw4 and Bw6 sequences in class I heav~.~haln sequences of humans and chimpm'tzees. Residues 7e:-83 contributing to the Bw4 and Bw6 epitopes are shown in bold. The presence of the serological Bw4 and Bw6 epi~o~es are shown on the right. For the chimpanzee proteins Pa~rA1, PatrB2, PatrB3, and PatrC1, the presence of the epitopes has not been directly tested and is predicted on the basis of sequence

tion of the Bw4 and Bw6 epitopes [41]. This also appears to be a region in which there are recurrent segmental exchanges. For many years it has been appreciated by serologists that certain pairs of cross-reactive "splits" of HLA-B antigens could be distinguished on the basis of their having either the Bw4 or Bw6 epitopes. Mfiller et al. [42] have recently analyzed the sequences of B38 and B39, which constitute one such pair of molecules. These two molecules differ by a cluster of substitutions in codons 74 to 83 of exon 2 encoding the a~ helix. B38 carries the Bw4 epitope and has a sequence identical to a number of other Bw4-positive molecules including B51, B52, B57, B58, and B49 within this region; whereas B39 carries the Bw6 epitope and has sequence idendw within this region to Bw6carrying molecules including B14, B18, B7, BS, and Bw60. Again it seems clear that the B38 and B39 alleles are evolutionarily related by a single segmental exchange e v e n t - - b e it double recombination or allelic conversion. There are, however, other pairs of serologically cross-reactive B-locus molecules that have a more complex relationship. An example is provided by B57 and B58 [40]. These alleles differ by 16 nucleodde substitutions, scattered throughout the coding region, that give rise to eight amino acid subst/tutions: 2 in al, 2 in ~x2, 1 in a3, 2 in the transmembrane domain, and 1 in the cytoplasmic region. The helices of the ~ l a n d a 2 domains are where most serological epitopes have been mapped and these sequences are identical in B57 and Bw58, thus accounting for their antigenic cross-reactivity. The differences in a~ are at positions 45 and 46 where B57 has methionine and alanine in common with B13, B46, and B62 whereas Bw58 has threonine and

124

Leader

P. Parham and D. A. Lawlor

DeDtlde

PatrA2 Pa~rA1 PatrA3 PatrA4

PopyAI PopyE1 PatrB5 PatrB2 PatrB4 PatrB3 PatrBl

PatrCl PatrF1

CONS Ch26

-24 . . MAVMAPRTLLLLLSGALALTETWA .... P ...............

.... P ............... Q-----P ................ 0-'......... V .......... Q--................. Q--.................... O----2 ................. Q--G ....... E ......... -VDG ....... E ..... Q---Q-T .... V ..... A ......... -Q-T .... V ..... A .........

ChLA-B1 Bw58 Bw42 Ch34 Cwll

.... V ..... A ......... -R-T .... V .... W--V ....... -L ...... V ..... A .........

-Q-T .... V .....

A .........

....... .... S .... F ........... .... S ............ D---

Ch28 HLAF

al

-1 Q---

Ch25 A108 A126 A3.1 Alle OrAl OrE1 HL~E Chll ChLA-B2 Ch18 Ch39

domain 1

PatrA2 PatrAl PatrA3 PatnA4

PopyAI PopyEl PatrB5 PatrB2 PatrB4 PatrB3 PatrB1

PatrCl PatrFl

CONS Ch26 Ch25 AI0S A126 A3.1 Alle OrAl OrEl HLAE Chll ChLA-B2 Ch18 Ch39 ChLA-BI Bw58 Bw42 Ch34 Cwll Ch28 HLAF

.

.

30

.

.

60

.

.

90

G~H~M~YFYT~RPGRGEPRFIA~GYV~DT~FV~FD~DAA~P~PRAPWIEQEGPEYWDRETR-AKA-AQTDRVDLRTLRGYYNQSEA ........ F ................................. ........ F ................................. .......................................... ........ S ................................. ........ F ................................. ..........................................

Q .................. Q .................. Q .................. Q .................. Q .................. Q ..................

E---S---HS Q---S---HS O---BM--S E-~-SV--S Q---NV--QS Q---NV--QS

....... ....... ........ ........ ....... .......

G .......... G .......... G .......... G .......... G ........... G ..........

D D D D D

........ S ................................. Q ..... T---M ............. SV--H---N .... G ......... DG .... LK--H .............. S ............. N ....... V---Q-M ............. S-RDT---F--N .......... T-.... LK--H .............. S ............. N ....... V ..... M .... S ........ S-RDT--IF--N ..... R ....... ..... K .... A ............ S .......... W ......... E ................... QIS-TN---Y-ES--N .......... ..... K .... A ............ S .......... W ......... E ................... QIS-TN---Y-ES--N .......... ~W ......... K---T ............... QIS-TN---Y--G--N-LR ....... ........ D-A ............ T ......................................... NV--S ..... EN--IALR ....... ......................... T ......................................... NM--S ..... EN--~ALR ....... .......... AM ................................ T ................ G---NM--S---Y-EN--IALR ....... ....................... S .................... E ................. N-QIY--Q ..... ES--N .......... ............... G ...... V ........ H---QKY-RQ--A---S--N ......... D C ......... A .......................... Q ......... G ...... V ............ QKY-RQ ...... S--N .......... .... L---S-A .............. Y - - - E . . . . . . . L ....... I ...... E--V ..... Q--E-T-GY---N ...... A--N-LRR ...... .... L---S-A .......... Y---E ....... L ....... I ...... E--V ..... Q--EWT-Gy---N ...... A--N-LRR ......

FIGURE 3 Comparison of the structures of chimpanzee, orangutan, and human class I MHC heavy-chain glycoproreins. As agreed at the meeting on The Primate MHC (November 19-22, 1989, Oegstgeest, The Netherlands), the nomenclature for the chimpanzee (Pan troglodytes) alleles is PatrA,B,C, etc., and for the orangutan (Pongopygmaeus) is PopyA,B,C, etc. [47]. New and the old designations for the alleles are included. The consensus is derived from all the sequences shown. glutamic acid in common with B18, B35, B51, and Bw52. These residues are at the end of the "45 pocket" found in the Aw68 structure [33] and are likely to influence peptide binding. In the ol2 domain B57 and Bw58 differ at position 97, which is a peptide-binding residue on the floor of the groove and at 103, which is under the (*2 helix. Although by serological criteria these two molecules appear similar and their sequences are very similar, one must invoke a number of independent mutational events to account for their differences, suggesting that they diverged at an earlier point in time compared, for example, to A2.1 and A2.3. Analysis of class I H-2 alleles has indicated that segmental exchange between linked alleles of different loci

is a major contributor to diversification in mice [19]. Although evidence for such events can be found in human class I sequences, this mechanism does not appear to have been as important in the diversification of HLAA,B,C alleles and has been a minor player compared to segmental exchange between alleles of the same locus. In part it is probably the relatively low frequency of intergenic compared to interallelic exchanges that has allowed the human class I loci to diverge and evolve their locus-specific characters, The difference in the patterns of segmental exchange in humans and mice could derive from a number of sources. First, the number of class I genes in mice is higher than in humans and will clearly increase the ratio in favor of intergenic events [43]. Second, the actual frequency of intergenic events between any two genes may be greater in the H-2 than in the HLA region. Third, selection may eliminate a greater proportion of HLA alleles resulting from intergenic conversion. Segmental exchanges between alleles of different loci will tend to homogenize different loci and prevent their diversification. Previous successful intergenic conversions will therefore tend to facilitate further events as the

Evolution of Class I MHC Genes and Molecules

a2

125

domain

PatrA2 PatrA1 PatrA3 PatrA4

PopMA1 PopyEl PatrB5 PatrB2 PatrB4 PatrB3 PatrBl

PatrCl PatrFl

~3

CONS Ch26 Ch25 AI08 A126 A3.1 Alle OrAl OrE1 HLAE chll ChLA-B2

91 . . 120 . . 150 • • ~80 GS HT IQ-MYGCDVGPDGRLLRGYRQYAYDGKDYIALNEDLRSWTAADTAAQITQ~AAF~%AEQLRAYLZGTCVEWL~ RYe. ~ N G ~ L ~ y L Q ~ A ...... I ....... S---F ...... D ..................... M ..... K ...... H .... R .......................... T ...... I ....... S---F ...... D ..................... M ..... K ...... H .... Q ......................... T ...... I ....... S---F ...... D .................... AM ..... K ...... H ........... R .................. : ...... L-F ..... S---F ...... D ..................... M ............ H .............................. T ...... I ....... S---F ...... D ..................... M ..... K ...... HE ........ D .................... T ...... I ........... F ...... D ..................... M ..... K ...... H .... Q ...... R.................. ...... R-F ......... F .... E-H ..................... M ............ G .... D ...... L---S .............. .... L-W-N---L ..... F .... E-F ....... LT .......... V ...... SZ--SND-CE--HQ ..... D ....... K---~ ..... LHL .... L-W-H--EL---R-F .... E-F ....... LT .......... V ...... SSQ-SND-SE--HQ ..... D ...... HK---K ..... LHL ---I--~ ...... M .......... E ............. ~--S .................... W .............................. ---l--R ..... M .......... E ................ S .................... W ..............................

Chl8 Ch39 ChLA-BI Bw58 Bw42 Ch34 ewll Ch28 HLAF

.... W-T-F ............................ O--S .................... V ................................ .... W-T-F---M .......... Y ................ S ..................................................... .... W-T ..... M .......... G ................ S .................... E---R .......................... ---I--R ..... L ......... HD-S .............. S .................... V .......... L ................... .... L-S ............... HN ..................................... V---D .................... D--E-.... L-W ..... L .......... D .... Y ........................... L ................................... .... L-R ..... L ........... N-F ................................... T ........................ K ..... .... L-G-N---M .......... H-H ........ S ............. V ...... FY--EEY--EF-T .... E-L-L ............... .... L-G-N - --M .......... H -H ........ S ............. V ...... FY- -EEY--EF-T .... E-L-L ...............

domain

PatrA2 PatrAl PatrA3 PatrA4

Popy~l PopyEl PatrBS PatrB2 PatrB4 PatrB3 PatrB3

PatrCl PatrFl

PatrA4

PopyAl PopyEl PatrB5 PatrB2 PatrB4 PatrB3 PatrBl

PatrCl PatrF1

...... M ..................................................................................... ...... M ...................................................................................... -& .... M .... V .......................................................... K ..............

OrZl HLAE Chll ChLA-B2 Chl8 ch39 ChLA-B1 Bw58 Bw42 Ch34 Cwll Ch2S HLAF

Transmer~rane.

PatrA2 PatrAl PatrA3

183 . . 210 . . . 240 . 270 DPPKTHVTHHP ISDHEATLRCWALGFYPAE ITLTWQRDGEDQTQDTELVZTRPAGDGTFQKWAAVVVP SGEEQRYTC}rgQ~EGLPKPLTLRW ...... M ..................................................................................... ...... M ..................................................................................... ...... M ...................... V ............................................................... ...... M ....... R .............................................................................

CONS Ch26 ch25 A108 A126 A3.1 Alle OrAl

domoin 275

Cytop!a~mi .

.

300

.

EPSSOPTIPIVGIVAGLVVLGAVVTGAAWAA~MWRRKSS . . . . . . . . . . . . . . . . . .I. . . . . .L ............. I

A126 A3.1 Alle OrAl OrEl HLAE

............. -L ........... -L ........... -L ........... --A--T ....... K-A ..........

ch28 HLAF

............. I .... L .... . . . . . . . . . . . . . . . . . .I. . . . . .L I .... I .... I .... I .... F .... I ....

I---. ....... °............ I

31S PatrA2 PatrA1 PatrA3

L .... I---. ............ L .... I---. ............ g .... I---. ............ L .... I---. ......... RNL ........ T ........ K--L--S--S--. ..... I--K---

PatrA4

..... S ........... A--AV--I--. ...... C ..... ..... S ........... A--AV--I--. ...... c ..... ..... S ........... A--AV--I--. ...... C ..... ..... S ........... A--~FqT-AVV ...... LC ..... ..... S ........... A--VVT-AVV ....... c ..... ..... S ........... A--AV--I--.---T--C ..... ..... S-V ......... A--AV--I--. ...... C ..... G ............ A---A--AVLAVLG .... V--C ..... G ................ A--AVLAVLG--M-V--C .....

PatrB5 PatrB2 PatrB4 Pa~rB3 PatrBl

-Q-P ....................... -Q-P .......................

PatrF1

......... . ........

K--K---

PopyA1 PopyEl

PatrCl

E ...... E-V ....

E ...... E ...... Q ...... Q--I---

~ do.ins

314

CONS Ch26 Ch25 AI08

Chll ChLA-B2 Chl8 Ch39 ChLA-BI Bw58 Bw42 Ch34 CWII

E ......

.......... R ............................... N-Y ........................................ E ................................... ~---OH ........................................... ........................................................ R ................................... ........................................................ R ................................... ........................................................ R ................................... ...... M .............................................. E--R ................................... ..................................................... Z--R ................................... ........... V ............................................ R ................................... ........................................................ R ................................... EH ......... V ......................................................................... EH ......... V ......................................................................... .... A-IA ................................ E ................ N ........................... .... A--A ................................ E ............................................

.

3aO

. 342

CONS Ch26 eh25 AI08

GGKGGSYSQAASSDSAQGSDVSLTACKV DR ..... T ...............

A126 A3.1 Al1¢ OrAl OrEl HLAE

DR .......................... DR ..... T .................... DR ..... T .................... DR .......... N ............... ........ K-EW ....... ES.- .... A ........ K-EW ....... ESHSL

Chll ChLA-B2 Ch18 Ch39 ChLA-BI Bw58 Bw42 Ch34 Cwll Ch28 HLA~

DR .....

~-T-y

.--

T ....................

......................... ......................... .......................... ......................... ......................... ......................... ........... C ............. ...... C ...... ~ ...... E--I---A ...... C ...... N ...... E--I---A DRNR DRNR

....... .......

. ............... VT ...... G .....

N--

Figure 3 (Continued)

proteins are more likely to tolerate the changes and give functionally useful molecules. Conversely, locus diversification will decrease the frequency of intergenic exchanges that give functionally useful molecules and thereby facilitate further locus divergence. Thus once locus divergence has begun, it may become a positively reinforcing process [3, 44].

COMPARISON OF CHIMPANZEE A N D HUMAN MHC Characterization of class I MHC cDNA from chimpanzee EBV-transformed B-cell lines and Southern blot analysis suggests that the organization of the class I region of the chimpanzee and human MHC will be similar. Chimpanzee homologues of HLA-A, -B, and -C

126

P. Parham and D. A. Lawlor

TABLE 2

A m i n o acid differences of P a t r A locus p r o d u c t s

A.

PatrA2 PatrA1 PatrA3 PatrA4

T



T

T

T

T

T

9

62

66

67

70

71

97

99

137

144

156

T

163



197

212

F F Y S

E Q Q E

S S N S

A A M V

H H S S

S S A A

I I I L

Y Y Y F

D D A D

K K K Q

R Q L L

T T R T

H H H R

E E V E

B.

CH26 CH26 CH25 AI08

CH25

A108

A126

2

10 9

9 10 11

Only positions of amino acid differences between the four PatrA locus products are shown (A). Residues with side chains postulated to interact with peptides (V), with T-cell receptors (&), or with both peptides and T-cell receptors (0) are shown. Below (B), pair-wise differences between the amino acid sequences of the PatrA heavy chains are shown.

have been found and for each locus (Mhc Parr-A, -B, and -C; for terminology see legend to Fig. 3) the alleles are no more different from human alleles than certain human alleles are from each other [3, 24, 25, 45]. In addition, a chimpanzee homologue of the nonclassical HLA-F locus has also been identified [24] (Fig. 3). Comparison of HLA-A sequences shows they cluster into five allelic families that correlate well with the serologically cross-reactive antigens: A1/A3/A11, A2/A28, A9, A10, and A19. The only exceptional allele is HLAA30, which from serology is grouped into A19 but from sequence is part of the A1/A3/A11 family. In this instance it is presumably a small number of amino acids that determine the antigenicity of A30 rather than the overall pattern of its polymorphic substitutions [18]. Patr-A alleles have been obtained from three unrelated chimpanzees and they are all related to the A1/A3/A11 family. Limitation in polymorphism is also seen with the five Patr-B alleles that have been described. Although only a few individuals have been examined and it is unclear how representative the captive chimpanzees of the wild population are, these findings suggest that class I polymorphism in chimpanzees is limited compared to humans. The presence of Al/A3/A11 alleles in both species shows that this family of alleles was present in the common ancestor of chimpanzees and humans. The absence in chimpanzees of alleles from the other "human" HLA-A families could have come about in the following ways. First, these families were not present in the common ancestor and arose subsequently and specifically in the human line. Second, they were present in the common ancestor but were not present in the individuals that formed the chimpanzee lineage. Third, they were in the common ancestor, entered the chimpanzee

lineage, and were subsequently lost. Fourth, the sample may have been too small for other alleles to be expected. It is possible that the chimpanzee may have been through a recent population bottleneck, as has been proposed for the cheetah [46]. Such an argument does not readily explain the diversity in chimpanzee alleles of the A1/A3/A11 family, unless one argues that the cause of the population contraction specifically selected for these alleles. Further studies on HLA-A homologues from the chimpanzee and other ape species should further resolve the question as to the time of origin of the other families of alleles. The patterns of amino acid substitution in chimpanzee class I heavy chains is analogous to that seen with HLA-A,B,C heavy chains and indicates that similar types of selective pressures have been in operation. For example, comparison of the 4 Patr-A members of the A1/A3/All family shows that most differences are at positions which from the A2 and Aw68 crystal structures are postulated to participate in the interactions with peptides and T-cell receptors (Table 2). The PatrA1 (CH25) and Parr-A2 (CH26) heavy chains only differ by two amino acids, but these substitutions are at positions 62 and 156 which have frequently been shown to be critical in determining patterns ofT-cell recognition in humans and mice. HLA-A2 is by far the most frequent of HLA-A,B,C antigens, being found in over 4 0 % of individuals in many human populations. Even though many subtypes have been found there is a predominant allele, HLAA2.1 (A*0201), which accounts for the majority o f " A 2 antigens." Thus the HLA-A locus could be considered either as "semimonomorphic" or partially polymorphic. By comparison no allele predominates in a similar fash-

Evolution of Class I MHC Genes and Molecules .

127 ~.~.

ion at the HLA-B locus. The question we would like to consider is whether the high frequency of A2 is due to a particular reason or if it is the result of chance, as suggested by Dr. Masatoshi Nei at the recent conference on the primate MHC. This is an issue of general importance as so much of our understanding of the structure and function of HLA-A,B,C and other class I and II molecules is based on studies of HLA-A2. The ease of purification and crystallization, the strong antigenicity, and inability to protect against natural killing are additional unusual features that mark HLA-A2. HLA-A2 subtypes and the related HLA-A28 subtypes form the A2/A28 family of HLA-A locus alleles. Serology and analysis of c D N A clones concord in their total lack o f evidence for A2/A28 alleles in chimpanzees. Alternative explanations for this phenomenon are: first, that recognizable A2/A28 alleles began to be formed on the human lineage only after separation of the chimpanzee lineage; second, that A2/A28-1ike alleles were present in the common ancestor of humans and chimpanzees but were either not taken into the chimpanzee lineage or subsequently eliminated from that line. The A1/A3/A11 family of alleles is well-represented in both humans and chimpanzees and clearly was present in common ancestral species. Comparing alleles it is apparent that A28 alleles are more related in sequence to the A 1 / A 3 / A l l family than A2 alleles. For example, the a~ domain of Aw68.1 differs from that of A3.1 by only three substitutions at positions 9, 62, and 63, whereas A2 differs by four substitutions at positions 62, 66, 70, and 74. In the amino-terminal 36 residues of the c,2 domain Aw68.1 differs from A3.1 by only one substitution at position 97 whereas A2.1 differs from A3.1 at five positions. In the carboxy-terminal 56 residues of c,2, A2, and A w 6 8 1 differ at only one position whereas A3.1 differs by five residues from A2.1 and by six residues from Aw68.1. This pattern of similarity between A2.1 and Aw68.1 and differences with A3.1 extends throughout a3. Thus Aw68.1 is in essence a composite of A2.1 and A3.1; it may represent an evolutionary intermediate in the derivation of A2 from an A3-1ike molecule or the subsequent product of exchange events between already formed A2 and A3 alleles. Evidence that can sway interpretation to one or other of these possibilities will come from analysis of class I molecules in other primate species.

REFERENCES 1. lvanyi P: lnterspecies MHS relationships studied by serological and cellular cross-reactions. In Reisfeld, Ferrone (eds): Current Trends in Histocompadbility. New York, Plenum Press, 1980.

2. Klein J, Figueroa F: Evolufon of the major histocompatlbility complex. CRC Crk gev Immunol 6:295, 1986. 3. Lawlor DA, ZemmourJ, Ennis PD, Parhatn P: Evolution of class I MHC genes and proteins: From natural selection to thymic select/on. Annu Rev Immunol 8:23,1990. 4. Rogers JH: Mouse histocompadbility-related genes are not conserved in other mammals. EMBO J 4:749, 1985. 5. IGndt TJ, Singer D: Class I major histocomp~bility genes in vertebrate species: What is the common denominator? Immunol Res 6:57, 1987. 6. Ennis PD, Jackson AP, Patham F: Molecular cloning of bovine class I MHC cDNA. J Immunol 141:643, 1988. 7. Dupont B (ed): Immunob/ology of HLA, vul 2. New York, Springer-Verlag, 1989 8. Hasegawa M, Kishino H, Yano T-a: Man's place in hominodea as inferred from molecular clocks of DNA. J Mol Evo126:132, 1987. 9. Sibley CG, Ahlquist JE: The phylogeny of the hominoid primates, as indicated by DNA-DNA hybridization. J Mol Evo120:2, 1984. 10. Lewin R: DNA clock conflict continues. Science 241:1756, 1988. 1 I. DiamondJM: DNA-based phylogenies of the three chimpanzees. Nature 332:685, 1988. 12. Orr HT, DeMars R: Mapping of class I DNA sequences within the human major histocompatibility complex. ]mmunngenedcs 18:489, 1983. 13. Chimini G, Pontarotri P, Nguyen C, Toubert A, Boretto J, Jordan BR: The chromosome region containing the highly polymorphic HLA class I genes displays limited large scale variability in the human population. EMBO J 7:395, 1988. 14. Parham P, I.omen CE, Lawlor DA, Ways JP, Holmes N, Coppin HL, Salter RD, Wan AM, Ennis P: The nature of polymorphism in HLA-A,B,C molecules. Proc Nad Acad Sci LISA 85:4005, 1988. 15. Parham P, Lawlor DA, Lomen CE, Ennis PD: Diversity and diversification of HLA-A,B,C alleles. J lmmunol 142:3937, 1989. 16. Bjorkman PJ, Patham P: Structure, function and diversity of class I major histocompatibility molecules. Annu Rev B/ochem 59:253, 1990. 17. Holmes N, Ennis P, Wan AM, Denney D, Parham P: Multiple genetic mechanisms have contributed to the generation of the HLA-A2/A28 family of class I MHC molecules. J Immunol 139:936, 1987. 18. Kato K, Trapani JA, Allopenna J, Dupont B, Yang SY: Molecular analysis of the serolng/cally defined HLAAwl9 antigens: A genetically distinct family of HLA.A antigens comprising A29, A31, A32 and Aw33, but probably not A30. J Immnol 143:3371, 1989. 19. Nathenson SG, Gehebter J, Pfdfenbach GM, Zeff RA: Murine major histocompatibility complex class I mutants:

128

Molecular analysis and structure-function implications. Annu Rev Immunol 4:471, 1986. 20. Balner H, Gabb BW, D'Amaro J, van Vreeswijk W, Visset TP: Evidence for two linked loci controlling the serologically defined leukocyte antigens of chimpanzees (ChLA-A). Tissue Antigens 4:313, 1974. 21. Balner H, van Vreeswijk W, Roger JH, D'AmaroJ: The major histocompatibility complex of chimpanzees: Identification of several new antigens controlled by the A and B loci of ChI.A. Tissue Antigens 12:1, 1978. 22. Doff ME, Metzgar RS. Serological relationships of human, chimpanzee and gorilla lymphocytes isoantigens. In Terasaki P. (ed): Hismcompatibility Testing 1970. Copenhagen, Munksgaard, 1970. 23. Metzgar RS, Ward FE, Seigler HI:. Study of the HL-A system in chimpanzees. In Dausset, Colombani (eds): Histocompatibility Testing 1972. Copenhagen, Munksgaard, 1972. 24. Lawlor DA, Ward FE, Ennis PD, Jackson AP, Parham P: HLA-A,B polymorphisms predate the divergence of humans and chimpanzees. Nature 335:268, 1988. 25. Lawlor DA, Warren E, Ward FE, Parham P: Comparison of class I MHC alleles in humans and apes. Immunol Rev 113:147, 1990. 26. Malissen M, Malissen B, Jordan BR. Exon/intron organiza:ion and complete nucleotide sequence of an HLA gene. Proc Natl Aead Sci USA 79:893, 1982. 27. Davis MM, Bjorkman pJ: T-cell antigen receptor genes and T-cell recognition. Nature 334:395, 1988. 28. Connolly JM, Potter TA, Wormstall E-M, Hansen TH: The Lyt-2 molecule recognizes residues in the class I a3 domain in allogeneic cytotoxic T-cell responses. J Exp Med 168:325, 1988. 29. Salter RD, Norment AM, Chen BP, Clayberger C, Krensky AM, Littman DR, Parham P: Polymorphism in the a3 domain of HLA-A molecules affects binding to CD8. Nature 338:345, 1989. 30. Wu "IT, Kabat EA: An analysis of the sequences of the variable region of Bence Jones proteins and myelome light chains and their implications for antibody complementarity. J Exp Med 132:211, 1970. 31. Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL, Wiley DC: Structure of the human class I histocompatibility antigen, HLA-A2. Nature 329:506, 1987. 32. Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL, Wiley DC: The foreign antigen binding site and T cell recognition regions of class I histoeompatibility antigens. Nature 329:512, 1987. 33. Garrett TPJ, Saper MA, Bjorkman PJ, Strominger JL, Wiley DC: Specificity pockets for the side chains of peptide antigens in HLA-Aw68. Nature 342:692, 1989.

P. Parham and D. A. Lawlor

34. L6pez de Castro JA: HLA-B27 and HLA-A2 subtypes: Structure, evolution and function. Immunol Today 10:239, 1989. 35. Choo SY, St. John T, Orr HT, Hansen JA: Molecular analysis of the variant alloantigen HLA-B27d (HLAB'2703) identifies a unique single amino acid substitution. Hum Immunol 21:209, 1988. 36. Castafio R, Ezquerra A, Dom~nech N, Ltpez de Castro JA: An HLA-A2 population variant with structural polymorphism in the c~3 region. Immunogenetics 27:345, 1988. 37. Holmes N, Parham P: Exon shuffling in vivo can generate novel HLA class I molecules. EMBO J 4:2849, 1985. 38. Mattson DH, Handy DE, Bradley DA, Coligan ME, Cowan EP, Biddison WE: DNA sequences of the genes that encode the CTL-defined HLA-A2 variants M7 and DKI. Immunogenetics 26:190, 1987. 39. Cianetti L, Testa U, Scotto L, La Valle R, Simeone A: Three new class I HLA alleles: Structure of mRNAs and alternative mechanisms of processing. Immunogenetics 29:80, 1989. 40. Ennis PD, Zemmour J, Salter RD, Parham P: Rapid cloning of HLA-A,B cDNA using the polymerase chain reaction. Proc Natl Acad Sci USA 87:2833, 1990. 41. Wan AM, Ennis P, Parham P, Holmes N: The primat3. structure of HLA-A32 suggests a region involved in formation of the Bw4/Bw6 epitopes. J Immunol 137:3671, 1986. 42. Miiller CA, Engler-Blum G, Gekeler V, Steiert I, Weiss E, Schmidt H: Genetic and serological heterogeneity of the supertypic HLA-B locus speeificities Bw4 and Bw6. Immunogenetics 30:200, 1989. 43. Flavell RA, Allen H, Burkly LC, Sherman DH, Waneck GL, Widera G: Molecular biology of the H-2 histocompatibility complex. Science 233:437, 1986. 44. Parham P, Benjamin PJ, Chen BP, Clayberger C, Ennis PD, Kreusky Am, Lawlor DA, Littman DR, Normeut AM, Orr HT, Salter RD, Zemmour J: Diversity of class I HLA molecules: Functional and evolutionary interactions with T cells. Cold Spring Harbor Syrup Quant Biol 54:529, 1990. 45. Mayer WE,Jonker M, Klein D, Ivanyi P, van Seventer G, Klein J: Nucleotide sequences of chimpanzee MHC class I alleles: Evidence for trans-species mode of evolution. EMBOJ 7:2765, 1988. 46. O'Brien SJ, Roelke ME, Marker L, Newman A, Winkler CA, Meltzer D, Colly L, Evermann JF, Bush M, Wildt DW: Genetic basis for species vulnerability in the cheetah. Science 227:1428, 1985. 47. Klein J, Bontrop PE, Dawkins RL, Erlich HA, Gyllensten UB, Heise ER, Jones PP, Parham P, Wakeland EK, Watkins DI: Nomenclature for the major histocompatibility complexes of different species: A proposal, immunogenetics 31:217, 1990.

Evolution of class I major histocompatibility complex genes and molecules in humans and apes.

SPECIAL ARTICLE THE PRIMATE MHC WORKSHOP O E G S T G E E S T , T H E N E T H E R L A N D S , 1989 Evolution of Class I Major Histocompatibflity Compl...
915KB Sizes 0 Downloads 0 Views